Florin Leon

Florin Leon
Advanced Methods in Artificial

Intelligence: Agents, Optimization
and Machine Learning
Habilitation Thesis
March 2014
Table of Contents
Summary ............................................................................................................. 5
Rezumat .............................................................................................................. 9
Part I. Scientific Achievements

Chapter 1. Multiagent Systems with Emergent Behaviour
1.1. Multiagent Role Allocation ................................................................... 15
1.2. Social Networks of Adaptive Agents ................................................. 29
1.3. Behaviours with Different Levels of Complexity .......................... 42
Chapter 2. Agents with Learning Capabilities
2.1. Planning with Quasi-Determined States ......................................... 55
2.2. Reinforcement Learning with State Attractors ............................ 66
2.3. Dual Use of Learning Methods ............................................................ 77
Chapter 3. Quantum-Inspired Evolutionary Algorithms
3.1. Fundamentals of Quantum Computing ........................................... 87
3.2. Binary Encoded Solution for Combinatorial Auctions .............. 88
3.3. Real-Valued Encoded Solution for Multi-Issue Multi-Lateral
Negotiations ............................................................................................... 98
Chapter 4. Applications of Machine Learning Methods

4.1. Prediction of Liquid-Crystalline Properties ...............................
4.2. Stacked Neural Networks for Time Series Forecasting .........
4.3. Stacked Neural Networks for Modelling Chemical
Engineering Phenomena ....................................................................
4.4. Protein Structure Classification ......................................................
Chapter 5. Hybrid Neural and Neuro-Genetic Methods
5.1. Combining Neural Models with Phenomenological Models ..
5.2. Evolutionary Methods for the Determination of the
Architecture and Internal Parameters of a Neural Network ..
5.3. Neuro-Genetic Multiobjective Optimization Using NSGA-II
with Alternative Diversity-Enhancing Metrics ..........................
5.4. Multiobjective Optimization of a Stacked Neural Network
Using an Evolutionary Hyper-Heuristic .......................................
111
118
125
136
141
143
150
153
Part II. Professional Development Plans .............................................. 165
Part III. References ....................................................................................... 175
Summary
The habilitation thesis aims at presenting a selection of research
results obtained in the field of artificial intelligence, mainly dealing with
intelligent agents, multiagent systems, optimization and machine learning
methods.
Concerning multiagent role allocation, a method is proposed by
which the agents can self-organize based on the changes of their individual
utilities. Agents have different preferences regarding the features of the
tasks they are given and their adaptive behaviour is based on the
psychological theory of cognitive dissonance, where an agent working on a
low-preference task gradually improves its attitude towards it. The total
productivity is shown to increase as an emergent property of the system.
Another demonstration of emergent behaviour is based on the design
of an interaction protocol for a task allocation system, which can reveal the
formation of social networks. The agents can improve their solving ability
by learning and can collaborate with their peers to deal with more difficult
tasks. The average number of connections and resources of the agents
follows a power law distribution.
Also, a simple set of interaction rules is proposed that can generate
overall behaviours with different levels of complexity, from asymptotically
stable to chaotic. It is shown that very small perturbations can have a great
impact on the evolution of the system, and some methods of controlling
such perturbations are investigated in order to have a desirable final state.
Another contribution in the subfields of planning and learning is a
method that includes a learning phase into the plan itself, so that the agent
can dynamically recognize the preconditions of an action when the states are
not fully determined, and it can even directly choose its actions based on
learning results.
The notion of state attractor is introduced, which allows the agents to
compute their actions based on the proximity of their current state to the
nearest state attractor. This technique is considered to be an alternative way
of approaching difficult multiagent reinforcement learning problems.
Considering autonomous learning, a system for solving classification
and regression problems is proposed, which involves competition between
5
Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning
different types of agents. They use neural networks to solve external

problems given by the user, but they can also build their own internal
problems out of their experience in order to increase their performance.
Due to the recent research advances in quantum computing, ideas
from this field have been increasingly used as a source of inspiration for
new evolutionary algorithms. Two variants of quantum-inspired
evolutionary algorithms are proposed, characterized by a steady state model
(population-based elitism), a repairing procedure to keep all the individuals
feasible and an evolutionary hill-climbing phase meant to further improve
the quality of the solution. The first one uses binary encoding and the
second one uses real-valued encoding. These algorithms are applied for
solving multi-attribute combinatorial auction problems and for finding nearoptimal outcomes of multi-issue multi-lateral negotiation in multiagent
systems.
Another contribution is the application of classification methods to
chemical engineering problems. One example is the prediction of the liquid
crystalline property of compounds, using methods such as neural networks,
decision trees, instance-based techniques etc. The best results are obtained
with an original algorithm, Non-Nested Generalized Exemplars with
Prototypes, NNGEP.
The classification algorithms are also applied for 27-class problems
and on 4-class reduced problems of protein fold classification.
A modelling methodology based on stacked neural networks is
suggested by combining several individual networks in parallel, whose
outputs are weighted to provide the output of the stack. For time series
forecasting, a stacked neural network is designed containing one normal
multilayer perceptron with bipolar sigmoid activation functions, and the
other with an exponential activation function in the output layer. Other two
chemical engineering phenomena are modelled with stacked neural
networks, and they are shown to outperform individual networks in terms of
generalization capabilities.
The final section of the thesis is concerned with the combination of
different modelling and optimization methods into hybrid ones. In case of
chemical processes which are difficult to model analytically, a hybrid model
is obtained by combining a simplified phenomenological model with a
neural network which approximates the difficult parts of the reaction.
Different types of evolutionary algorithms, such as classical genetic
algorithms, standard differential evolution and self-adaptive differential
evolution are used to determine both the architecture and the internal
parameters of a neural network that models other chemical processes.
Summary
Neural networks can handle multiple outputs corresponding to

multiobjective optimization problems. The Non-dominated Sorting Genetic
Algorithm, NSGA-II, is adapted with alternative metrics such as the set
coverage metric and the spacing metric in order to increase the solution
diversity.
A stacked neural network is designed using an evolutionary hyperheuristic, called NSGA-II-QNSNN, based on the NSGA-II algorithm as a
global optimization method and incorporating a Quasi-Newton algorithm as
a local optimization method for training the neural networks of the stack.
The results prove to be very good not only in terms of accuracy, but also in
terms of structural complexity of the stacks.
Rezumat
Teza de abilitare i propune s prezinte o selecie a rezultatelor de
cercetare obinute n domeniul inteligenei artificiale, legate n principal de
ageni inteligeni, sisteme multi-agent, metode de optimizare i de nvare
automat.
n ceea ce privete alocarea rolurilor n sisteme multi-agent, se
propune o metod prin care agenii se pot auto-organiza pe baza
modificrilor n utilitile lor individuale. Agenii au preferine diferite
referitoare la caracteristicile sarcinilor pe care le primesc, iar
comportamentul lor adaptiv se bazeaz pe teoria disonanei cognitive din
psihologie, n care un agent care lucreaz la o sarcin pentru care manifest
o preferin sczut i mbuntete treptat atitudinea fa de aceasta. Se
arat c o proprietate emergent a sistemului este creterea productivitii
totale.
O alt situaie n care un sistem multi-agent manifest un
comportament emergent este demonstrat prin proiectarea unui protocol de
interaciuni pentru alocarea sarcinilor de lucru, care conduce la formarea de
reele sociale. Agenii i pot mbunti capacitatea de rezolvare prin
nvare i pot colabora cu ali ageni atunci cnd sunt confruntai cu sarcini
mai dificile. Numrul mediu de conexiuni i resurse ale agenilor urmeaz o
distribuie de tip lege de putere.
De asemenea, se propun o serie de reguli simple de interaciune care
pot genera comportamente globale cu diferite niveluri de complexitate, de la
asimptotic stabil la haotic. Se arat c perturbaii foarte mici pot avea un
impact mare asupra evoluiei sistemului i sunt investigate unele metode de
control al acestor perturbaii n scopul de a avea o stare final dorit.
O alt contribuie n subdomeniile planificrii i nvrii este o
metod care include o faz de nvare n planul propriu-zis, astfel nct
agentul poate recunoate dinamic precondiiile unei aciuni atunci cnd
strile nu sunt determinate n totalitate i i poate alege direct aciunile pe
baza rezultatelor de nvare.
Este introdus noiunea de atractor de stri, care permite agenilor s
i calculeze aciunile n funcie de proximitatea strii curente fa de cel mai
apropiat atractor de stri. Aceast tehnic este considerat a fi o abordare

alternativ pentru probleme dificile de nvare cu ntrire multi-agent.
Avnd n vedere nvarea autonom, se propune un sistem pentru
rezolvarea problemelor de clasificare i regresie, care include concurena
ntre diferite tipuri de ageni. Acetia folosesc reele neuronale pentru a
rezolva probleme externe, atribuite de ctre utilizator, dar i pot construi
propriile lor probleme interne, pe baza experienei, n scopul de a-i crete
performanele.
Datorit progreselor recente ale cercetrilor privind calculul cuantic,
idei din acest domeniu pot servi drept surs de inspiraie pentru noi
algoritmi evolutivi. Sunt propuse dou variante de algoritmi evolutivi de
inspiraie cuantic, caracterizate printr-un model de stare de echilibru
(elitism la nivel de populaie), o procedur de reparare pentru a asigura
fezabilitatea tuturor indivizilor i o faz de cutare evolutiv de tip gradient
ascendent menit s mbunteasc suplimentar calitatea soluiei. Primul
folosete codarea binar, iar al doilea folosete codarea cu valori reale.
Aceti algoritmi sunt aplicai pentru rezolvarea unor probleme de licitaie
combinatoric multi-atribut i pentru a gsi rezultate cvasi-optime pentru
negocieri multi-laterale i multi-produs n sisteme multi-agent.
O alt contribuie este aplicarea metodelor de clasificare pentru
probleme de inginerie chimic. Un exemplu este predicia proprietii de
cristal lichid al unor compui folosind metode precum reele neuronale,
arbori de decizie, tehnici bazate pe instane etc. Cele mai bune rezultate sunt
obinute cu un algoritm original care construiete exemplare generalizate
nesuprapuse cu prototipuri, NNGEP.
Algoritmii de clasificare sunt de asemenea aplicai pentru
clasificarea plierii proteinelor n probleme cu 27 de clase i probleme reduse
cu 4 clase.
Este sugerat o metodologie de modelare bazat pe reele neuronale
de tip stiv care combin mai multe reele individuale n paralel i ale cror
rezultate sunt ponderate pentru a furniza ieirea stivei. Pentru prognoza
seriilor de timp, este proiectat o reea neuronal stiv coninnd un
perceptron multistrat normal cu funcii de activare sigmoide bipolare i altul
cu o funcie de activare exponenial n stratul de ieire. Alte dou fenomene
de inginerie chimic sunt modelate cu reele neuronale de tip stiv care
demonstreaz capaciti de generalizare superioare reelelor individuale.
Seciunea final a tezei trateaz combinarea diferitelor metode de
modelare i optimizare n metode hibride. n cazul unor procese chimice
dificil de modelat analitic, un model hibrid este obinut prin combinarea
unui model fenomenologic simplificat cu reele neuronale care aproximeaz
prile dificile ale reaciei.
10
Rezumat
Diferite tipuri de algoritmi evolutivi precum algoritmii genetici

clasici, evoluia diferenial standard i evoluia diferenial auto-adaptiv
sunt utilizate pentru a determina att arhitectura ct i parametrii interni ai
unei reele neuronale care modeleaz alte procese chimice.
Reelele neuronale pot avea mai multe ieiri care corespund unor
probleme de optimizare multi-obiectiv. Algoritmul genetic cu sortare
nedominat, NSGA-II, este adaptat cu metrici alternative, precum metrica
de acoperire a unei mulimi i metrica de spaiere, n scopul de a mri
diversitatea soluiilor.
O reea neuronal stiv este proiectat folosind o hiper-euristic
evolutiv, numit NSGA-II-QNSNN, bazat pe algoritmul NSGA-II ca
metod de optimizare global i incluznd un algoritm cvasi-Newton ca o
metod de optimizare local pentru antrenarea reelelor neuronale ce
compun stiva. Rezultatele se dovedesc a fi foarte bune, nu numai din punct
de vedere al acurateei, ci i al complexitii structurale a stivelor.
11
Scientific Achievements
Chapter 1
Multiagent Systems with

Emergent Behaviour
1.1. Multiagent Role Allocation
Coordination and automatic adoption of roles are important issues for
multiagent systems. Often, coordination mechanisms assume direct or
indirect communication of the agents potential in performing different tasks.
In this study, this problem is addressed from a different perspective, by
using negotiation outcomes for the allocation of tasks defined by attributes,
where the process focuses on agent utility functions regarding these
attributes, without an explicit transmission of agent potentials.
Related Work
The ability of a multi-agent system to dynamically reorganize its structure
and operation at run-time is highly valuable for many application domains.
Therefore, allocating tasks to agents in multi-agent systems is a significant
research issue.
Recently, many approaches have been proposed for task allocation.
Some of them use centralized methods which assume that there is a central
controller to assign tasks to agents (Zheng & Koenig, 2008). The centralized
approach can make the allocation process efficient and effective in a small
network since the central planner has a global view of the system and it
knows which agents are good at which tasks. Compared to this, the
decentralized style is more scalable and robust, but the communication
overhead increases. Some proposed mechanisms form groups of agents
before allocating tasks, which may result in the increase of computation and
communication cost (Lerman & Shehory, 2000).
A popular approach to task allocation is by using negotiation or
auction based mechanisms. Each agent computes a cost for completing a
task and broadcasts the bid for it. The auctioneer agent decides the best
available bid, and the winning bidder attempts to perform the task.
Following the classic contract-net protocol (Davis & Smith, 1983), several
15
variations of this method have been more recently proposed for the control
of unmanned space or aerial vehicles (Lemaire, Alami & Lacroix, 2004;
Thomas et al., 2005). Also, an emergent allocation method for mobile robots
was proposed, where each robot uses only the information obtained from its
immediate neighbours (Atay & Bayazit, 2007).
The Extended Generalized Assignment Problem, E-GAP (Scerri et
al., 2005) studies the assignment of tasks to agents, taking into account the
agents' capacities in order to maximize a total reward. It considers dynamic
domains and interdependencies (possible constraints) among tasks. Beside
the greedy centralized approach to solving such problems, approximate
solutions have been proposed, e.g. algorithms modelling colonies of social
insects, such as SWARM-GAP (Ferreira & Bazzan, 2006).
In cooperative multi-agent systems, roles are used as a design
concept when creating large systems, and they are known to facilitate
specialization of agents. A review of multi-agent role allocation is given by
(Campbell & Wu, 2010).
A negotiation problem is one where multiple agents try to come to
an agreement or deal. Each agent is assumed to have a preference over all
possible deals. Commonly encountered quantitative solutions for the
bargaining problem are, among others, the Nash solution, and the utilitarian
solution.
Learning Models
According to a recent study (Leibowitz et al., 2010), the learning curve L(t)
has an equation relating to the number of successful and failed trials. If S(t)
is the weighted average of success and F(t) is the weighted average of
failure, then:
L(t ) A S (t ) 1 A F (t )
t
L(t ) A p( x)dx 1 A 1 p( x) dx
(1.1)
(1.2)
where A [0,1] and p is a monotonically increasing function of t such that

pn suggests the performance measure at trial n, p0 and p represent the initial
and the asymptotic performance, respectively, and is a constant rate
coefficient:
p(t ) p p p0 e s (t )
16
(1.3)
with 0 p0 p(t ) p 1 .
When coefficient A decreases, the early performance rates increase,
but the late performance rates decrease. When only successful trials are
considered (A = 1), the learning curve has a sigmoid shape, as shown in
figure 1.1 (Leibowitz et al., 2010).
Figure 1.1. The learning curve for p0 = 0.01, p = 0.99, = 0.01

and different values of A
This model was adopted here, taking into account that at the
beginning of the learning period, while discovering the basic concepts, the
effort can be great, but progress is slow. When enough knowledge has been
accumulated, at the middle of the learning curve, progress begins to rapidly
take place. At the end of the learning curve, where expert knowledge
resides, progress is again slow, because complex matters must be addressed.
Description of the Proposed Model
The following subsections (Leon, 2011a; Leon, 2011b) formalize the
definition of tasks that the agents should negotiate for, the adaptation model
of the agents, directly related to the adoption of roles and change in
productivity, and present an evolutionary approach to find a fair,
(near-)optimal task allocation at a given time.
The model can be used to simulate a self-organizing company,
where employees are encouraged to accept tasks themselves, rather than
being assigned, similar to the recommendations provided by the agile
software developing methodologies or modern management practices.
The tasks are considered to be defined by a series of p attributes. A
task has specific values of these attributes, considered to be complexity
17
levels, each within a certain domain. Let T be the set of tasks T {ti } ,
m T and F the set of attributes or features. Each task is defined by the

complexity levels of all the attributes:
ti {c1 , ..., c p } ,
(1.4)
with c j D j , j 1,..., p and p F .

Agents have competence levels l aj associated with each attribute j,
which describe how much knowledge an agent has regarding a particular
feature of a task.
For example, in a software industry environment, attributes can
relate to the use of databases, developing graphical user interfaces, or
working with specific algorithms. A programming task can contain such
features in varying degrees, with different complexity levels. An agent
specialized on databases has a high competence level for the first attribute,
and thus it will have a greater utility for database-related tasks (although
these tasks may also contain issues related to the other attributes, possibly
with lower complexity levels). However, if the commercial interests of the
company determine that other types of tasks should be addressed, the agent
will gradually specialize in other development areas.
The complexity levels refer to only a certain moment in the
evolution of technology. As technology evolves, one can consider that the
competence level naturally decreases, unless a person keeps up with it by
learning.
The model can also address the possible functional limitations of the
agents, i.e. no matter how much an agent tries to become competent for a
certain attribute, it cannot succeed:
aj : laj aj .
(1.5)
It is assumed that an agent has the greatest utility when the

complexity level of an attribute cj approximately matches the competence
level of the agent on that attribute. The utilities of tasks are based on the
valuation of individual attributes the task is composed of. For an agent
a A and a task t T :
p
ua (t ) uaj (c j ) ,
j 1
18
(1.6)
laj c j j
, la aj c j laj
1
j
a
c l j
uaj (c j ) 1 j j a , laj c j laj aj .
a
0, otherwise
(1.7)
The equation for the utility of an individual attribute is adapted after

the shape of a fuzzy triangular number with centre l aj , left width aj and
right width aj . Thus, agents have non-monotonic utility functions.
The reason for choosing this kind of utility function is the following.
If the task is too easy, the agent can solve it with no problems, but it has no
challenge associated with it. If the task is too complex, the agent will have
difficulty achieving it, although it has the chance to improve its own
competence level.
This adapting behaviour of the agents is rooted in the psychological
theory of cognitive dissonance (Festinger, Riecken & Schachter, 1956),
meaning that an agent working in a low-preference (or unpleasant) situation
gradually improves its attitude toward it, in order to reduce the discrepancy
(or dissonance) between the reality and its own painful perspective about it.
The utility of an agent can also depend on certain combinations of
tasks, e.g. an agent cannot handle some tasks simultaneously, or the
interdependencies of several tasks affect its utility:
u a () f (t1 ,...,t s ) ,
(1.8)
where (T ) {} and s .
Evolutionary Approach to Determine Negotiation Outcomes
An evolutionary approach is considered for finding the negotiation
outcomes that are usually believed to be fair or desirable: the Nash solution
and the utilitarian solution.
Let X be a finite set of potential agreements. In our case X contains
combinations of disjoint sets of tasks such that all the tasks are allocated and
each task is allocated to exactly one agent:
19
X S1 ... Sn | Si S , Si S j , i j, i, j A, S1 ... Sn S (1.9)
where S (T ) and n | A | .
An evolutionary algorithm is used to find the desired solutions. The
encoding takes into account the specificity of the problem, i.e. each task
must appear only once in a possible allocation. Therefore, a permutationbased encoding is used. The partition of the permutation between different
agents is defined by n 1 integer genes, with values from 1 to n 1 .
Therefore, a hybrid representation is used: the first genes of the
chromosome are the split points, and the rest contains the actual permutation
of tasks, as shown in figure 1.2.
Figure 1.2. Chromosome encoding and example
The fitness function is the product or the sum of agent utilities for a
given deal, i.e. a chromosome. The crossover and mutation operations are
different for the first part of the chromosome and the second. Normal onepoint crossover is applied to the split point section, while a modified
crossover is applied to the permutation. The procedure for the latter is as
follows: one crossover point is selected, the first part of the first parent is
directly copied into the child, and then the remaining distinct values are
copied in order from the second parent into the remaining loci of the child.
This process is presented in figure 1.3.
Figure 1.3. Permutation crossover (example)
Mutation by resetting a gene is performed to the split points part of

the chromosome. In order to keep the chromosome valid, the split points are
sorted in ascending order. If two split points are equal, a new mutation is
used to make them different, because each agent must be allocated at least
one task. For the permutation part, the interchange mutation is used, i.e. two
20
genes are randomly selected and interchanged. In this way, a new valid
permutation is generated, because no duplicate values appear.
The chosen selection method is the tournament selection with two
individuals, because it is fast and uses only local information. Elitism is
used, i.e. the best individual is directly copied into the next generation, in
order not to lose the best solution of the current generation. However, it was
considered that copying more than one individual would decrease the
genetic diversity needed to find a global optimal solution.
Agent Adaptation: Learning and Forgetting
Following the model described above, the knowledge of the agents traverses
a sigmoid learning curve, defined by the following equation:
L( x )
1
1 e
(1.10)
( x )
Here, parameter controls the steepness of the sigmoid curve in its

central part while parameter translates the learning curve on the abscise in
order to fit the domain of the task attributes.
The main feature of the proposed adaptation process is that it takes
into account the effort of reaching different competence levels, as defined
by the inverse of the sigmoid function:
L1 ( y)
ln(1 y) ln y
(1.11)
Thus, the agents perform equal small steps on the ordinate, and these
reflect into unequal steps on the abscise:
1
j
j
L (1 ) L(la (k )) L(c j ) , la (k ) c j
l (k 1) 1
j
j
L (1 ) L(la (k )) L(c j ) , la (k ) c j
j
a
(1.12)
When an agent receives a task above its competence level, it will

learn using learning rate . The basic idea of moving a small distance
towards the target, which here is the complexity level of a task attribute is
commonly encountered in adaptive methods such as Kohonen selforganizing map (Kohonen, 1982) or Q-Learning reinforcement learning
algorithm (Watkins, 1989). On the other hand, if the agent constantly
21
receives tasks below its competence level, it starts to decrease its

competence in that area with a small forgetting rate . Normally, << .
Duration and Productivity
The duration of solving a task is defined as follows:
|F |
d d aij , i 1.. | T |
(1.13)
d ai max d aij , i 1.. | T |
(1.14)
i
a
j 1
or:
j 1..|F|
The time taken for an agent to handle a specific task attribute with a
certain complexity level is computed as follows:
j
1, l c j
d aij a
j
j
1 L(c j ) L(la ) , la c j
(1.15)
Thus, an agent with a sufficient competence level can solve a task in

a time unit. However, if an agent with a high competence level does so, the
system wastes its resources, because the highly specialized agent could be
used to solve more difficult tasks. The time to solve tasks above ones
competence level also takes into account the effort difference on sigmoid
learning curve multiplied by an amplification factor .
Finally, productivity is defined as the number of solved tasks in a
time period.
Case Studies: Self-Organization of Roles
In order to demonstrate the behaviour of a system following the previously
presented model, we will first consider a situation where 3 agents are given
10 tasks defined by 5 attributes. This scenario was useful to compare the
results given by the evolutionary algorithm to the actual optimal solution
computed by taking into account all 55980 possible assignments in a task
allocation epoch. The total number of possible assignments is: n! S ( m, n) ,
where n is the number of agents, m is the number of tasks and S (m, n) is the
22
Stirling number of the second kind. The domain of all attributes is

D j [0, 9] .
The utility functions are given as such as the first two agents initially
compete over the same type of tasks, and the third agent has different
preferences.
Figure 1.4. The learning curve used by the agents
The learning curve used by the agents is a particularization of

equation 1.10, where 1 and 4.5 , so that the curve should be
positioned symmetrically into the definition domain of the task attributes, as
shown in figure 1.4.
Table 1.1. The attribute values of tasks (case 1)
Task no.
Attribute values
1
2
3
4
5
6
7
8
9
10
74191
38420
70053
27270
03378
86906
99103
86760
02249
50434
Other parameters used are the following: 0.1 , 0.001 ,

10 the population size of the evolutionary algorithm is 50, and the
number of generations for each task allocation epoch is 200. Let us consider
the tasks shown in table 1.1, with their corresponding attribute values.
The negotiation process for the same set of tasks is repeated 100
times. The evolution of agent utilities is presented in figure 1.5. The results
obtained when the negotiation outcome is the Nash solution are displayed in
23
figure 1.5a, and the results for the utilitarian solution are displayed in figure
1.5b.
Figure 1.5. Agent utilities over 100 repeated trials using as a negotiation
outcome: a) the Nash solution; b) the utilitarian solution
One can notice that in both situations the total utilities eventually
stabilize over some value. Using the Nash solution causes more fluctuations,
while using the utilitarian solution causes larger fluctuations earlier and a
faster convergence.
The evolution of attribute utilities of the agents over 100 repeated
trials is displayed in figure 1.6. Since the first two agents have similar initial
attribute preferences, it can be seen that the utility of Attribute 1 is relatively
equal at first and then decreases for Agent 2 while it remains constant for
Agent 1. Similarly, the utility of Attribute 2 remains relatively constant for
Agent 1 and increases for Agent 2. Both agents find new equilibrium states
where they can receive maximum utility by specializing to different types of
tasks.
24
Figure 1.6. Evolution of attribute utilities of the agents

over 100 repeated trials (random tasks)
This effect can be seen more clearly in figure 1.7, where the tasks
given to the agents have a strict specialization: only 2 attributes out of the
total 5 attributes have non-zero complexity levels.
These tasks with their attribute values are presented in table 1.2. In
this case, the evolution of agent utilities varies more drastically in order for
the agents to adapt to a stricter, more competitive environment.
25
Figure 1.7. The evolution of attribute utilities of the agents

over 100 repeated trials (specialized tasks)
Table 1.2. The attribute values of tasks (case 2)

Task no.
Attribute values
1
2
3
4
5
6
7
8
9
10
80600
40900
40500
08060
05040
08040
04070
00065
00085
00084
26
The total productivity of the system is displayed in figure 1.8 and

figure 1.9. The difference between the two figures is the formula for
computing the total duration of a task, using either equation 1.13 or 1.14.
One important thing to underline is that the system productivity
converges to similar values both when using the Nash solution and the
utilitarian solution. The fluctuations in agent utilities over the learning trials
are reflected in the system productivity. Since by using the max function
instead of sum to compose the attribute durations into the task duration the
constraints increase, there are more evident fluctuations in figure 1.9 than in
figure 1.8.
Figure 1.8. The evolution of system productivity

when task duration follows equation 1.13

when task duration follows equation 1.14
The following scenario is used to show the behaviour of the system

when functional limitations are in place: we impose a limitation 11 4 , i.e.
27
Agent 1 can achieve a maximum competence level of 4 for Attribute 1. This

only allows it to take tasks 2, 4, 5 and 9.
Figure 1.10. The evolution of attribute utilities of the agents

with functional limitations
The final allocation when using the Nash solution is therefore:

{2, 4}, {3, 6, 7, 8}, {1, 5, 9, 10} and agents utilities are: {5.11, 8.66, 8.54}.
The evolution of attribute utilities is displayed in figure 1.10, where
one can see the sharp decrease in utility for Agent 1 (figure 1.10a) regarding
Attribute 1.
Scaling
Figure 1.11 shows the productivity evolution with 100 agents and 1000
tasks, when using the Nash solution and the sum function. In this case, the
28
search space is huge, but the evolutionary algorithm can be applied in a

straightforward manner, while allowing more individuals (100) and
generations (1000), at the expense of a greater computing time. Therefore,
the model can easily scale to any (reasonable) number of agents and tasks.
When dealing with more agents, it can be seen that the fluctuations
are reduced, as there are far more possibilities to achieve a good allocation
deal. When the number of agents is small, the initial variation is more
apparent, as agents quickly shift from one role to another. As stated before,
the difference between the Nash and the utilitarian solution also reflect in
the overall system productivity converging to slightly different values.

for 100 agents and 1000 tasks
The increase and stabilization of total productivity is an emergent

property of the system, because the explicit aim of the agents is to adapt and
maximize their own individual utilities, not to solve tasks more quickly. The
shape of the productivity evolution function is very similar to that of
productivity evolution measured in actual human working environments, as
shown for example by (Jovanovic & Nyarko, 1995).
1.2. Social Networks of Adaptive Agents

Complexity theory is a relatively recent research field which mainly studies
the way in which critically interacting components self-organize to form
potentially evolving structures exhibiting a hierarchy of emergent system
properties (Lucas, 2008). Thus, complexity theory includes the new domain
of self-organizing systems, which searches for general rules about the forms
and evolution of system structures, and the methods that could predict the
29
potential organization resulting from changes to the underlying components

(Gabbai et al., 2005).
Currently, many applications of self-organizing systems have been
proposed (Bernon et al., 2006), among which we could mention: the selforganization of broker agent communities for information exchange (Wang,
2002), solutions to constraint satisfaction problems such as the automated
timetabling problem (Picard, Bernon & Gleizes, 2005), flood forecasting
(Georg et al., 2003), land use allocation based on principles of eco-problem
solving (Dury, Le Ber & Chevrier, 1998), and traffic simulation using
holons (Rodriguez, Hilaire & Koukam, 2006). An interesting approach is
the use of the emergent structures given by Conways game of life (Gardner,
1971) to define functional components such as adders, sliding block-based
memories, and even the pattern of a Turing machine (Rendell, 2002).
Emergent Properties
The emergent properties are characteristic of complex systems, which
usually are formed of a relatively small variety of fairly homogenous agents
acting together according to simple local rules of interaction and
communication, with no global or centralized control. Despite their
homogeneity, complex systems are irreducible, because emergent global
properties result from agents operating in a self-organizing manner
(Stepney, Polack & Turner, 2006). Emergent properties are often associated
with properties such as fault tolerance, robustness and adaptability.
Complex Networks
Many complex phenomena are related to a network-based organization.
Complex networks are often encountered in the real world, e.g.
technological networks such as the power grid, biological networks such as
the protein interaction networks or the neural network of the roundworm
Caenorhabditis elegans, and social networks such as scientific collaboration
of human communication networks (Guo & Cai, 2009).
Many real complex networks share distinctive features that make
them different from regular (lattice) and random networks. One such feature
is the small world property, which means that the average shortest path
length between the vertices in the network is small; it usually scales
logarithmically with the size of the network, and the network exhibits a high
clustering effect. This model proposed by (Watts & Strogatz, 1998) was
designed as the simplest possible model that accounts for clustering while
30
retaining the short average path lengths characteristic of (Erds & Rnyi,
1959) model.
A well-known example is the so-called six degrees of separation
in social networks (Milgram, 1967). Another is the scale-free property of
many such networks, i.e. the probability distribution of the number of links
per node P(k) satisfies a power law P(k ) ~ k with the degree exponent
in the range 2 3 (Albert, Jeong & Barabsi, 1999). The World-WideWeb network has been shown to be a scale-free graph (Albert & Barabsi,
2002).
The generation of scale-free graphs can be performed based on two
basic rules (Gaston & DesJardins, 2005): growth (at each time step, a new
node is added to the graph) and preferential attachment (when a new node is
added to the graph, it attaches preferentially to nodes with high degrees).
Model Description
The proposed multiagent system (Leon, 2012a) is composed of a set of
agents A, which are physically distributed over a square grid. However,
this localization does not prevent agents from forming relations with any
other agents, based on the common interest of solving tasks.
Like the general model presented in section 1.1, the tasks have
attributes with complexity levels and agents have corresponding competence
levels.
The tasks are generated in the following way. First, the number of
i
i
, 1 pnn
p , is determined, by using a power
non-null attributes pnn
law or a uniform distribution:
i
P pnn
~
(1.16)
i
P pnn
~ U (1, p) .
(1.17)
i k
nn
or
When using the power law distribution, most of the tasks will have
one or two non-null attributes, therefore they will have a large degree of
specialization. Thus, agents can specialize in performing some type of tasks.
When using a uniform distribution, there will be a larger number of
31
interdisciplinary tasks, therefore the agents will have to interact more in

order to solve them in a cooperative way.
Based on the distribution of non-null attributes, tasks are
continuously generated during the simulation, and the corresponding values
of the non-null attributes are generated from a uniform distribution:
P(c j ) ~ U 1, Lmax ,
(1.18)
where Lmax is the maximum complexity level allowed in a certain

simulation epoch. Later on, we will analyze the situations where Lmax is
constant throughout the whole simulation, or gradually increases.
When an agent receives a task, it first verifies whether the
complexity levels of the attributes are less or equal to its respective
competence levels: c j laj , j 1.. p .
In this case, the agent can solve the task by itself and receives a
monetary payment of:
p
M ai c 2j .
(1.19)
j 1
If the complexity level of a task attribute is greater

corresponding competence level, then the agent can learn in
the proper level. There are two prerequisites. First, each
individual willingness to learn Wal ~ U Wmin , Wmax . The
pass from a competence level to another is given by
probability:
Pal Wa
j
l la
by 1 than the
order to reach
agent has an
probability to
the following
(1.20)
Accordingly, the probability to go from a low level to the next is

much higher than the probability to go from a high level to the next. For
example, if Wal 0.95 , the probability to go from level 1 to level 2 is 0.95,
while the probability to go from level 9 to level 10 is 0.63.
Secondly, a payment is required of the agent for the learning step:
M al c 2j .
32
(1.21)
For this process to work, each agent has an equal initial amount of
money M 0 .
If the agent is fit to solve the whole task following a possible
learning phase, the agent solves it and it receives the same payment as that
described above in equation 1.19.
If the agent is unable to solve a task by itself, it seeks other agents to
solve the parts of the task it cannot handle by itself.
Interaction Protocol and Social Network Formation
The environment randomly distributes a number of tasks fewer than the
number of agents in the system: T A . Since some agents will be able to
solve their tasks, either individually or cooperatively, while others will not,
in the subsequent epochs, tasks will be given preferentially to agents that
previously succeeded more. Thus, each agent has a probability to solve
tasks, defined as:
Tas
P r,
Ta
s
a
(1.22)
where Tas is the number of tasks solved by agent a, and Tar is the total
number of tasks received by agent a.
In this setting, the agents are sorted by their probabilities to solve
tasks, Pas , and only the first T receive new tasks. Ties between agents with
the
s
a
same
Pas
are
broken
randomly.
The
initial
values
are:
r
a
T 1, T 2, a A , and therefore the initial probability to solve is

Pas 0.5, a A .
However, the agents who fail to solve tasks at first become
disadvantaged because they no longer receive new tasks and therefore they
no longer have the direct incentives to improve their competence levels. For
this reason we introduced a perturbation rate R, such that when R = 0, the
tasks are distributed deterministically, as presented above, and when R = 1,
tasks are randomly distributed.
Each agent has a set of connections to other agents, i.e. friends,
initially void. When an agent cannot solve a task by itself, it begins a
breadth-first search on the connection graph. As agents are physically
situated on a lattice, the immediate neighbours are always added to the
33
search queue after the immediate friends. Search is performed without

allowing agents to be visited twice in order to avoid infinite loops.
Every time an agent is not able to solve the originators task either
directly or by learning, its own friends are added to the search queue. In this
way, the entire graph is ultimately explored, and the closest agents are
always selected to solve the tasks, since breadth-first search is complete and
optimal. We ignore the exponential memory requirements of the algorithm,
because we consider that this aspect is irrelevant to our study; the graph
itself has quite a small depth, similar to the networks that exhibit the small
world phenomenon.
Once an agent is found which can solve one or more attributes of a
task, and only when the whole originators task can be solved, the agents on
the path from the originator to the solver are sequentially connected. Finally,
the originator and the solver are directly connected.
Several aspects must be underlined: there is no limitation to the
number of tasks an agent can solve in one round. It may have only one
individually assigned task, but several task attributes received as
subcontracting. The originator agent u receives a commission rate for the
subcontracted task attributes, and the solver agent v receives the rest:
M u' M u c 2j ,
(1.23)
M v' M v 1 c 2j ,
(1.24)
where is the set of task attributes transferred to agent v for solving.

There is no payment for the intermediary agents, and this is one of
the main differences between this approach and the classical contract net
protocol (Smith, 1980). In this study, we consider that the most important
feature of the social network is to transfer information, not actual solutions
for tasks accompanied by monetary payments.
Once a task is solved, it is considered to be solved by the originator
agent u, whose Pus increases. Every connection, once established, is active a
limited number of epochs T after which it disappears. If an agent does not
succeed in solving a task during one epoch, and does not receive another
one in the next epoch, it can keep its task for a maximum number of epochs
C . If an agent cannot eventually solve a task, its Pus consequently
decreases.
34
Case Studies: Parameter Influence on System Behaviour

In this section, the results of some simulations are presented, which
illustrate the behaviour of the multiagent system. We focus on the common
patterns that emerge from the social network created by the system in
execution, while varying some of the parameters of the simulation.
For all the case studies, the following parameters remain unchanged.
The initial amount of money available to the agents M 0 100 . The limits
of the willingness to learn are: Wmin 0.9 and Wmax 1 . The number of
epochs while a connection is active after being created is T 5 . The
maximum number of epochs while an agent can keep a task without being
able to solve it is C 3 . The total number of tasks distributed during every
epoch is T A / 2 .
Figure 1.12. The evolution of the social network after

10, 100, 350 and 500 epochs when the environment is static
Figure 1.12 shows the evolution of the social network after 10, 100,
350 and 500 epochs, from left to right, top to bottom, respectively. The
35
maximum complexity level Lmax 10 remains constant throughout the

simulation. As it can be seen, the number of agents is 400, arranged on a
grid of size 20 x 20. The perturbation rate R = 0, the commission = 0.1 and
the non-null attributes follow a power law distribution (equation 1.16) with
exponent k = 2. One can notice that at the beginning local connections form
on short physical distances on the grid. Gradually, these small clusters
connect, and global networks emerge. Depending on the initial conditions
(the actual values of task complexity levels, agent competence levels, and
the first distribution of tasks), which are randomly generated and can vary
from a simulation to another, one global network or two/few networks can
appear. Their number also greatly depends on the number of agents
(reflected on the size of the grid). After a great increase, the number of
connections gradually decreases, because the agents tend to learn and adapt
to the unchanging environment, depicted by a constant Lmax . Thus, as agents
increase their individual competence levels, they no longer need other
agents to solve their tasks. Finally, when most of the agents become
independent, the number of active connections approaches 0.
Figure 1.13. The status of the social network after 100 epochs for
different numbers of agents: 100, 256 and 2500
36
Figure 1.13 shows the status of the multiagent system after 100
epochs, with the same parameters, when the number of agents varies (100,
256 and 2500), in order to demonstrate the scaling behaviour of the system.
One can see that the formation of global social network(s) is similar. Also,
the evolution of the system as the number of epochs increases is the same:
the agents enhance their competence and no longer need their peers, and
therefore the connections go through a gradual dissolution process.
Figure 1.14. The status of the social network after 500 epochs
when the environment is dynamic
This tendency of the system is altered when the maximum

complexity level of the tasks is no longer constant, but grows as the
simulation proceeds. Figure 1.14 shows the status of the multiagent system
after 500 epochs, when the environment is dynamic and Lmax constantly
increases: Lmax 10 N epochs / 20 . Because the competence levels of the
agents no longer become saturated, the shape of the social network is
maintained throughout the simulation and its dissolution no longer takes
place.
This situation is more similar to the real world problem of task
allocation, where the environment continuously changes and individuals
must change as well in order to stay adapted and solve the increasingly
competitive tasks they are faced with. It is this individual change and
adaptation itself that causes the dynamism of the environment, due to the
interconnectedness of the individuals.
Figure 1.15 shows the social network of the agents after 100, 350
and 500 epochs, respectively, when the environment is not dynamic
(Lmax = 10 and remains constant), and the perturbation rate is R = 0.5.
Although more connections are formed, the agents still tend to saturate their
competence levels. However, the random perturbation involves more agents
in the task-allocation process, and therefore, connections still remain active
37
after 500 epochs, but in a different, mostly neighbour-based configuration,

because it becomes increasingly likely that a competent agent will be found
in the physical neighbourhood of an agent.
Figure 1.15. The evolution of the social network after 100, 350 and 500 epochs
when the environment is static and the perturbation rate R = 0.5
In the following, we will analyze some of the statistical properties of

the multiagent system, taking into account the way in which the average
number of connections, the average wealth, and the average competence
levels of the agents vary during 500 epochs of the simulation.
In figure 1.16, the evolution of the average number of connections is
displayed, when the type of environment and the type of task attribute
generation differs: evolving (dynamic) or non-evolving (static) for the
former, and power law or uniform distribution for the latter. The shape of
the function is similar in all four cases, with a sharp increase toward the
beginning of the simulation. As it was shown above, the uniform
distribution delays the formation of the maximum sized network, however,
when the network is formed, it has a greater number of connections. As the
system converges, the average number of connections in the evolving
setting is greater than for the power law distribution. Also, for the non38
evolving case, the average number of connections decreases more quickly

than in the power law distribution scenario.
Figure 1.16. The evolution of the average number of connections
These average values show that only a few agents have a great
number of connections, while most of the agents have one or no active
connections at all. In fact, the histogram of the number of agents having a
certain number of connections reveal a power law distribution, as shown in
figure 1.17. The ordinate axis (the number of agents) has a logarithmic
scale. The abscise axis shows the five equal size intervals ranging from 0 to
the maximum number of connections.
1000
100
10
1
1
Figure 1.17. Power law distribution of the number of connections
Regarding the evolution of the average competence levels, figure

1.18 displays a typical situation. It is important to note that the shape of this
function is approximately the same in all scenarios.
39
3.5
3
2.5
2
1.5
1
0.5
0
1
101
201
301
401
Figure 1.18. The evolution of average competence levels
An interesting fact was observed, that the willingness to learn Wal

is not crucial to become a top agent. There are best agents with the
greatest competence levels and the most money, which do not have Wal 1 ,
but even a value close to Wmin . It seems that the most important fact is the
number of opportunities to learn (by receiving tasks directly or indirectly,
from other agents in its social network), which forces the agent to eventually
learn and become the most competent.
Regarding the average wealth of the agents, which results from the
monetary payments for solving tasks, a similar distribution is encountered:
there is one (or very few agents) on top, with a large amount of money, and
most agents are on the bottom of the scale. Since all the agents start with an
equal wealth, at some point in the simulation the ratio reverses, a
phenomenon displayed in figures 1.19 and 1.20. The figures show the
histogram with five intervals, from 0 to the maximum amount of money,
and the bars show on the ordinate the number of corresponding agents on a
logarithmic scale. At the beginning (time frame 1, but not epoch 1), most
agents are in internal I5, close to the maximum. During only a few epochs,
the majority of the agents loses their relative wealth compared to the top
agents, and move into interval I1, close to the minimum. The phenomenon
takes place more quickly when the commission 0.1 (figure 1.19), than
when the commission 0.95 (figure 1.20). Therefore, a higher
commission leads to a more egalitarian distribution of the wealth in the first
phase of the simulation, although in the end, the results are similar.
40
1000
I1
100
I2
I3
I4
10
I5
1
1
Figure 1.19. The change histogram of the average wealth

with a commission rate = 0.1
1000
I1
100
I2
I3
I4
10
I5
1
1
9 10 11 12 13 14 15 16 17
Figure 1.20. The change histogram of the average wealth

with a commission rate = 0.95
It is important to note that these figures show only relative values.

The absolute value of the money actually increases for all the agents
involved.
Finally, we analyze the efficiency of the overall task allocation
system, defined as the average probability to solve tasks Pas of all the agents
in the system.
In figure 1.21, one can notice that the efficiency is slightly larger
when the task attributes are generated using a power law, compared to the
uniform distribution. This is because tasks generated in this way are
easier, as they require less collaboration or learning for solving them.
41
Figure 1.21. The evolution of the overall system efficiency with different
distributions for task attribute generation
1.3. Behaviours with Different Levels of Complexity

Chaos has been extensively studied in physical systems, including methods
to control it for uni-, bi- and multi-dimensional systems (Boccaletti, 2000).
Also, concepts such as causality and the principle of minimal change in
dynamic systems have been formalized (Pagnucco & Peppas, 2001).
Many human-related e.g. social or economic systems are nonlinear,
even when the underlying rules of individual interactions are known to be
rational and deterministic. Prediction is very difficult or impossible in these
situations. However, by trying to model such phenomena, we can gain some
insights regarding the fundamental nature of the system. Surprising or
counterintuitive behaviours observed in reality can be sometimes explained
by the results of simulations.
Therefore, the emergence of chaos out of social interactions is very
important for descriptive attempts in psychology and sociology (Katerelos
& Koulouris, 2004), and multiagent systems are a natural way of modelling
such social interactions. Chaotic behaviour in multiagent systems has been
investigated from many perspectives: the control of chaos in biological
systems with a map depending on growth rate (Sol et al., 1999), the use of
a chaotic map by the agents for optimization (Charrier, Bourjot &
Charpillet, 2007) and image segmentation (Melkemi, Batouche & Foufou,
2005), or the study of multiagent systems stability for economic applications
(Chli et al., 2003). However, in most of these approaches, chaos is explicitly
injected into the system, by using a chaotic map, e.g. the well-known
logistic map, in the decision function of the agents.
42
The main goal of this study (Leon, 2013) is the design of simple
interaction rules which in turn can generate, through a cascade effect,
different types of overall behaviours, from stable to chaotic. We believe that
these can be considered metaphors for the different kinds of everyday social
or economic interactions, whose effects are sometimes entirely predictable
and can lead to an equilibrium while some other times fluctuations can
widely affect the system state, and even if the system appears to be stable
for long periods of time, sudden changes can occur unpredictably because of
subtle changes in the internal state of the system. We also aim at
investigating how very small changes can non-locally ripple throughout the
system with great consequences and if it is possible to reverse these changes
in a non-trivial way, i.e. by slightly adjusting the system after the initial
perturbation has occurred.
The Design of the Multiagent System
The main goal in designing the structure and the interactions of the
multiagent system was to find a simple setting that can generate complex
behaviours. A delicate balance is needed in this respect. On the one hand, if
the system is too simple, its behaviour will be completely deterministic and
predictable. On the other hand, if the system is overly complex, it would be
very difficult to assess the contribution of the individual internal elements to
its observed evolution. The multiagent system presented as follows is the
result of many attempts of finding this balance.
The proposed system is comprised of n agents; let A be the set of
agents. Each agent has m needs and m resources, whose values lie in their
predefined domains Dn, Dr +. This is a simplified conceptualization of
any social or economic model, where the interactions of the individuals are
based on some resource exchanges, of any nature, and where individuals
have different valuations of the types of resources involved.
In the present model, it is assumed that the needs of an agent are
fixed (although an adaptive mechanism could be easily implemented, taking
into account, for example, previous results (Leon, 2011a; Leon, 2011b)
reported in section 1.1), that its resources are variable and they change
following the continuous interactions with other agents.
Also, the agents are situated in their execution environment: each
agent a has a position a and can interact only with the other agents in its
neighbourhood a. For simplicity, the environment is considered to be a bidimensional square lattice, but this imposes no limitation on the general
interaction model it can be applied without changes to any environment
topology.
43
Social Model
Throughout the execution of the system, each agent, in turn, chooses
another agent in its local neighbourhood to interact with. Each agent a stores
the number of previous interactions with any other agent b, ia(b), and the
cumulative outcome of these interactions, oa(b), which is based on the
profits resulted from resource exchanges, as described in the following
section.
When an agent a must choose another agent to interact with, it
chooses the agent in its neighbourhood with the highest estimated outcome:
b* arg max oa (b) .
b a
The parallelism of agent execution is simulated by running them

sequentially and in random order. Since one of the goals of the system is to
be deterministic, we define the execution order from the start. Thus, at any
time, it can be known which agent will execute and which other agent it will
interact with. When perturbations are introduced into the system, the same
execution order is preserved. It has been shown that the order of
asynchronous processes plays a role in self-organisation within many multiagent systems (Cornforth, Green & Newth, 2005). However, in our case this
random order is not necessary to generate complex behaviours. Even if the
agents are always executed in lexicographic order (first A1, then A2, then
A3 etc.), sudden changes in utilities still occur, although the overall aspect
of the system evolution is much smoother.
Bilateral Interaction Protocol
In any interaction, each agent tries to satisfy the needs of the other agent as
well as possible, i.e. in decreasing order of its needs. The interaction
actually represents the transfer of a resource quantum from an agent to the
other. Ideally, each agent would satisfy the greatest need of the other.
For example, let us consider 3 needs (N) and 3 resources (R) for 2
agents a and b: Na = {1, 2, 3}, Nb = {2, 3, 1}, Ra = {5, 7, 4}, Rb = {6, 6, 5},
and = 1. Since need 2 is the maximum of agent b, agent a will give b 1 unit
of resource 2. Conversely, b will give a 1 unit of resource 3.
In order to add a layer of nonlinearity, we consider that an exchange
is possible only if the amount of a resource exceeds a threshold level and
if the giving agent a has a greater amount of the corresponding selected
resource rsel than the receiving agent b: Ra (rsel ) Rb (rsel ) and Ra (rsel ) .
44
In the previous situation, if we impose a threshold level = 5, agent

a will still give b 1 unit of resource 2, but b will only satisfy need 1 for
agent a.
Based on these exchanges, the resources are updated and the profit
pa is computed for an agent a as follows:
pa N a (rsel )
Rb (rsel )
.
Ra (rsel )
(1.25)
A bilateral interaction can bring an agent a profit greater or equal to

0. However, its utility should be able to both increase and decrease. For this
purpose, we can compute a statistical average of the profit, pavg, and increase
the utility of an agent if the actual profit is above pavg, and decrease the
utility if the profit is below pavg.
Thus, the equation for updating the utility level of an agent a is:
ua
u a iaadj pa pavg
iaadj 1
(1.26)
where the adjusted number of interactions is: iaadj min bA ia (b), imem ,
imem is the maximum number of overall interactions that the agent can
remember (i.e. take into account) and is the rate of utility change. At the
beginning, the utility of the agent can fluctuate more, as the agent explores
the interactions with its neighbours. Afterwards, the change in utility
decreases, but never becomes too small.
For example, if imem = 20, ua = 0.1, pa = 8.5, = 1, pavg = 7.5 and the
sum of all previous interactions is 2, the utility will change to:
ua = (0.1 2 + (8.5 7.5) 1) / 3 = 0.4. If the sum of all previous
interactions is 100, the same utility will change only to:
ua = (0.1 20 + (8.5 7.5) 1) / 21 = 0.14.
Similarly, the social outcome of an agent a concerning agent b is
updated as follows:
oa (b)
oa (b) ia (b) pa pavg

ia (b) 1
(1.27)
In this case, the social model concerns only 1 agent and thus the use
of the actual number of interactions can help the convergence of the
estimation an agent has about another.
45
Regarding the computation of the average profit, we used a

statistical approach where we took into account 100 continuous interactions
between two randomly initialized agents, which exchange resources for
100000 time steps. The average profit depends on the number of resources,
their domain and the interaction threshold.
Types of Behaviours
A key challenge in applied dynamical systems is the development of
techniques to understand the internal dynamics of a nonlinear system, given
only its observed outputs (Dawes & Freeland, 2013). As the observed output
of our multiagent system, we consider only the agent utilities. We can view
this output as a discrete time series, one for each agent. In the following, we
analyse the evolution of these time series over time. Since there is no
stopping condition for the agent interactions, we restrict our study to a
predefined, finite time horizon, e.g. 1000, 2000 or 10000 time steps.
Depending on the number of agents and the initial state of the
system, several types of behaviours can be observed:
Asymptotically stable: When only 2 agents exist in the system, we

noticed that they can perform an indefinite number of interactions.
They can stabilize to a continuous exchange of resources, possibly
the same resource in both cases ( units of the same resource are
passed back and forth between the 2 agents). With 2 agents, the
system quickly converges to a stable state (figure 1.22). Depending
on the initial state, a stable state can also be reached by some agents
in a system with multiple agents. The typical behaviour in the latter
case is a high frequency vibration around the value of convergence.
However, it is also possible that multiple agents all converge to
stable states and the system remains in equilibrium afterwards;
Quasiperiodic: With more interacting agents in the system, usually
their utilities no longer converge to a stable value. Instead, the values
belong to a certain range, with few, predictable oscillations around
the mean value. Figure 1.23 shows the evolution of the utility of 4
agents over 10000 time steps. In order to smooth out short-term
fluctuations and highlight longer-term trends, a simple moving
average method is used, with a window size of 10 time steps;
Chaotic: With a high number of agents (e.g. over 10), the
complexity of their interactions usually exceeds the deterministically
predictable level. The utilities of some agents widely fluctuate, even
46
after the initial period where a part of the system approaches a stable
zone. Figure 1.24 displays the behaviour of 100 agents over 10000
time steps. A simple moving average is applied here again, with a
window size of 100 time steps. One agent (with a utility value
around 3) has unpredictable great changes, although they appear to
be governed by a higher-level order of some kind. Another agent has
a sudden drop in utility around time step 9000, although it has been
fairly stable before.
We consider that the third type of behaviour is chaotic, since it
satisfies the typical features of chaos (Ditto & Munakata, 1995):
Nonlinearity: Given the nonlinearity caused by the minimum

threshold for resource exchange, the system can be viewed as a
hybrid one, with transitions between different ways of operation.
Also, the maximum of the social outcome can change, thus an agent
can interact with different neighbours, which results in different
profits and further changes to the social outcomes;
Determinism: Apart from the random initialization of the agent
parameters (which nevertheless can be controlled by using the same
seed for the random number generator), all the interaction rules are
deterministic;
Sensitivity to initial conditions: As we will later show in the
experimental study, very small changes in the initial state of the
system can lead to a radically different final state;
Sustained irregularity and mostly impossible long-term predictions:
These are also characteristic of observed behaviours.
Regarding the effect of small perturbations, which in general can be

used to control a chaotic system, out of many runs under different
configurations, we noticed that a perturbation can affect the overall system
behaviour in more ways:
No effect within a predefined time horizon: depending on the agent

positions, the system state and the place where the perturbation
occurs, some changes can have no effect at all;
A temporary effect which is later cancelled out within the time
horizon;
A permanent effect which reflects in the final state of the system,
within the predefined time horizon.
47
Figure 1.22. Asymptotically stable behaviour 2 agents, 1000 time steps
Figure 1.23. Quasiperiodic behaviour 4 agents, 10000 time steps
Figure 1.24. Chaotic behaviour 100 agents, 10000 time steps
48
We can make a parallel between these kinds of effects and the

choices we make in everyday life. Out of the many alternatives that we
have, a different choice can have no effect or sometimes we may not know
that something is different until a later time when the different choice
becomes relevant. Other times, a different choice impacts our environment
immediately. Even if something changes, the overall environment can
eventually reduce the perturbation, or the system can toggle to a whole
different state indefinitely. All these kinds of behaviours have been
observed in the designed multiagent system.
We can measure the degree of chaos introduced by a perturbation by
considering the difference between the changed system and the original
system as a time series, and computing the largest Lyapunov exponent
(LLE) of the variation in an agent utility. Basically, LLE describes the
predictability of a dynamical system. A positive value usually indicates that
the system is chaotic (Rothman, 2006). There are methods, e.g. (Wolf et al.,
1985), which compute the LLE from the output of the system regarded as a
time series.
10
5
LLE = 4.83
LLE = -2.13
201
401
601
801
LLE = -12.98
-5
-10
Figure 1.25. Chaotic and non-chaotic variations
Figure 1.25 displays three situations. The variation with a positive

LLE (4.83) can be considered to be chaotic. We can notice the sudden
change in utility after the half of the simulation, although the perturbation
has occurred in the first time step. A small negative LLE (2.13) indicates
an almost deterministic behaviour, which can correspond to a quasiperiodic
variation. Finally, a high negative LLE (12.98) indicates a deterministic
behaviour, when the time series converges to a value and remains stable
there. Positive LLEs are not only found in some utility variations, but also in
some of the original utility evolutions, depending on the system initial state.
49
Experimental Studies
A mathematical analysis of a nonlinear hybrid system is usually very
difficult. Therefore, in the following, we will present an empirical
experimental study, where we will emphasise different cases or settings
which reveal certain types of behaviour.
Since one of the characteristics of a chaotic system is that small
changes in its initial state can greatly affect the final state through a cascade
effect, we observe the influence of perturbations on the system behaviour.
We also reflect on the question of when it is possible to correct some
distortions with the smallest amount of external energy, such that, after a
perturbation, the system should reach again a desired state within a
corresponding time horizon, through small changes.
In all the case studies presented in this section, the following
parameters were used: the number of agents n = 10, the number of needs
and resources m = 10, their domains Dn = Dr = [0, 10), the resource transfer
quantum = 1, the resource exchange threshold = 5, the interaction
memory imem = 20, the utility change rate = 2, the side length of the agent
square neighbourhood is 4 and the computed average profit pavg = 7.5.
Original Behaviour
The configuration under study is composed of 3 subgraphs (figure 1.26):
one agent, A1, is isolated and cannot interact with any other agent. Two
agents, A2 and A3, form their own bilateral subsystem and seven agents can
interact with one another in their corresponding neighbourhoods. A change
in any of those agents can affect any other one in this subgraph, because, for
example, A4 can influence A7, A7 can influence A9 and A9 can influence
A10. The evolution of the agent utilities for 2000 time steps is displayed in
figure 1.27.
Figure 1.26. The positions of the agents

50
Figure 1.27. The original evolution of agent utilities with no perturbation
Figure 1.28. The consequences of a perturbation of 0.1 in resource 5 of agent A3
Figure 1.29. The consequences of a perturbation of 10-5

in resource 6 of agent A6
51
The Effect of Small Perturbations

In this section, we observe the evolution of the utilities when a very small
perturbation is added to or subtracted from a resource of an agent. Figure
1.28 shows the difference between the changed behaviour due to the
presence of the perturbation and the original behaviour seen in figure 1.27,
with a slightly larger perturbation of 0.1, and when the agents execute in
lexicographic order. Figure 1.29 shows this difference for a perturbation of
only 10-5 and when agents execute in a predefined random order. With 10
agents and 10 resources, this corresponds to a 10-7 change in the initial
system state. We can see that, in general, the smaller a perturbation is, the
longer it takes for its effect to accumulate and impact the observed
behaviour of the system.
The actual number of an agent or resource is not very important, as
we study the overall performance of the system. However, one can notice
that the effects are non-local, and a change in one agent can affect other
agents in its subgraph. Also, even if the perturbation has occurred in the first
time step, big differences can appear later on, after 686 and 1873 time steps,
respectively.
Perturbation Correction
Given a perturbation in the initial time step with a non-null effect on the
system, we are interested in finding a way to cancel or greatly reduce its
impact, as observed on the time horizon and even beyond. Since this
correction must be done from outside the system, and consists in changing
the amount of a resource of an agent, it is also important that we find the
minimum (or a small) amount of change needed to return the system to its
final state as it would have been with no perturbation.
We would also like to find flexible solutions. A trivial solution
would be to reverse the perturbation in the first time step. However, it is
more interesting to see if there can be changes in later steps of the
simulation which can tackle the effect of the initial perturbation.
Because the effects of change are non-local and can propagate
throughout the subgraph of an agents neighbours, we have applied, so far,
the following search methods:
Exhaustive search with one correction point: trying all the resources
of all the agents in each step of the simulation, adding or subtracting
a small amount (e.g. 0.1, 0.5), and observing the maximum utility
52
variation in the final state of the system. If this maximum variation is

below a desired threshold (e.g. 1), then a solution has been found;
Random corrections with one or multiple points: considering 1 or
more (e.g. 3) sets of quadruples (agent, resource, simulation step,
correction amount) which inject changes into the simulation, and
seeing if the final state of the system matches the final state in the
original setting. The random search is by far faster than exhaustive
search, but it cannot tell if any solution exists at all.
Figure 1.30. A perturbation correction with an amount of 0.5 in resource 2 of

agent A4 in step 70, leading to a maximum difference of 0.48 utility units from
the original final state within the test period of 100 time steps
Beside considering only the state of the system at the time horizon
(e.g. 2000 time steps), it is also important to verify if the system behaviour
continues to be desirable. Figure 1.30 shows the effect of a 1 point
correction for the situation presented in figure 1.28, which remains stable
for a test period of 100 more time steps after the initial 2000 ones. However,
if the system is chaotic, it is impossible to guarantee that this difference will
remain small forever.
53
Chapter 2
Agents with Learning Capabilities

2.1. Planning with Quasi-Determined States
One of the fundamental issues of agent-based design is how to choose the
optimal action, or how to achieve an optimal sequence of actions, a plan.
Traditionally, a great deal of work in artificial intelligence was devoted to
planning algorithms.
Many algorithms have been devised to solve classical planning
problems. Besides the straightforward state-space search, with the forward
(progression) and backward (regression) search methods resembling the
conventional problem solving techniques, we can mention the partial-order
planning (POP) algorithms, whose main idea is to establish partial
sequences of actions into a plan, without specifying an ordering between
these sequences beforehand. Within the partial action sequences, a set of
ordering constraints is defined.
Faced with scalability issues, several planning algorithms emerged,
such as: Graphplan (Blum & Furst, 1997), SATPLAN (Kautz & Selman,
1996) and RAX (Johnson et al., 2000).
A critique to the AI establishment was made by Rodney Brooks
(Brooks, 1991a; Brooks, 1991b), who considers that traditional AI planners
are disembodied entities, and not physically situated in their execution
environment, as it is the case with agents. He considers that the abstraction
of representation can be misleading, and overly simplifies the real-world
problems. Brooks proposed the reactive subsumption architecture, stating
the fact that the intelligence of an agent results from the continuous
interaction between the agent and its environment. He considered a set of
behaviour levels, from the low-level, critical reactions to more abstract ones,
and claims that intelligent behaviour is an emergent property of the
interaction of these simpler behaviours.
However, although reactive architectures are well-suited for certain
problems, they are less suited to others (Jennings, Sycara & Woolridge,
1998). Therefore, there is a need to investigate hybrid architectures that
combine the best of the two extremes.
55
Planning with Quasi-Determined States

The classical planning methods use a predicative logic representation for the
states. For example, if a robotic agent has a plan of taking an apple off the
table and putting it into a basket, a typical plan would use a predicate such
as Apple(a) to describe this object.
However, in a real-life situation, if more objects were placed on the
table, it could be difficult for the agent to recognize the apple out of them.
Another difficulty of the agent would arise if the apple on the table would
have a non-standard appearance, in terms of size, shape or colour. In this
case, the preconditions of actions, that describe the states in which the agent
has to make a decision, are not fully determined. We call such states quasidetermined states (Leon, 2010).
In order to address the issue of reactive actions, we propose that
besides recognizing the preconditions of an action, classification can be
used to directly map a state to an action. A training dataset can be used to
choose the appropriate action in a state instead of a conditional planning
method. As an example, we can mention the classical weather dataset, that
decides if someone should play golf or not, based on attributes such as
temperature, humidity and outlook, or the decision to give a loan to a person
based on his/her marital status, age and income. The applications of
classification are presently extensive in real-life situations, such as deciding
the optimal treatment of patients, classifying electronic payments as
legitimate or not, speech and writing recognition, etc. All these and similar
ones could be part of a longer plan of an agent.
Another advantage of using learning for the quasi-determined states
is that the agent can adapt in real-time to the changing conditions of its
execution environment.
There are many inductive learning algorithms that address the
problem of classification. We can mention three main classes of such
algorithms: decision trees, which provide an explicit symbolic result, similar
to the rules on which humans base their conscious reasoning process,
instance-based, similar to the way humans recognize objects or situations
based on analogy to previously encountered ones, and Bayesian, similar to
the way in which humans make decisions based on probabilities or
frequency of event occurrence. Another well-known technique that can be
used for classification is the sub-symbolic, neural network approach.
Synthetically, figure 2.1 shows the two situations where an inductive
learning phase can be included into a plan. Action 1 is a classical action
within a plan. Action 2 has its preconditions determined by classification.
56
Action 3 is dynamically determined by a supervised procedure. ES, the set of

effects of Start pseudo-action represents the initial state of the problem. PF,
the set of preconditions of the Finish pseudo-action, represents the goal of
the problem. Pi designate the preconditions of an action and Ei designate the
effects.
Figure 2.1. A plan with quasi-determined states and learning phases
Case Study: The Mountain Car Problem

As an example of how learning and quasi-determined states can be
incorporated into a planning mechanism for an intelligent agent, we will
consider the Mountain Car problem. It was originally presented by Andrew
Moore in his PhD dissertation (Moore, 1990) and later Sutton and Barto
included it as an exercise in their well-known introductory book to
reinforcement learning (Sutton & Barto, 1998).
The task requires an agent to drive an underpowered car up a steep
mountain road. Since gravity is stronger than the engine of the car, even at
full power the car cannot accelerate up the steep slope (RL-Community,
2010). The movement of the car is described by two continuous output
variables, position x [1.2, 0.5] and velocity v [0.07, 0.07] , and one
discrete input representing its acceleration a. The acceleration is therefore
the action that the agent chooses, and it can be one of the three available
discrete options: full thrust forward (1), no thrust (0) and full thrust
backward (1).
Recently, a 3D version of the problem has been proposed, which
extends the standard 2D variant, where the states are described by 4
continuous variables (Taylor, Kuhlmann & Stone, 2008).
The mountain curve on which the car is moving is defined as:
h = sin(3x). According to the laws of physics, the discrete-time state space
equations of the system are:
57
vt 1 vt 0.0025 cos(3xt ) 0.001 at
xt 1 xt vt 1
(2.1)
where at {1, 0, 1} .
Figure 2.2 presents the setting of the mountain car problem, adapted
after (Singh & Sutton, 1996; Naeeni, 2004).
Figure 2.2. The mountain car problem
Both state variables are kept in the defined range, i.e. all values
above or below the boundaries will be set to their extreme values. When the
position x is equal to the extreme left boundary 1.2, the velocity v is set to
0. The goal, the top of the mountain, is located at x = 0.5.
The problem is particularly interesting because, in order to reach its
goal, the car must gain enough kinetic energy by accelerating in alternating
directions, backward or forward. It must first drive backward, up the other
side of the valley, to gain enough momentum to drive forward up the hill. It
will therefore move away from the goal at first in order to find a working
solution. Also, the states of the problem defined by position and velocity are
continuous, real-valued, and this causes an additional difficulty for a
reinforcement learning algorithm dealing with discrete states. Finally,
because of the external factor, gravity, and the momentum of the car, the
actions may not have the same results in similar states.
58
Figure 2.3. The behavior of the agent using the nave heuristic for different
starting positions
Nave Heuristic
First, we can verify the assumptions of the problem when the agent uses a
nave heuristic, i.e. it maintains the acceleration forward (a = 1) at all times.
Figure 2.3 shows the behaviour of the system for different initial positions.
59
Figure 2.4. The behaviour of the agent using the simple heuristic
for different starting positions
When the initial position is on the top of the mountain on the

opposite direction from the goal, the car momentum is enough to climb the
mountain side in order to reach the goal. The momentum is enough until the
initial position becomes 0.84. In this case, the nave heuristic proves its
limitation because the car will become engaged in an oscillatory movement
over the alternating sides of the valley. Getting closer to the goal, only the
60
initial position of 0.39 or greater will be enough to reach the goal directly,
using forward acceleration.
Simple Heuristic
Taking into account the characteristics of the problem, we can devise a
simple heuristic that ensures the fact that the goal is reached every time. The
heuristic tries to make maximum use of the gravitational force: the
acceleration of the car is the sign of its speed. Figure 2.4 shows the
behaviour of the system for different initial positions in this case. One can
see that even when the initial position x [0.84, 0.38] , the agent reaches
its goal after several amplifying oscillations.
Reinforcement Learning Solution
The simple heuristic presented above does not solve the problem in an
optimal manner, i.e. with a minimum number of time steps. The problem
was originally designed to be solved with reinforcement learning
algorithms, so we employ such a technique to find shorter plans for the
agent.
Model-free reinforcement learning algorithms using temporal
differences such as Q-Learning (Watkins, 1989; Watkins & Dayan, 1992) or
State-Action-Reward-State-Action, SARSA (Rummery & Niranjan, 1994)
need to discretize the continuous input states of the problem. The Q
function, used to determine the best action to be taken in a particular state, is
defined as Q : S A and is usually implemented as a matrix containing
the real-valued rewards r given by the environment in a particular state
s S when performing a particular action a A . The mountain car problem
is also difficult for a reinforcement learning algorithm because all the
rewards are 1, with the exception of the goal state where the reward is 0.
Therefore, the agent becomes aware of a higher reward only when it finally
reaches the goal.
For the following tests, a Matlab implementation of SARSA
algorithm (Martin, 2010) was used. For the initial positions where the first
approach began to fail, and also for the initial position of x = 0.5, which is
the standard start point suggested by the problem author(s), a comparison
was made in terms of the number of time steps of the solution. This
comparison is displayed in figure 2.5. In most cases, the reinforcement
learning algorithm finds shorter plans than the simple heuristic presented
before.
61
Figure 2.5. Comparison between the number of solution steps found

by the reinforcement learning algorithm and the simple heuristic
Figure 2.6. Position and velocity of the car for the initial position of x = 0.5
using simple heuristic and reinforcement learning, respectively
Figure 2.6 further presents a detailed comparison between the two

approaches for an initial position of x = 0.5 in terms of car trajectory and
car speed. The upper row contains the results of the simple heuristic. One
62
can notice that during the second left oscillation, the car hits the fixed wall
and its speed becomes 0. The additional steps of the solution are due to the
fact that it didnt control its acceleration better so that it could climb the left
side of the mountain only up to a position sufficient to gain enough
momentum to reach the goal. The bottom row contains the results of the
reinforcement learning algorithm.
Filtered Supervised Solution
From the Q matrix found by the SARSA algorithm, we can extract a
supervised learning dataset, so that each row of the Q matrix is transformed
into a training instance.
Let S be the matrix of states, S sij , 1 i n, 1 j m , let A be the
action vector, A ai , 1 i p , and Q qij , 1 i n, 1 j p .

Then the supervised learning dataset will be a matrix whose
instances will be the states followed by the optimal action in that state.
Formally, D is a matrix D dij , 1 i n, 1 j m 1 , where:
s , if 1 j m
d ij ij
ak , where qik maxqil | l 1,..., p, if j m 1
(2.2)
More synthetically, the dataset D is defined as:

D S
A*
(2.3)
where:
A* ai* , such that ai* arg maxQ(si , a)

a
(2.4)
However, since the Q matrix was inherently constructed based on

trial and error episodes, eventually composed into coherent policies, we
tried to determine how much of this learned experience was relevant for a
supervised learning setting. The supervised learning algorithm used by the
agent was the simple nearest neighbour algorithm (Cover & Hart, 1967),
due to its simplicity and very good performance (mostly up to 100% on the
training set) when the data are not affected by noise (Leon, 2006). Another
reason for choosing the nearest neighbour algorithm was due to its
63
resemblance to human pattern recognition by analogy. In the case of the

mountain car problem, the action is taken by similarity to previously learned
actions in some situations given by car position and car velocity, i.e. the
input states of the problem. It is hypothesized that the dataset resulting from
the reinforcement learning may have noisy or irrelevant training instance,
which may affect the optimal solution.
In order to filter the dataset, we considered 1000 trials of random
sampling, with a filtering factor varying from 10% to 90%.
Figure 2.7. The percent of failed trials when the filtering factor varies
Figure 2.7 shows the percent of failed trials when the filtering factor
varies. By failed trials we mean plans in which the agent fails to reach the
goal, resulting in a continuous oscillatory behaviour. When the training
dataset is small, the information may be insufficient for the agent to learn
the solution. As additional information is accumulated, the agent begins to
use it to solve the problem more frequently. Thus the failed trials decrease
from over 50% with 27 randomly chosen training instances to less than 1%
for 243 randomly chosen training instances. Of course, all the 270 original
instances are sufficient for the agent to solve the problem every time.
Taking into account only the successful trials, we counted the
average number of steps and the minimum number of steeps needed to reach
the goal, graphically presented in figures 2.8 and 2.9, respectively.
64
Figure 2.8. The average number of steps to goal

when the filtering factor varies
Figure 2.9. The minimum number of steps to goal

when the filtering factor varies
One can see that although the average number of steps increases
when the agent receives more information, the minimum number of steps is
attained only with a small dataset. In order to find the optimal dataset, we
extended the test with 10000 trials only for a filtering factor of 10% for the
initial position of 0.5. From the initial 270 states used by SARSA
discretization, with additional removal of redundant states and with a further
removal of training instances where the acceleration was 0 (because we
considered that the optimal solution was to be attained only when the agent
65
actively pursues the goal, with no passive actions), the number of training
instances was reduced to 12. The best solution has now only 104 steps.
These distinctive instances are displayed in table 2.1.
Table 2.1. The selected training instances
Car position x Car velocity v
(agent state)
0.80
0.04
0.70
0.00
0.70
0.06
0.60
0.06
0.40
0.03
0.40
0.02
0.40
0.04
0.30
0.04
0.10
0.04
0.00
0.01
0.10
0.03
0.10
0.02
Acceleration a
(agent action)
1
1
1
1*
1
1
1
1
1
1
1
1
It is clear that these instances conform to the simple heuristic

described above, with only one exception, marked with italic font in table
2.1 and with the asterisk following the class / action. This instance is
responsible for the decrease in the number of steps, because it tells the
agent when to switch to driving forward on the left side of the valley, and
thus to reach the goal on the right side earlier.
2.2. Reinforcement Learning with State Attractors

Reinforcement learning is a convenient way to allow the agents to
autonomously explore and learn the best action sequences that maximize
their overall value, based on successive rewards received from the
environment (Sutton & Barto, 1998).
A multiagent reinforcement problem adds an additional level of
complexity. Since classic algorithms such as Q-learning (Watkins, 1989)
estimate the values for each possible discrete state-action pair, each agent
causes an exponential increase of the size of the state-action space. Another
challenge is the implicit or explicit need for coordination, since the effect of
an agents action also depends on the actions of the other agents. If their
actions are not consistent, the overall goal may be impeded.
66
Learning should usually exhibit adaptation to the changing

behaviour of the other agents and stability, i.e. convergence to a stationary
policy (Buoniu, Babuka & DeSchutter, 2010). Usually, stability assumes
the convergence to equilibria, which means that the agents strategies should
eventually converge to a coordinated equilibrium. Nash equilibrium are
most frequently used for this purpose. Adaptation ensures that performance
is maintained or improved as the other agents change their policies as well.
Several learning algorithms have been proposed for multiagent
systems, among which we could mention: Fictitious Play (Brown, 1951),
Minimax-Q (Littman, 1994), Nash-Q (Hu & Wellman, 1998), Team-Q
(Littman, 2001), WoLF-PHC (Bowling & Veloso, 2002), AWESOME
(Conitzer & Sandholm, 2003).
The main goals of this study (Leon, 2011c) are to propose a new
benchmark problem for cooperative and competitive multiagent
reinforcement learning (MARL), provide a game theoretic analysis of
rational behaviour among competitive agents, and an evolutionary method
to detect a compact representation of agent policies.
The Proposed Benchmark Problem
One of the main difficulties in dealing with the multiagent version of a
reinforcement learning or planning algorithm is the great increase of the
state space, since the search for a solution must take into account the
combination between all the actions of individual agents. Also, the
environment in such a setting is no longer static, in the acception of (Russel
& Norvig, 2002), but dynamic, because it is not only the actions of a
particular agent that determine the next state of the environment, but the
actions of all the other agents. Therefore, it is not enough for an agent to
learn how to react best to the environment; it must also adapt to the models
of the other agents. In this respect, the dynamism of the problem makes it
similar in a way to the moving target learning problem: the best policy
changes as the other agents policies change (Buoniu, Babuka &
DeSchutter, 2010).
Many test problems that deal with multiagent reinforcement learning
are simple problems, involving two or only a few agents, in environments
with a rather small size. In this section, we will consider a more realistic
problem, with 8 agents executing in a 10 x 10 grid world with obstacles.
The agents must retrieve and deliver 3 blocks in a designated order.
The proposed problem is presented in figure 2.10. The agents are
marked with circles and the blocks are represented by grey squares marked
with rectangles.
67
Figure 2.10. The initial configuration of the proposed benchmark problem
The goal of the agents is to move the 3 blocks (K1, K2, K3) to the
Goal state, in this respective order. There are 3 types of agents: A, B, C, D
(these types can also be interpreted as roles that these heterogeneous agents
can play). There is only 1 agent of type A, 2 agents of types B and C, and 3
agents of type D. The blocks can be carried only by a specific combination
of agent types: K1 can be carried only by an A together with a B agent, K2
can be carried by 3 agents of types B, C, and D, while block K3 can be
carried by a B and a D agent.
The world has 2 types of obstacles: a hard wall, displayed in dark
grey, which no agent can cross, and a soft wall, displayed in light grey,
which only an agent of type C can cross (with a penalty denoting its greater
effort).
The agents can perform 7 actions: move in the four axis-parallel
directions (Left, Right, Up, Down), Pick up an object, Put down an object,
and Wait (perform no action). An important feature is that an object can
refer both to a block and to another agent. Therefore, agents in this setting
can directly act on their peers.
Agents can move into or through a cell occupied by another agent.
They execute in a tick-based manner, all in parallel.
There are several rewards given to the agents in different situations:
1. Moving 1 square on the grid: r = 1;
68
2. Moving 1 square on the grid while carrying an object: r = 2;

3. A C agent moving through a soft wall square: r = 3;
4. Hitting an uncrossable obstacle, including the edges of the grid, or a
block that is not meant for its type: r = 5;
5. Waiting, picking up or putting down another agent: r = 0;
6. Picking up a block: r = 100;
7. Putting down a block to the Goal state: r = 100;
8. Completing the global task (all 3 blocks being delivered):
r = 5000 to all the agents.
Since the rewards are subject to a discount factor < 1 (e.g.
= 0.95), the maximization of the total value of the agents also implies the
minimization of the number of time steps needed to solve the problem.
In order to be solved in a near-optimal way (or at all), the problem
requires the agents to make use of several high-level mechanisms, analyzed
in the following subsections.
Cooperation
The problem goal requires agents of different types to cooperate in order to
achieve the ultimate reward. Moreover, the environment is set in such as
way that some agents must demonstrate entirely altruistic behaviour. For
example, the move of the first block, K1, is only possible if an A agent is
present to carry it. However, agent A cannot come to its location by itself
because it cannot go through the soft wall. One of the two C agents must
help agent A. If one C agent does so, the other C agent will move to get
block K2, thus obtaining a reward of 100. If no C agent helps A, there is no
solution to the overall problem. The completion reward of 5000 can
motivate a C agent to help A.
Competition
Since there are more agents of the same type, they must analyze the best
individual goals based on their locations in the environment. Some goals
may yield greater rewards than others (e.g. getting a certain block), but
those individual goals may not be obtainable because other agents of the
same type can achieve them first (i.e. if they are closer to that particular
block).
The best decisions in these situations can be reached with the help of
game theory, and an analysis will be presented in the following for all
agents involved.
69
Synchronization
A weak form of synchronization appears because two or three agents must
come to the location of a block in order to move it. A stronger, more
interesting form of synchronization is needed in order to find the solution as
quickly as possible. Thus, the moving blocks must arrive to the Goal state
one after another, if possible. This translates into a synchronization phase
between the moving blocks, such that a block should start when another
block passes through a particular, appropriate location.
Internal States
There are multiple, consecutive subgoals to be achieved by the agents (move
to a block, carry it to the Goal state, possibly return for another block, and
carry the latter to the Goal). The need to carry more than one block appears
because all three blocks need a type B agent, but there are only two type B
agents present. Therefore, one B agent must eventually carry two blocks.
Especially in this case, it is difficult to find a unique optimal policy
that makes the agent move virtually on the same trajectory in opposite
directions with different goals. It is more convenient to assume that agents
can have internal states that are triggered by certain events, such as picking
up an object or dropping it to the Goal. Thus the behaviour of the agents is
no longer based on a first order Markov process, where the current state St
depends only on the previous state St-1: P(St | St-1, ... , S0) = P(St | St-1).
Game Theoretic Analysis of Competitors Behaviour
In order to reduce the size of the state space when searching for agent
policies, we use a game theoretic approach as a heuristic or pre-processing
phase. We thus analyze what is the rational meta-action (or decision
regarding an individual subgoal) for the agents belonging to the three types
that involve competition. There is no competition in the case of type A,
because there is only one such agent.
Type B
B1 and B2 agents are located in opposite sides of the grid, next to K3 and K1
or K2, respectively. In order to minimize losses from negative rewards
associated with movement (r = 1), agents should strive in general to move
to their nearest block and get the corresponding reward of 100. One can
notice from figure 2.10 that B2 is closer to K2 than to K1. This leads to the
70
short-term analysis presented in table 2.2. The choice regarding which block
to pursue is equivalent to a game in normal (or strategic) form where the
utilities are represented by the sum of rewards received by each agent. For
clarity and simplicity of computation, we will consider that the discount
factor = 1.
Table 2.2. Short-term utilities of B type agents
B1
K1
K2
K3
K1
16, 96
86, 96
96, 96
B2
K2
84, 98
14, 98
96, 98
K3
84, 82
86, 82
96, 18
The cells of table 2.2 show the utilities received by B1 and B2,
respectively, while following all the combinations of the 3 actions: pursuing
block K1, K2, or K3. The calculation of utilities is based on the reward
model described above. For example, in the first cell (16, 96), we consider
the situation where both agents pursue block K1. Agent B2 will reach it first,
after 4 moves, and on completion, it will get a reward of 100. Therefore, it
will get a corresponding utility of 4 ( 1) 100 96 . Agent B1 will arrive
later and thus get a utility of only 16, for moving 16 squares. According to
the Nash equilibrium analysis, columns K1 and K3 are dominated, because
B2 will get a utility of 98 by choosing K2, rather than getting 96 by choosing
K1, or 82 or 18 by choosing K3. If B2 is rational, it will always choose K2,
irrespective of what B1 does. Therefore, B1 should assume this and try to
maximize its own reward under these circumstances. The best B1 can do
when B2 chooses K2 is to choose K3. In this case, B1 will receive the
maximum reward out of the three possibilities: 84, 14, or 96.
Thus, the pure Nash equilibrium of this game is for B1 to get block
K3 and for B2 to get K2. The corresponding cell is marked in bold.
However, this does not take into account the fact that since block K1
is not taken, the overall problem cannot be solved. Also, it is easy to see that
the B agent that gets block K1 will also have a chance to get the third block,
because it will deliver the block to the Goal and will be free sooner than the
second agent.
The long-term analysis of the situation is presented in table 2.3. The
dominated strategies where an agent can have a negative utility were
marked with . Also, the strategy that was previously the Nash
equilibrium is now marked with x, because it fails to solve the problem,
and thus it is dominated by the strategies leading to the final reward of 5000.
The utilities presented in table 2.3 do not include this final reward.
71
Table 2.3. Long-term utilities of B type agents

B2
B1
K1
K2
K3
K1
152, 331
178, 307
K2
319, 164
K3
Similar to the computations for table 2.2, the resulting utilities are
the sums of the rewards for moving on the shortest path to the targets and
for picking up the targets. Finding the optimal path is a different problem
which can be solved itself by reinforcement learning if the environment is
initially unknown. In order to simplify the solution and concentrate on
higher level issues, we assume that the environment is accessible and
discrete, and the agents have means to compute shortest paths, e.g. by
applying the A* algorithm (Hart, Nilsson & Raphael, 1968).
This game has two Nash equilibria, marked in bold. The subgame
formed by strategies (K1, K3) for B1 and (K1, K2) for B2 has a mixed Nash
equilibrium where the agents can stochastically choose either actions with
probabilities P(B1, K1) = 0.65 and P(B2, K1) = 0.64.
In the following, for our case study, we will consider the pure
equilibrium (178,307), because if we assume a cooperative behaviour, both
the sum of utilities (utilitarian solution) and their product (Nash solution)
are greater than for the (319,164) equilibrium. In this case, agent B1 gets
block K3 and then waits until block K2 is moved to the Goal. The optimal
strategy for B2 is to get K1, move it to the goal, then return for block K2,
and also carry it to the Goal.
Type C
Table 2.4 presents the utilities of the two C type agents when the subgoals
are either picking up block K2 or picking up agent A. Of course, since
agents can pick up any object in their environment, there are many other
possible subgoals. However, we chose to analyze these ones because only
they can contribute to a solution to the problem. The other meta-actions are
irrelevant and could only increase the number of time steps needed to reach
the overall goal. We also included the Wait action for C2, because otherwise
the Nash equilibrium would force it to accept a negative utility.
In this case, the pair of strategies (Move to K2, Wait) is the pure
Nash equilibrium of the game. This would also mean that no block would be
taken to the Goal, because agent A cannot reach K1, and C2 has no incentive
to help it.
72
Table 2.4. Short-term utilities of C type agents
A
K2
C1
A
9, 13
92, 13
C2
K2
9, 88
92, 12
Wait
9, 0
92, 0
However, in the long run, C2 can benefit from the completion of the
joint task, therefore, its dominant strategy becomes Move to A.
Type D
Table 2.5 presents the normal form of the game played by D agents.
In this case, the Nash equilibrium requires D1 and D3 to move to their
nearest blocks, and D2 to Wait, because if the other two agents behave
rationally, it cannot pick up any of the two blocks K2 or K3. The three tables
can be aggregately viewed as a 3D matrix, with one axis corresponding to
one of the three agents involved. The utilities displayed in the cells are in
the following order: u ( D1 ), u ( D2 ), u ( D3 ) .
The game has a pure Nash equilibrium point, marked in bold.
Table 2.5. Utilities of D type agents
D2 K2
D1
K2
K3
D3
K2
(94, 10, 13)
(88, 90, 13)
K3
(94, 10, 95)
(12, 90, 95)
D2 Wait
D1
D2 K3
K2
K3
D1
K2
K3
D3
K2
(94, 94, 87)
(12, 94, 87)
K3
(94, 6, 95)
(88, 6, 95)
D3
K2
(94, 0, 13)
(88, 0, 87)
K3
(94, 0, 95)
(12, 0, 95)
Evolving Policies Based on State Attractors

Even with the simplifications provided by the analysis regarding the rational
subgoals of the agents, the space of possible joint actions remains large. A
classical approach in the reinforcement learning field is to find a way to
reduce the size of the state-action mappings (such as the Q matrix), e.g. by
using a neural network to approximate the learned policy (Tesauro, 1995).
An alternative way to compress the state-action mapping is to use a
nearest neighbour classifier to determine the current action based not
73
necessarily on the current state, but on the nearest state with a defined
action. In our case, in order to develop individual policies that would
perform well when combined in the multiagent system, we resort to the use
of state attractors, i.e. states when an action is defined, such that an action
in another state is computed as the action of the closest state attractor. The
state attractors can be viewed as the centre points of a Voronoi diagram.
When an agent enters the corresponding region, it will follow only the
action specified by its centre.
A genetic algorithm was used to discover the state attractors.
Separate searches were performed for every agent when moving
individually. When several agents are jointly involved in moving a block to
the Goal state, a single joint policy was searched for.
Figure 2.11. The structure of a chromosome
Since there can be more state attractors for a certain policy, a

variable length encoding was used. A chromosome is thus formed of a
variable number of triplets denoting the attractor position (line and column
of the grid, from 0 to 9) and action (from 0 to 6). The structure of a
chromosome is displayed in figure 2.11.
The crossover can be performed by choosing two different positions
in the two parents, but multiple of 3, and then joining the different ends
from the parents in order to form two new children. Mutation can be
performed on any position and consists in resetting the gene value to a
random value in its definition domain. The selection method is the
tournament with 2 individuals. Elitism is used, i.e. the best individual is
directly copied into the next generation, in order not to lose the best solution
of a generation.
Especially at the beginning of the GA search, the policies are not
valid and do not make the agent go to a terminal state (its subgoal). In this
case, in order to prevent an infinite loop, e.g. by continuously hitting a wall
and keeping the same position, we restricted the application of the policy to
a maximum of 200 steps.
The fitness function is given by the total sum of discounted rewards
by following the policy:
N
F ( ) t R( S ;t ) ,
t 0
74
(2.5)
where is the policy resulting from the actual execution of the agent
guided by the state attractors, and N is the number of steps to the terminal
state, N 200. R is a function giving the reward in state S.
When two chromosomes have the same fitness value, the shorter
chromosome is preferred in the tournament. Thus, the GA will try to find
the smallest number of state attractors that define an optimal or near-optimal
policy.
Simple State Attractors
The state attractors for the agents involved in reaching the first subgoals (i.e.
reaching a block) found by the GA are displayed in figure 2.12. The dotted
lines show the actual trajectories of the agents.
Joint State and Double State Attractors
As mentioned above, when two or three agents move a block, the GA finds
a unique policy for the block movement. This is shown in figure 2.13 where
state attractors have names such as K1, K2, or K3. Also, figure 2.13 shows
the trajectory of B2 when returning from the Goal state in order to get block
K2.
Figure 2.12. Simple state attractors and agent trajectories
75
Figure 2.13. Joint state and double state attractors and agent trajectories
The double state attractors are more interesting. In this case, the
movement of a block also takes into account the position of another. In this
way, the system can exhibit a means of synchronization.
Since block K3 must arrive to the Goal just after K2, so that the time
to completion should be minimized, its state attractors also take into account
the state of K2. This is achieved by slightly modifying the encoding of the
chromosomes to have two positions instead of one, and the crossover being
executed onto sets of genes, multiples of 5.
The state attractors in this case are displayed in table 2.6.
Table 2.6. Double state attractors for joint movement of block K3
Line K3
1
1
8
9
9
Column K3
8
8
8
8
9
Line K2
5
5
9
9
9
Column K2
4
5
9
9
9
Action K3
Wait
Down
Down
Right
Put Down
The effect is that K3 starts when K2 approaches and it keeps only a

square distance, so that it is delivered to the goal state one tick after K2 is
delivered.
76
Overall, following these policies, a solution to the problem is found

in 89 time steps. Since a GA is used to find the policies, the results are not
always optimal. For example, when B2 returns to get block K2, it moves
down on column 0 instead of column 1, thus moving 2 additional squares.
However, the use of GA has the advantage of simplicity at the expense of
execution time to obtain a near-optimal solution.
2.3. Dual Use of Learning Methods

Following the use of adaptive and deliberative agents, learning enables
systems to be more flexible and robust, and it makes them better able to deal
with uncertainty and changing conditions, which are key factors for
intelligent behaviour. This is particularly important in multiagent systems,
where the task of predicting the results of interactions among agents can be
very difficult.
Many multiagent system applications can be found in the literature
where agents use neural networks for learning. Such systems were used for
intrusion detection in computer networks (Herrero et al., 2008),
classification of medical records (Oeda, Ichimura & Yoshida, 2004),
optimization of multiagent reinforcement learning algorithms (Unemi &
Nagayoshi, 1996; Crites & Barto, 1998), market modelling and simulation
(Grothmann, 2002; Veit & Czernohous, 2003) or for studying phenomena
such as the Baldwin effect in evolving populations (Red'ko, Mosalov &
Prokhorov, 2005).
The vast majority of works consider the use of neural networks, or
other machine learning methods, either to solve problems posed by the user
in some form, or to make the agents self-adaptive to the environment.
However, a human being uses his/her cognitive abilities to solve both
external problems and problems derived from his/her own experience.
The main original contribution of this work (Leon, Leca & Atanasiu,
2010; Leon & Leca, 2011) is the combination of these two approaches, i.e.
the agents in the system use neural networks to solve external problems, but
they can also build their own internal problems out of their experience in a
competitive multiagent environment in order to optimize their performance.
An emergent behaviour of the system is that a group of agents whose
original performance is bellow that of other groups can greatly improve
their operation by learning from their own experience in an autonomous
way. Thus, they succeed in outperforming the other groups only by
developing an adaptive strategy, not by changing their initial problem
solving capabilities.
77
In the following, we describe the way in which individual agents

operate when presented with learning tasks. Their goal is to build models
based on neural networks for data provided by an external user.
Learning Problems
In order to study the behaviour of the multiagent system where the agents
use neural networks to solve classification and regression tasks, we
considered 20 typical classification and regression problems, presented
along with their characteristics in table 2.7. The type refers either to
classification (C) or regression (R).
Table 2.7. The characteristics of the inductive learning problems
Name
Number of
attributes
Type
and-sym
xor-sym
drugs
iris-d3
iris-d5
iris-num
max
monks1
monks2
monks3
shepard1
shepard2
shepard4
shepard6
sin
weather
weather-sym
problem1
problem2
problem3
2
2
3
4
4
4
2
6
6
6
3
3
3
3
1
4
4
3
2
2
C
C
C
C
C
C
R
C
C
C
C
C
C
C
R
C
C
R
R
R
Number of
training
instances
4
4
12
150
150
150
6
124
169
122
8
8
8
8
40
14
14
1352
676
169
Number of
testing
instances
4
4
0
0
0
12
3
432
432
432
0
0
0
0
0
14
0
0
0
0
The problems and-sym, xor-sym define the Boolean logic problems

and and exclusive or. sin and max are the corresponding functions of 1 or 2
variables. iris-d3 and iris-d5 are variants of the Iris flower problem (Fisher,
1936), where the values are discretized into 3 or 5 intervals, while iris-num
uses the original dataset. The drugs problem tries to find the rules to
78
prescribe one of the two classes of drugs, according to the patients age,
blood pressure, and gender. monks{1-3} are the three classical MONKs
benchmark problems (Thrun et al., 1991). The shepard{1,2,4,6} are
problems proposed while studying human performance on a category
learning task (Shepard, Hovland & Jenkins, 1961). weather and weathersym are variants of the golf-playing problem (Quinlan, 1986) where data
values are symbolic or numerical, and only symbolic, respectively.
Finally, in order to further study the regression capabilities, three
functions were devised, with randomly generated samples. The problems are
defined as follows:
f1 ( x, y, z) sin(x) cos( y)
f 2 ( x, y) sin(x) cos( y)
z,
sin( y)
,
2 cos(x)
f 3 ( x, y) sin( x) cos(x y) ,
(2.6)
(2.7)
(2.8)
where the definition domains of x and y are the same for all: x, y ,
and z 2, 2 .
Behaviour of Individual Agents
The multiagent system is implemented using the JADE framework
(Bellifemine, Caire & Greenwood, 2007) and thus it can be easily
distributed. It is comprised of a varying number of agents that apply one of
three different algorithms to train their neural networks: Backpropagation
(Bryson & Ho, 1969; Werbos, 1974; Rumelhart, Hinton & Williams, 1986),
Quickprop (Fahlman, 1988) and RProp (Riedmiller & Braun, 1993).
The agents are given a number of tasks, consisting in repeated
versions of the 20 classification and regression problems described earlier,
problems that appear in a random order.
Before testing the overall behaviour of the multiagent system, the
problems were considered separately, so that their complexity could be
estimated in terms of execution time while executing the training
algorithms.
Since finding the optimal topology of a neural network is a difficult
problem, the agents successively try increasingly complex network
configurations. The performance of the algorithms depend on the Mean
79
Square Error (MSE) desired by the user, and also on the number of training
epochs. In this scenario, the agents need to find a MSE of 0.001 or lower,
and run the algorithms for 500 epochs. The MSE can be higher for
classification problems, if the percent of correctly classified instances is
100% for both training and testing.
They first use a network topology of 1 hidden layer with 10 neurons.
If this configuration proves to be too simple to learn the model, and the
performance criteria are not met, they use a network topology of 1 hidden
layer with 20 neurons. The third attempt is a topology of 2 hidden layers
with 15 and 5 neurons, respectively. Finally, they use a topology of 2 hidden
layers with 30 and 10 neurons, respectively.
These particular values were considered sufficient for most learning
problems, based on authors previous experience with neural network
models on real-world problems (Curteanu & Leon, 2006; Leon, Piuleac &
Curteanu, 2010). Since the process of building a proper neural network is
meant to be automated and the agents are also required to solve the
problems as quickly as possible, and therefore they do not have the option
of an extensive search for the optimal topology, these 4 variants, although
not guaranteed to be optimal, should be representative for the complexity of
the learning problems.
The ratio between the number of neurons in the first and the second
hidden layer is 3:1, following the heuristic proposed by (Kudrycki, 1988).
Figure 2.14. Average execution time of QP and RP compared to BP
In this approach, the number of epochs is rather small and the target
MSE is rather large. However, these values can emphasize the difference in
convergence speed of the training algorithms. Figure 2.14 displays the
80
average values of 10 tests for each problem and algorithm. The

Backpropagation (BP) time was taken as a reference, and displayed in the
labels on the abscise, with values given in milliseconds. The Quickprop
(QP) and RProp (RP) times are given as percents compared to the reference
times. This way of displaying the data was preferred because of the great
difference (in orders of magnitude) between the training times for different
learning problems.
Behaviour of the Multiagent System
The multiagent system assumes the following rules for task allocation.
Tasks are proposed to the agents and they have the freedom of accepting a
task or not. Only the available agents, i.e. agents that are not engaged in
solving some other task can bid for a new one. The system randomly selects
up to 3 agents to solve a task from the bidding agents. This redundancy is
useful because the neural network training depends on the initial, random
initialization of connection weights. Therefore, a trained network can get
stuck in a local minimum. However, the 4 successive attempts to solve the
problem in addition to a maximum of 3 agents to solve the same problem
should be enough to find a reasonable solution.
If the system has less than 3 available agents, the task can be
assigned to only 1 or 2 agents. If there are no agents available, the task
allocation system waits until some agents become available again.
Agents do not know the task complexity a priori. They can only have
descriptive quantitative information such as that contained in table 2.7.
Nevertheless, two tasks with similar structures may have completely
different complexities, depending on the unknown internal model. Agents
also know the number of other available agents in the system, i.e. possible
competitors, but they do not know the type of the training algorithm the
others are using.
The agents receive rewards for solving tasks. However, only the
agent that provides the smallest error out of the competitors receives the
reward. If more agents provide the same error rate, the network with the
smallest topology, and then the agent with the smallest execution time is
declared the winner.
The utility that agents receive for solving a task is computed by
taking into account the classification accuracy or the MSE of a regression
problem.
For classification problems, the utility is given in terms of the
percent of incorrectly classified instances (PICI):
81
U C 100 PICI .
(2.9)
If the problem has both training and testing data, then the utility is
given as a weighted sum between the percent of incorrectly classified
instances for training and testing, respectively, in order to emphasize the
importance of generalization:
U C 100
training
ICI
testing
2 PICI
.
3
(2.10)
For regression problems, the received utility takes into account the
ratio between the training MSE achieved by the agent and the maximum
allowed error, which is 0.001 in our case:
R
MSEtraining
MSEmax
(2.11)
The utility decrease Ud follows the shape given in figure 2.15. The
utility of a solution for a regression problem is computed by the following
equation:
U R 100 U d .
(2.12)
Figure 2.15. The decrease in utility as a function of the MSE ratio (R)
Similar to the classification case, if the problem also has testing data,
the formula becomes:
82
U R 100
training
d
2 U dtesting
.
3
(2.13)
Case Studies
In this section, we study the overall performance of the agents in terms of
received utility. First, we consider the simple case when agents accept tasks
automatically. Then, we try to optimize their behaviour by allowing them to
learn from their previous experience, so that they can only accept tasks that
they believe would yield them a greater utility. We emphasize a surprising
result of the emergent behaviour of the agents. Finally, we analyze the way
in which agent performance scales when the number of agents and tasks
varies.
Non-Adaptive Agents
In order to analyze and later optimize the utilities received by the agents, we
establish a reference result, when all the agents accept any given task,
provided that they are available.
The total utilities received by the agents are displayed in figure 2.16.
The last 3 bars represent the average utility received by each agent type:
Backpropagation agents, Quickprop agents, and RProp agents.
Figure 2.16. Reference behaviour: agents accept any task
83
Adaptive Backpropagation Agents

Given these results, we aim to improve the performance of Backpropagation
agents, and analyze how a change in their utility affects the other agents in
the system. The strategy that we consider is based on task acceptance. The
Backpropagation algorithm appears to be slower that the other two.
Therefore, it would be logical for Backpropagation agents to refuse the
most complex, time-consuming tasks, that would take them longer to
execute and even then, the error could be worse than that of their
competitors and thus no utility would be received. By getting more simple
tasks more often, one could hope to improve the total utility. Also, if there
are more available agents in the system, there are more chances that they
should be Quickprop or RProp agents, which usually provide better
solutions. Thus, if fewer agents are available, there are more chances that a
Backpropagation agent should be the winner of a task utility.
We thus consider the task information and the final result of
accepting it: winning a reward or losing it. This represents another inductive
learning problem, built out of the internal experience of the agents and not
given by an external user. The attributes taken into account are: the number
of attributes, the number of training instances, the number of testing
instances, the type of the problem, and the number of competitors (other
available agents in the system). The output is given by gaining or losing a
reward.
The optimization of the strategy has clear effects on the performance
of the agents. Figure 2.17 displays the resulting behaviour of the system
when Backpropagation agents use the neural network trained based on their
execution experience.
More succinctly, the total utilities received by the agents are
displayed in figure 2.18, as box plots. Box plots display statistics of data by
showing the smallest value (sample minimum), the lower quartile (Q1 - 25th
percentile), the average value, the upper quartile (Q3 - 75th percentile) and
the largest value (sample maximum). The utilities received by the three
types of agents are the box plots to the left of each section. The three
sections refer to the three types of agents: Backpropagation (BP), Quickprop
(QP) and RProp (RP). It can be noticed that Backpropagation agents
perform rather poorly compared to the other agents. Quickprop agents have
good results and RProp agents seem to perform the best.
It can be clearly seen that the average utility of the Backpropagation
agents significantly improves compared to that of other agents. This
increase in performance also affects the other types of agents in the system:
84
the Backpropagation agents receive an average total utility greater than that
of Quickprop and even RProp agents.
Figure 2.17. System behaviour when Backpropagation agents

use a neural network to accept or reject tasks
Figure 2.18. Box plot summarizing the utilities of agent types when
Backpropagation agents use (adaptive BP) or do not use (non-adaptive BP)
a neural network to accept or reject tasks
85
Scaling
Finally, we address the way in which the system scales, by analyzing the
evolution of the average utility for each type of agents when the number of
agents varies and the number of tasks is constant: 5000 (figure 2.19a), and
when the number of tasks varies and the number of agents is constant: 30,
with 10 agents of each type (figure 2.19b). The figure displays the adaptive
BP (A) and non-adaptive BP (NA) cases for the three types of agents
involved.
Figure 2.19. The evolution of average utilities when:

a) the number of agents varies; b) the number of tasks varies
When the number of agents varies, it can be noted that the average
utility decreases, because the tasks are divided among more solving entities.
When the number of agents grows, the difference between the types of
agents becomes less important. When the number of tasks increases, the
average utility received by agents increases in an approximately linear
manner.
86
Chapter 3
Quantum-Inspired Evolutionary Algorithms

3.1. Fundamentals of Quantum Computing
In quantum computing, the basic unit of information is the qubit, whose
state can be represented as follows:
0 1 ,
(3.1)
where and are complex numbers called probability amplitudes (Nielsen

& Chuang, 2010).
The difference from the classical bit is that a bit must be either 0 or
1, while a qubit can be 0, 1 or a superposition of the two, i.e. simultaneously
0 and 1. A characteristic of quantum mechanics is that by measuring a
quantum state, the state is changed into a classical one. It is said that the
wave function collapses. When a qubit is measured, the outcome is
2
either 0 or 1. The probability of obtaining a 0 is and the probability of
2
obtaining a 1 is . Since the squares of the absolute values of the

probability amplitudes are probabilities, the following relation holds:
2
1.
(3.2)
A quantum chromosome is a string of n qubits:
ci 1
1
2
2
2
... n
... n
(3.3)
where j j 1, j 1...n .
Such a chromosome can simultaneously represent 2 n dimensions
(Vlachogiannis & Lee, 2008) since its state is:
87
Si a00...0 00...0 a00...1 00...1 ... a11...1 11...1 ,
(3.4)
where:
a00...0 1 2 ... n
a00...1 1 2 ... n
a11...1 1 2 ... n
(3.5)
By measuring a quantum chromosome, we can obtain a string of n

bits, which are probabilistic projections of the quantum state onto one of the
2
2
single binary states, with probabilities a00...0 , ..., a11...1 , respectively.
3.2. Binary Encoded Solution for Combinatorial Auctions

Combinatorial auctions are auctions in which bidders can place bids on
combinations of items (packages or bundles) instead of individual items.
They are motivated by the non-linear valuation of the requested resource
bundles due to the inherent interdependencies of the auctioned goods or
services, when the bidders are interested in multiple heterogeneous items
and their valuations are non-additive. This kind of resource allocation is a
common research topic in multi-agent systems.
Traditionally, the search for an optimal solution to the combinatorial
auction problem (CAP) was mainly done by integer programming
(Andersson, Tenhunen & Ygge, 2000) or branch-and-bound (Sandholm,
2002). Due to the high computational effort of such approaches, heuristics
are also used to solve the CAP, which try to reach a balance between
solution quality and computational effort.
There are many real-world problems which require the exploration
of multiple local optima in order to find the global optimum. Evolutionary
algorithms (EAs) proved to be very successful optimization methods,
especially in cases where classical differential-based techniques are difficult
or even impossible to use. As population-based methods, EAs demonstrated
a clear potential for finding the global optimum, despite typical problems
such as premature convergence or low solution precision, especially for
complex optimization problems.
88
With the recent advances and popularization of quantum mechanics

and quantum computing, ideas from this field have been increasingly used
as a source of inspiration for new variants of evolutionary algorithms, with
the main goals of increasing their speed and avoiding premature
convergence.
Real-world applications of combinatorial auctions include transport
services, which are often highly interdependent (Ledyard et al., 2002;
Caplice & She, 2003). Practical transport auctions in the United States can
handle an average annual procurement volume of 150 million USD, and the
whole auction process can take up to a few months.
Combinatorial auctions have also been used for truckload
transportation, bus routes, and industrial procurement, and have been
proposed for airport arrival and departure slots, as well as for allocating
radio spectrum for wireless communication services. Combinatorial
auctions for radio spectrum have been conducted in the United States and
Nigeria. In each case, the motivation for the use of a combinatorial auction
is the presence of complementarities among the items, which differ across
bidders. For example, a truckers cost of handling shipments in one lane
depends on its loads in other lanes. Similarly, a mobile phone operator may
value licenses in two adjacent cities more than the sum of the individual
license values, since roaming between cities is important for the customers
of the operator (Cramton, Shoham & Steinberg, 2006).
The CAP is also known as the winner determination problem,
according to the traditional task of the auctioneers to identify the winner.
Many algorithms have been proposed for the CAP, for example a method
based on stochastic local search (Hoos & Boutilier, 2000), a method for
multi-unit combinatorial auctions (Leyton-Brown, Shoham & Tennenholtz,
2000), an optimal algorithm based on depth-first branch-and-bound for
generalized combinatorial auctions, i.e. multi-unit, double auction and price
reservation (Sandholm & Suri, 2000), or a genetic algorithm with random
key encoding (Khanpour & Movaghar, 2006).
The performance of three different heuristics for CAP have been
compared (Schwind, Stockheim & Rothlauf, 2003): the greedy approach,
simulated annealing and a genetic algorithm (GA), and it was found that the
GA, using a random key encoding, shows the best results with acceptable
execution times.
The CAP is NP-hard, i.e. there is no polynomial-time algorithm that
is guaranteed to compute the optimal allocation. Moreover, it is not
uniformly approximable, i.e. there exist no polynomial-time algorithm and a
constant d such that the algorithm could provide an answer that is at least
89
1/ d of the correct optimal one for all inputs (Cramton, Shoham & Steinberg,
2006).
Multi-Attribute Combinatorial Auctions
While auctions are an excellent form of price discovery, there are usually
other aspects in addition to price that can affect the value of the outcome. In
such cases, if the auction process focuses competition on price only, the
buyer is forced to ignore these other aspects or to handle them outside of the
auction by subsequent contractual arrangements. An effective multi-attribute
auction mechanism should possess the following two characteristics: the
buyer must be able to effectively assess the value of a multi-attribute bid,
and a bidder must be able to effectively bid on several attributes
simultaneously (Chen-Ritzo et al., 2005).
Many authors have studied multi-attribute auctions, with a focus to
bidding languages (Bichler, 1998), applications to outsourcing (Mishra &
Veeranmant, 2002), government procurement for the US Department of
Defense by using a model of multi-dimensional auction, including attributes
such as price and quality (Che, 1993). A review of multi-attribute
combinatorial auctions can be found in (Xie, Li & Sun, 2004).
Formalization
In this section, the combinatorial auction model used in this study (Leon,
2012b) is presented. Let G be the set of goods or items to be auctioned, let B
be the set of bids, and let A be the set of attributes. Each bid Bi has a value
for each attribute:
v( Bi ) 1 ,..., A ,
(3.6)
where k , with k 1.. A and i 1.. B .

The corresponding multi-objective optimization problem is to find a
set of non-dominated vectors (the Pareto frontier), composed of vectors y *
such that for any other dominated vector y, y *i y i for all i, and y *i y i for
some i.
Under these circumstances, for each attribute k one must find:
90

B
max xi vk Bi ,
(3.7)
i 1
where xi 0, 1 is a decision variable which determines whether a bid Bi is

accepted or not, and:
G,
B
i
(3.8)
where is the set of bids for which xi 1 .

An additional constraint is that a good must be included only once in
the solution:
i j
1,
(3.9)
where j is the set of bids containing good j, with j 1.. G .

Evolutionary Multi-Objective Optimization
In contrast to single optimization, in multi-objective optimization both
fitness and selection must support several objectives. Multi-objective
algorithms are especially fit for the multi-attribute combinatorial auction
scenario. Several different variants of multi-objective algorithms have been
introduced with different fitness assignment and selection strategies. Based
on fitness assignment and selection strategies, multi-objective algorithms
can be classified as aggregation-based, population-based and Pareto-based
approaches (Tran, 2006). In a previous study (Crciu & Leon, 2010),
NSGA-II (Deb et al., 2002) was compared with a weighted genetic
algorithm and with VEGA (Schaffer, 1985), and proved to have superior
performance. Consequently, it was selected as a basis for comparison in this
work.
The QIEA-SSEHC Algorithm
The evolutionary operations are performed by means of quantum gates,
which correspond to rotations of the qubit parameters. We can consider that
91
cos( ) and sin( ) , and thus:

cos sin
U
.
sin cos
(3.10)
Usually, an operation that roughly corresponds to crossover is to

rotate a gene of an individual towards the gene value of the best individual
in the current population, by adding a (possible negative) value :
cos( ' )
cos( ) cos( )
(
)
sin( ' )
sin( ) sin( ) .
(3.11)
Mutation is usually performed through a gate that swaps the

probability amplitudes of a qubit, and :
0 1
U
.
1
0
(3.12)
Since crossover in a quantum-inspired evolutionary algorithm is

usually made by means of a rotation gate (equation 3.10), some authors
have used a differential approach to fine-tune the rotation angle, i.e.
gradients of the objective functions to determine the size of the rotation
angle (Li, Wu & Liu, 2012).
However, it can be argued that the nature of this operation is not
exactly in line with the philosophy of the evolutionary approach, where the
objective function may not be differentiable. In the present paper, in order to
improve the quality of the solution, an evolutionary hill-climbing phase was
added after the termination of the QIEA: a set of solutions is generated
starting from the current solution, based only on mutation, and the best
solution among those is selected as the new current solution. The process is
repeated for the same number of generations as the main QIEA.
In addition, a steady state model is used. A new population is
created, but it does not automatically replace the old one. The two
populations are merged, sorted by non-domination and crowding, like in
NSGA-II, and the next generation is produced by selecting the best half.
In order to generate valid candidate solutions, and hence to increase
the diversity of the population by avoiding to discard individuals which do
not satisfy the constraints, a repairing procedure was applied. A greedy
approach is impossible, because with multiple attributes it cannot be
92
determined which gene will have the lowest or highest impact on the fitness
of a chromosome. Therefore, a randomly selected gene with the value of 1
in the measured individual is set to 0, thus decreasing the number of
violated constraints, until the chromosome comes to satisfy all the
constraints of the problem.
Case Studies
Combinatorial Auction Scenarios
In order to create combinatorial auction problems, the CATS software
(Leyton-Brown et al., 2011) was used, which is a generator of combinatorial
auction instances.
Three single-attribute problems were generated following the CATS
distribution L2, where the number of goods is chosen according to a random
distribution, and the price is chosen according to a linear random
distribution, with different degrees of complexity: a small one (with 5 goods
and 30 bids), a medium-sized one (with 10 goods and 200 bids), and a big
one (with 50 goods and 1000 bids).
In order to transform them into multi-attribute problems, some
processing was performed. The three resulting problems have three
attributes, computed as follows:
Attribute A1 , the data generated by CATS;

Attribute A2 r1 k1 A12 , where r1 is a uniform random number,
r1 [0.2, 0.8] and k1 1000;
Attribute A r k 1 / A , where r2 is a uniform random number,
3
r2 [0.25, 0.75] and k2 10 .
In a practical setting, the three attributes could mean for example

price, quality and lead time (the delay between the initiation and execution
of the process subjected to the auction).
The aim of a combinatorial auction is to find a maximum price for a
bundle of goods. The auctioneer wants to maximize the price. If the price
(A1) is high, the quality (A2) must also be high, but the lead time (A3), a
disadvantage, may be low. The ranges of r1 and r2, as well as the values of
the scaling constants k1 and k2 are arbitrary.
93
Performance Metrics
Unlike in the single-objective case, the performance of multi-objective
algorithms is harder to compare. Since the solution is actually a set of nondominated vectors, usually of different cardinalities, it is not straightforward
to assess the quality of a solution in terms of a single real number.
There are two main approaches that try to overcome this difficulty
(Groan, 2003). One focuses on the convergence of the solution to the real
Pareto front (if known, in the ideal case). The other concentrates on the
diversity of a solution, often by measuring how far apart the solution vectors
are from one another.
In this study, a performance metric inspired by the R2 convergence
metric in (Hansen & Jaszkiewicz, 1998) is used, which takes into account
the expected value of a solution given all possible utility functions.
This proposed metric is a weighted sum of partial fitness function
values, averaged over all possible values of the weights:
U (S )
sS aA wW
ua ( s)
p( w) dw
u amax ( s)
,
S
(3.13)
where S is the solution of the algorithm (the set of all non-dominated

solution vectors), A is the set of attributes, W is the set of all utility
functions, represented here as the set of all possible combinations of weights
(provided that the sum of all weights is 1), u a (s) is the utility of a solution
vector for an attribute, i.e. the value of a partial fitness function, and uamax (s)
is the maximum utility obtainable for that attribute, i.e. the optimal result of
an optimization for that attribute alone. p is an intensity function which
gives the probability density of each utility function wW .
From the implementation point of view, the integral is approximated
as a Monte Carlo sampling with 1 million samples, where each trial
generates a combination of normalized weights. Since any two simulations
give the same result for U, it is considered that this number of samples is
sufficient for three attributes. The number of samples is independent from
the number of attributes, and thus this approach can also be applied for a
larger number of attributes, although the approximation will likely have less
precision.
Figure 3.1 shows two solutions of a bi-attribute problem (the
medium-sized problem considered for analysis, only with attributes A1 and
94
A2), obtained by NSGA-II and QIEA-SSEHC. The two-dimensional Pareto

fronts can be easily visualized. One can see that the better solution obtained
by QIEA-SSEHC is reflected in a higher value of the averaged utility U.
Figure 3.1. Solution convergence comparison: graphical display

of Pareto fronts and analytic expression of averaged utility
It must be noted that the value of U is always less than 1, if no

attributes are irrelevant. It can be 1 only when the attributes of the problem
are not conflicting. In this case, there is a unique solution and all the
attributes have their maximum values. Of course, this case is neither
realistic nor interesting.
In terms of solution diversity, the average number of solutions for a
certain search configuration was chosen in this work.
Results
In order to evaluate the performance of the quantum-inspired algorithm,
several experiments were made while varying the parameters: the number of
generations and the population size. After 10 trials for each configuration,
the average and best values of the averaged utility U were selected, as a
convergence metric, and also the average number of solutions, as a diversity
metric.
95
Tables 3.1 and 3.2 show the results obtained by NSGA-II and
QIEA-SSEHC, respectively, for the first problem.
Table 3.1. The results of NGSA-II for the small problem
with 5 goods and 30 bids
Pop. Number of Average Best Average number
size generations utility
utility
of solutions
20
100
0.7183 0.7442
9
100
100
0.7604 0.7905
11.5
20
1000
0.7669 0.7929
10.5
100
1000
0.7910 0.7929
12.1
Table 3.2. The results of QIEA-SSEHC for the small problem

utility
of solutions
20
100
0.7318 0.7734
9.8
100
100
0.7775 0.7929
11.5
20
1000
0.7818 0.7929
11.1
100
1000
0.7929 0.7929
12
Even if the best results are close, one can see that QIEA-SSEHC
performs better even with a small population and a small number of
generations. In all four configurations, the average results of QIEA-SSEHC
are better than those of NGSA-II, and also the solution diversity is greater.
Since QIEA-SSEHC converges faster, there is no further improvement of
solution diversity when the population size and the number of generations
are increased.
QIEA-SSEHC, respectively, for the second problem.
Table 3.3. The results of NGSA-II for the medium-sized problem
utility
of solutions
20
100
0.4873 0.5167
16
100
100
0.5380 0.5570
22.5
20
1000
0.5687 0.6128
17.8
100
1000
0.6209 0.6794
27.2
96
Table 3.4. The results of QIEA-SSEHC for the medium-sized problem

utility
of solutions
20
100
0.5009 0.5219
18.5
100
100
0.5591 0.5848
24.4
20
1000
0.5851 0.6057
18.6
100
1000
0.6452 0.6869
27.7
QIEA-SSEHC also performs better than NGSA-II. It is important to

note that the actual values of the utility can only be used as a means to
compare solutions. An average utility of 0.5 for the second problem is not
worse per se than an average utility of 0.7 for the first problem. The two
problems are different and they have different single-attribute optima.
QIEA-SSEHC, respectively, for the third problem.
Table 3.5. The results of NGSA-II for the big problem
utility
of solutions
20
100
0.2772 0.3583
12.6
100
100
0.3532 0.3814
28.4
20
1000
0.3778 0.3924
10.6
100
1000
0.4299 0.4420
40.6
Table 3.6. The results of QIEA-SSEHC for the big problem

utility
of solutions
20
100
0.3361 0.4047
11
100
100
0.3925 0.4322
28.2
20
1000
0.3981 0.4513
11
100
1000
0.4543 0.4865
44
In this case, QIEA-SSEHC is clearly better than NSGA-II, in terms

of both solution convergence and solution diversity. Therefore, such an
approach is especially useful for large, difficult combinatorial auction
problems.
97
3.3. Real-Valued Encoded Solution for Multi-Issue MultiLateral Negotiations

A negotiation problem can be defined as a situation where multiple agents
try to come to an agreement (or deal), given that each agent has a preference
over all possible deals. A solution to this problem is a deal accepted by all
the agents involved. Agents want to maximize their own utility, but they may
also face the risk of a breakdown in negotiation, or expiration of a deadline
for an agreement (Vidal, 2007). In the general case where several agents
negotiate over several issues, it is much difficult to find the optimal deal.
In this study (Leon, 2012c), we choose a centralized approach to this
problem, by using some evolutionary optimization methods improved with
ideas from quantum computing.
Combinatorial optimization problems were among the first to benefit
as applications of quantum-inspired genetic or evolutionary algorithms.
However, quantum-inspired optimization has also been applied for various
continuous benchmarks problems (Liao, Wang & Qin, 2010; Zongyao &
Zhou, 2011; Li, Wu & Liu, 2012). Quantum-inspired variants have been
proposed as well for differential evolution (Storn & Price, 1997), used to
solve both binary problems such as the knapsack (Hota & Pat, 2011) and
continuous optimization ones (Wang et al., 2012). Other applications
include: training fuzzy neural networks (Zhao et al., 2009), options price
model calibration in financial modelling (Fan et al., 2007), data clustering
(Xiao et al., 2008), image segmentation (Melo et al., 2008) or electric power
applications (Babu, Das & Patvardhan, 2008; Vlachogiannis & Lee, 2008).
Although different from evolutionary algorithms, particle swarm
optimization is also a model of biological behaviour, namely, the social
flocking behaviour of insects or birds. This algorithm was adapted with the
help of ideas from the quantum computing, with applications to numerical
benchmark problems (Pant, Thangaraj & Abraham, 2008), feature selection
and parameter optimization for classification based on neural networks
(Hamed, Kasabov & Shamsuddin, 2011) and linear array antenna synthesis
in the field of electromagnetic research (Mikki & Kishk, 2006).
Real-Valued Encoding
Many optimization problems deal with real multi-dimensional functions. The
task is to find the maximum (or minimum) value of the function, given that
its arguments belong to some predefined ranges:
98
x* arg max f (x)
where x x1 , x2 ,..., xn n and ximin xi ximax , i 1...n .

In the literature, several representation schemes for real-valued
problems have been proposed, as well as different variants for the
evolutionary operators. One approach is to encode the real values as bits,
where a binary number is obtained by measuring a corresponding vector of
qubits. The main problem is that an acceptable precision requires a large
number of qubits, especially for large optimization problems. For example,
one of the multi-issue multi-lateral negotiation problems proposed in this
paper has 100 real numbers that constitute the solutions and need to be
determined. Another approach is to use the phase of the qubits, defined by
their probability amplitudes.
Since cos and sin , there are different schemes suggested
for connecting the potential solutions x with the phase angles , for example
(Wang et al., 2012):
x x min
i
i
ximax ximin
(3.15)
xi ximin
.
i arcsin max
min
x
x
i
i
(3.16)
i arccos
or (Zhao et al., 2009):
Another paper presents a completely different model, in which a

square pulse represents a quantum gene (Fan et al., 2007).
Most of the authors use rotation gates to move an individual gene
toward the value of the currently best individual. Gradients of the objective
functions can be used to determine the size of the rotation angle (Li, Wu &
Liu, 2012). Although a negation gate is used for mutation in many cases,
chaotic mutation can also be employed (Wang et al., 2012), by generating a
chaotic sequence based on the chaotic part of the bifurcation diagram
(Feigenbaum, 1979).
As stated above, a simple way of encoding real-value information
into the qubits is by using phase angles. In this case, a chromosome
employing the so-called double chain coding has the following structure
(Liao, Wang & Qin, 2010):
99
cos( i1 )
ci
sin( i1 )
cos( i 2 )
sin( i 2 )
...
...
cos( in )
sin( in )
The evolutionary operations are performed by means of quantum

gates, presented in equations 3.10, 3.11 and 3.12.
Multi-Issue Multi-Lateral Negotiation
The evaluation of the results of multi-agent negotiation protocols is not
straightforward. However, there are some parameters that can be used to
evaluate these protocols: efficiency, stability, existence of time constraints,
and whether side payments, or money transfers, are allowed (Kraus, 2001).
A qualitative criterion to identify the states that are optimal from a
social point of view is the identification of Pareto optimal states (the Pareto
frontier), i.e. states where it is not possible to increase the utility of some
agents without reducing that of any of the others. As mentioned in section
1.1, commonly used quantitative solutions for the bargaining problem are,
among others, the Nash solution and the utilitarian solution.
Beside these, other solutions which are not always Pareto optimal
have been proposed, such as the egalitarian solution, where all the agents
receive the same utility and the sum of utilities is maximum, and the KalaiSmorodinsky solution, which is Pareto optimal only if all the deals are
possible, and where each agent receives an utility proportional to the
maximum utility that it could be able to receive.
A concrete negotiation problem is the following example (Bo & Li,
2011) with three agents: supplier agent, storage agent and sales agent, and
four issues: the commodity price (with a range from 1000 to 1150 yuans), the
number of commodity (100-200 pieces), the date of payment (5-12 days),
and maintenance period (1-16 months).
The Negotiation Model
In the model considered in this paper, we assume that there are n agents,
which negotiate over m issues. The issues involve dividing an amount
normalized to 1. Agents have preferences (weights) over each issue, which
sum up to 1. The problem is to find a proper division of all the issues such
that the product of the total utilities is maximized, i.e. Nash solution:
U * maxU i
i 1
100
where Ui is the total utility that agent i receives.

The negotiation problem reduces to finding an optimal division
D = (dij) of each issue among the agents. Also, each issue has a value vj of
the amount that is divided. By introducing these values, the divisions dij can
be normalized.
The utility of an agent i for an issue j is computed as follows:
uij wij dij v j
The total utility of an agent is the sum of the issue utilities:
U i uij
j 1
The weights of the agents for the issues wij, are given (known), and so
are the issue values vj. We need to find the divisions dij, which maximize the
product of total utilities.
For example, let us consider the situation with 2 agents and 2 issues
presented in table 3.7. The weights in a row must sum up to 1.
Table 3.7. Agent weights for the 2 x 2 problem
Issues
Agents
A1
A2
I1
I2
Total
0.4
0.5
0.6
0.5
1
1
The values of the issues are presented in table 3.8.

Table 3.8. Issue values for the 2 x 2 problem
Issues
Values
I1
2
I2
1
Table 3.9 displays the optimal, exact divisions which correspond to

the Nash bargaining solution, computed by Microsoft Excel optimization tool
Solver, as shown in table 3.10. In table 3.9, the divisions in a column must
sum up to 1. Table 3.10 contains the individual utilities of the 2 agents (as
computed by equation 3.20), and their product.
101
For the general case, with n agents and m issues, finding the optimal
division requires a general optimization method. In this study, we use and
compare several varieties of quantum-inspired evolutionary algorithms.
Table 3.9. Optimal issue divisions for the 2 x 2 problem
Issues
Agents
A1
A2
Total
I1
I2
0.125
0.875
1
1
0
1
Table 3.10. Agent utilities for the 2 x 2 problem

Utilities
U1
U2
Agents
A1
0.1
0.6
A2
0.875 0
U* (Nash solution)
Total
utility
0.7
0.875
0.6125
The Optimization Algorithms Considered

Classical Methods
As a basis for comparison, two classical methods were first used: the
random generation of values using a uniform distribution for the divisions
(called here Monte Carlo), and a simple evolutionary algorithm with realvalued genes and the well-established operators such as: selection, crossover
and mutation.
The first variant, QIEA1, is one which uses a random initialization of qubits
in terms of a uniform distribution of solutions x in the interval
[0, 1]. Then, the phase angle is arccos x . A rotation gate corresponding
to equation 3.10 is used, with a constant rotation angle d = 0.01 radians. If
the phase angle of an individual is lower than that of the best individual in
the current population, the phase angle of the former is increased by d. If it
is higher, the individuals phase angle is decreased by d. The mutation uses
the probability amplitudes swap, presented in equation 3.12.
102
The second variant, QIEA2, is similar to the first, however it uses

elitism to improve the solution. A new population is created using the same
procedure, but it does not automatically replace the old one. The two
populations are merged, sorted by the values of the fitness, and the next
generation is created by selecting the best half (the so-called steady state
model).
The third variant, QIEA3, uses a different model. The phase angle
of a qubit is randomly initialized following a uniform distribution from the
interval [0, 2]. The probability amplitudes are cos( ) and sin( ) .
From each qubit, 2 real solutions are generated: x1 (1 cos( )) 2 and
x2 (1 sin( )) 2 . The fitness of an individual is f max(x1 , x2 ) . This
approach of generating multiple solutions from the information contained in
a single qubit is much closer to the philosophy of quantum computing. A
resetting mutation is used instead of a flip mutation: a qubit is randomly
reinitialized, in order to introduce a greater diversity into the population and
avoid premature convergence. Also, a population-based elitism was used,
similar to the idea from QIEA2. Finally, since some authors use a differential
approach to fine-tune the rotation angle (Li, Wu & Liu, 2012), it was
considered that the nature of this operation was not exactly in line with the
philosophy of the evolutionary approach, where the objective function may
not be differentiable. However, in order to improve the quality of the
solution, an evolutionary hill-climbing phase was added after the termination
of the QIEA. Here, a set of solutions is generated starting from the current
solution, based only on mutation, and the best solution among those is
selected as the new current solution. The process is repeated for the same
number of generations as the main QIEA.
In all cases, the divisions are normalized in order to comply with the
constraint that
d
i 1
ij
1, j 1...m .
It must be mentioned that none of the three QIEA variants are actual
quantum algorithms meant to be executed on a quantum computer. They are
sequential algorithms for traditional computing architectures, but include
ideas from quantum computing.
103
Case Studies
The Problems
In the case studies, we will use 3 negotiation problems, with increasing
levels of difficulty. The first one is the 2 x 2 problem presented above. Its
exact Nash solution corresponds to a utility product of 0.6125.
The second is a 3 x 3 problem, with the agent weights displayed in
table 3.11 and equal issue values presented in table 3.12. The optimal
solution corresponds to the divisions from table 3.13, which lead to a utility
product of 96.026, as shown in table 3.14.
Table 3.11. Agent weights for the 3 x 3 problem
Issues
Agents
A1
A2
A3
I1
I2
I3
0.2
0.7
0.5
0.3
0.1
0.1
0.5
0.2
0.4
Table 3.12. Issue values for the 3 x 3 problem

Issues
Values
I1
10
I2
10
I3
10
Table 3.13. Optimal issue divisions for the 3 x 3 problem

Issues
Agents
A1
A2
A3
I1
I2
I3
0
0.76
0.24
1
0
0
0.35
0
0.65
Table 3.14. Agent utilities for the 3 x 3 problem

Agents
A1
A2
A3
Nash solution
Total utility
4.75
5.32
3.80
96.026
The third is a 10 x 10 problem, of higher complexity, involving 100

variables to be found. The optimal solution leads to a utility product of
0.968201.
104
The Results
Table 3.15 presents the best results of Monte Carlo sampling, with different
numbers of samples for each problem. In order to be more intuitive, the
results are presented as percents of the optimal result. One can see that this
method only works for small problem sizes. For the third problem, it simply
cannot give a useful result at all, even with 10 million samples. In this case,
the best solution provided is only 2% of the actual optimal solution.
Table 3.15. The results of Monte Carlo sampling
100,000
1,000,000
10,000,000
samples
samples
samples
value
% U*
value
% U*
value
% U*
2x2
0.612391 99.9822 0.612488 99.9980 0.612499 99.9999
3 x 3 86.819573 90.4126 88.107870 91.7542 91.482164 95.2681
10 x 10 0.018156 1.8753 0.020774 2.1456 0.023302 2.4067
Problem
In order to facilitate a comparison, a classical evolutionary algorithm

was applied to the three problems, with different parameters, and the results
are presented in tables 3.16, 3.17 and 3.18. Other parameters that were kept
constant were the crossover rate (90%), the mutation rate (5%), selection by
tournament with 2 individuals and elitism: the best individual was directly
copied into the next generation.
The fitness values are computed after 10 different trials, and they are
presented as percents of the optimum known value.
Table 3.16. The results of the EA for the 2 x 2 problem
Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Average
fitness (%)
99.9675
99.9063
99.9041
99.9917
Best
fitness (%)
99.9933
99.9944
99.9758
99.9993
One can notice that for simple problems the algorithm performance is
quite good. However, for the 10 x 10 problem, even with a high number of
generations (1000) and a large population size (100), the results do not come
close to the optimal value. The best fitness is only around 60% of the known
optimum.
105

Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Average
fitness (%)
99.6662
99.1343
98.1748
99.9339
Best
fitness (%)
99.9349
99.4662
99.6375
99.9749

Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Average
fitness (%)
34.164
14.454
11.593
56.656
Best
fitness (%)
40.713
16.746
12.895
59.604
QIEA1
In order to evaluate the performance of the quantum-inspired algorithms,
several experiments were made while varying the parameters: the number of
generations, the population size and the mutation rate. Table 3.19 displays
the results in terms of average fitness and best fitness.
One can see that the results improve as the number of generations
increases. With 1000 generations, the best fitness comes very close to the
optimum value. Also, the accuracy of the results increases with the
population size, which ensures a greater diversity. The mutation rate must be
larger compared to the one usually used in a classical EA, but if it is much
greater (e.g. 25%), the results are affected by randomness and their quality
decreases. Therefore, an acceptable mutation rate was found to be around
10%.
The results in italics were repeated for convenience. The best results
in each group are marked in bold. Finally, the results with the best
combination of parameters are presented.
Table 3.20 shows the results of the algorithm with different
parameters when applied to the 3 x 3 problem, and Table 3.21 displays the
corresponding results for the 10 x 10 problem.
106
Table 3.19. The results of the QIEA1 for the 2 x 2 problem

Number of
generations
50
100
1000
Population
size
20
20
20
Mutation
rate
0.05
0.05
0.05
Average
fitness (%)
93.97
95.49
96.22
Best
fitness (%)
96.3037
96.4899
99.3993
100
100
100
20
50
100
0.05
0.05
0.05
95.49
96.25
97.74
96.4899
96.9849
99.1758
100
100
100
50
50
50
0.05
0.10
0.25
96.25
97.60
97.11
96.9849
98.2241
97.1142
1000
100
0.10
99.06
99.5515

Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
54.73
66.27
63.87
65.44
Best
fitness (%)
72.16
73.57
72.73
84.74

Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
0.927
1.199
1.150
1.092
Best
fitness (%)
1.088
1.255
1.387
1.277
It is evident that, despite the fact that this variant of the QIEA is very
fast, the results are inferior even to those of the Monte Carlo simulation.
QIEA2
Since the first problem is quite simple, in the following we will focus on the
results of the other two problems. Table 3.22 presents the results of QIEA2
on the 3 x 3 problem, and table 3.23 presents the results for the 10 x 10
problem. One can see that the solutions are better than in the previous case,
107
but especially for complex problems, they are still not satisfactory (the best
fitness is only around 4% of the optimal one).
Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
71.72
80.83
77.05
80.56
Best
fitness (%)
87.38
85.34
85.12
88.21

Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
3.304
3.529
3.112
4.031
Best
fitness (%)
3.744
3.950
4.088
4.168
QIEA3
Table 3.24 presents the results of the third, improved variant of QIEA on the
3 x 3 problem, and Table 3.25 presents the results for the 10 x 10 problem. It
is clear that by the modifications applied to the quantum-inspired algorithm,
the quality of the solutions is now much better. For the 3 x 3 problem, good
results are obtained even with a small configuration (50 individuals and 100
generations), which leads to a very good execution speed.
Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
99.9988
99.9703
99.9721
99.9993
Best
fitness (%)
99.9997
99.9977
99.9991
99.9997
For the 10 x 10 problem, the parameters must confer greater diversity

and a greater convergence time, but the solution in the best case is now the
best obtained out of all the methods under consideration (around 93% of the
optimum). The most important criterion that influences the quality of the
108
results seems to be the number of generations, especially since it also

impacts the fine-tuning, evolutionary hill climbing procedure after the main
search.
Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
109
Average
fitness (%)
81.157
58.571
48.728
91.381
Best
fitness (%)
83.591
63.036
51.733
92.787
Chapter 4
Applications of Machine Learning Methods

4.1. Prediction of Liquid-Crystalline Properties
In the polymers field, the efficient design of new materials requires the
prediction of the properties of the candidate polymers and the selection of
the best structure from all the potential possibilities. To solve this problem,
a quantitative structure-property relationship is necessary. One of the most
interesting properties of polymers is the liquid crystalline behaviour,
because in this state, the materials combine two essential properties of the
matter: the order and the mobility. But, because of the complexity of the
liquid crystalline phase, it is not at all easy to predict the occurrence of a
mesophase, i.e. an intermediate state of matter between liquid and solid.
Also, the reduction of the number of experimental trials represents a
requirement that is more and more felt in the field of the study and analysis
of physical, chemical and biological phenomena. In particular, this tendency
is encouraged by the economical advantages that experimental research can
get from it in terms of time and money savings. The determination of the
liquid crystal properties of some organic compounds based on their structure
and quantitative structure-property relationship is a major research subject
in computational chemistry.
The first study selected for exemplification (Leon et al., 2007) is an
opportunity to prove the utility and the efficiency of classification
algorithms and neural networks for quantifying the structure-properties
relation for some copolyethers. We report a new approach to predict the
liquid crystalline behaviour for a series of copolyethers with mesogene
groups in the main chain. The prediction of properties is correlated with the
chemical structure, type and copolymer composition (molar ratio), polymer
molecular weight and two geometric parameters: the length of the spacer
and an asymmetry parameter. In a prediction problem, the substitution of the
experiments means to save time and materials.
The algorithms used are: neural networks with different topologies,
decision trees: the C4.5 algorithm (Quinlan, 1993), random tree (Witten &
Frank, 2000), random forest (Breiman, 2001), the Bayesian classifier, which
111
tries to estimate the probability distribution of data, and an instance-based

algorithm, which can be viewed as an extension of the classical nearest
neighbour approach. Each method can be applied with several variants by
changing the corresponding parameters.
Non-Nested Generalized Exemplars (NNGE) is an instance-based
algorithm (Martin, 1995) which attempts to avoid overgeneralization by
never allowing exemplars to nest or overlap. NNGE always tries to
generalize new examples to their nearest neighbour of the same class. If this
is impossible because of intervening negative examples, no generalization is
performed. If a generalization later conflicts with a negative example, it is
modified to maintain consistency.
Table 4.1 shows the performance of the neural networks and
classification algorithms the two problems. Problem 1 refers to the input
parameters: molar ratio, type of bisphenol, molecular weight and spacer
length. Problem 2 is formulated for the molar ratio, type of bisphenol,
spacer length and asymmetry parameter as input parameters.
Table 4.1. Error rates of the considered classification algorithms
Algorithm
Neural Network (1H-5)
Neural Network (2H-12:4)
C4.5 pruned
C4.5 unpruned
Random Tree
Random Forest (100 trees)
Nave Bayes
Nearest Neighbour
4 - Nearest Neighbour
NNGE
Problem 1
Training Validation
0%
17.65%
0%
11.76%
10.71%
20.69%
8.33%
17.24%
0%
28.57%
0%
17.85%
13.09%
17.24%
0%
13.79%
0%
13.79%
0%
17.24%
Problem 2
Training Validation
0%
10.71%
0%
7.14%
12%
32.0%
12%
34.61%
0%
32%
0%
32%
16%
30%
0%
26.92%
0%
23.07%
0%
26.92%
Although the symbolic methods (such as decision trees) are less

effective, they have the advantage of an explicit representation of the
classification model. C4.5 generally has good error rates over the training
set (especially in the unpruned version) and acceptable generalization
capabilities.
Neural networks demonstrate the best performance. However, the
structure of the classification model is implicit, as it is given by the
connection weights. Also, there are many free parameters involved in neural
networks, such as choosing the best topology, the learning rate, the
momentum factor when trying to accelerate the learning process, which is
112
rather slow.
Although the neural network method yields the best results and the
prediction phase is very fast, the process of finding the best network may be
difficult.
The Bayesian classifier generalizes well, because it is based on the
estimation of the probability distribution of data. Unfortunately, it has high
error rates in both situations.
An interesting approach is the nearest neighbour paradigm,
especially k-NN, which proves to be a good choice for both problems. Like
the neural networks, it doesnt provide a structure for the data. The speed
and simplicity of the learning process is counterweighted by the prediction
phase, which must employ a search process through the already learned
instances in order to find the desired class.
The prediction of the mesophase occurrence with machine learning
methods, as well as the choice and the codification (numerical and nominal)
of different sets of parameters which characterize the structure and the
behaviour of the studied copolyethers represent a new approach in the field.
In the paper summarized above the liquid crystalline behaviour for a
series of copolyethers with mesogene groups in the main chain has been
investigated. It has been shown that for the studied copolyethers the neural
networks prove to be efficient classification tools.
In the second selected article (Leon, Lisa & Curteanu, 2010), the
case study focuses more on other classification algorithms for the crystalline
property prediction rather than neural networks. An original classification
algorithm is also used with accurate prediction results. It extends the
nearest-neighbour methods by allowing the training instances to be grouped
into hyper-rectangles that are treated unitarily during the prediction phase,
which can greatly improve the speed of the classification, and also by
computing information about the prototype of mode value of the attributes
of such hyper-rectangles, which help to give additional information of an
instance regarding how relevant it is to a class, and thus allowing a fuzzy
logic interpretation of the results.
We used an organic compounds database with 390 records which
includes a wide variety of compounds: bis and tris phenyl aromatic, azo
aromatic, azomethinic types containing connecting groups in the rigid core
as azo, azomethine or double bond. We report a new approach to predict the
liquid crystalline behaviour for these compounds, using neural networks and
classification algorithms.
A performance analysis for several well-known classification
algorithms is made. The error is calculated both for the training set, to
determine how accurately a certain algorithm can build a model of the data,
113
and for a validation set, to obtain the prediction capability of that model.
The algorithms presented here belong to two important classes of
classification methods: eager learners and lazy learners. Eager learners use
many computational resources in the first step to build the actual model, and
then the prediction is easy; we chose different decision tree inducers for this
class: C4.5, Random tree, Random forest and REPTree (Esposito et al.,
1999). Lazy learners, on the other hand, build a very simple model and most
of the processing is made in the second step, for prediction; for this category,
we chose a set of classifiers based on the nearest neighbour paradigm:
simple Nearest Neighbour, k-Nearest Neighbour and Non Nested
Generalized Exemplars with Prototypes.
As stated before, instance-based learning reduces the learning effort
by simply storing the examples presented to the learning agent and
classifying the new instances on the basis of their closeness to their
neighbours, i.e. previously encountered instances with similar attribute
value. Experimental results showed that not all the elements of a category
are processed in the same way; some elements are considered more typical
of one category than others are. This is known as the prototypical effect
(Rosch, 1975). For a European, an apple is more representative for the fruit
category than a coconut. Cognitive psychology gives the prototype two
possible interpretations (Medin, Ross & Markman, 2005). It can represent
one or more real exemplars, which appear most frequently when human
subjects are asked to exemplify a category. The second approach considers
the prototype to be an ideal exemplar that sums up the characteristics of
more members of the category.
The NNGE model (Martin, 1995) was extended in order to include
prototype information about each rule or hyper-rectangle. Since a
generalized exemplar contains several instances, it is not necessary for the
statistical average of those instances to match the centre of the
corresponding hyper-rectangle. The prototype may differ from the hyperrectangle geometric centre. The proposed Non-Nested Generalized
Exemplars with Prototypes, NNGEP (Leon, 2006) is an incremental
algorithm, so the prototypes can also be computed incrementally. Adding a
new instance to a hyper-rectangle is a particularization of the general case of
merging two Gaussian prototypes, with means 1 and 2 and standard
deviations 1 and 2 respectively. We understand by merging computing
the mean and standard deviation for a new Gaussian prototype that
would have resulted if all the instances of the two source prototypes had
been added to it from the start.
This helps computing the similarity between a given instance and the
closest generalized exemplar. Although classification can be based on
114
similarity, the two notions are not identical. Even if an instance has been
assigned a certain category due to an existing rule, it may not be
representative of that category. An important advantage of the model built
by NNGEP is that instances have graded membership to categories, which
permit a fuzzy interpretation using fuzzy numbers with multidimensional
Gaussian membership functions.
Using this fuzzy interpretation, if an instances membership in a rule
(hyper-rectangle) is less than a specified threshold, the instance can be
included into a different rule, or form a different rule by itself.
The above procedure can only be applied to numeric attributes.
When attributes are symbolic or discrete, their mean value cannot be
computed. Instead, the mode of the set of attribute values is computed, i.e.
the value that occurs most frequently in the given set. The distance on a
certain dimension between an instance and a prototype is either 0 if the
instance attribute value on that specific dimension is the same as the mode,
or 1 otherwise.
There are 5 inputs to our problem: length of the rigid core (Lrig.),
length of the flexible core (Lflex), molecular weight (M), ratio of molecular
diameter and total length (S) and compound class (classcomp) (1 6).
Concerning the liquid crystal behaviour, we have coded the possibility to
generate a mesophase with 1 and the crystalline or amorphous phases with
0. This is the symbolic output of the model.
A comparison between the algorithms, taking into account the best
error found, is displayed in figure 4.1.
20
18
16
14
error
12
Training
10
Prediction
8
6
4
2
0
Nearest k-Nearest
Neighbor Neighbor
C4.5
Random Random REPTree

Tree
Forest
NNGEP
Figure 4.1. Accuracy comparison between classification algorithms

115
The instance-based algorithms such as Nearest Neighbour and

NNGEP have zero error for the training set, which is always true unless
there are class assignment errors (noise) in the training set itself, i.e. the
same instance assigned more than once with different classes. The k-Nearest
Neighbour (with k = 4 in our case) has a greater error on the training set and
no significantly better prediction accuracy. NNGEP has a good prediction
rate and it also has the advantage of an explicit model compared to the
nearest neighbour variants, where all the instances are memorized.
From the decision tree algorithms, Random forest and Random tree
also have zero error for the training set. The prediction error is slightly
larger for the Random tree; however, its main advantage is that the result
can be easily interpretable, unlike the Random forest. C4.5 builds a more
compact decision tree, however the errors are not better than those of the
Random tree and Random forest. REPTree has greater errors for both
training set and prediction.
The results in figure 4.1 show that the smallest errors in the
validation phase, 14.729%, correspond to k-Nearest Neighbour and to our
original algorithm NNGEP, associated with 6.701% and 0% in the training
phase, respectively. These values mean a prediction accuracy of 85.271%
for liquid crystalline behaviour under the condition of a large set of
validation data (1/3 from the whole database). If the domain of the
validation (unseen) data becomes narrower, the probability of a correct
answer increases, as it can been seen in table 4.2.
Table 4.2. Prediction accuracy for different domains of validation data
Algorithm
33%
20%
10%
Nearest Neighbour
k-Nearest Neighbour
C4.5
Random Tree
Random Forest
REPTree
NNGEP
84.615
85.271
83.761
83.721
84.496
81.395
85.271
90.67576
88.43355
86.09706
90.13394
90.60364
87.20127
91.07333
95.33788
90.86627
87.89403
95.06697
95.30182
91.66764
95.53667
Our interest is to estimate the accuracy of the results for a large

validation set. In this respect, the proposed algorithm, NNGEP, provides
one of the best results.
The neural network with the best performance was MLP(5:42:14:1),
a multilayer perceptron with 5 inputs (Lrig, Lflex, M, S, classcomp), two
116
hidden layers with 42 and 14 hidden neurons, respectively, and 1 output

(with the value of 1 for liquid crystalline behaviour and 0 otherwise).
Although the database has a large number of data, the network
predictions are affected by significant errors, both in the training and
validation phases. Thus, in this case, the other classification algorithms
prove to be more effective, unlike the first application described above,
where neural networks were the preferred classification tools, both from the
point of view of the results provided and from the point of view of the
methodology applied.
Acceptable results are obtained in the case of the database with
aromatic compounds if the classes of compounds are considered separately.
For example, for the class of compounds marked as 2, a MLP(4:42:14:1)
network was designed, with mean squared error MSE = 0.01831, correlation
between experimental data and network predictions r = 0.9785 and percent
error Ep = 3.1133% on the training set.
Table 4.3. Sample of validation phase for MLP(4:42:14:1)
Lrig
Lflex
9.21
9.22
9.22
9.22
9.23
9.23
9.21
9.21
9.21
9.21
9.21
9.21
25.5
20.98
6.22
8.77
8.9
16.62
6.39
9.94
20.61
11.69
17.24
15.2
0.08
0.09
0.19
0.16
0.16
0.11
0.18
0.15
0.10
0.14
0.11
0.12
463
439
270
298
296
381
266
310
439
360
431
404
LC
experimental
0
0
1
1
1
0
0
1
1
1
0
0
LC
network
0
0
0
1
1
0
0
1
1
0
0
0
The data in table 4.3 show the efficiency of the neural model which
has a probability of a correct answer of 83.33% compared to validation data
set (which represents 10% from the data set of compounds 2). Cells marked
in grey represent wrong predictions of the networks. This percent can not be
compared with the results from table 4.2 because it is limited to a single
class of compounds. For the whole database, the probability is less than
75%.
Although neural networks are a very popular classification tool in
many domains, according to our study, they yielded the worse results for
117
our particular problem. In this case, the best predictions were given by our
original algorithm NNGEP, along with perfect accuracy on the training set.
It thus combines the good performance of the classic instance-based
methods with the lower memory requirements due to its hyper-rectangles
and the ease of the interpretation of its explicit model in the form of rules.
4.2. Stacked Neural Networks for Time Series Forecasting

Many processes found in the real world are nonlinear. Therefore, there is a
need for accurate, effective tools to forecast their behaviour. Current
solutions include general methods such as multiple linear regression,
nonlinear regression, artificial neural networks, but also specialized ones,
such as Self-Exciting Threshold Auto-Regression, SETAR (Tong, 1983),
Multivariate Exponential Smoothing, MES (Pfeffermann & Allon, 1989)
and Functional Coefficient Auto-Regressive model, FCAR (Harvill & Ray,
2005). The DAN2 model (Ghiassi & Nangoy, 2009) is a dynamic
architecture for neural networks for solving nonlinear forecasting and
pattern recognition problems, based on the principle of learning and
accumulating knowledge at each layer and propagating and adjusting this
knowledge forward to the next layer.
Recently, hybrid forecasting models have been developed,
integrating neural network techniques with conventional models to improve
their accuracy. A well known example is the Auto-Regression Integrated
Moving Average, ARIMA (Brockwell & Davis, 1991). The Auto-Regressive
Fractionally Integrated Moving Average, ARFIMA (Ravishanker & Ray,
2002) is a time series model that generalizes ARIMA by allowing nonlinear
values in modelling events with long memory The SARIMABP model
(Tseng, Yu & Tzeng, 2002) combines SARIMA, Seasonal ARIMA and the
back-propagation training algorithm to predict seasonal time series for
machinery production and soft drink time series. Another hybrid model,
KARIMA (Voort, Dougherty & Watson, 1996) combines the self organizing
map (Kohonen, 1982) and ARIMA to predict short-term traffic flow.
In this study (Leon & Zaharia, 2010), a hybrid model for time series
forecasting is proposed. It is a stacked neural network, containing one
normal multilayer perceptron with bipolar sigmoid activation functions and
the other with an exponential activation function in the output layer. The
two neural networks have each one hidden layer, as shown in figure 4.2.
118
Figure 4.2. The architecture of the stacked neural network
The inputs of the model are the recent values of the time series,
depending on the size of the sliding window s. Basically, the stack model
predicts the value of the time series at moment t depending on the latest s
values:
yt f ( yt 1 , yt 1 ,..., yt s ) .
(4.1)
The neurons in the hidden layers of both networks that compose the
stack are normal multiplayer perceptron (MLP) neurons, with bipolar
sigmoid, or hyperbolic tangent, activation functions.
The difference between the two neural networks lies in the output
layer of the individual neural networks that compose the stack. The first
network, represented at the top in figure 4.2, has the same activation
function, the bipolar sigmoid in the output layer. The second network has an
exponential function instead.
Each network respects the basic information flow of an MLP, where
the values represent the thresholds of the neurons:
ytn tanh win tanh wnp ,q y p qn n ,
(4.2)
yte exp wej tanh wre,v yr ve e .

r
(4.3)
119
Finally, the stack output ytstack is computed as a weighted average of

the two outputs, y tn and yte :
ytstack w1 ytn w2 yte ,
(4.4)
with w1, 2 [0,1] and w1 w2 1 .

The training of the networks is performed with the classical
back-propagation algorithm (Rumelhart, Hinton & Williams, 1986). Beside
training the two components of the stack, the goal of the model is to find the
optimal weights w1 and w2, such that the error of the whole stack is
minimized.
Case Studies
The Sunspot Series
Wolfers sunspot time series records the yearly number of spots visible on
the surface of the sun. It contains the data from 1700 to 1987, for a total of
288 observations. This data series is considered as nonlinear and nonGaussian and is often used to evaluate the effectiveness of nonlinear models
(Ghiassi & Saidane, 2005; Khasei & Bijari, 2010). With a window size of 5
points and considering 28 points ahead, the performance of the model on the
training set is displayed in figure 4.3. The forecasting capabilities of the
model are displayed in figure 4.4.
Figure 4.3. The model performance on the sunspot training data
120
Figure 4.4. The model predictions for the sunspot data
In figure 4.5, the effect of the weights on the mean square error of
the stack on the testing data is displayed. One can see that the optimal
weights are w1 = 100 and w2 = 0, where w1 is the weight of the neural
network with sigmoid activation functions and w2 = 1 w1 is the weight of
the second neural network, whose output neuron has an exponential
activation function.
Figure 4.5. The evolution of MSE when w1 varies
Table 4.4 shows the errors both for the training and for testing. It
separately presents the mean square errors for the first, sigmoid network, for
the second, exponential network, and for the stack, respectively. Since the
range of the datasets are very different, we also display the MSE on the
normalized data between 0 and 1, in order to better compare the
performance of our model on data with different shapes.
121
Table 4.4. The errors of the model for the sunspot data
Training
Testing
Sigmoid output NN
Exponential output NN
Stack
Sigmoid output NN
Stack
MSE on original data

131.865
827.933
131.865
511.531
2248.409
511.531
MSE on normalized data

-3
3.645 x 10
-3
22.886 x 10
-3
3.645 x 10
-3
14.140 x 10
-3
62.151 x 10
-3
14.140 x 10
The Canadian Lynx Series

This series contains the yearly number of lynx trapped in the Mackenzie
River district of Northern Canada (Stone & He, 2007). The data set has 114
observations, corresponding to the period of 1821-1934. With a window
size of 5 points and considering 11 points ahead, the performance of the
model on the training set is displayed in figure 4.6. The forecasting
capabilities of the model are displayed in figure 4.7.
The evolution of the mean square error of the stack on the testing
data is displayed as a function of the sigmoid NN weight, w1, in figure 4.8.
Here, the optimal weights are w1 = 23 and w2 = 77. Table 4.5 shows the
mean square errors obtained for a window size of 5.
For this dataset, the exponential network can contribute to the stack
result.
Figure 4.6. The model performance on the lynx training data
122
Figure 4.7. The model predictions for the lynx data

Table 4.5. The errors of the model for the lynx data
Training
Testing
Sigmoid output NN
Stack
Sigmoid output NN
Stack

530,416.473
383,963.211
401,015.900
154,951.311
113,955.783
109,902.034
123

-3
10.974 x 10
-3
7.944 x 10
-3
8.297 x 10
-3
3.206 x 10
-3
2.357 x 10
-3
2.273 x 10
UK Industrial Production
This data series contains the index of industrial production in the United
Kingdom, from 1700 to 1912 (Janacek, 2001). We first consider the
performance of the model on the training set with a window size of 5 points
and 21 points ahead, as displayed in figure 4.9. The forecasting capabilities
of the model are shown in figure 4.10.
Figure 4.9. The model performance on the

UK industrial production training data
Figure 4.10. The model predictions for the

UK industrial production data
The evolution of the mean square error of the stack on the testing
data is displayed in figure 4.11, as a function of the sigmoid NN weight, w1.
The optimal weights are w1 = 0 and w2 = 100. Table 4.6 shows the mean
square errors obtained for a window size of 5.
124

Table 4.6. The errors of the model for the UK industrial production data
Training
Testing
Sigmoid output NN
Stack
Sigmoid output NN
Stack

1.185
0.988
0.988
296.895
9.810
9.810

-3
0.132 x 10
-3
0.111 x 10
-3
0.111 x 10
-3
33.281 x 10
-3
1.099 x 10
-3
1.099 x 10
Unlike the previous problems, the exponential nature of this time

series makes it difficult for a normal neural network, with a sigmoid
activation function in the output layer. Therefore, the exponential network
dominates the stack. One can notice that although the sigmoid network can
approximate the training set fairly well, with errors comparable to those of
the exponential network, there is a clear difference in performance for the
prediction phase, where only the exponential network can find a good trend
for the time series.
4.3. Stacked Neural Networks for Modelling Chemical

Engineering Phenomena
The idea of combining neural network models is based on the premise that
different neural networks can capture different aspects of process behaviour
and aggregating this information should reduce uncertainty and provide
more accurate predictions. It is always possible that a stacked network could
125
outperform a single, best trained network for the following reasons (Tian,
Zhang & Morris, 2001):
Optimal network architecture cannot always be guaranteed;

The optimization of network parameters is a problem with many
local minima. Even for a given architecture, the final network
parameters can differ between one run of the algorithm and another;
Different activation functions and learning algorithms can also lead
to different generalization characteristics and no single activation
function or learning algorithm is the best in all cases;
Convergence criteria used for network training can lead to very
different solutions for a given network architecture.
Stacked neural networks designed to improve predictive

performance, have been increasingly used for chemical processes, especially
when dealing with complex nonlinear processes where the understanding of
the phenomena is limited.
The stacked neural network applications presented in this section are
based on two papers.
The first one (Leon, Piuleac & Curteanu, 2010) contains a case study
consisting in establishing the dependency between the reaction yield of
hydrogels based on polyacrylamide and the swelling degree of these
chemical compounds and the reaction conditions such as time, temperature,
monomer concentration, initiator, crosslinking agent and inclusion polymer,
as well as type of the polymer added.
The second one (Piuleac, Poulios, Leon, Curteanu & Kouras, 2010)
presents a neural networks modelling methodology applied to the
heterogeneous photocatalytic decomposition of triclopyr in the presence of
TiO2 as photocatalyst. This methodology includes simple and stacked feedforward neural networks and a technique of optimizing the weights of the
stack to obtain the best performance of the neural models. The developed
models, especially stacked neural networks, accurately predict the final
concentration of triclopyr as a function of reaction conditions.
Stacked Neural Network Modelling of the Synthesis of PolyacrylamideBased Multicomponent Hydrogels
In this application, the experimental data are divided into the training (155
instances) and the validation (20 instances) datasets. 7 input variables are
considered: CM (monomer concentration), CI (initiator concentration), CA
126
(crosslinking agent concentration), PI (amount of inclusion polymer), T

(temperature), t (reaction time) and type of included polymer codified as: 1
no polymer added, 2 starch, 3 PVA and 4 gelatine. Two neural
models with the same seven inputs, but with different outputs were
developed: (yield in crosslinked polymer) model 1 and (swelling
degree) model 2. Thus, the neural network modelling establishes the
influence of the initial conditions on reaction yield and swelling degree,
separately.
The predictions of the swelling degree achieved by a neural model
are extremely useful in practice, because they can replace the experiments
that necessitate a great amount of materials and especially time, since a
determination takes around 20 days.
The next task in the modelling technique is to develop the neural
network topology. To determine the optimum network configuration,
different elements were tested: several neural network types, different
numbers of hidden layers and neurons, learning rules and activation
functions.
Four types of neural networks that have the supervised learning
control as a common characteristic (Multi-Layer Perceptrons, MLP,
Generalized Feed-forward Networks, GFN, Modular Neural Networks,
MNN and Jordan Elman Networks, JEN) were considered.
Several trials showed that 3 individual networks in a stack are
enough in our case. Based on recorded performance, the networks chosen
for the stack were: MLP(7:10:1), MLP(7:15:1) and MLP(7:12:4:1).
Beside performance, the criterion of simplicity was taken into
consideration, in order to make a trade-off between complexity and
performance. Thus, if an increase in the number of layers or neurons did not
provide a significant improvement of performance, simpler networks were
preferred, with slightly worse performance.
A second stage in building the stacks consists in an original method
to find the optimum weights of the individual networks selected in the
previous step. Several stacks are generated by successive trials, using
different weights, and their performance is recorded for training and
validation. For interpolation purposes, other two different neural networks
(MLP) are designed, one for the training and one for validation phase. Their
inputs are two of the stack weights, since the third is implicit: 100 w1 w2.
Their output is the correlation, r. The best network provides the optimal
weights of the stack.
For the networks used in stack optimization, the focus is on the
interpolation, not generalization capability, so that they bring an advantage
over the method of successive trials for stack generation. In addition, by
127
making successive trials it is very difficult to test the whole domain for the
values of the stack weights.
Our technique generates a stack formed of three multilayer
perceptrons. One must emphasize the fact that the presented method can be
easily extended to a larger number of individual networks aggregated into a
stack, and also to building heterogeneous stacks, formed of different types
of neural networks. Moreover, like any modelling methodology based on
neural networks, it can be used for different processes and systems. The
synthesis of hydrogels based on polyacrylamide is considered a good
example due to at least two reasons: the complexity of the process and the
lack of knowledge concerning the physical and chemical laws related to it
and the consistency of the available experimental data.
Results and Discussion
The predictions of the yield proved to be easier. A single neural network
MLP(7:15:5:1) was found with a relative error of 5.2304% and a correlation
of 0.9423 in the validation stage, which are acceptable results.
Concerning the swelling degree, the second parameter of interest in
this case study, by using a single neural network, the minimum validation
error is about 10%. Therefore, a procedure based on stacked neural
networks is used as a solution for improving the model performance. Table
4.7 presents the three neural networks chosen for the stack, along with their
performance in the training and validation phases. The errors are high
enough, but the performance of individual models is presented in order to
compare them with the performance registered by different stacks. The
stacks obtained and tested here have the same components, but with
different weights, i.e. contributions to the general output.
The second step in our modelling methodology based on stacked
neural networks with optimized weights is represented by successive trials
with different values of the weights.
Table 4.7. The three individual neural networks aggregated into a stack
Training
Average error
Correlation
Et%
7.95430
0.91507
Validation
Average error
Correlation
Ev%
10.32700
0.85025
Individual
network
N1
MLP(7:10:1)
N2
MLP(7:15:1)
8.21040
0.90845
12.94670
0.84547
N3
MLP(7:12:4:1)
7.56330
0.92197
9.91280
0.87200
128
Table 4.8 contains several examples. The first column is an identifier

of the stack (S) with the weights given in the columns 2, 3 and 4, for the
neural networks N1, N2 and N3 (table 4.7). The average relative error (%)
and the correlation are also presented for each stack, in both training and
validation phases. The stack with the best validation correlation is stack 1,
and the stack with the best validation error is stack 6.
Table 4.8. Different stacked neural networks and
their performance in training and validation phases
No. stack
Weights,%
Training
Average error
Correlation
Et%
5.26581
0.97815
Validation
Average error
Correlation
Ev%
8.99005
0.94653
w1
w2
w3
10
10
80
20
10
70
4.51110
0.98590
10.75674
0.92844
60
10
30
6.36672
0.96883
19.42466
0.84869
30
20
50
5.10411
0.98228
9.47487
0.94023
40
20
40
5.33578
0.97831
11.64715
0.92025
30
30
40
6.63880
0.94828
8.82913
0.94175
40
30
30
6.92451
0.92289
9.22556
0.93172
50
40
10
7.59543
0.95752
9.13066
0.90260
20
60
20
7.97599
0.95402
9.2432
0.92100
10
20
30
50
6.74835
0.96532
8.9050
0.9459
11
70
20
10
7.9102
0.9155
10.2290
0.8971
12
10
10
80
8.2165
0.9001
10.9921
0.8899
The best validation error corresponds to stack 6 with 8.82913%. In

this case, the training error is 6.63880%. However, if we consider the best
validation correlation, we find that the best stack is stack 1. Its validation
error is 8.99005%, very close to that of stack 6, but the training error is
5.26581%, less than the training error of stack 6. Therefore, we consider the
correlation as a good criterion for choosing the optimal stack weights. One
can also see the better performance of the stacks compared to the individual
networks presented in table 4.7.
In the following, the optimization is applied, thus trying to improve
the model performance, especially its generalization capability. In order to
find the optimal stacked neural network, a separate neural network was
developed for interpolation. It has two inputs, the weights for w1 and w2
(values every 10%, in all possible combinations) and the correlation as the
129
output. Only two inputs are necessary because w3 = 100 w1 w2. One
network was prepared for the interpolation of the training results and
another one for the validation results. These networks were admitted to be
large enough, MLP(2:24:8:1) and MLP(2:21:7:1), for training and
validation, respectively, because the interpolation capacity is here more
important than the generalization capability. The predictions of these models
were generated with a step of 1% in order to find the optimum values for the
weights which lead to smaller errors and a better correlation.
Figure 4.12. The influence of the stack weights upon the error in
the validation stage in neural network modelling of swelling degree
The contours in figure 4.12 show the validation error and the
weights w1 and w2 which correspond to this error. One can see a minimum
validation error of approximately 5%. The validation error was chosen for
display, instead of correlation, because this measure is easier to visualize
and understand by the reader. Table 4.9 contains some examples taken from
the contours in figure 4.12.
The smallest validation error is that of stack 10 (4.80111%), but the
largest correlation is that of stack 4 (0.99993). The training error of stack 10
is 4.21283%, which is significantly greater than the training error of stack 4,
which is 0.03968. At the same time, the validation error of stack 4 is
4.91658%, close to the validation error of stack 10, which is 4.80111%.
130
Table 4.9. The performance of some stacks

constructed with the interpolated weight values
No.
stack
Weights,%
Training
Validation
w1
w2
w3
Average error
Et%
Correlation
Average error
Ev%
Correlation
20
80
4.10903
0.98903
4.9564
0.9836
41
58
0.98174
4.94792
0.99981
20
78
4.13607
0.98854
5.08107
0.97167
40
58
0.03968
0.98223
4.91658
0.99993
43
55
0.97992
5.15297
0.99989
20
77
4.14946
0.98841
5.14339
0.9672
43
54
0.97962
5.19517
0.9999
42
54
0.98014
5.15579
0.9999
21
73
4.1917
0.98838
5.03165
0.97124
10
22
71
4.21283
0.98808
4.80111
0.97331
11
13
23
64
4.32662
0.98759
5.03536
0.97887
12
17
25
58
4.47654
0.98593
4.96954
0.9785
13
18
26
56
4.54248
0.98532
4.81774
0.98017
14
19
26
55
4.58003
0.98488
4.9975
0.97798
15
20
26
54
4.62123
0.98443
5.19274
0.97596
16
23
28
49
4.85229
0.98234
5.15541
0.97659
17
24
29
47
4.96186
0.98142
5.03862
0.97794
18
27
31
42
5.27457
0.97881
5.05065
0.97834
19
28
32
40
5.41643
0.97773
4.94861
0.97979
20
30
33
37
5.62787
0.9759
5.08703
0.97853
21
31
34
35
5.78407
0.97471
4.9974
0.97996
22
32
35
33
5.94462
0.97352
4.91409
0.98154
23
33
35
32
6.00224
0.97273
5.15105
0.97847
24
34
36
30
6.16465
0.97147
5.07881
0.97988
25
37
39
24
6.64903
0.96783
4.91092
0.98509
26
38
39
23
6.70035
0.96679
5.16718
0.98106
27
43
44
13
7.39964
0.96123
5.04476
0.99152
131
Therefore, we can consider that the best weights for the networks
presented in table 4.7, for both training and prediction, are w1 = 2%,
w2 = 40% and w3 = 58%.
For the modelling of the swelling degree good results were obtained
by aggregating individual networks into stacks and weighting their outputs.
Thus, three MLP networks were used. The main idea was to replace a
possibly large, complex network with a stack formed of simple networks. If
the best individual network that modelled the swelling degree was an
MLP(7:12:4:1) with a validation error of 9.9128% and a correlation of
0.872, the stack formed of three similar MLPs reached a validation error of
4.91658% and a correlation of 0.99993 after the optimization of the weights.
One can draw the conclusion that the performance of neural
modelling for the validation phase can be significantly improved by
aggregating stacks and optimizing the weights of the stack components. The
modelling methodology developed here can be easily adapted and applied
for other complex problems.
Stacked Neural Network Modelling of Heterogeneous Photocatalytic
Decomposition of Triclopyr
The phenomenological treatment of such photochemical systems is very
complex. In general, the rate of reaction in heterogeneous photocatalytic
systems is a complex nonlinear function of catalyst loading, light intensity,
initial solution pH, reactant and oxidants concentration, etc. Therefore, the
ability of systems such as artificial neural networks to recognize and
reproduce cause-effect relationships through training, for multiple inputoutput mappings, has gained popularity in various areas of chemical
engineering and also in the field of photocatalytic treatment of wastewater.
For this case study, the photocatalytic degradation of triclopyr, we
model the final concentration of this compound as a function of process
conditions. The neural models consider the irradiation time (t), the initial
concentration of triclopyr (C0), the concentration of TiO2 used as a catalyst (
CTiO2 ) and the concentration of H2O2 ( CH 2O2 ) as inputs, and the final
concentration of triclopyr (C) as the output.
First, the data (368 in total) were split into training and validation
data sets, about 15% being the test data set used to evaluate the performance
of the neural network to data not being used in the training process. In this
way, we can evaluate the most important feature of a neural model the
generalization capability.
132
The number of hidden layers and units was established by training a

different range of networks and selecting the one that best balanced
generalization performance against network size. The best network topology
was determined based on the mean of squared errors (MSE) of the training
data. Hidden neurons, as well as the output layer neuron, use hyperbolic
tangent as the nonlinear activation function. The network was trained using
the well-known back-propagation algorithm. We consider that training ends
when the network error (MSE) on the testing set becomes sufficiently small
and does not increase.
Table 4.10. Different neural networks developed with experimental data
and their performance in the training phase
Network type
MLP(4:5:1)
MLP(4:10:1)
MLP(4:15:1)
MLP(4:20:1)
MLP(4:12:4:1)
MLP(4:24:8:1)
MLP(4:25:20:1)
MLP(4:30:25:1)
MLP(4:42:14:1)
MSE
0.0012
0.0004
0.000354
0.000375
0.000756
0.00058
0.000079
0.000106
0.00047
r
0.988230
0.998497
0.998669
0.99859
0.997156
0.997816
0.999702
0.999599
0.998232
Ep%
19.2430
13.1822
9.9729
13.5923
17.9502
11.6835
3.8722
5.5637
10.6775
Table 4.10 contains a series of MLPs with one or two hidden layers
trained with experimental data, and their performance is registered in the
training stage: mean square error MSE, the correlation between experimental
data and the output of the neural network r and the percent error Ep. Only
several examples are presented in table 4.10 from the many neural networks
trained. Taking into account their performance, three neural networks were
selected: MLP(4:15:1), MLP(4:25:20:1) and MLP(4:30:25:1).
The method used to combine the parallel models was the weighted
summation of the individual outputs. Consequently, the performance of the
stack is influenced by the aggregated individual models and their
corresponding weights.
Individual and stacked neural networks were applied to the training and
validation datasets in order to compare their performance and, finally, to
choose the most appropriate model for the studied process. Like in the
133
previous study, in order to find the optimal stacked neural network, separate
neural networks were developed for interpolation. One network was
prepared for the interpolation of the training results and another one for the
validation results. They had two inputs, the weights for N1 and N2 and the
correlation, r, as the output. Only two inputs are necessary because the third
weight w3 = 100 w1 w2. While these networks were admitted to be large
enough, several variants were tried: MLP(2:24:8:1) and MLP(2:21:7:1), for
training and validation, respectively, because the interpolation capacity is
here more important than the generalization capability. The predictions of
these models were generated with a step of 1% and the maximum
correlation of 0.999045 was obtained with the following contributions of the
individual networks N1, N2 and N3: w1 = 15%, w2 = 52% and w3 = 33%,
respectively. Figure 4.13 shows the variation of the stack performance with
the weights of the component neural networks, for validation stage in case 1.
Because of the interpolation errors, the above result (the maximum)
is not precise. Additional experiments were performed in the neighbourhood
of the potential maximum in order to improve the solution. The value
0.999048 for correlation corresponds to a stack with weights of 12%, 50%
and 38% (stack 1).
In another trial, three neural networks with one single hidden layer
(the simplest networks with acceptable performances), MLP(4:5:1),
MLP(4:10:1) and MLP(4:15:1) were considered for the stack. The entire
procedure described above was repeated to obtain the weights of the
individual networks in the stack which lead to the best correlation in the
validation phase.
An optimization procedure based on a separate neural network for all
training and validation results, MLP(2:15:5:1), with the weights as inputs
and the correlation as the output, gives the weights 9%, 55% and 36% with
a correlation of 0.996071. Figure 4.14 presents the variation of the
correlation values with the weights of the stack, emphasizing the maximum.
Additional simulations around this optimum point found a correlation of
0.996087 for the following weights: 11%, 53% and 36% (stack 2).
The results of the stacks were better than those of the individual
models. Good predictions were obtained in the validation phase; therefore,
these models give a very good representation of the photocatalytic oxidation
of triclopyr and they can provide useful information for experimental
practice.
134
Figure 4.13. The variation of the stack performance in the validation stage
with the weights of the component neural networks for stack 1
Figure 4.14. The variation of the stack performance in the validation phase
with the weights of the component neural networks for stack 2
135
4.4. Protein Structure Classification

In the recent years, research in molecular biology and molecular medicine
has led to the accumulation of large amounts of data. Therefore, there is a
continuous need to extract higher-level relevant information patterns from
the data, which in turn can be more easily understood and used. The
automation of this process can be done with the help of machine learning
techniques, especially with classification and clustering. This caused a strong
interest in employing such methods of knowledge discovery to generate
models of biological systems.
As several genome projects such as the Human Genome Project have
been completed, there is a paradigm shift from static structural genomics to
dynamic functional genomics (Houle et al., 2004). The term structural
genomics refers to the DNA sequence determination and mapping activities,
while functional genomics refers to the assignment of functional
information to known sequences.
There are various databases that divide proteins into individual
domains and then divide these domains into evolutionarily related groups
hierarchically. These databases include the Structural Classification Of
Proteins, SCOP (Murzin et al., 2007), manually divided, Class, Architecture,
Topology, and Homologous super-family, CATH (Orengoet al., 2008) and
Families of Structurally Similar Proteins, FSSP (Holmet et al., 2008),
automatically divided.
The identification of such sequences is a problem that concerns
bioinformatics scientists. There are four levels of protein structural
arrangement or conformation (Brazma et al., 2001; Tzanis, Berberidis &
Vlahavas, 2005):
Primary structure: the sequence of amino acids, forming a chain

called polypeptide;
Secondary structure: the structure that forms a polypeptide after
folding;
Tertiary structure: the stable 3D structure that forms a polypeptide;
Quaternary structure: the final 3D structure of the protein formed by
the conjugation of two or more polypeptides.
The function of a protein largely depends on its structure which itself

depends on the protein sequence. Therefore understanding and predicting
how protein sequence information translates into three-dimensional protein
structure and folding has become one of the most challenging open questions
136
of current molecular biology. Only a small percentage of proteins for which

the sequence is known could be explored for their three-dimensional
structure by physical methods and there is a large gap between the number of
known protein sequences and the number of identified structures
(Markowetz, Edler & Vingron, 2003). Methods of statistical classification
can help to bridge this gap, unlike methods that try to find the 3D structure of
a new protein sequence by aligning it to a protein with given structure, such
as the BLAST search algorithm (Altschul et al., 1997).
Many machine learning techniques have been proposed to deal with
the identification of specific DNA sequences. Unfortunately, traditional
techniques cannot be directly applied to this type of sequencing problems.
For many bioinformatics applications, the data (e.g. sequences) have to be
transformed into an attribute-based representation first.
Protein Datasets
The method used in this study (Leon, Aigntoaiei & Zaharia, 2009) is to find
ways to determine structure similarity by means of classification techniques,
not by using sequence similarity. This approach has achieved some success
mostly through the recognition of the protein folds, which are a common
3-dimensional pattern with the same major secondary structure elements in
the same arrangement and with the same topological connections (Craven et
al., 1995). Comprehensive protein classifications such as SCOP and CATH
identified more than 600 3D protein folding patterns. Protein fold prediction
in the context of this large number of classes presents a rather challenging
classification problem. The more classes are involved, the more difficult it is
to accurately predict the fold for a certain sequence.
We used the dataset analyzed by (Ding & Dubchak, 2001) which,
unlike many other benchmark datasets, has the advantage of an attributebased representation. The dataset contains 27 largest, non-redundant, SCOP
folds, which have seven or more proteins and represent all major structural
classes: , , / , + . The attribute vectors were extracted from protein
sequences (Dubchak et al., 1995; Dubchak et al., 1999).
The parameters extracted from the protein sequence are the
following: C: amino acids composition, S: predicted secondary structure, H:
hydrophobicity, V: normalized van der Waals volume, P: polarity and Z:
polarizability.
For a classification model, the prediction or generalization capability
is usually more important than the ability to fit the training data. In this case,
an independent dataset is used for testing, namely the PDB-40D set
developed by the authors of the SCOP database. This set contains the SCOP
137
sequences having less than 40% identity with each other. From this set, 386
representatives of the same 27 largest folds stated above were selected. All
PDB-40D proteins that had higher than 35% identity with the proteins of the
training set were excluded from the testing set. 90% of the test proteins have
less than 25% sequence identity with the training proteins (Brenner, Chothia
& Hubbard, 1998). Therefore, we can expect that a prediction accuracy on
the testing dataset above 35% is not due to the overlapping patterns of the
test and training dataset, but to the generalization capability of the
classification model.
Case Studies
The algorithms we applied are well established in machine learning. From
the implementation point of view, the variants described by (Witten & Frank,
2000) and (Leon, 2006) were used.
The first experiments were made on the original 6 datasets presented
above: C, H, P, S, V and Z, each consisting in 313 instances. As stated, these
problems have 27 classes, corresponding to the protein folds.
Two investigations were made for each problem: the first by
evaluating the performance of the algorithms on the training set alone (e.g.
Ctrain), and the second by building the models on the training sets, but
computing the performance on the independent testing datasets (e.g. Ctest).
The testing datasets consist in 385 instances each. The instances in both the
training and the testing sets are described by 20 or 21 attributes.
100
90
80
70
C4.5
60
NN
50
k-NN
40
NNGE
30
NB
20
10
0
Ctrain
Ctest
Htrain
Htest
Ptrain
Ptest
Strain
Stest
Vtrain
Vtest
Ztrain
Ztest
Figure 4.15. Performance analysis on the 27 class problems
The results are displayed in figure 4.15. One can notice that the best
accuracy on the training sets is provided by the instance-based methods. On
these problems, the decision tree and the Nave Bayes do not seem to be able
138
to model the underlying data well. The same instance-based methods,

especially k-NN and NNGE, also have the best accuracy on the testing sets.
The best test values are presented in table 4.11, along with the best algorithm
used to obtain them.
By considering the average error rates, we can conclude that the C
model is the easiest to model, whereas H and Z models are particularly
difficult to generalize.
Table 4.11. Best accuracy values for the test datasets
Dataset
Ctest
Htest
Ptest
Stest
Vtest
Ztest
Accuracy
45.71%
35.32%
33.77%
39.22%
34.81%
32.73%
The best algorithm

k-NN
NN
k-NN
k-NN and NNGE
NN
NN
The second class of experiments were performed by considering only

the 4 major structural classes: , , / and + . The results are displayed
in figure 4.16, considering again the training and testing cases. The R index
denotes a reduced number of classes. As expected, the performance of the
algorithms is much better. The instance-based algorithms have a 100%
accuracy on the testing sets, and this time the decision tree algorithm has
better results, over 90%. The Nave Bayes approach still performs poorly on
the training set, but it seems to have much better generalization capabilities.
Table 4.12 presents the best test values.
100
90
80
70
C4.5
60
50
NN
40
NNGE
30
NB
k-NN
20
10
0
CRtrain CRtest HRtrain HRtest PRtrain PRtest SRtrain SRtest VRtrain VRtest ZRtrain ZRtest
Figure 4.16. Performance analysis on the 4 class problems
139
Table 4.12. Best accuracy values for the reduced test datasets
Dataset
R
C test
R
H test
R
P test
R
S test
R
V test
R
Z test
Accuracy
69.09%
59.74%
57.40%
74.03%
59.74%
55.32%
The best algorithm

NB
NN
NB
k-NN
NN
NN
The structure of the models themselves can bring more insight into
the nature of the classification problem. An interesting discovery is that for
the 27 class problems the number of NNGE hyper-rectangles, or rules, are
greater than the number of single instances. This means that in these
problems there are more explicit rules, which however have a lesser scope.
Conversely, for the reduced problems, single instances are more numerous,
therefore the rules found have a greater scope but many training instances
remain outside them, and the testing instances are classified mostly by the
nearest neighbour method.
Fold recognition is a useful structure classification approach,
complementary to the one based on string sequence similarity. The
advantage of a representation with attributes is that many common machine
learning algorithms can be applied to this type of problems without
modification. The purpose of the tests presented in this study is to give an
intuition about the type of classification algorithms which is the most
appropriate for the specific problem of classifying protein folds. The main
result is that instance-based methods, related to the way in which people
classify by analogy or similarity, perform particularly well and also show
good generalization capabilities.
140
Chapter 5
Hybrid Neural and Neuro-Genetic Methods

5.1. Combining Neural Models with Phenomenological
Models
The hybrid modelling is recommended when the phenomenological model
of a process is not entirely known, and experimental measurements of the
quantities of interest are available. Thus, the unelucidated part of the model
can be replaced with a neural network trained with experimental data.
In order to illustrate different ways of hybrid modelling, this study
(Curteanu & Leon, 2006) uses simulation data obtained by running a
complete phenomenological model for training the neural networks and as a
reference in comparisons. As an example, free radical polymerization of
MMA was considered, achieved through a batch bulk procedure. The hybrid
model designed here contains a simplified phenomenological model, which
describes the reaction kinetics without taking into account gel and glass
effects specific to free radical polymerization. These effects correspond to
the sudden increase in the graphs values from figures 5.1 and 5.2. This part
of the model, which is more difficult to obtain, is replaced by a neural
network.
One way of combining a phenomenological model with a neural
network is based on the successive use of the two models, on time intervals
delimited by the critical conversion the conversion on which the onset of
the gel effect is considered. In the first step, until critical conversion, the
phenomenological model without gel and glass effects is used. After this
conversion, modelling is made with a MLP(4:42:14:3) neural network,
which has temperature, initiator concentration, time and initial conversion as
inputs. The outputs are: conversion and average polymerization degrees. It
is necessary to introduce conversion both as an input parameter (the initial
value), and as an output parameter (the final value corresponding to the
current time), because the neural model is not used from zero conversion,
but from an initial conversion equal to the critical conversion.
141
conversion
0.8
0.6
0.4
0.2
0
0
100
200
300
400
500
time [min]
Figure 5.1. Monomer conversion obtained at T = 50C and I0 = 50 mol/m3 with a

hybrid model composed of a simplified phenomenological model () and a
MLP(4:42:14:3) network (o), successively used until xcrit = 0.22; continuous line
complete phenomenological model
40000
35000
DPw
DPn , DPw
30000
25000
20000
15000
10000
5000
DPn
0
0
100
200
300
400
500
time [min]
Figure 5.2. Number and weight average polymerization degrees obtained at T =

50C and I0 = 50 mol/m3 with a hybrid model composed of a simplified
phenomenological model () and a MLP(4:42:14:3) network (o), successively
used until xcrit = 0.22; continuous line complete phenomenological model.
142
For this network, training performances were MSE = 0.0005612 and

r = 0.9994. The program used for hybrid modelling computes critical
conversion, dependent on reaction conditions (temperature and initiator
concentration), then it successively uses the two models, phenomenological
and neural.
Figures 5.1 and 5.2 present an example of the hybrid model. The
computation of the critical conversion is included within the program that
applies the two components of the hybrid model. The complete
phenomenological model was considered as a reference because it models
the actual process well. In this case, the use of the hybrid model leads to
accurate results as well, avoiding the modelling of gel and glass effects (a
difficult part of the phenomenological modelling).
According to the obtained results and because of their simplicity in
design and handling, hybrid models represent a suitable way to render real
processes that have incompletely elucidated or difficult to model parts.
5.2. Evolutionary Methods for the Determination of the

Architecture and Internal Parameters of a Neural Network
Among the known types of neural networks (NN), the feed-forward neural
networks are the most used because of their simplicity, flexible structure,
good qualities of representation and their capability of universal
approximation. The methods used to develop NN optimal architecture can
be classified into the following types:
The trial and error method which consists in successive tests relying
on the development of several configurations of the neural networks
and the evaluation of their performance, e.g. (Piuleac et al., 2010);
Empirical or statistical methods, which study the influence of
different internal parameters of NNs, choosing their optimal values
depending on the network performance, e.g. (Balestrassi et al., 2009);
Hybrid methods, such as the fuzzy inference, in which a network can
be interpreted as an adaptive fuzzy system or it can operate on fuzzy
instead of real numbers, e.g. (Attik, Bougrain & Alexandre, 2005);
Constructive methods and/or pruning algorithms, which add and/or
remove neurons or weights to/from an initial architecture, using a
pre-specified criterion to show the manner that these changes should
affect the network performance, e.g. (Xing & Hu, 2009);
143
Evolutionary strategies which are based on the search within the

space of NN topologies, by varying the NN parameters on the basis
of genetic operators, e.g. (Benardos & Vosniakos, 2007).
Two case studies will be presented in this section. The first one
(Curteanu, Leon, Furtun, Drgoi & Curteanu, 2010) shows a comparison
between three methods: trial and error, a real-coded genetic algorithm and
differential evolution. The second one (Drgoi, Curteanu, Leon, Galaction
& Cacaval, 2011) compares the standard version of differential evolution
with a self-adaptive variant of differential evolution.
Trial and Error, Genetic Algorithm and Differential Evolution
The free radical polymerization of styrene performed by suspension
technique is the first case study selected for presentation (Curteanu, Leon,
Furtun, Drgoi & Curteanu, 2010). A complete mathematical model based
on conservation equations for the elements in the reaction mixture was
elaborated and solved using the distribution moments of the concentrations
(Curteanu, 2003). This model was the simulator for producing the database
for neural network modelling used to predict the changes of monomer
conversion and molecular weight depending on initiator concentration,
temperature and reaction time.
Three methods for determining the neural network topology are
applied and compared using a complex nonlinear process as a case study.
These methods are: a trial and error optimizing methodology applied for NN
parameters (OMP) and two evolutionary strategies based on differential
evolution (DE) and a genetic algorithm (GA).
The steps describing the algorithm for finding the optimum
parameters for a feed-forward neural network are:
1. Finding the optimum number of neurons in the hidden layer for one
hidden layer neural network;
2. Finding the optimum value for the learning rate;
3. Finding the optimum value for the momentum term;
4. Finding the optimum activation function for the output layer;
5. Finding the optimum number of neurons in the hidden layers for a
two hidden layers neural network;
6. Optimizing the parameters for the two hidden layer neural network
following the steps 2-4.
144
The genetic algorithm is based on the representation of NN

parameters into chromosomes (the number of hidden layers, the number of
neurons in these layers, the weights and biases of the neurons), coding this
information as real numbers. A floating point vector was created that
contains all the information from the neural network.
Figure 5.3. The content of a solution vector used in the evolutionary methods
In figure 5.3, NL represents the number of hidden layers, where the

only acceptable values are {0,1,2}. H1 represents the number of neurons in
the first hidden layer, H2 the number of neurons in the second hidden layer.
The parameters w1, ..., wn indicate the weights for each neuron in the neural
network, b1, ..., bn the biases and a1, ..., an the activation functions.
Differential evolution is a global optimization technique proposed by
(Storn & Price, 1997). Because it is a search method over continuous spaces
like those describing an neural network (NN) structure and because it is a
robust algorithm with fast convergence, the DE algorithm is chosen to
determine the topology (architecture) and internal parameters of the NNs
used in this work. To obtain efficient models, the optimal (or near optimal)
topology of NN is an important requirement.
The combination of DE and NN has two major kinds of applications:
training (optimizing) NNs and designing the topology of a network. Among
the first attempts to optimize a neural network with the DE algorithm was
that of (Fischer, Reismann & Hlavackova-Schindler, 1999), where the
optimization was performed on a fixed architecture. The goal was to find a
new training method, and the obtained results were encouraging. After that,
new variations on the same idea appeared. (Plagianakos et al., 2001) used
the DE algorithm to train neural networks with discrete activation functions.
(Subudhi & Jena, 2009) combined DE and Levenberg-Marquardt algorithms
to create a new hybrid method to train a neural network for nonlinear system
identification. An attempt to determine a partial architecture of an NN using
DE was made by (Lahiri & Khalfe, 2010), where the network is constructed
by determining the best number of neurons in the hidden layer, the weights
and the activation functions. However, the topology was constrained to have
only one hidden layer.
As it can be observed, extensive work was done to train the neural
network using DE, but the area of finding the best topology of the network
with the DE algorithm is mainly unexplored.
145
Table 5.1 presents the relative errors obtained by applying the three
methods, for each modelled variable (x, Mn and Mw). The relative errors
were calculated with the following equation:
Er
pdesired pnet
100 ,
pdesired
(5.1)
where pdesired represents x, Mn or Mw as targets of the neural network and pnet

is the prediction of the network for each of the three output variables.
Beside errors, table 5.1 presents the correlations between the data
used for testing and the predictions of the three neural networks selected
through the three methods (see the first column of the table). All three
methods have acceptable results in the testing phase.
Table 5.1. Relative errors and correlations obtained through the three methods
applied for developing neural network topologies
Relative errors %
x
Mn
Mw
OMP
3:21:6:3
GA
3:20:7:3
DE
3:9:7:3
Correlations
Mn
Mw
0.044
5.480
7.018
0.999
0.989
0.988
0.811
2.233
6.130
0.999
0.998
0.994
0.787
2.437
6.385
0.999
0.993
0.979
A comparison between these methods will take into account both the
error values, but also other factors such as the accessibility of the method
(execution time, algorithm complexity etc.) or the purpose to which the
model is intended (prediction, monitoring or control).
In terms of accessibility, DE and GA methods, once implemented,
are easy to use because the execution of the software program provides the
optimal network topology, training and testing errors, and predictions over
the training and testing data. However, the runs of the program should be
repeated because of the stochastic nature of the algorithms. Furthermore,
one cannot say precisely that the results of these algorithms are optimal
networks because there is always the possibility of reaching a local
optimum.
The OMP method is more laborious, but the systemized practical
considerations on modelling a neural network (structured in a 6-step
algorithm) and the criterion and formula used for calculating the
146
performance indices give the method a real chance of obtaining a neural

network of minimum size and maximum performance.
Standard Differential Evolution and Self-Adaptive Differential Evolution
In the second case study (Drgoi, Curteanu, Leon, Galaction & Cacaval,
2011) we propose a method for using DE to determine both the topology
and internal parameters of two neural networks used for predicting and
classifying the behaviour of the simulated fermentation systems in the
presence of n-dodecane as oxygen-vector, from the viewpoint of the oxygen
transfer rate or oxygen mass transfer coefficient.
The initial population is generally initialized by randomly generating
a set of vector solutions, but there is also the alternative of introducing the
solution vector from another source.
After initialization, a set of intermediary solutions called mutants are
created by adding a randomly sampled vector with a scaled difference
between other two vectors. The variant in which the base vector is randomly
chosen and only one differential term is used is called DE/Rand/1 (Price,
Storm & Lampinem, 2005). After the creation of mutant vectors, the current
population is recombined with the mutants to create a trial population. For
the DE algorithm, two generally used crossover methods are known:
binomial and exponential. The exponential crossover is more complex than
the former. Initially, a start index (SI) is randomly generated, and the SI
element is copied from the mutant to the trial vector. Random values
between 0 and 1 are generated and the elements from the mutant are copied
to the trial one until they are less than Cr (crossover rate). After that, all
remaining elements are copied from the target vector.
Three methods for determining the best control parameters are
known: deterministic (parameters are determined using a deterministic law),
adaptive (feedback information from the search process is used to determine
the change) and self-adaptive (the parameters are encoded directly into the
algorithm). The self-adaptive variants tend to behave better than the
classical ones, as various studies report encouraging results, e.g. (Brest et
al., 2006; Das & Suganthan, 2011). In this work, we are referring to this
methodology as Brests self-adaptive mechanism, Janez DE or simply as
jDE (Brest et al., 2006).
To model and simulate the oxygen mass transfer rate in the presence
of n-dodecane as oxygen-vector, a database with 229 instances was used,
represented by the oxygen mass transfer coefficient in the simulated broth.
The data was split into two groups: for training (75%) and for testing (25%).
147
Using the simple DE algorithm, the best neural network for

prediction of the oxygen mass transfer coefficient had rather good
performance: MSEtraining = 0.0217, MSEtesting = 0.0218, Fitness = 45.889,
RMSEtesting = 0.875 and RRMSEtesting = 0.860. Minimum MSE, RMSE,
RRMSE and a maximum fitness function are required.
Figure 5.4. Values of oxygen mass transfer coefficient obtained experimentally

and predicted with the best simple neural networks determined with the
standard DE approach and with jDE method in the training phase
Figure 5.5. Values of oxygen mass transfer coefficient obtained experimentally

and predicted with the best simple neural networks determined with the
standard DE approach and with jDE method in the testing phase
A series of tests using the jDE approach was also performed to

determine which method is better for our case study. The best network
determined with the self-adaptive methodology has a 4:6:1 topology with
MSEtraining = 0.0110, MSEtesting = 0.0127, Fitness = 90.260, RMSEtesting =
148
0.667 and RRMSEtesting = 0.669. Comparisons between the predictions of the

neural network (4:2:1) determined with standard DE approach, as well as
predictions of the (4:6:1) network determined with jDE, and experimental
data are given in figures 5.4 and 5.5 for the training set and testing set,
respectively.
In table 5.2, all parameters of the two best networks obtained with
the two DE approaches, excepting the weights, are listed. Each neuron has
the activation function and the bias (threshold) parameter. Three types of
activation functions were used: step, linear and bipolar sigmoid functions,
which are referred to as Step, Linear and Sigmoid, respectively. In order to
indicate each neuron, notations (Hp.q) and (O.k) were used to represent the
qth neuron from the pth hidden layer and the kth neuron from the output layer,
respectively. The values for biases are computed within the DE algorithm.
Standard DE approach
Table 5.2. Parameters of the two best networks obtained with the standard DE
approach and self-adaptive jDE mechanism
Neural Network parameter
Parameter Value
Number of hidden layers
Number of neurons in the input layer
Parameters for the first hidden layer
jDE self-adaptive approach
Parameters for the output layer
Number of neurons
Activation functions
Biases
Number of neurons
(H1.1):Linear
(H1.1):-4.254
1
Biases
(O.1):Linear
(O.1):0.180
1
4
6
(H1.1):Step
(H1.1):Step
(H1.1):Linear
(H1.1):-0.240
(H1.3):-0.552
(H1.5):-0.093
1
(O.1):Step
(O.1):-0.271
Number of hidden layers

Number of neurons in the input layer
Number of neurons
Parameters for the first hidden layer
Biases
Parameters for the output layer
Number of neurons
Biases
(H1.2):Sigmoid
(H1.2):-3.211
(H1.2):Linear
(H1.2):Sigmoid
(H1.2):Linear
(H1.2):0.201
(H1.4):-0.256
(H1.6):0.142
The best network determined with the self-adaptive methodology

has significant improved performance compared with the best network
determined with the standard DE approach. These results tend to confirm
the superiority of the self-adaptive mechanism observed in other studies.
149
5.3. Neuro-Genetic Multiobjective Optimization Using

NSGA-II with Alternative Diversity-Enhancing Metrics
Process optimization and control can have a significant strategic impact on
production cost reduction and quality enhancement in polymer production
facilities. The optimization of a polymerization process often involves a
series of competing objectives, which must be satisfied at the same time.
Consequently, some research has been recently reported in the literature on
the optimization of polymerization processes using multiple objective
functions and constraints.
Many optimization techniques traditionally involved single (scalar)
objective functions which used a weighted average combination of several
individual objectives The optimization method was found to be simple to
use, but depending on the users decision to specify the weights to the
different objectives based on a good knowledge of the process. This can be a
drawback when the objectives to inter-relate are of different nature.
This study (Furtun, Curteanu & Leon, 2011) presents an original
software implementation of the elitist non-dominated sorting genetic
algorithm (NSGA-II) applied and adapted to the multi-objective
optimization of a polysiloxane synthesis process. An optimized feedforward neural network, modelling the variation in time of the main
parameters of the process, was used to calculate the vector objective
function of NSGA-II, as an enhancement to the multi-objective optimization
procedure. An original technique was utilized in order to find the most
appropriate parameters for maximizing the performance of NSGA-II. The
algorithm provided the optimum reaction conditions and proved to be able
to find the entire non-dominated Pareto front and to quickly evolve optimal
solutions. The use of the neural network makes it also suitable to the multiobjective optimization of processes for which the amount of knowledge is
limited.
Multiobjective Optimization with the NSGA-II Algorithm
The multiobjective optimization procedure was based on the non-dominated
sorting genetic algorithm NSGA-II (Deb et al., 2002), which was used to
obtain the set of Pareto optimal solutions and the corresponding decision
variables. NSGA-II is a multiobjective, fast, elitist (the best individuals are
kept through the evolution process, from one generation to another) genetic
algorithm, which preserves the diversity of the solutions.
150
NSGA-II starts from a randomly generated population of

chromosomes (potential solutions), P0 of size N. The population is sorted
based on non-domination. Each solution is assigned a fitness (or rank) equal
to its non-domination level (1 is the best level). The minimization of the
fitness is assumed. An offspring population (Q0) of size N is then created by
applying the genetic operators: binary tournament selection, recombination,
and mutation.
The population of parents (Pt) and the population of children, (Qt)
are reunited forming a temporary population, Rt of size 2N. The number of
the current generation is denoted by t. The population Rt is then sorted on
non-dominated Pareto fronts, where F1 is the best non-dominated front, F2
the second one, and so on. The new population of chromosomes (Pt+1) of
size N is created by selecting the non-dominated fronts based on rank order
(first F1, then F2, and so on). This procedure is followed until no more fronts
can be accommodated. The last added front which led to the exceeding of
the population size is then sorted based on crowding distance, in descending
order, and the remaining population slots are filled with the best solutions.
The crowding distance is a criterion based on comparison of the congestion
around a solution and is used in the selection of parents for a new individual
and the selection of a new population. A greater crowding distance is
preferred in order to maintain the diversity of the solutions.
The newly obtained population is then used for generating a new
population of offspring and the procedure described above is applied for a
number of times equal to the predefined number of maximum generations.
The feed-forward neural network used to model the polysiloxane synthesis
process had four inputs (temperature - T, reaction time - t, amount of
catalyst - ccat, and amount of water - cwater) corresponding to the neurons in
the first (input) layer and two outputs (monomer conversion, x, and
viscometric molecular weight, Mv) corresponding to the neurons in the
output layer.
The proximity to the true Pareto optimal solution set was estimated
by using the set coverage metric and the distribution of non-dominated
solutions throughout the Pareto front was evaluated with the spacing metric
(Coello, Lamont & Van Veldhuizen, 2007).
Set coverage, i.e. the relative coverage comparison of two sets, was
computed with the following equation:
151
CS(X ,X )
a X ;a X :a a
X
(5.2)
where X, X are two sets of solutions vectors. CS maps the ordered pair (X,
X) to the interval [0, 1]. CS(X, X) = 1 means that all points in X are
weakly dominated by the solutions in X. The opposite, CS(X, X) = 0,
represents the situation when none of the solutions in X are covered by the
set X. Since the domination operator is not symmetric, CS(X, X) is not
necessarily equal to 1 CS(X, X). Therefore, both CS(X, X) and
CS(X, X) need to be considered.
The spacing metric was computed as:
1
PF
PF
(d
i 1
dm )2 ,
(5.3)
where PF is the solution vector (the Pareto front), di is the Euclidean

distance measured in objective space between solution i PF and the
consecutive solution in PF, and dm is the mean value of the above measured
distances.
S numerically describes the spread of solutions in the Pareto front.
When S = 0, all solutions are spaced evenly apart. Thus, an algorithm with a
smaller S is better.
For the calculation of the set coverage metric, an original technique
was designed based on the comparison between the obtained solutions
vectors. The comparison procedure started with the pair of solution vectors
obtained after the simulations with a population size of 10 and 50 maximum
generations and with a population size of 10 and 100 maximum generations,
respectively. The dominant solution vector was chosen for further
comparisons. By continuing this procedure for the solutions vectors
obtained with the other combinations of values for the genetic algorithm
parameters, the best solution vector was found to be the one obtained using
a population size of 500 and 500 maximum generations, as it can be seen in
table 5.3.
The simulations revealed that the new operator used for choosing
only the unique chromosomes to be included in the new population caused a
slight increase in the solutions diversity.
152
Table 5.3. The results of the simulations for optimum popSize and optimum
noGen with CrossoverProbability = 0.9 and MutationProbability = 0.03
Simulation
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Current NSGA-II
parameters
popSize
noGen
10
10
10
10
10
50
50
50
50
50
100
100
100
100
100
300
300
300
300
300
500
500
500
500
500
50
100
300
500
1000
50
100
300
500
1000
50
100
300
500
1000
50
100
300
500
1000
50
100
300
500
1000
CS (current,
best)
CS (best,
current)
1
0.2
0.2
0.1
0.1
0.2
0.2
0.36
0.32
0.86
0
0.02
0.02
0.08
0.02
0
0.02
0.58
0.407
0.607
0.69
0.038
0.786
0.888
0.24
1
0
0
0.2
0.2
0.2
0.46
0.24
0.28
0
0.88
0.6
0.61
0.3
0.65
0.973
0.557
0.237
0.853
0.62
0.608
0.992
0.572
0.572
0.93
Best NSGA-II
parameters
popSize
noGen
10
10
10
10
10
10
50
50
50
50
50
50
50
50
50
50
50
50
300
300
300
500
500
500
500
50
50
100
300
300
300
50
50
300
500
1000
1000
1000
1000
1000
1000
1000
1000
300
300
300
50
50
300
500
5.4. Multiobjective Optimization of a Stacked Neural

Network Using an Evolutionary Hyper-Heuristic
Finding the optimal neural network for modelling a particular chemical
process is a time consuming problem and it usually lacks the certainty that
an individual neural network has extracted all the relevant information from
the training data set. The lack of robustness in neural network models, when
153
they are applied to unseen data, is basically due to the over-fitting or underfitting of the training data leading to poor generalization performance.
A combination of two or more neural networks would avoid the
failure of individual component networks caused by the limited training set,
the over-fitting of the noise in the data or the convergence of the training to
local minima. Another advantage of using a combination of multiple neural
networks is based on the fact that different networks could perform well in
different regions of the input space, so using them simultaneously would
increase the prediction accuracy on the entire input space. Such aggregation
methods of neural networks are: stacked neural network e.g. (Zhang, 2008),
neural network ensemble e.g. (Nguyen, Abbass & McKay, 2005) or
aggregated neural network e.g. (Mukherjee & Zhang, 2008).
This study (Furtun, Curteanu & Leon, 2012) is based on the
development of an optimized stacked neural network used for modelling the
synthesis of polyacrylamide-based multicomponent hydrogels. This is a
very complex chemical process and there is no known phenomenological
model that can reproduce the physical and chemical laws that govern it.
Therefore, empirical models that work with input-output data sets, such as
the neural networks, are recommended as alternatives for the modelling of
this kind of process.
The available experimental database consisted in 178 instances
which were randomized and then divided into 75% for neural network
training (134 instances) and 25% for testing (44 instances).
Like in the first case study presented in section 4.3, 7 input variables
were considered for each individual neural network: CM (monomer
concentration), CI (initiator concentration), CA (crosslinking agent
concentration), PI (amount of inclusion polymer), T (temperature), t
(reaction time), and type of included polymer (Pt), codified as 1 no
polymer added, 2 starch, 3 poly(vinyl alcohol) (PVA), and 4 gelatine.
The outputs of the neural models and, implicitly, the outputs of the stacked
neural network, were (yield in crosslinked polymer) and (swelling
degree). Thus, the neural network modelling established the influence of
reaction conditions on reaction yield and swelling degree.
The Multiobjective Optimization Procedure
The core of the multiobjective optimization procedure is the real-coded
NSGA-II algorithm. The multiobjective optimization problem consists in
finding the best topology and the best weights for a stacked neural network,
which lead to the greatest generalization performance, while maintaining the
stacked neural network to a minimum size.
154
The accuracy and the generalization capacity of the stacked neural

network were evaluated with the following constructed formula:
perf_index r ( MSEtrain MSEtest ) ,
(5.4)
where MSEtrain is the mean squared error obtained for the training set,
MSEtest is the mean squared error obtained for the testing set and r is the
linear correlation coefficient for testing. Since the best value for MSEtrain
and MSEtest is 0 and the best value for r is 1, the ideal value for perf_index
(the performance index of the stacked neural network) would be 1. As the
performance index (perf_index) is closer to 1, the better is the accuracy and
the generalization capacity of the stacked neural network.
The output of the neural network corresponding to the yield in
crosslinked polymer () will be denoted as the first output and the output
corresponding to the swelling degree (), as the second output.
The goal of the multiobjective optimization was to maximize the
performance of the stacked neural network, namely minimizing the training
and testing errors and obtaining a testing correlation coefficient of 1, while
minimizing its total number of hidden neurons. The decision variables
optimized with the evolutionary algorithm were: the number of individual
neural networks from the stack, the weights for each output of the individual
neural networks combined in the stack and the number of hidden neurons
for each neural network.
Therefore, the multiobjective function to be maximized in the
optimization procedure was constructed from two fitness functions and had
the following expression:
NNno
f ( NNno,wjk ,Hnk ) perf_index1, Hnk ,

k 0
(5.5)
where Hnk is the number of neurons in the k-th individual neural network
and wjk is the weight of the j-th output of the k-th neural network from the
stack.
The selection of the maximum allowed number of hidden neurons
for every neural network took into account the practical considerations
stated in (Furtun, Curteanu & Cazacu, 2011).
For a better search efficiency, the total number of connection
weights in a neural network must be limited:
155
Nw
1
Nt No
10
(5.6)
If there are few training data, the following equation is preferred:
Nw
1
Nt No
2
(5.7)
The total number of connection weights in a neural network with a

single hidden layer is:
Nw Ni Hn Hn No
(5.8)
If the neural network has two or more outputs, the following

equation is also considered:
2 Ni 1 Hn No ( Ni 1)
(5.9)
In equations 5.6 5.9, Nw is the total number of connection weights

in the neural network, Ni is the number of inputs of the neural network, No
is the number of outputs of the neural network, Nt is the number of
exemplars in the training or testing dataset and Hn is the number of hidden
neurons of the neural network.
NSGA-II-QNSNN Evolutionary Hyper-Heuristic
A hyper-heuristic (Burke et al., 2003) is a heuristic search method that seeks
to automate, often by the incorporation of machine learning techniques, the
process of selecting, combining, generating or adapting several simpler
heuristics (or components of such heuristics) to efficiently solve
computational search problems.
Our multiobjective optimization procedure is based on the realcoded NSGA-II algorithm which was combined with the Quasi-Newton
(QN) training algorithm (Davidon, 1959; Fletcher & Powell, 1963; Boyd &
Vandenberghe, 2004) for designing an optimized stacked neural network.
The resulting evolutionary hyper-heuristic was named NSGA-II-QNSNN.
A chromosome was constructed from 16 genes representing the 16
decision variables: the number of individual neural networks from the stack
(maximum 5), the weights for each output of the individual neural networks
combined in the stack, and the number of hidden neurons for each neural
156
network. The constrains of the decision variables were included in their

encoding.
The actual length of one chromosome was determined by the
number of neural networks included in the stack (NNno), which was
encoded in the first gene of the chromosome. The number of genes encoding
the weights for every output of the individual networks and the number of
genes encoding the number of hidden neurons in every network from the
stack were equal to the value of the first gene in the chromosome. The
remaining empty positions in the chromosome, up to the pre-established
maximum number of neural networks, were filled with zero. This technique
was applied when generating initial random values for the first population of
chromosomes and, also, when applying the mutation and crossover
operators. In order to preserve the diversity of the solutions, the two genetic
operators were separately used, with a predefined probability, for every gene
in the chromosome. When the mutation operation did not occur for a certain
gene, the value of that gene remained unchanged in the child chromosome.
Likewise, when the crossover operation did not take place for a certain gene,
the gene with a nonzero value from the corresponding position in the mother
or father chromosome was copied in the child chromosome.
At each iteration of the evolutionary algorithm, a series of stacked
neural networks were developed and the individual networks composing
them were trained with the Quasi-Newton algorithm, in order to calculate
the fitness values of the stacked neural networks. The internal connection
weights of each individual neural network from every stack were
determined through the training algorithm.
The Quasi-Newton training algorithm is a local search algorithm
which finds the best set of weights that minimize the error between the
network output and its training targets (the training error). This algorithm
was preferred to the gradient descent algorithm (back-propagation
algorithm) because it is faster and it performs well on function
approximation problems. The stopping criterion for the training process was
the decrease of the error gradient under a pre-established value.
The parameters modelled were the reaction yield which characterizes the
productivity of the synthesis of polyacrylamide-based multicomponent
hydrogels and the swelling degree, a property of the hydrogels.
For the proposed evolutionary hyper-heuristic to be superior to the
trial and error method, it must not only provide a global optimal solution,
but also a relatively short computation effort, expressed in terms of time and
157
memory. Consequently, the size of the search space for NSGA-II must be
limited through the complexity of the chromosome structure and the range
of values for its genes. Thus, the decision variables representing the genes
from every chromosome were limited to a certain interval of values. The
number of individual neural networks included in the stack varied from 1 to
5, the weights for each output of the individual neural networks combined in
the stack took values between 0 and 100%, and the number of hidden
neurons for each neural network ranged from 2 to 15. The maximum
number of hidden neurons was established by using equations 5.6, 5.8 and
5.9.
The number of neural networks in the stack and the number of
hidden neurons in each network were limited to the above-mentioned ranges
due to the amount of time and memory required for the convergence of the
evolutionary algorithm in case of a larger stacked neural network. These
limitations were imposed based on a series of tests done with the
evolutionary hyper-heuristic. Also, the results of the multiobjective
optimization showed that, usually, 2 or 3 neural networks in the stack
provide it with a very satisfactory accuracy and a good generalization
capacity. This fact is sustained by the results presented in table 5.4.
Table 5.4. The set of optimal non-dominated solutions obtained
with the NSGA-II-QNSNN evolutionary hyper-heuristic
No. NNno w11 w12 w13 w14 w15 w21 w22 w23 w24 w25 Hn1 Hn2 Hn3 Hn4 Hn5
1
0.17 0.83
0.47 0.53
0.25 0.75
0.08 0.92
0.25 0.75
0.22 0.78
0.22 0.78
0.31 0.69
0.40 0.60
0.02 0.42 0.56
0.12 0.34 0.54
0.97 0.03
0.40 0.60
11
0.80 0.20
0.30 0.70
11
0.01 0.99
0.05 0.95
14
10
0.20 0.80
14
11
0.03 0.97
0.18 0.82
11
14
12
0.02 0.53 0.45
0.02 0.37 0.61
14
12
13
0.08 0.54 0.38
0.07 0.44 0.49
11
14
12
14
0.03 0.38 0.27 0.32
0.06 0.43 0.33 0.18
14
12
12
15
0.01 0.36 0.28 0.35
0.05 0.40 0.39 0.16
11
14
12
12
16
0.02 0.26 0.24 0.26 0.22 0.06 0.42 0.26 0.12 0.14 11
14
12
12
14
158
The best performance for modelling the synthesis of

polyacrylamide-based multicomponent hydrogels was achieved by the
neural network using a linear activation function for the output which
models the yield in crosslinked polymer and a linear activation function for
the output which models the swelling degree. The performance index for the
best neural network (0.761), the training and testing errors, and the testing
linear correlation coefficient are presented in the row marked in bold in
table 5.5. Given these results, the individual networks included in the stack
had linear activation functions in the output layer.
The results from table 5.5 also reveal that a single neural network
cannot provide a very good accuracy for modelling the synthesis of
polyacrylamide-based multicomponent hydrogels, thus strengthening the
idea that a stacked neural network should be created and used in this case.
Table 5.5. The results of the neural modelling using single networks
with different activation functions in the output layer
Activation
function 1
()
Linear
Linear
Linear
Logistic
Logistic
Logistic
Tanh
Tanh
Tanh
Activation
function 2
()
Linear
Logistic
Tanh
Linear
Logistic
Tanh
Linear
Logistic
Tanh
MSEtrain
MSEtest
perf_index
0.012
0.052
0.012
0.117
0.045
0.137
0.011
0.074
0.179
0.049
0.028
0.164
0.050
0.028
0.093
0.140
0.047
0.112
0.822
0.457
0.638
0.227
0.267
0.036
0.669
0.280
0.265
0.761
0.377
0.463
0.060
0.198
-0.194
0.518
0.159
-0.026
Figure 5.6. The Pareto optimal front provided by

the NSGA-II-QNSNN evolutionary hyper-heuristic
159
The NSGA-II-QNSNN evolutionary hyper-heuristic provided a set

of equally good compromises between the complexity of the stacked neural
network and its accuracy and generalization capability, under the form of a
Pareto optimal front. The obtained Pareto optimal front can be visualized in
figure 5.6. The discontinuity that can be observed in the Pareto front is due
to the disconnected areas in the solutions space.
The rows in table 5.6 contain the two conflicting objectives. The best
performance index (0.928) was obtained by a stacked neural network
composed of 5 individual neural networks with a total of 63 hidden neurons,
with a convergence time of 279 min. The software application was run on an
Intel Core 2 CPU 6600 at 2.40 GHz and 2GB RAM. The smallest
complexity was achieved for a stacked neural network with 2 individual
networks and a total of 4 hidden neurons with a performance index of 0.636.
A trade-off between the complexity of the stack and its performance index is
represented by the pair of conflicting objectives perf_index = 0.89,
Total Hn = 16 which was obtained with a stacked neural network composed
of 2 individual neural networks.
Table 5.6. The performance indices and the structural complexity obtained with
the NSGA-II-QNSNN evolutionary hyper-heuristic
No.
MSEtrain
MSEtest
perf_index
Total Hn
1
2
3
4
5
0.109
0.105
0.084
0.087
0.080
0.044
0.037
0.027
0.027
0.026
0.789
0.821
0.875
0.881
0.882
0.636
0.679
0.764
0.767
0.776
4
5
6
7
9
6
7
8
9
10
11
0.072
0.047
0.047
0.024
0.027
0.021
0.024
0.027
0.023
0.016
0.015
0.015
0.892
0.876
0.895
0.930
0.936
0.934
0.796
0.802
0.825
0.89
0.894
0.898
11
14
15
16
19
25
12
13
14
15
16
0.019
0.018
0.020
0.017
0.016
0.011
0.011
0.010
0.010
0.012
0.951
0.951
0.955
0.954
0.956
0.921
0.922
0.925
0.927
0.928
28
37
43
49
63
160
By analyzing the results, it can be concluded that a stacked neural

network with 2 component networks can provide a very good performance
in modelling the synthesis of polyacrylamide-based multicomponent
hydrogels. Although a performance index greater than 0.92 was achieved
only with stacked neural networks with more than two individual networks,
one of the neural networks had a very little influence on the outputs of the
stack. As it can be seen in row 12 of table 5.6, one of the three neural
networks from the stack has only a contribution of 2% over the outputs of
the stack.
The proposed NSGA-II-QNSNN evolutionary hyper-heuristic
proved to be more efficient than the method used in (Leon, Piuleac &
Curteanu, 2010) for the same polymerization process. The multiobjective
optimization procedure offered an entire set of non-dominated solutions,
corresponding to a set of equally good compromises between the
performance and the complexity of the stack, whereas the trial an error
method only provided one stacked neural network. Moreover, although the
performance indices of the stacked neural networks from the non-dominated
solutions set were just slightly better, the structural complexity of each stack
was considerably lower than that of the best stack obtained in the previous
study.
The evolutionary hyper-heuristic also provides a global optimal
solution and it does not need the interaction with a user along the
optimization process. The user just needs to input the values for the
parameters of NSGA-II-QNSNN and then he/she is presented with a set of
optimum stacked neural networks.
A great advantage of the proposed evolutionary hyper-heuristic lies
in the fact that it provides an entire set of non-dominated solutions, giving
thus the decision maker the possibility to choose a certain trade-off based on
the necessities of the problem to be solved.
161
Professional Development Plans
Overview of Scientific Accomplishments

Until now, I have conducted a sustained research activity, reflected in the
publication of 6 books, 6 book chapters, 33 articles in journals and
conference proceedings indexed in ISI Web of Knowledge and 34 articles in
journals and conference proceedings indexed in other international
databases, which constitute a selection of works in line with the subject of
this habilitation thesis.
I was a reviewer for 20 conferences and journals, session chairman
for 5 conferences, and a member of the organizing committees for 2
conferences.
I was the director of a national PNII IDEI research project, I am
currently scientific leader of an FP7 ERA-NET Concert Japan project, with
partners from Germany and Japan and I was assistant coordinator of a
Quality Culture Project financed by the European University Association
(EUA), with a network of universities from Poland, Kosovo, Finland,
Austria, Turkey, Russia, Ireland and Greece, led by the Technical University
Gheorghe Asachi of Iai. I was also part of the research teams in 2 other
European projects and 22 national projects.
So far, my research has focused on 3 directions:
Fundamental and applicative research in the field of intelligent

agents, with an emphasis on the social aspects of interactions and the
design of adaptive local rules which lead to surprising, emergent
behaviour of the system;
Fundamental and applicative research in the field of machine
learning, with an emphasis on classification algorithms;
Interdisciplinary research regarding the application of artificial
intelligence methods to problems in chemical engineering
(modelling and optimization of complex processes) and civil
engineering (seismic risk management).
In the future, I will continue the investigations related to intelligent

agents, learning and optimization methods. These directions have a great
potential for the discovery of new theoretical results, because there are still
many open problems and challenges, and they can also benefit from
practical applications through interdisciplinary collaborations.
In the following, I will briefly summarize some results obtained up
to this point and state some concrete ideas for the future development of my
research.
165
Intelligent Agents and Multiagent Systems

The study regarding dynamic role allocation presents a method by which
agents can self-organize their roles by changing their individual utilities to
fit the particularities of task distributions. This is achieved by reaching a
near-optimal negotiation outcome, computed by means of an evolutionary
algorithm with hybrid encoding. The adaptive behaviour is based on the
psychological theory of cognitive dissonance. Knowledge dynamics effects
such as learning and forgetting are also taken into account. Over repeated
trials, the system is shown to stabilize and the agents converge to specific
roles by handling tasks defined by particular complexity levels of their
attributes. As an emergent property of this behaviour, the overall system
productivity increases and eventually stabilizes, following a typical shape
also revealed by studies about human worker performance.
Compared for example to E-GAP (Scerri et al., 2005), this method
does not address roles explicitly, but implicitly, based on the changes of the
agents individual utilities. Also, rather than treating task allocation as a
constraint optimization problem, it uses negotiation as a mechanism to
achieve emergent coordination.
A future direction of research would be the investigation of using
direct multilateral negotiation techniques, to reach the desired fair
outcomes. This would be especially important in situations where the agents
involved are not fully cooperative and do not reveal their private
information regarding their utilities.
According to the proposed model in which adaptive agents form
social networks, the agents can gradually enhance their knowledge by
learning when presented with tasks to be solved. They also manifest a
collaborative behaviour for solving more complex problems. The social
network acts as a means to transfer information, unlike the classic contract
net protocol which transfers actual tasks. The model also includes payments
for agents which successfully accomplish their goals.
Under different configurations of the environment, it is shown that
the agents form evolving social networks, and the system exhibits several
types of emergent properties: a power law distribution of the number of
connections and agent wealth, the fact that a dynamic or evolving
environment is required to maintain the social networks, the fact that a nondeterministic task distribution increases the overall system efficiency, and
the fact that small changes in task distribution cause great changes in the
resulting system efficiency.
166
In the future, the study can be continued with the investigation of the
influence of system parameters, considered both individually and
collectively. The model can also be used with slight modifications to model
learning behaviour in an e-learning environment, where the agents, i.e. the
learning individuals, can benefit from tutoring and group learning by
solving tasks in a cooperative manner.
We also presented the design of a multiagent system that can display
different types of behaviours, from asymptotically stable to chaotic. In this
case, chaos arises only from the agent interactions, and it is not artificially
introduced through a chaotic map.
Here, as future directions of research, we aim at further analysing the
results of the interactions in order to see whether some probabilistic
predictions can be made, taking into account the system state at a certain
moment. It is important to determine when small perturbations have visible
effects and when they can be controlled. Also, one must investigate whether
classical chaos control techniques used for physical systems such as the
OGY method (Ott, Grebogi & Yorke, 1990), can be applied for this
multiagent system.
Another fundamental question is whether the chaos in the system is
only transient and eventually stabilises into a steady state or its behaviour
remains chaotic forever. Out of many experiments, it was observed that
sometimes the system converges to a stable state. In other cases, chaos
doesnt seem to be only transient, e.g. with 50 agents executing in
lexicographic order, which corresponds to fewer fluctuations, there are still
sudden changes occurring in the utility variation even after 50000 time
steps. One needs to distinguish between these cases as well.
Also, beside the analysis of the exogenous perturbations showing
that very small changes can have a great impact on the evolution of the
system and the investigation of some methods of controlling such
perturbations in order to have a desirable final state, we aim at analysing
endogenous perturbations and the effect of alternative decisions on the
evolution of agent utilities. In this respect, we will suggest different methods
of describing the behaviour of the multiagent system.
The planning method with quasi-determined states shows a way to
include supervised, inductive learning into a planning problem. A model of
extracting a training dataset from the Q matrix of a reinforcement learning
algorithm was described. Since the agent does not possess all the necessary
information at any given time, it needs to compute the optimal action. If the
environment is non-deterministic, the agent can learn and change its model.
A predicative representation of the states is not necessary, because the states
are dynamically recognized by means of predictions made on the basis of
167
the learned model. However, the model can be symbolically interpreted,

because the actual values that compose it are explicitly available.
The idea was extended in using state attractors for general
multiagent reinforcement learning (MARL) problems. We proposed a
difficult multiagent reinforcement learning problem as a benchmark,
involving both cooperation and competition. A way to detect a compact
representation of policies by using state attractors was described, and a
genetic algorithm with variable chromosome length was used to detect
them. A game theoretic approach regarding the initial establishment of
subgoals among competitive agents was shown to be useful as a preprocessing phase in order to reduce the size of the search space.
Due to its generality and suitability for complex situations, we
consider that this method can also be successfully applied to other MARL
benchmark problems.
Also, we will seek other ways of finding the state attractors, different
for the variable-length encoding evolutionary algorithm used so far. We will
investigate the use of other algorithms such as Differential Evolution, DE
(Storn & Price, 1997), Particle Swarm Optimization, PSO (Kennedy &
Eberhart, 1995) or Imperialist Competitive Algorithm, ICA (AtashpazGargari & Lucas, 2007). We will try to find non-evolutionary, iterative
methods of creating and moving the state attractors into optimal locations,
as well.
In the study regarding the dual use of learning methods, we showed
that by modifying the acceptance strategy of tasks, the overall performance
of a group of agents can be improved. With an optimized strategy for task
acceptance, the Backpropagation agents outperform both Quickprop and
RProp agents. The emergent behaviour of the system gives a surprising
result, because the performance of the Backpropagation algorithm itself
remains the same (usually inferior to the other two algorithms). The
difference is made by the intelligence of accepting only the tasks that
seem to fit the reduced capability of Backpropagation agents, and this is
accomplished by merging the common group experiences, without any other
type of direct communication.
Another important feature of the multiagent system is that agents use
neural networks to solve problems provided by the user, but also to learn
from their own experience. Consequently, they improve their behaviour.
Thus, they use the same type of model for both external and internal
purposes.
Our analysis shows that for a given total number of tasks there is an
optimal number of Backpropagation agents where the effect of the
optimization is the most significant.
168
It should be noted that a high classification or regression accuracy

per-se is not the main goal of this work. Other approaches, such as Support
Vector Machines (Cortes & Vapnik, 1995) or boosting techniques, e.g.
AdaBoost (Freund & Schapire, 1997), have been shown to provide better
results than neural networks for many practical problems. The focus of this
study is to present a meta-model that can both address user-defined tasks
and develop internal strategies for the agents to optimize their behaviour in
an autonomous way based on their own experience. Any other inductive
learning method could be used instead of neural networks. This feature can
prove especially important in real-world, open multiagent systems, where
adaptability and simplicity conveyed by the use of a unique, consistent
learning model can provide a decisive advantage for the agents involved.
As future research opportunities, we will try to make use of methods
such as the above-mentioned ones, and also to find other problems where
agents could manifest their learning duality and optimize their performance
based on their experience in solving user-given tasks.
Optimization Methods
Quantum-inspired evolutionary algorithms (QIEA) seem to be a promising
direction of research, given that they outperform classical evolutionary
algorithms especially for large optimization problems. The QIEA-SSEHC
algorithm was proposed for solving multi-attribute combinatorial auction
problems, with the following set of characteristics: an evolutionary hillclimbing phase at the end to fine-tune the solutions and the use of a steady
state model, instead of a generational model. In order to maintain the genetic
diversity, a repairing procedure was employed, which guarantees that all the
chromosomes of the population are feasible, i.e. satisfy the problem
constraints. A metric for the comparison of multi-objective solution sets was
proposed, based on sampling for the integral of the whole space of utility
functions. Also, the average number of solution vectors in the Pareto front
was used as a diversity metric. In order to apply the rotation gate for the
quantum-inspired crossover, a randomly selected non-dominated solution
vector was chosen as the quantum best.
Future research will focus on using different variants of quantum
operators, for example different types of mutations instead of the one based
on flipping probability amplitudes, and especially on finding a real quantum
optimization algorithm, which could exploit the state superpositions to
extract the optimum without actually representing all the possible observed
states.
169
Another real-valued QIEA variant was proposed, which was better

than two other classical techniques and also two other quantum-inspired
algorithms proposed in the literature. It has the following distinct
characteristics, whose combination is believed to be an original improvement
for this class of real-valued quantum-inspired algorithms: a population-based
elitism, a resetting mutation for the qubits and an evolutionary hill climbing
phase at the end of the main search meant to improve the quality of the
solution.
The good performance on a large optimization problem with 100
variables is encouraging, especially compared with the poor results of the
other algorithms under study.
An interesting issue for future research would be to extract multiple
solutions from a group of qubits instead of only one qubit. However, since
there would be an exponential number of solutions, one would have to find a
way to estimate the maximum fitness value without actually representing and
computing all these solutions.
Even if the negotiation model presented in the study can be solved by
linear programming, if non-linearity is added in the computation of the
utilities, the evolutionary methods can provide a convenient, efficient
solution.
The results can be used for negotiation problems in a multiagent
system, in situations where the agents involved are cooperative or are willing
to reveal their private information regarding their utilities to an external,
impartial mediator.
Another direction in using optimization methods was their
application to chemical processes, often modelled themselves with neural or
other models. We intend to continue the interdisciplinary cooperation in this
direction by using other optimization algorithms, such as PSO or ICA.
The combination of a multiobjective evolutionary algorithm
(NSGA-II) with a local search algorithm (Quasi-Newton) is a new approach
in the development and optimization of stacked neural networks. A further
important research direction will by the improvement of the proposed
multiobjective optimization method, in terms of computation time, through
the revision of the search and optimization stages.
Also, we can optimize processes modelled by other regression
methods, such as Support Vector Regression (Drucker et al., 1997).
170
Machine Learning Algorithms

Despite its simplicity, it seems that the stacked hybrid neural model
performs well on a variety of benchmark problems. It is expected that it can
have good results for other important problems that show dynamical and
predictive aspects. The heterogeneous model described for time series can
be easily extended to incorporate other activation functions that can be
x
suitable for a particular problem, such as a double-exponential function ab .
It is also possible to include non-differentiable functions in the model, if one
adopts an evolutionary algorithm for training the neural networks, instead of
gradient-based methods like the classical back-propagation algorithm.
The stack aggregation demonstrated very good results also in the
case of modelling complex chemical processes, such as the synthesis of
polyacrylamide-based multicomponent hydrogels and the heterogeneous
photocatalytic decomposition of triclopyr, where the generalization
capabilities of the stack outperformed those of individual neural networks.
As a future direction of research, different configurations for the
neural models will be tried, beside the classical multilayer perceptron. The
aggregation of neural models with other regression models will be
investigated. Different training methods will be used to find the optimal
architecture and parameters, e.g. constructive and destructive methods,
beside gradient-based and biologically-inspired optimization algorithms.
Different machine learning algorithms, such as Nearest Neighbour,
k-Nearest Neighbour, C4.5, Random Tree, Random Forest, REPTree and
Neural Networks, were used to predict the crystalline behaviour for a large
database. The mentioned methods were tested in different variants and their
efficiency was evaluated based on the error calculated for the training set, to
determine how accurately a certain algorithm can build a model of the data,
and for the validation set, to assess the prediction capability of that model.
In one of our case studies, the best predictions were given by our
original algorithm Non-Nested Generalized Exemplars with Prototypes,
NNGEP, along with perfect accuracy on the training set. It combines the
good performance of classic instance-based methods with a higher search
speed due to its hyper-rectangles and the ease of interpretation of its explicit
model in the form of rules.
We intend to develop the instance-based methods in order to
incorporate recent advances in the field, such as the Large Margin Nearest
Neighbour (Weinberger & Saul, 2009).
171
Objectives for Professional Development

In the following, I will summarize several objectives for my future
professional and scientific development:
short-term objectives (2 years):

o continuing personal research in the field of artificial
intelligence;
o forming a research team with interests in the abovementioned directions;
o publishing at least 3 papers per year in journals or conference
proceedings indexed by ISI Web of Knowledge;
o making an application for a research grant, as a director;
o developing the national and international contact network for
the pursued research domains;
medium-term objectives (3-5 years):
o winning at least one national research grant, as a director;
o participation in at least one European research project, as the
leader of the university research group;
o publishing a book or book chapter in an internationally
recognised publishing house;
long-term objectives (5-15 years):
o publishing at least 3 internationally recognized books;
o being invited as a keynote speaker to recognized
international conferences.
172
References
[1]
Albert R., Barabsi A. L. (2002) Statistical mechanics of complex networks,

Review of Modern Physics, vol. 99, no. 3, pp. 7314-7316.
[2]
Albert R., Jeong H., Barabsi A. L. (1999) Diameter of the world-wide web,
Nature, no. 401, pp. 130-131.
[3]
Altschul S. F., Madden T. L., Schffer A. A., Zhang J., Zhang Z., Miller W.,
Lipman D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs, Nucleic Acids Res, 25, pp. 3389-3402.
[4]
Andersson A., Tenhunen M., Ygge F. (2000) Integer programming for

combinatorial auction winner determination, Proceedings of the Fourth
International Conference on Multi-Agent Systems (ICMAS 00), Boston, MA,
pp. 39-46.
[5]
Atashpaz-Gargari E., Lucas C (2007) Imperialist Competitive Algorithm: An

algorithm for optimization inspired by imperialistic competition, IEEE
Congress on Evolutionary Computation 7, pp. 4661-4666.
[6]
Atay N., Bayazit B. (2007) Emergent Task Allocation for Mobile Robots,
Proceedings of Robotics: Science and Systems Conference.
[7]
Attik M., Bougrain L., Alexandre F. (2005) Neural network topology

optimization, ICANN, vol. 2, pp. 53-58.
[8]
Babu G. S. S., Das D. B., Patvardhan C. (2008) Quantum Evolutionary

Algorithm Solution Of Real Valued Function Optimization: Reactive Power
Dispatch Example, XXXII National Systems Conference.
[9]
Balestrassi P. P., Popova E., Paiva A. P., Marangon Lima J. W. (2009) Design of
experiments on neural network's training for nonlinear time series
forecasting, Neurocomputing, vol. 72, no. 4-6, pp. 1160-1178.
[10] Bellifemine F. L., Caire G., Greenwood D. (2007) Developing Multi-Agent

Systems with JADE, Wiley Series in Agent Technology, Wiley.
[11] Benardos P. G., Vosniakos G. C. (2007) Optimizing feedforward artificial
network architecture, Eng. Appl. Artif. Intell., vol. 20, no. 3, pp. 365-382.
[12] Bernon C., Chevrier V., Hilaire V., Marrow P. (2006) Applications of SelfOrganising Multi-Agent Systems: An Initial Framework for Comparison,
Informatica, vol. 30, pp. 73-82
[13] Bichler M. (1998) Bidding languages and winner determination in multiattribute auctions, IBM Research Report RC 22478.
[14] Blum A., Furst M. L. (1997) Fast planning through planning graph analysis,
Articial Intelligence, 90(1-2).
[15] Bo J., Li J. (2011) A Multi-lateral Multi-issue Negotiation Protocol Based on
Adaptive Genetic Algorithm and Its Application in E-commerce, Energy
Procedia, vol. 13, 2011 International Conference on Energy Systems and
Electrical Power (ESEP 2011), pp. 1380-1386.
[16] Boccaletti S., Grebogi C., Lai Y. C., Mancini H., Maza D. (2000) The Control of
Chaos: Theory and Applications, Physics Reports, vol. 329, pp. 103-197.
175

[17] Bowling M., Veloso M. (2002) Multiagent learning using a variable learning
rate, Articial Intelligence, vol. 136, pp. 215250.
[18] Boyd S. P., Vandenberghe L. (2004) Convex Optimization, Cambridge Univ.
Press.
[19] Brazma A., Parkinson H., Schlitt T., Shojatalab M. (2001) A Quick Introduction
to Elements of Biology - Cells, Molecules, Genes, Functional Genomics,
Microarrays, http://www. ebi. ac. uk/microarray/ biology_intro. html.
[20] Breiman L. (2001) Random Forests, Machine Learning 45 (1):5-32.
[21] Brenner S. E., Chothia C., Hubbard T. J. (1998) Assessing Sequence
Comparison Methods with Reliable Structurally Identified Distant
Evolutionary Relationships, Proc. National Academy of Sciences, vol. 26, pp.
6073-6078.
[22] Brest J., Greiner S., Boskovic B., Mernik M., Zumer V. (2006) Self-Adapting
Control Parameters in Differential Evolution: A Comparative Study on
Numerical Benchmark Problems, Evolut. Comput. 10, pp. 646-657.
[23] Brockwell P. J., Davis R. A. (1991) Time Series: Theory and Methods, SpringerVerlag, 2nd edition.
[24] Brooks R. A. (1991a) Intelligence without Reason, Proceedings of the Twelfth
International Joint Conference on Artificial Intelligence (IJCAI-91), pp. 569595, Sydney, Australia.
[25] Brooks R. A. (1991b) Intelligence without Representation, Artificial
Intelligence, vol. 47, pp. 139-159.
[26] Brown G. W. (1951) Iterative solutions of games by ctitious play, in: T. C.
Koopmans (ed. ) Activitiy Analysis of Production and Allocation, ch. XXIV,
pp. 374376, Wiley.
[27] Bryson A. E., Ho Y. C. (1969) Applied Optimal Control, Blaisdell, New York.
[28] Burke E. K., Hart E., G. Kendall, Newall J., Ross P., Schulenburg S. (2003)
Hyper-heuristics: An emerging direction in modern search technology,
Handbook of Metaheuristics (F. Glover and G. Kochenberger, eds. ), Kluwer,
pp. 457474.
[29] Buoniu L., Babuka R., DeSchutter B. (2010) Multi-Agent Reinforcement
Learning: An Overview, in Innovations in Multi-Agent Systems and
Applications, series Studies in Computational Intelligence, D. Srinivasan, L.
Jain (eds. ), no. 310, pp. 183221, Springer.
[30] Campbell A., Wu A. S. (2010) Multi-agent role allocation: issues, approaches,
and multiple perspectives, Autonomous Agents and Multi-Agent Systems, vol.
22, no. 2, pp. 317-355.
[31] Caplice C., She Y. (2003) Optimization-based procurement for transportation
services, Journal of Business Logistics, vol. 24, no. 2, pp. 109-28.
[32] Charrier R., Bourjot C., Charpillet F. (2007) A Nonlinear Multi-agent System
designed for Swarm Intelligence: the Logistic MAS, First IEEE International
Conference on Self-Adaptive and Self-Organizing Systems, pp. 32-44.
176
References
[33] Che Y. K. (1993) Design competition through multidimensional auctions,
RAND Journal of Economics, vol. 24. no. 4, pp. 668-680.
[34] Chen-Ritzo C. H., Harrison T. P., Kwasnica A. M., Thomas D. J. (2005) Better,
Faster, Cheaper: An experimental analysis of a multi-attribute reverse auction
mechanism with restricted information feedback, Management Science, vol.
51, no. 12, pp. 1753-1762.
[35] Chli M., De Wilde P., Goossenaerts J., Abramov V., Szirbik N., Correia L.,
Mariano P., Ribeiro R. (2003) Stability of Multi-Agent Systems, IEEE
International Conference on Systems, Man and Cybernetics, vol. 1, pp. 551556.
[36] Crciu M. S., Leon F. (2010) Comparative Study of Multiobjective Genetic
Algorithms, Bulletin of the Polytechnic Institute of Iai, Romania, tome LVI
(LX), section Automatic Control and Computer Science, fasc. 1, pp. 35-47.
[37] Coello C. A. C., Lamont G. B., Van Veldhuizen D. A. (2007) Evolutionary
Algorithms for Solving Multi-Objective Problems, in Goldberg, D. E., Koza, J. R.
(Eds. ), Genetic and Evolutionary Computation Series. Springer, New York,
pp. 233-282.
[38] Conitzer V., Sandholm T. (2003) AWESOME: A general multiagent learning
algorithm that converges in self-play and learns a best response against
stationary opponents, Proceedings of the 20th International Conference on
Machine Learning, ICML-03, pp. 8390, Washington, US.
[39] Cornforth D., Green D. G., Newth D. (2005) Ordered asynchronous processes
in multi-agent systems, Physica D (Nonlinear Phenomena), vol. 204, pp. 7082.
[40] Cortes C., Vapnik V. N. (1995) Support-Vector Networks, Machine Learning,
20, [online] http://www. springerlink. com/content/k238jx04hm87j80g/.
Freund Y., Schapire R. E. (1997) A Decision-Theoretic Generalization of OnLine Learning and an Application to Boosting, Journal of Computer and
System Sciences, 55(1):119-139.
[41] Cover T. M., Hart P. E. (1967) Nearest neighbor pattern classification, IEEE
Transactions on Information Theory, vol. 13 (1), pp. 21-27.
[42] Cramton P., Shoham Y., Steinberg R., eds. (2006) Combinatorial Auctions,
MIT Press, Cambridge, MA.
[43] Craven M. W., Mural R. J., Hauser L. J., Uberbacher E. C. (1995) Predicting
protein folding classes without overly relying on homology, ISMB, vol. 3, pp.
98-106.
[44] Crites R., Barto A. G. (1998) Elevator Group Control Using Multiple
Reinforcement Learning Agents, Machine Learning, vol. 33, pp. 235-262 .
[45] Cundari T. R., Deng J., Pop H. F., Sarbu, C. (2000) Structural analysis of
transition metal beta-X substituent interactions. Toward the use of soft
computing methods for catalyst modeling, J. Chem. Inf. Comput. Sci., 40,
1052-1061.
177

[46] Curteanu S. (2003) Modeling and simulation of free radical polymerization of
styrene under semibatch reactor conditions, Cent. Eur. J. Chem., vol. 1, pp. 6990.
[47] Curteanu S., Leon F. (2006) Hybrid Neural Network Models Applied to a Free
Radical Polymerization Process, Polymer-Plastics Technology and
Engineering, vol. 45, pp. 1013-1023, Marcel Dekker, USA.
[48] Curteanu S., Leon F., Furtun R., Drgoi E. N., Curteanu N. (2010) Comparison
between Different Methods for Developing Neural Network Topology Applied
to a Complex Polymerization Process, Proceedings of the International Joint
Conference on Neural Networks, IEEE World Congress on Computational
Intelligence, Barcelona, Spain, pp. 1293-1300.
[49] Das S., Suganthan P. N. (2011) Differential Evolution: A Survey of the State-ofthe-Art, Evolutionary Computation, IEEE T. Evolut. Comput. 15, pp. 4-31.
[50] Davidon W. C. (1959) Variable metric method for minimization, Technical
report, Argonne National Laboratories, Ill.
[51] Davis R., Smith R. G. (1983) Negotiation as a metaphor for distributed
problem solving, Articial Intelligence, vol. 20, pp. 63-109.
[52] Dawes J. H. P., Freeland M. C. (2013) The 01 test for chaos and strange
nonchaotic attractors, [online] people. bath. ac. uk/jhpd20/publications/
sna. pdf.
[53] Deb K., Pratap A., Agarwal S., Meyarivan T. (2002) A Fast and Elitist
Multiobjective Genetic Algorithm: NSGA-II, IEEE Transactions on
Evolutionary Computation, vol. 6, issue 2, pp. 182-197.
[54] Ding C., Dubchak I. (2001) Multi-Class Protein Fold Recognition Using
Support Vector Machines and Neural Networks, Bioinformatics, vol. 17, pp.
349-358.
[55] Ditto W., Munakata T. (1995) Principles and Applications of Chaotic Systems,
Communications of the ACM, vol. 38, no. 11, pp. 96-102.
[56] Drgoi E. N., Curteanu S., Leon F., Galaction A. I., Cacaval D. (2011) Modeling
of oxygen mass transfer in the presence of oxygen-vectors using neural
networks developed by differential evolution algorithm, Engineering
Applications of Artificial Intelligence, Elsevier, vol. 24, issue 7,
Infrastructures and Tools for Multiagent Systems, pp. 1214-1226.
[57] Drucker H., Burges, C. J. C., Kaufman L., Smola A. J., Vapnik V. N. (1997)
Support Vector Regression Machines, Advances in Neural Information
Processing Systems 9, NIPS 1996, pp. 155161, MIT Press.
[58] Dubchak I., Muchnik I., Holbrook S. R., Kim S. H. (1995) Prediction of protein
folding class using global description of amino acid sequence, Proc. Natl Acad
Sci., vol. 92, pp. 8700-8704.
[59] Dubchak I., Muchnik I., Mayor C., Dralyuk I., Kim S. H. (1999) Recognition of a
protein fold in the context of the Structural Classification of Proteins (SCOP)
classification, Proteins, vol. 35, pp. 401-407.
178
References
[60] Dury A., Le Ber F., Chevrier V. (1998) A Reactive Approach for Solving
Constraint Satisfaction Problems: Assigning Land Use to Farming Territories,
Proceedings of Agents Theories, Architectures and Languages, ATAL98,
Lecture Notes in Artificial Intelligence 1555 Intelligent Agents V, J. P.
Muller, M. P. Singh et A. S. Rao (eds), Springer-Verlag, pp. 397-412.
[61] Esposito F., Malerba D., Semeraro G., Tamma V. (1999) The Effects of Pruning
Methods on the Predictive Accuracy of Induced Decision Trees, Applied
Stochastic Models in Business and Industry, pp. 277-299.
[62] Fahlman S. E. (1988) Faster-Learning Variations on Back-Propagation: An
Empirical Study, in Proceedings of the 1988 Connectionist Models Summer
School, Morgan-Kaufmann, Los Altos CA.
[63] Fan K., Brabazon A., OSullivan C., ONeill M. (2007) Option Pricing Model
Calibration using a Real-valued Quantum-inspired Evolutionary Algorithm,
Proceedings of the 9th annual conference on Genetic and evolutionary
computation (GECCO 07), London, England, pp. 1983-1990.
[64] Feigenbaum M. J. (1979) The Universal Metric Properties of Nonlinear
Transformations, Journal of Statistical Physics, vol. 21, pp. 669-706.
[65] Ferreira P. R., Bazzan A. L. C. (2006) Swarm-GAP: A Swarm Based
Approximation Algorithm for E-GAP, First International Workshop on Agent
Technology for Disaster Management, pp. 49-55.
[66] Festinger L., Riecken H. W., Schachter S. (1956) When Prophecy Fails: A
Social and Psychological Study of A Modern Group that Predicted the
Destruction of the World, Harper-Torchbooks.
[67] Fischer M. M., Reismann M., Hlavackova-Schindler K. (1999) Parameter
estimation in neural spatial interaction modelling by a derivative free global
optimization method, Proceedings of IV international conference on
geocomputation, Fredericksburg, USA [online] http://citeseerx. ist. psu.
edu/ viewdoc/download?doi=10. 1. 1. 20. 9676&rep=rep1&type=pdf.
[68] Fisher R. A. (1936) The use of multiple measurements in taxonomic problems,
Annual Eugenics, 7, Part II, pp. 179-188.
[69] Fletcher R., Powell M. J. D. (1963) A rapidly convergent descent method for
minimization, The Computer Journal, 6(2):163168.
[70] Furtun R., Curteanu S., Cazacu M. (2011) Optimization Methodology Applied
to Feed-Forward Artificial Neural Network Parameters, Int. J. Quantum Chem.
111, pp. 539-553.
[71] Furtun R., Curteanu S., Leon F. (2011) An Elitist Non-Dominated Sorting
Genetic Algorithm Enhanced with a Neural Network Applied to the MultiObjective Optimization of a Polysiloxane Synthesis Process, Engineering
Applications of Artificial Intelligence, Elsevier, vol. 24, pp. 772-785.
[72] Furtun R., Curteanu S., Leon F. (2012) Multi-objective optimization of a
stacked neural network using an evolutionary hyper-heuristic, Applied Soft
Computing, Elsevier, vol. 12, issue 1, January 2012, pp. 133-144.
179

[73] Gabbai J. M. E., Yin H., Wright W. A., Allinson N. M. (2005) Self-Organization,
Emergence and Multi-Agent Systems, IEEE International Conference on
Neural Networks and Brain, ICNN&B 05, pp. 1858-1863.
[74] Gardner M. (1971) On Cellular automata, self-reproduction, and the game
life, Scientific American, February 1971.
[75] Gaston M. E., DesJardins M. (2005) Social Networks and Multi-agent
Organizational Performance, Proceedings of the 18th International Florida
Artificial Intelligence Research Society Conference, FLAIRS 2005, special
track on AI for Social Networks, Social Networks for AI, Clearwater, FL.
[76] Georg J. P., Gleizes M. P., Glize P., Rgis C. (2003) Real-time Simulation for
Flood Forecast: an Adaptive Multi-Agent System STAFF, Proceedings of the
AISB03 Symposium on Adaptive Agents and Multi-Agent Systems,
University of Wales, Aberystwyth, pp. 7-11.
[77] Ghiassi M., Nangoy S. (2009) A dynamic articial neural network model for
forecasting nonlinear processes, Computers & Industrial Engineering, vol. 57,
pp. 287-297.
[78] Ghiassi M., Saidane H. (2005) A dynamic architecture for artificial neural
networks, Neurocomputing, vol. 63, pp. 397-413.
[79] Groan C. (2003) Performance metrics for multiobjective optimization
evolutionary algorithms, Proceedings of Conference on Applied and
Industrial Mathematics (CAIM), Oradea, Romania, [online] www. cs. ubbcluj.
ro/~cgrosan/grosanCAIM. pdf.
[80] Grothmann R. (2002) Multi-Agent Market Modeling based on Neural
Networks, PhD Thesis, Faculty of Economics, University of Bremen,
Germany.
[81] Guo L., Cai X. (2009) The Fractal Dimensions of Complex Networks, Chinese
Physics Letters, vol. 26, no. 8.
[82] Hamed H. N. A., Kasabov N. K., Shamsuddin S. M. (2011) Quantum-Inspired
Particle Swarm Optimization for Feature Selection and Parameter
Optimization in Evolving Spiking Neural Networks for Classification Tasks, in
E. Kita (ed. ), Evolutionary Algorithms, InTech, pp. 133-148.
[83] Hansen M. P., Jaszkiewicz A. (1998) Evaluating the quality of approximations
to the nondominated set, Technical Report IMM-RE P-1998-7, Institute of
Mathematical Modeling, Technical University of Denmark.
[84] Hart P. E., Nilsson N. J., Raphael B. (1968) A Formal Basis for the Heuristic
Determination of Minimum Cost Paths, IEEE T Syst Sci Cyb, 4(2):100-107.
[85] Harvill J. L., Ray B. K. (2005) A note on multi-step forecasting with functional
coefficient autoregressive models, International Journal of Forecasting, vol.
21, pp. 717-727.
[86] Herrero A., Corchado E., Pellicer M. A., Abraham A. (2008) Hybrid Multi
Agent-Neural Network Intrusion Detection with Mobile Visualization, in
Collection of Innovations in Hybrid Intelligent Systems, pp. 320-328.
180
References
[87] Holm L., Ouzounis C., Sander C., Tuparev G., Vriend G. (2008) FSSP: Families
of Structurally Similar Proteins,
[online]
ftp://ftp.
ebi.
ac.
uk/pub/databases/fssp.
[88] Hoos H. H., Boutilier C. (2000) Solving combinatorial auctions using
stochastic local search, Proceedings of the Seventeenth National Conference
on Artificial Intelligence (AAAI-2000).
[89] Hota A. R., Pat A. (2011) An Adaptive Quantum-inspired Differential Evolution
Algorithm for 0-1 Knapsack Problem, Computer Science - Neural and
Evolutionary Computing, I. 2. 8.
[90] Houle J. L., Cadigan W., Henry S., Pinnamaneni A., Lundahl S. (2004)
Database Mining in the Human Genome Initiative, Whitepaper, Biodatabases.
com,
Amita
Corporation,
[online]
http://www.
biodatabases.
com/whitepaper. html.
[91] Hu J., Wellman M. P. (1998) Multiagent reinforcement learning: Theoretical
framework and an algorithm, Proceedings of the 15th International
Conference on Machine Learning, ICML-98, pp. 242250, Madison, US.
[92] Janacek G. (2001) Practical Time Series, Oxford University Press.
[93] Jennings N. R., Sycara K., Woolridge M. (1998) A Roadmap of Agent Research
and Development, Autonomous Agents and Multi-Agent Systems, vol. 1, pp.
7-38.
[94] Johnson A., Morris P., Muscettola N., Rajan K. (2000) Planning in
Interplanetary Space, Theory and Practice, Proceedings of AIPS.
[95] Jovanovic B., Nyarko Y. (1995) A Bayesian Learning Model Fitted to a Variety
of Empirical Learning Curves, Brookings Papers on Economic Activity, pp.
247-305.
[96] Katerelos I. D., Koulouris A. G. (2004) Is Prediction Possible? Chaotic Behavior
of Multiple Equilibria Regulation Model in Cellular Automata Topology,
Complexiy, Wiley Periodicals, vol. 10, no. 1.
[97] Kautz H., Selman B. (1996) Pushing the envelope: Planning, propositional
logic and stochastic search, Proceedings of AAAI
[98] Kennedy J., Eberhart R. (1995) Particle Swarm Optimization, Proceedings of
IEEE International Conference on Neural Networks IV, pp. 19421948,
doi:10.1109/ICNN.1995.488968.
[99] Khanpour S., Movaghar A. (2006) Design and Implementation of Optimal
Winner Determination Algorithm in Combinatorial e-Auctions, World
Academy of Science, Engineering and Technology, vol. 20, 2006.
[100] Khasei M., Bijari M. (2010) An artificial neural network (p, d, q) model for
timeseries forecasting, Expert Systems with Applications, vol. 37, pp. 479489.
[101] Kohonen T. (1982) Self-Organized Formation of Topologically Correct
Feature Maps, Biological Cybernetics 43 (1): 5969.
181

[102] Kraus S. (2001) Automated Negotiation and Decision Making in Multiagent
Environments, MultiAgent Systems and Applications, ACAI-EASSS
2001 Proceedings, Luck M., Marik V., Stepankova O., Trappl R. (eds),
Springer-Verlag, pp. 150-172.
[103] Kudrycki T. P. (1988) Neural Network Implementation of a Medical
Diagnostic Expert System, Master's Thesis, College of Engineering, University
[104] Lahiri S. K., Khalfe N. (2010) Modeling of commercial ethylene oxide reactor:
A hybrid approach by artificial neural network & differential evolution, Int. J.
Chem. React. Eng. 8 [online] http://www. bepress. com/ijcre/vol8/A4.
[105] Ledyard J. O., Olson M., Porter D., Swanson J. A., Torma D. P. (2002) The first
use of a combined-value auction for transportation services, Interfaces, vol.
32, no. 5, pp. 4-12.
[106] Leibowitz N., Baum B., Enden G., Karniel A. (2010) The exponential learning
equation as a function of successful trials results in sigmoid performance,
Journal of Mathematical Psychology, vol. 54, pp. 338-340.
[107] Lemaire T., Alami R., Lacroix S. (2004) A distributed task sallocation scheme
in multi-uav context, Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 38223827, New Orleans, LA.
[108] Leon F. (2006) Ageni inteligeni cu capaciti cognitive, Tehnopress, Iai,
Romania.
[109] Leon F. (2010) Intelligent Agent Planning with Quasi-Determined States
Using Inductive Learning, Bulletin of the Polytechnic Institute of Iasi, tome
LVI (LX), section Automatic Control and Computer Science, fasc. 3, pp. 2742.
[110] Leon F. (2011a) Self-Organization of Roles Based on Multilateral Negotiation
for Task Allocation, in Franziska Klugl, Sascha Ossowski (eds. ), Multiagent
System Technologies, Lecture Notes in Artificial Intelligence, LNAI 6973, 9th
German Conference, MATES 2011, Berlin, Germany, Proceedings, SpringerVerlag Berlin Heidelberg 2011, pp. 173-180.
[111] Leon F. (2011b) Multiagent Role Allocation Based on Cognitive Dissonance
Theory, International Review on Computers and Software, vol. 6, no. 5, pp.
715-724.
[112] Leon F. (2011c) Evolving Equilibrium Policies for a Multiagent Reinforcement
Learning Problem with State Attractors, in Piotr Jedrzejowicz, Ngoc Thanh
Nguyen, Kiem Hoang (eds. ), Computational Collective Intelligence.
Technologies and Applications, Part II, Lecture Notes in Artificial
Intelligence, LNAI 6923, Third International Conference, ICCCI 2011 Gdynia,
Poland, September 2011, Proceedings, Springer-Verlag Berlin Heidelberg,
pp. 201-210.
[113] Leon F. (2012a) Emergent Behaviors in Social Networks of Adaptive Agents,
Mathematical Problems in Engineering, vol. 2012, Article ID 857512, 19
pages, doi:10. 1155/2012/857512.
182
References
[114] Leon F. (2012b) A Quantum-Inspired Evolutionary Algorithm for MultiAttribute Combinatorial Auctions, Proceedings of 2012 16th International
Conference on System Theory, Control and Computing (ICSTCC 2012),
Sinaia, Romania
[115] Leon F. (2012c) Real-Valued Quantum-Inspired Evolutionary Algorithm for
Multi-Issue Multi-Lateral Negotiation, Proceedings of 2012 IEEE 8th
International Conference on Intelligent Computer Communication and
Processing (ICCP 2012), pp. 41-48, Cluj-Napoca, Romania
[116] Leon F. (2013) A Multiagent System Generating Complex Behaviours, in
Costin Bdic, Ngoc Thanh Nguyen, Marius Brezovan (eds. ), Computational
Collective Intelligence. Technologies and Applications, Lecture Notes in
Artificial Intelligence, LNAI 8083, 5th International Conference, ICCCI 2013,
Springer-Verlag Berlin Heidelberg, pp. 154-164.
[117] Leon F., Aigntoaiei B. I., Zaharia M. H. (2009) Performance Analysis of
Algorithms for Protein Structure Classification, Proceedings of the 20th
International Workshop on Database and Expert Systems Applications,
DEXA 2009, eds. A. M. Tjoa, R. R. Wagner, IEEE Computer Society,
Conference Publishing Services, pp. 203-207.
[118] Leon F., Curteanu S., Lisa C., Hurduc N. (2007) Machine Learning Methods
Used to Predict the Liquid-Cristalline Behavior of Some Copolyethers,
Molecular Crystals & Liquid Crystals, vol. 469, pp. 1-22, Taylor & Francis
Group, USA.
[119] Leon F., Leca A. D. (2011) Dual Manner of Using Neural Networks in a
Multiagent System to Solve Inductive Learning Problems and to Learn from
Experience, in F. M. T. Brazier, Kees Nieuwenhuis, Gregor Pavlin, Martijn
Warnier, Costin Bdic (eds. ), Intelligent Distributed Computing V, Studies
in Computational Intelligence, vol. 382, Proceedings of the 5th International
Symposium on Intelligent Distributed Computing - IDC 2011, Delft, The
Netherlands October 2011, Springer-Verlag Berlin Heidelberg 2011, pp.
81-91.
[120] Leon F., Leca A. D., Atanasiu G. M. (2010) Strategy Management in a
Multiagent System Using Neural Networks for Inductive and Experience-based
Learning, in C. Bratianu, N. A. Pop (eds. ) - Management & Marketing,
Challenges for Knowledge Society, vol. 5, no. 4, pp. 3-28, Editura Economic,
Bucureti.
[121] Leon F., Lisa C., Curteanu S. (2010) Prediction of the Liquid Crystalline
Property Using Different Classification Methods, Molecular Crystals and
Liquid Crystals, vol. 518, pp. 129-148.
[122] Leon F., Piuleac C. G., Curteanu S. (2010) Stacked Neural Network Modeling
Applied to the Synthesis of Polyacrylamide Based Multicomponent Hydrogels,
Macromolecular Reaction Engineering, vol. 4, pp. 591-598, WILEY-VCH
Verlag GmbH & Co., Germany.
183

[123] Leon F., Zaharia M. H. (2010) Stacked Heterogeneous Neural Networks for
Time Series Forecasting, Mathematical Problems in Engineering, vol. 2010,
Article ID 373648, 20 pages.
[124] Lerman K., Shehory O. (2000) Coalition formation for large-scale electronic
markets, Proceedings of 4th International Conference on Multi-Agent
Systems, pp. 167-174, Boston, Massachusetts, USA.
[125] Leyton-Brown K., Nudelman E., Andrew G., McFadden J., Shoham Y. (2011)
Combinatorial Auction Test Suite (CATS), [online] http://www. cs. ubc. ca/
~kevinlb/CATS.
[126] Leyton-Brown K., Shoham Y., Tennenholtz M. (2000) An algorithm for multiunit combinatorial auctions, Proceedings of the Seventeenth National
Conference on Artificial Intelligence (AAAI- 2000).
[127] Li J., Wu J., Liu H. (2012) A Quantum Genetic Algorithm for Continuous Space
Optimization based on Search Space Partition, International Journal of
Advancements in Computing Technology, vol. 4, no. 4, pp. 73-83.
[128] Liao R., Wang X., Qin Z. (2010) A Novel Quantum-inspired Genetic Algorithm
with Expanded Solution Space, Proceedings of the Second International
Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 2,
pp. 192-195.
[129] Littman M. L. (1994) Markov games as a framework for multi-agent
reinforcement learning, Proceedings of the 11th International Conference on
MachineLearning, ICML-94, pp. 157163, New Brunswick, US.
[130] Littman M. L. (2001) Value-function reinforcement learning in Markov games,
Journal of Cognitive Systems Research, vol. 2, issue 1, pp. 5566.
[131] Lucas C. (2008) Self-Organizing Systems. Frequently Asked Questions, Version
3, September 2008, http://www. calresco. org/sos/sosfaq. htm.
[132] Markowetz F., Edler L., Vingron M. (2003) Support Vector Machines for
Protein Fold Class Prediction, [online] http://cmb. molgen. mpg. de/
compdiag/docs/MarkowetzEdlerVingron. ps.
[133] Martin J. A. (2010) A Reinforcement Learning Environment in Matlab,
[online] http://www. dia. fi. upm. es/~jamartin/download. htm.
[134] Martin, B. (1995) Instance-Based Learning: Nearest Neighbour with
Generalisation, Master of Science Thesis, University of Waikato, Hamilton,
New Zealand.
[135] Medin D. L., Ross B. H., Markman A. B. (2005) Cognitive Psychology, Wiley.
[136] Melkemi K. E., Batouche M., Foufou S. (2005) Chaotic MultiAgent System
Approach for MRF-based Image Segmentation, Proceedings of the 4th
International Symposium on Image and Signal Processing and Anlysis
ISPA'05, Zagreb, Croatia, pp. 268-273.
[137] Melo L. M., Costa G. A. O. P., Feitosa R. Q., Abs da Cruz A. V. (2008) QuantumInspired Evolutionary Algorithm and Differential Evolution Used in the
Adaptation of Segmentation Parameters, GEOBIA 2008 - Pixels, Objects,
184
References
Intelligence: GEOgraphic Object-Based Image Analysis for the 21st Century,
Calgary.
[138] Mikki S., Kishk A. A. (2006) Quantum Particle Swarm Optimization for
Electromagnetics, IEEE Transactions on Antennas and Propagation, vol. 54,
issue 10, pp. 2764-2775.
[139] Milgram S. (1967) The Small World Problem, Psychology Today, vol. 2, pp.
60-67.
[140] Mishra D., Veeranmant D. (2002) A multi-attribute reverse auction for
outsourcing, Proceedings of the 13th International Workshop on Database
and Expert Systems Application, pp. 675-679.
[141] Moore A. (1990) Efficient Memory-Based Learning for Robot Control, PhD
thesis, University of Cambridge.
[142] Mukherjee A., Zhang J. (2008) A reliable multi-objective control strategy for
batch processes based on bootstrap aggregated neural network models, J.
Process Contr. 18, pp. 720-734.
[143] Murzin A. G., Chandonia J. M., Andreeva A., Howorth D., L. LoConte, Ailey B.
G., Brenner S. E., Hubbard T. J. P., Chothia C. (2007) SCOP: Structural
Classification of Proteins, [online] http://scop. mrc-lmb. cam. ac. uk/scop.
[144] Naeeni A. F. (2004) Advanced Multi-Agent Fuzzy Reinforcement Learning,
Master Thesis, Computer Science Department, Dalarna University College,
Sweden,
[online]
http://www2.
informatik.
hu-berlin.
de/~ferdowsi/Thesis/ Master %20Thesis. pdf.
[145] Nguyen M. H., Abbass H. A., McKay R. I. (2005) Stopping criteria for ensemble
of evolutionary artificial neural networks, Appl. Soft Comput. 6, pp. 100-107.
[146] Nielsen M. A., Chuang I. L. (2010) Quantum Computation and Quantum
Information, 10th Anniversary Edition, Cambridge University Press.
[147] Oeda S., Ichimura T., Yoshida K. (2004) Immune Multi Agent Neural Network
and Its Application to the Coronary Heart Disease Database, Lecture Notes in
Computer Science, 2004, vol. 3214, pp. 1097-1105.
[148] Orengo C., Cuff A., Sillitoe I., Lewis T., Clegg A. (2008) CATH: Protein
Structure Classification, Institute of Structural and Molecular Biology,
University College London, http://www. cathdb. info.
[149] Ott E., Grebogi C., Yorke J. A. (1990) Controlling Chaos, Phys. Rev. Lett. 64,
2837.
[150] Pagnucco M., Peppas P. (2001) Causality and Minimal Change Demystified,
Proceedings of the Seventeenth International Joint Conference on Artificial
Intelligence (IJCAI'01), Seattle, USA, pp. 125-130.
[151] Pant M., Thangaraj R., Abraham A. (2008) A New Quantum Behaved Particle
Swarm Optimization, Proceedings of the 10th annual conference on Genetic
and evolutionary computation (GECCO 08), Atlanta, GA, USA, pp. 87-94.
[152] Pfeffermann D., Allon J. (1989) Multivariate exponential smoothing: Methods
and practice, International Journal of Forecasting, vol. 5, pp. 83-98.
185

[153] Picard G., Bernon C., Gleizes M. P. (2005) ETTO: Emergent Timetabling by
Cooperative Self-Organisation, Third International Workshop on Engineering
Self-Organising Applications, ESOA05, Utrecht, The Netherlands, pp. 31-45.
[154] Piuleac C. G., Poulios I., Leon F., Curteanu S., Kouras A. (2010) Modeling
Methodology Based on Stacked Neural Networks Applied to the Photocatalytic
Degradation of Triclopyr, Separation Science and Technology, Taylor &
Francis, USA, vol. 45, pp. 1644-1650.
[155] Piuleac C. G., Rodrigo M., Caizares P., Curteanu S., Sez C. (2010) Ten steps
modelling of electrolysis processes by using neural networks, Environ. Modell.
Softw., vol. 25, no. 1, pp. 74-81.
[156] Plagianakos P., Magoulas G. D., Nousis N. K., Vrahatis M. N. (2001) Training
multilayer networks with discrete activation functions, Proceedings of the
INNS-IEEE international joint conference on neural networks.
[157] Price K. V., Storm R. M., Lampinem J. A. (2005) Differential Evolution A
Practical Approach to Global Optimization, Springer-Verlag, Berlin.
[158] Quinlan J. R. (1986) Induction of Decision Trees, Machine Learning, vol. 1, pp.
81-106.
[159] Quinlan R. (1993) C4. 5: Programs for Machine Learning, Morgan Kaufmann
Publishers, San Mateo, CA.
[160] Ravishanker N., Ray B. K. (2002) Bayesian prediction for vector ARFIMA
processes, International Journal of Forecasting, vol. 18, pp. 207-214.
[161] Red'ko V. G., Mosalov O. P., Prokhorov D. V. (2005) A model of Baldwin effect
in populations of self-learning agents, Proceedings of the International Joint
Conference on Neural Networks, IJCNN 2005, Montreal, Canada, pp. 13551360.
[162] Rendell P. (2002) Turing Universality of the Game of Life, Collision-based
computing, Springer-Verlag London, UK.
[163] Riedmiller M., Braun H. (1993) A direct adaptive method for faster
backpropagation learning: The RPROP algorithm, IEEE Int. Conf. Neur. Net.,
vol. 1, pp. 586-591.
[164] RL-Community (2010) Mountain Car (Java), RL-Library,
http://library. rl-community. org/wiki/Mountain_Car_(Java).
[online]
[165] Rodriguez S., Hilaire V., Koukam A. (2006) Holonic Modeling of Environments
for Situated Multi-agent Systems, Environments for Multi-Agent Systems II,
Second International Workshop, E4MAS 2005, Utrecht, The Netherlands,
Selected Revised and Invited Papers, pp. 18-31.
[166] Rosch, E. (1975) Cognitive Representation of Semantic Categories, J
Experimental Psychology, vol. 104, pp. 192-233.
[167] Rothman D. (2006) Nonlinear Dynamics I: Chaos, [online] http://ocw. mit.
edu/courses/earth-atmospheric-and-planetary-sciences/
12-006jnonlinear-dynamics-i-chaos-fall-2006/lecture-notes/ lecnotes15. pdf.
186
References
[168] Rumelhart D. E., Hinton G. E., Williams R. J. (1986) Learning internal
representations by error propagation, in D. E. Rumelhart, J. L. McClelland
(eds. ): Parallel Distributed Processing: Explorations in the Microstructure
of Cognition, Volume 1, 318-362, The MIT Press, Cambridge, MA.
[169] Rummery G. A., Niranjan M. (1994) On-line Q-learning Using Connectionist
Systems, Technical Report CUED/F-INFENG/TR 166, Engineering
Department, Cambridge University.
[170] Russell S. J., Norvig P. (2002) Artificial Intelligence: A Modern Approach,
Prentice Hall, 2nd Edition.
[171] Sandholm T. (2002) Algorithm for optimal winner determination in
combinatorial auctions, Artificial Intelligence, vol. 135, no. 1-2, pp. 1-54.
[172] Sandholm T., Suri S. (2000) Improved algorithms for optimal winner
determination in combinatorial auction and generalization, Proceedings of
the Seventeenth National Conference on Artificial Intelligence (AAAI-2000).
[173] Scerri P., Farinelli A., Okamoto S., Tambe M. (2005) Allocating tasks in
extreme teams, Proceedings of the Fourth International Joint Conference on
Autonomous Agents and Multiagent Systems, pp. 727-734, ACM Press.
[174] Schaffer D. J. (1985) Multiple objective optimization with vector evaluated
genetic algorithms, Proceedings of the International Conference on Genetic
Algorithm and Their Applications, 1985.
[175] Schwind M., Stockheim T., Rothlauf F. (2003) Optimization Heuristics for the
Combinatorial Auction Problem, Working Papers in Information Systems,
University of Mannheim, [online] http://wi. bwl. uni-mainz. de/Dateien/
working_paper_2003_13. pdf.
[176] Shepard R. N., Hovland C. I., Jenkins H. M. (1961) Learning and memorization
of classifications, Psychological Monographs, 75.
[177] Singh S. P., Sutton R. S. (1996) Reinforcement learning with replacing
eligibility traces, Machine Learning, vol. 22(1/2/3), pp. 123-158.
[178] Smith R. G. (1980) The Contract Net Protocol: High-Level Communication and
Control in a Distributed Problem Solver, IEEE Transactions on Computers,
vol. 29, no. 12, pp. 1104-1113.
[179] Sol R. V., Gamarra J. G. P., Ginovart M., Lpez D. (1999) Controlling Chaos in
Ecology: From Deterministic to Individual-based Models, Bulletin of
Mathematical Biology, vol. 61, pp. 1187-1207
[180] Stepney S., Polack F. A. C., Turner H. R. (2006) Engineering Emergence,
ICECCS'06, Stanford, CA, USA, IEEE, pp 89-97.
[181] Stone L., He D. (2007) Chaotic oscillations and cycles in multi-trophic
ecological systems, Journal of Theoretical Biology, vol. 248, pp. 382-390.
[182] Storn R., Price K. (1997) Differential evolution - a simple and efficient
heuristic for global optimization over continuous spaces, Journal of Global
Optimization, vol. 11, pp. 341-359.
187

[183] Subudhi B., Jena D. (2009) An Improved Differential evolution and Levenberg
Marquardt trained neural networks scheme for nonlinear system
identification, Int. J. Autom. Comput. 6, pp. 137-144.
[184] Sutton R. S., Barto A. G. (1998) Reinforcement Learning: An Introduction, MIT
Press, Cambridge, Massachusetts.
[185] Taylor M. E., Kuhlmann G., Stone P. (2008) Autonomous Transfer for
Reinforcement Learning, Proceedings of the Seventh International Joint
Conference on Autonomous Agents and Multiagent Systems.
[186] Tesauro G. (1995) Temporal Difference Learning and TD-Gammon,
Communications of the ACM, vol. 38, no. 3, pp. 5868.
[187] Thomas G., Howard A. M., Williams A. B., Moore-Alston A. (2005) Multi-robot
task allocation in lunar mission construction scenarios, IEEE International
Conference on Systems, Man and Cybernetics, vol. 1, pp. 518-523, Hawaii.
[188] Thrun S. B., et al. (1991) The MONKs Problems - A Performance Comparison
of Different Learning Algorithms, Technical Report CS-CMU-91-197, Carnegie
Mellon University
[189] Tian Y., Zhang J., Morris J. (2001) Modeling and optimal control of a batch
polymerization reactor using a hybrid stacked recurrent neural networks, Ind.
Eng. Chem. Res., 40: 4525.
[190] Tong H. (1983) Threshold models in nonlinear time series analysis, Springer
Verlag, New York.
[191] Tran K. D. (2006) An Improved Multi-Objective Evolutionary Algorithm with
Adaptable Parameters, Doctoral Dissertation Nova Southeastern University,
FL, USA, [online] www. lania. mx/ ~ccoello/EMOO/thesis-tran. pdf. gz.
[192] Tseng F. M., Yu H. C., Tzeng G. H. (2002) Combining neural network model
with seasonal time series ARIMA model, Technological Forecasting and Social
Change, vol. 69, pp. 71-87.
[193] Tzanis G., Berberidis C., Vlahavas I. (2005) Biological Data Mining,
Encyclopedia of Database Technologies and Applications, in L. C. Rivero, J. H.
Doorn and V. E. Ferraggine (eds. ), IDEA Group Publishing, pp. 35-41.
[194] Unemi T., Nagayoshi M. (1996) Evolution of Reinforcement Learning Agents toward a feasible design of evolvable robot team, in Workshop Notes of
ICMAS'96 Workshop 1: Learning, Interactions and Organizations in
Multiagent Environment.
[195] Veit D., Czernohous C. (2003) Automated Bidding Strategy Adaption using
Learning Agents in Many-to-Many e-Markets, Proceedings of the Workshop
on Agent Mediated Electronic Commerce V, AMEC-V, Melbourne, Australia.
[196] Vidal J. M. (2007) Fundamentals of Multiagent Systems with NetLogo
Examples, [online] http://jmvidal. cse. sc. edu/papers/mas. pdf.
[197] Vlachogiannis J. G., Lee K. Y. (2008) Quantum-Inspired Evolutionary
Algorithm for Real and Reactive Power Dispatch, IEEE Transactions on Power
Systems, vol. 23, no. 4, pp. 1627-1636.
188
References
[198] Vlachogiannis J. G., Lee K. Y. (2008) Quantum-Inspired Evolutionary
Algorithm for Real and Reactive Power Dispatch, IEEE Transactions on Power
Systems, vol. 23, no. 4, pp. 1627-1636.
[199] Voort M. V. D., Dougherty M., Watson S. (1996) Combining Kohonen maps
with ARIMA time series models to forecast traffic flow, Transportation
Research Part C: Emerging Technologies, vol. 4, pp. 307-318.
[200] Wang F. (2002) Self-organising Communities Formed by Middle Agents,
Proceedings of the 1st International Conference on Autonomous Agents and
Multi-Agent Systems, AAMAS02, pp. 1333-1339.
[201] Wang Y., Zhou J., Mo L., Zhang R., Zhang Y. (2012) A Modified Differential
Real-coded Quantum-inspired Evolutionary Algorithm for Continuous Space
Optimization, Journal of Computational Information Systems, vol. 8, no. 4,
pp. 1487-1495.
[202] Watkins C. J. C. H., (1989) Learning from Delayed Rewards, PhD thesis,
Cambridge University.
[203] Watkins C. J. C. H., Dayan P. (1992) Technical Note: Q-Learning, Machine
Learning, vol. 8, pp. 55-68.
[204] Watts D. J., Strogatz S. H. (1998) Collective dynamics of small-world
networks, Nature, no. 393, p. 440.
[205] Weinberger K. Q., Saul L. K. (2009) Distance Metric Learning for Large
Margin Nearest Neighbor Classification, Journal of Machine Learning
Research, vol. 10, pp. 207-244.
[206] Werbos P. J. (1974) Beyond regression: new tools for prediction and analysis
in the behavioral sciences, PhD Thesis, Harvard University, Cambridge, MA.
[207] Witten I. H., Frank E. (2000) Data Mining: Practical machine learning tools
with Java implementations, Morgan Kaufmann, San Francisco.
[208] Wolf A., Swift J. B., Swinney H. L. Vastano J. A. (1985) Determining Lyapunov
exponents from a time series, Physica D, (Nonlinear Phenomena) vol. 16, pp.
285-317.
[209] Xiao J., Yan Y., Lin Y., Yuan L., Zhang J. (2008) A Quantum-inspired Genetic
Algorithm for Data Clustering, Proceedings of IEEE Congress on
Evolutionary Computation (CEC 2008), IEEE World Congress on
Computational Intelligence, pp. 1513-1519.
[210] Xie A., Li Y., Sun W. (2004) Review on the Theory of Multi-Attribute
E-Auction, Asia Pacific Management Review, vol. 9, no. 4, pp. 621-643.
[211] Xing H. J., Hu B. G. (2009) Two-phase construction of multilayer perceptrons
using information theory, IEEE Trans. Neural Networks, vol. 20, no. 4, pp.
715-721.
[212] Zhang J. (2008) Batch-to-batch optimal control of a batch polymerization
process based on stacked neural network models, Chem. Eng. Sci. 63, pp.
1273-1281.
189

[213] Zhao S., Xu G., Tao T., Liang L. (2009) Real-coded chaotic quantum-inspired
genetic algorithm for training of fuzzy neural networks, Computers and
Mathematics with Applications, vol. 57, pp. 2009-2015.
[214] Zheng X., Koenig S. (2008) Reaction functions for task allocation to
cooperative agents, Proceedings of 7th International Conference on
Autonomous Agents and Multiagent Systems, Estoril, Portugal, pp. 559-566.
[215] Zongyao H., Zhou L. (2011) A New Real-Coded Quantum-Inspired
Evolutionary Algorithm, International Journal of Advancements in
Computing Technology, vol. 3, no. 7.
190

Florin Leon - Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Florin Leon - Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

Transféré par

Droits d'auteur :

Formats disponibles

Advanced Methods in Artificial

Part I. Scientific Achievements

Chapter 4. Applications of Machine Learning Methods

Part II. Professional Development Plans .............................................. 165

Part III. References ....................................................................................... 175

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

different types of agents. They use neural networks to solve external

Neural networks can handle multiple outputs corresponding to

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

apropiat atractor de stri. Aceast tehnic este considerat a fi o abordare

Diferite tipuri de algoritmi evolutivi precum algoritmii genetici

Multiagent Systems with

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

where A [0,1] and p is a monotonically increasing function of t such that

Chapter 1. Multiagent Systems with Emergent Behaviour

Figure 1.1. The learning curve for p0 = 0.01, p = 0.99, = 0.01

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

m T and F the set of attributes or features. Each task is defined by the

with c j D j , j 1,..., p and p F .

It is assumed that an agent has the greatest utility when the

Chapter 1. Multiagent Systems with Emergent Behaviour

The equation for the utility of an individual attribute is adapted after

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

X S1 ... Sn | Si S , Si S j , i j, i, j A, S1 ... Sn S (1.9)

Figure 1.2. Chromosome encoding and example

Figure 1.3. Permutation crossover (example)

Mutation by resetting a gene is performed to the split points part of

Chapter 1. Multiagent Systems with Emergent Behaviour

Here, parameter controls the steepness of the sigmoid curve in its

When an agent receives a task above its competence level, it will

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

receives tasks below its competence level, it starts to decrease its

d ai max d aij , i 1.. | T |

Thus, an agent with a sufficient competence level can solve a task in

Chapter 1. Multiagent Systems with Emergent Behaviour

Stirling number of the second kind. The domain of all attributes is

Figure 1.4. The learning curve used by the agents

The learning curve used by the agents is a particularization of

Other parameters used are the following: 0.1 , 0.001 ,

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

Chapter 1. Multiagent Systems with Emergent Behaviour

Figure 1.6. Evolution of attribute utilities of the agents

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

Figure 1.7. The evolution of attribute utilities of the agents

Table 1.2. The attribute values of tasks (case 2)

Chapter 1. Multiagent Systems with Emergent Behaviour

The total productivity of the system is displayed in figure 1.8 and

Figure 1.8. The evolution of system productivity

Figure 1.9. The evolution of system productivity

The following scenario is used to show the behaviour of the system

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

Agent 1 can achieve a maximum competence level of 4 for Attribute 1. This

Figure 1.10. The evolution of attribute utilities of the agents

The final allocation when using the Nash solution is therefore:

Chapter 1. Multiagent Systems with Emergent Behaviour

search space is huge, but the evolutionary algorithm can be applied in a

Figure 1.11. The evolution of system productivity

The increase and stabilization of total productivity is an emergent

1.2. Social Networks of Adaptive Agents

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning

potential organization resulting from changes to the underlying components

Chapter 1. Multiagent Systems with Emergent Behaviour

Advanced Methods in Artificial Intelligence: Agents, Optimization and Machine Learning