Académique Documents
Professionnel Documents
Culture Documents
Habilitation Thesis
March 2014
Table of Contents
Summary ............................................................................................................. 5
Rezumat .............................................................................................................. 9
111
118
125
136
141
143
150
153
Summary
The habilitation thesis aims at presenting a selection of research
results obtained in the field of artificial intelligence, mainly dealing with
intelligent agents, multiagent systems, optimization and machine learning
methods.
Concerning multiagent role allocation, a method is proposed by
which the agents can self-organize based on the changes of their individual
utilities. Agents have different preferences regarding the features of the
tasks they are given and their adaptive behaviour is based on the
psychological theory of cognitive dissonance, where an agent working on a
low-preference task gradually improves its attitude towards it. The total
productivity is shown to increase as an emergent property of the system.
Another demonstration of emergent behaviour is based on the design
of an interaction protocol for a task allocation system, which can reveal the
formation of social networks. The agents can improve their solving ability
by learning and can collaborate with their peers to deal with more difficult
tasks. The average number of connections and resources of the agents
follows a power law distribution.
Also, a simple set of interaction rules is proposed that can generate
overall behaviours with different levels of complexity, from asymptotically
stable to chaotic. It is shown that very small perturbations can have a great
impact on the evolution of the system, and some methods of controlling
such perturbations are investigated in order to have a desirable final state.
Another contribution in the subfields of planning and learning is a
method that includes a learning phase into the plan itself, so that the agent
can dynamically recognize the preconditions of an action when the states are
not fully determined, and it can even directly choose its actions based on
learning results.
The notion of state attractor is introduced, which allows the agents to
compute their actions based on the proximity of their current state to the
nearest state attractor. This technique is considered to be an alternative way
of approaching difficult multiagent reinforcement learning problems.
Considering autonomous learning, a system for solving classification
and regression problems is proposed, which involves competition between
5
Summary
Rezumat
Teza de abilitare i propune s prezinte o selecie a rezultatelor de
cercetare obinute n domeniul inteligenei artificiale, legate n principal de
ageni inteligeni, sisteme multi-agent, metode de optimizare i de nvare
automat.
n ceea ce privete alocarea rolurilor n sisteme multi-agent, se
propune o metod prin care agenii se pot auto-organiza pe baza
modificrilor n utilitile lor individuale. Agenii au preferine diferite
referitoare la caracteristicile sarcinilor pe care le primesc, iar
comportamentul lor adaptiv se bazeaz pe teoria disonanei cognitive din
psihologie, n care un agent care lucreaz la o sarcin pentru care manifest
o preferin sczut i mbuntete treptat atitudinea fa de aceasta. Se
arat c o proprietate emergent a sistemului este creterea productivitii
totale.
O alt situaie n care un sistem multi-agent manifest un
comportament emergent este demonstrat prin proiectarea unui protocol de
interaciuni pentru alocarea sarcinilor de lucru, care conduce la formarea de
reele sociale. Agenii i pot mbunti capacitatea de rezolvare prin
nvare i pot colabora cu ali ageni atunci cnd sunt confruntai cu sarcini
mai dificile. Numrul mediu de conexiuni i resurse ale agenilor urmeaz o
distribuie de tip lege de putere.
De asemenea, se propun o serie de reguli simple de interaciune care
pot genera comportamente globale cu diferite niveluri de complexitate, de la
asimptotic stabil la haotic. Se arat c perturbaii foarte mici pot avea un
impact mare asupra evoluiei sistemului i sunt investigate unele metode de
control al acestor perturbaii n scopul de a avea o stare final dorit.
O alt contribuie n subdomeniile planificrii i nvrii este o
metod care include o faz de nvare n planul propriu-zis, astfel nct
agentul poate recunoate dinamic precondiiile unei aciuni atunci cnd
strile nu sunt determinate n totalitate i i poate alege direct aciunile pe
baza rezultatelor de nvare.
Este introdus noiunea de atractor de stri, care permite agenilor s
i calculeze aciunile n funcie de proximitatea strii curente fa de cel mai
Rezumat
11
Scientific Achievements
Chapter 1
variations of this method have been more recently proposed for the control
of unmanned space or aerial vehicles (Lemaire, Alami & Lacroix, 2004;
Thomas et al., 2005). Also, an emergent allocation method for mobile robots
was proposed, where each robot uses only the information obtained from its
immediate neighbours (Atay & Bayazit, 2007).
The Extended Generalized Assignment Problem, E-GAP (Scerri et
al., 2005) studies the assignment of tasks to agents, taking into account the
agents' capacities in order to maximize a total reward. It considers dynamic
domains and interdependencies (possible constraints) among tasks. Beside
the greedy centralized approach to solving such problems, approximate
solutions have been proposed, e.g. algorithms modelling colonies of social
insects, such as SWARM-GAP (Ferreira & Bazzan, 2006).
In cooperative multi-agent systems, roles are used as a design
concept when creating large systems, and they are known to facilitate
specialization of agents. A review of multi-agent role allocation is given by
(Campbell & Wu, 2010).
A negotiation problem is one where multiple agents try to come to
an agreement or deal. Each agent is assumed to have a preference over all
possible deals. Commonly encountered quantitative solutions for the
bargaining problem are, among others, the Nash solution, and the utilitarian
solution.
Learning Models
According to a recent study (Leibowitz et al., 2010), the learning curve L(t)
has an equation relating to the number of successful and failed trials. If S(t)
is the weighted average of success and F(t) is the weighted average of
failure, then:
L(t ) A S (t ) 1 A F (t )
t
L(t ) A p( x)dx 1 A 1 p( x) dx
(1.1)
(1.2)
p(t ) p p p0 e s (t )
16
(1.3)
with 0 p0 p(t ) p 1 .
When coefficient A decreases, the early performance rates increase,
but the late performance rates decrease. When only successful trials are
considered (A = 1), the learning curve has a sigmoid shape, as shown in
figure 1.1 (Leibowitz et al., 2010).
This model was adopted here, taking into account that at the
beginning of the learning period, while discovering the basic concepts, the
effort can be great, but progress is slow. When enough knowledge has been
accumulated, at the middle of the learning curve, progress begins to rapidly
take place. At the end of the learning curve, where expert knowledge
resides, progress is again slow, because complex matters must be addressed.
Description of the Proposed Model
The following subsections (Leon, 2011a; Leon, 2011b) formalize the
definition of tasks that the agents should negotiate for, the adaptation model
of the agents, directly related to the adoption of roles and change in
productivity, and present an evolutionary approach to find a fair,
(near-)optimal task allocation at a given time.
The model can be used to simulate a self-organizing company,
where employees are encouraged to accept tasks themselves, rather than
being assigned, similar to the recommendations provided by the agile
software developing methodologies or modern management practices.
The tasks are considered to be defined by a series of p attributes. A
task has specific values of these attributes, considered to be complexity
17
levels, each within a certain domain. Let T be the set of tasks T {ti } ,
ti {c1 , ..., c p } ,
(1.4)
aj : laj aj .
(1.5)
ua (t ) uaj (c j ) ,
j 1
18
(1.6)
laj c j j
, la aj c j laj
1
j
a
c l j
uaj (c j ) 1 j j a , laj c j laj aj .
a
0, otherwise
(1.7)
(1.8)
where (T ) {} and s .
Evolutionary Approach to Determine Negotiation Outcomes
An evolutionary approach is considered for finding the negotiation
outcomes that are usually believed to be fair or desirable: the Nash solution
and the utilitarian solution.
Let X be a finite set of potential agreements. In our case X contains
combinations of disjoint sets of tasks such that all the tasks are allocated and
each task is allocated to exactly one agent:
19
where S (T ) and n | A | .
An evolutionary algorithm is used to find the desired solutions. The
encoding takes into account the specificity of the problem, i.e. each task
must appear only once in a possible allocation. Therefore, a permutationbased encoding is used. The partition of the permutation between different
agents is defined by n 1 integer genes, with values from 1 to n 1 .
Therefore, a hybrid representation is used: the first genes of the
chromosome are the split points, and the rest contains the actual permutation
of tasks, as shown in figure 1.2.
The fitness function is the product or the sum of agent utilities for a
given deal, i.e. a chromosome. The crossover and mutation operations are
different for the first part of the chromosome and the second. Normal onepoint crossover is applied to the split point section, while a modified
crossover is applied to the permutation. The procedure for the latter is as
follows: one crossover point is selected, the first part of the first parent is
directly copied into the child, and then the remaining distinct values are
copied in order from the second parent into the remaining loci of the child.
This process is presented in figure 1.3.
genes are randomly selected and interchanged. In this way, a new valid
permutation is generated, because no duplicate values appear.
The chosen selection method is the tournament selection with two
individuals, because it is fast and uses only local information. Elitism is
used, i.e. the best individual is directly copied into the next generation, in
order not to lose the best solution of the current generation. However, it was
considered that copying more than one individual would decrease the
genetic diversity needed to find a global optimal solution.
Agent Adaptation: Learning and Forgetting
Following the model described above, the knowledge of the agents traverses
a sigmoid learning curve, defined by the following equation:
L( x )
1
1 e
(1.10)
( x )
L1 ( y)
ln(1 y) ln y
(1.11)
Thus, the agents perform equal small steps on the ordinate, and these
reflect into unequal steps on the abscise:
1
j
j
L (1 ) L(la (k )) L(c j ) , la (k ) c j
l (k 1) 1
j
j
L (1 ) L(la (k )) L(c j ) , la (k ) c j
j
a
(1.12)
d d aij , i 1.. | T |
(1.13)
(1.14)
i
a
j 1
or:
j 1..|F|
The time taken for an agent to handle a specific task attribute with a
certain complexity level is computed as follows:
j
1, l c j
d aij a
j
j
1 L(c j ) L(la ) , la c j
(1.15)
Attribute values
1
2
3
4
5
6
7
8
9
10
74191
38420
70053
27270
03378
86906
99103
86760
02249
50434
figure 1.5a, and the results for the utilitarian solution are displayed in figure
1.5b.
Figure 1.5. Agent utilities over 100 repeated trials using as a negotiation
outcome: a) the Nash solution; b) the utilitarian solution
One can notice that in both situations the total utilities eventually
stabilize over some value. Using the Nash solution causes more fluctuations,
while using the utilitarian solution causes larger fluctuations earlier and a
faster convergence.
The evolution of attribute utilities of the agents over 100 repeated
trials is displayed in figure 1.6. Since the first two agents have similar initial
attribute preferences, it can be seen that the utility of Attribute 1 is relatively
equal at first and then decreases for Agent 2 while it remains constant for
Agent 1. Similarly, the utility of Attribute 2 remains relatively constant for
Agent 1 and increases for Agent 2. Both agents find new equilibrium states
where they can receive maximum utility by specializing to different types of
tasks.
24
This effect can be seen more clearly in figure 1.7, where the tasks
given to the agents have a strict specialization: only 2 attributes out of the
total 5 attributes have non-zero complexity levels.
These tasks with their attribute values are presented in table 1.2. In
this case, the evolution of agent utilities varies more drastically in order for
the agents to adapt to a stricter, more competitive environment.
25
Attribute values
1
2
3
4
5
6
7
8
9
10
80600
40900
40500
08060
05040
08040
04070
00065
00085
00084
26
30
retaining the short average path lengths characteristic of (Erds & Rnyi,
1959) model.
A well-known example is the so-called six degrees of separation
in social networks (Milgram, 1967). Another is the scale-free property of
many such networks, i.e. the probability distribution of the number of links
per node P(k) satisfies a power law P(k ) ~ k with the degree exponent
in the range 2 3 (Albert, Jeong & Barabsi, 1999). The World-WideWeb network has been shown to be a scale-free graph (Albert & Barabsi,
2002).
The generation of scale-free graphs can be performed based on two
basic rules (Gaston & DesJardins, 2005): growth (at each time step, a new
node is added to the graph) and preferential attachment (when a new node is
added to the graph, it attaches preferentially to nodes with high degrees).
Model Description
The proposed multiagent system (Leon, 2012a) is composed of a set of
agents A, which are physically distributed over a square grid. However,
this localization does not prevent agents from forming relations with any
other agents, based on the common interest of solving tasks.
Like the general model presented in section 1.1, the tasks have
attributes with complexity levels and agents have corresponding competence
levels.
The tasks are generated in the following way. First, the number of
i
i
, 1 pnn
p , is determined, by using a power
non-null attributes pnn
law or a uniform distribution:
i
P pnn
~
(1.16)
i
P pnn
~ U (1, p) .
(1.17)
i k
nn
or
When using the power law distribution, most of the tasks will have
one or two non-null attributes, therefore they will have a large degree of
specialization. Thus, agents can specialize in performing some type of tasks.
When using a uniform distribution, there will be a larger number of
31
(1.18)
M ai c 2j .
(1.19)
j 1
Pal Wa
j
l la
by 1 than the
order to reach
agent has an
probability to
the following
(1.20)
32
(1.21)
For this process to work, each agent has an equal initial amount of
money M 0 .
If the agent is fit to solve the whole task following a possible
learning phase, the agent solves it and it receives the same payment as that
described above in equation 1.19.
If the agent is unable to solve a task by itself, it seeks other agents to
solve the parts of the task it cannot handle by itself.
Interaction Protocol and Social Network Formation
The environment randomly distributes a number of tasks fewer than the
number of agents in the system: T A . Since some agents will be able to
solve their tasks, either individually or cooperatively, while others will not,
in the subsequent epochs, tasks will be given preferentially to agents that
previously succeeded more. Thus, each agent has a probability to solve
tasks, defined as:
Tas
P r,
Ta
s
a
(1.22)
where Tas is the number of tasks solved by agent a, and Tar is the total
number of tasks received by agent a.
In this setting, the agents are sorted by their probabilities to solve
tasks, Pas , and only the first T receive new tasks. Ties between agents with
the
s
a
same
Pas
are
broken
randomly.
The
initial
values
are:
r
a
33
M u' M u c 2j ,
(1.23)
M v' M v 1 c 2j ,
(1.24)
34
Figure 1.12 shows the evolution of the social network after 10, 100,
350 and 500 epochs, from left to right, top to bottom, respectively. The
35
Figure 1.13. The status of the social network after 100 epochs for
different numbers of agents: 100, 256 and 2500
36
Figure 1.13 shows the status of the multiagent system after 100
epochs, with the same parameters, when the number of agents varies (100,
256 and 2500), in order to demonstrate the scaling behaviour of the system.
One can see that the formation of global social network(s) is similar. Also,
the evolution of the system as the number of epochs increases is the same:
the agents enhance their competence and no longer need their peers, and
therefore the connections go through a gradual dissolution process.
Figure 1.14. The status of the social network after 500 epochs
when the environment is dynamic
Figure 1.15. The evolution of the social network after 100, 350 and 500 epochs
when the environment is static and the perturbation rate R = 0.5
These average values show that only a few agents have a great
number of connections, while most of the agents have one or no active
connections at all. In fact, the histogram of the number of agents having a
certain number of connections reveal a power law distribution, as shown in
figure 1.17. The ordinate axis (the number of agents) has a logarithmic
scale. The abscise axis shows the five equal size intervals ranging from 0 to
the maximum number of connections.
1000
100
10
1
1
3.5
3
2.5
2
1.5
1
0.5
0
1
101
201
301
401
40
1000
I1
100
I2
I3
I4
10
I5
1
1
I1
100
I2
I3
I4
10
I5
1
1
9 10 11 12 13 14 15 16 17
Figure 1.21. The evolution of the overall system efficiency with different
distributions for task attribute generation
The main goal of this study (Leon, 2013) is the design of simple
interaction rules which in turn can generate, through a cascade effect,
different types of overall behaviours, from stable to chaotic. We believe that
these can be considered metaphors for the different kinds of everyday social
or economic interactions, whose effects are sometimes entirely predictable
and can lead to an equilibrium while some other times fluctuations can
widely affect the system state, and even if the system appears to be stable
for long periods of time, sudden changes can occur unpredictably because of
subtle changes in the internal state of the system. We also aim at
investigating how very small changes can non-locally ripple throughout the
system with great consequences and if it is possible to reverse these changes
in a non-trivial way, i.e. by slightly adjusting the system after the initial
perturbation has occurred.
The Design of the Multiagent System
The main goal in designing the structure and the interactions of the
multiagent system was to find a simple setting that can generate complex
behaviours. A delicate balance is needed in this respect. On the one hand, if
the system is too simple, its behaviour will be completely deterministic and
predictable. On the other hand, if the system is overly complex, it would be
very difficult to assess the contribution of the individual internal elements to
its observed evolution. The multiagent system presented as follows is the
result of many attempts of finding this balance.
The proposed system is comprised of n agents; let A be the set of
agents. Each agent has m needs and m resources, whose values lie in their
predefined domains Dn, Dr +. This is a simplified conceptualization of
any social or economic model, where the interactions of the individuals are
based on some resource exchanges, of any nature, and where individuals
have different valuations of the types of resources involved.
In the present model, it is assumed that the needs of an agent are
fixed (although an adaptive mechanism could be easily implemented, taking
into account, for example, previous results (Leon, 2011a; Leon, 2011b)
reported in section 1.1), that its resources are variable and they change
following the continuous interactions with other agents.
Also, the agents are situated in their execution environment: each
agent a has a position a and can interact only with the other agents in its
neighbourhood a. For simplicity, the environment is considered to be a bidimensional square lattice, but this imposes no limitation on the general
interaction model it can be applied without changes to any environment
topology.
43
Social Model
Throughout the execution of the system, each agent, in turn, chooses
another agent in its local neighbourhood to interact with. Each agent a stores
the number of previous interactions with any other agent b, ia(b), and the
cumulative outcome of these interactions, oa(b), which is based on the
profits resulted from resource exchanges, as described in the following
section.
When an agent a must choose another agent to interact with, it
chooses the agent in its neighbourhood with the highest estimated outcome:
b* arg max oa (b) .
b a
44
pa N a (rsel )
Rb (rsel )
.
Ra (rsel )
(1.25)
u a iaadj pa pavg
iaadj 1
(1.26)
where the adjusted number of interactions is: iaadj min bA ia (b), imem ,
imem is the maximum number of overall interactions that the agent can
remember (i.e. take into account) and is the rate of utility change. At the
beginning, the utility of the agent can fluctuate more, as the agent explores
the interactions with its neighbours. Afterwards, the change in utility
decreases, but never becomes too small.
For example, if imem = 20, ua = 0.1, pa = 8.5, = 1, pavg = 7.5 and the
sum of all previous interactions is 2, the utility will change to:
ua = (0.1 2 + (8.5 7.5) 1) / 3 = 0.4. If the sum of all previous
interactions is 100, the same utility will change only to:
ua = (0.1 20 + (8.5 7.5) 1) / 21 = 0.14.
Similarly, the social outcome of an agent a concerning agent b is
updated as follows:
oa (b)
(1.27)
In this case, the social model concerns only 1 agent and thus the use
of the actual number of interactions can help the convergence of the
estimation an agent has about another.
45
after the initial period where a part of the system approaches a stable
zone. Figure 1.24 displays the behaviour of 100 agents over 10000
time steps. A simple moving average is applied here again, with a
window size of 100 time steps. One agent (with a utility value
around 3) has unpredictable great changes, although they appear to
be governed by a higher-level order of some kind. Another agent has
a sudden drop in utility around time step 9000, although it has been
fairly stable before.
We consider that the third type of behaviour is chaotic, since it
satisfies the typical features of chaos (Ditto & Munakata, 1995):
48
5
LLE = 4.83
LLE = -2.13
201
401
601
801
LLE = -12.98
-5
-10
49
Experimental Studies
A mathematical analysis of a nonlinear hybrid system is usually very
difficult. Therefore, in the following, we will present an empirical
experimental study, where we will emphasise different cases or settings
which reveal certain types of behaviour.
Since one of the characteristics of a chaotic system is that small
changes in its initial state can greatly affect the final state through a cascade
effect, we observe the influence of perturbations on the system behaviour.
We also reflect on the question of when it is possible to correct some
distortions with the smallest amount of external energy, such that, after a
perturbation, the system should reach again a desired state within a
corresponding time horizon, through small changes.
In all the case studies presented in this section, the following
parameters were used: the number of agents n = 10, the number of needs
and resources m = 10, their domains Dn = Dr = [0, 10), the resource transfer
quantum = 1, the resource exchange threshold = 5, the interaction
memory imem = 20, the utility change rate = 2, the side length of the agent
square neighbourhood is 4 and the computed average profit pavg = 7.5.
Original Behaviour
The configuration under study is composed of 3 subgraphs (figure 1.26):
one agent, A1, is isolated and cannot interact with any other agent. Two
agents, A2 and A3, form their own bilateral subsystem and seven agents can
interact with one another in their corresponding neighbourhoods. A change
in any of those agents can affect any other one in this subgraph, because, for
example, A4 can influence A7, A7 can influence A9 and A9 can influence
A10. The evolution of the agent utilities for 2000 time steps is displayed in
figure 1.27.
51
Exhaustive search with one correction point: trying all the resources
of all the agents in each step of the simulation, adding or subtracting
a small amount (e.g. 0.1, 0.5), and observing the maximum utility
52
Beside considering only the state of the system at the time horizon
(e.g. 2000 time steps), it is also important to verify if the system behaviour
continues to be desirable. Figure 1.30 shows the effect of a 1 point
correction for the situation presented in figure 1.28, which remains stable
for a test period of 100 more time steps after the initial 2000 ones. However,
if the system is chaotic, it is impossible to guarantee that this difference will
remain small forever.
53
Chapter 2
xt 1 xt vt 1
(2.1)
where at {1, 0, 1} .
Figure 2.2 presents the setting of the mountain car problem, adapted
after (Singh & Sutton, 1996; Naeeni, 2004).
Both state variables are kept in the defined range, i.e. all values
above or below the boundaries will be set to their extreme values. When the
position x is equal to the extreme left boundary 1.2, the velocity v is set to
0. The goal, the top of the mountain, is located at x = 0.5.
The problem is particularly interesting because, in order to reach its
goal, the car must gain enough kinetic energy by accelerating in alternating
directions, backward or forward. It must first drive backward, up the other
side of the valley, to gain enough momentum to drive forward up the hill. It
will therefore move away from the goal at first in order to find a working
solution. Also, the states of the problem defined by position and velocity are
continuous, real-valued, and this causes an additional difficulty for a
reinforcement learning algorithm dealing with discrete states. Finally,
because of the external factor, gravity, and the momentum of the car, the
actions may not have the same results in similar states.
58
Figure 2.3. The behavior of the agent using the nave heuristic for different
starting positions
Nave Heuristic
First, we can verify the assumptions of the problem when the agent uses a
nave heuristic, i.e. it maintains the acceleration forward (a = 1) at all times.
Figure 2.3 shows the behaviour of the system for different initial positions.
59
Figure 2.4. The behaviour of the agent using the simple heuristic
for different starting positions
60
initial position of 0.39 or greater will be enough to reach the goal directly,
using forward acceleration.
Simple Heuristic
Taking into account the characteristics of the problem, we can devise a
simple heuristic that ensures the fact that the goal is reached every time. The
heuristic tries to make maximum use of the gravitational force: the
acceleration of the car is the sign of its speed. Figure 2.4 shows the
behaviour of the system for different initial positions in this case. One can
see that even when the initial position x [0.84, 0.38] , the agent reaches
its goal after several amplifying oscillations.
Reinforcement Learning Solution
The simple heuristic presented above does not solve the problem in an
optimal manner, i.e. with a minimum number of time steps. The problem
was originally designed to be solved with reinforcement learning
algorithms, so we employ such a technique to find shorter plans for the
agent.
Model-free reinforcement learning algorithms using temporal
differences such as Q-Learning (Watkins, 1989; Watkins & Dayan, 1992) or
State-Action-Reward-State-Action, SARSA (Rummery & Niranjan, 1994)
need to discretize the continuous input states of the problem. The Q
function, used to determine the best action to be taken in a particular state, is
defined as Q : S A and is usually implemented as a matrix containing
the real-valued rewards r given by the environment in a particular state
s S when performing a particular action a A . The mountain car problem
is also difficult for a reinforcement learning algorithm because all the
rewards are 1, with the exception of the goal state where the reward is 0.
Therefore, the agent becomes aware of a higher reward only when it finally
reaches the goal.
For the following tests, a Matlab implementation of SARSA
algorithm (Martin, 2010) was used. For the initial positions where the first
approach began to fail, and also for the initial position of x = 0.5, which is
the standard start point suggested by the problem author(s), a comparison
was made in terms of the number of time steps of the solution. This
comparison is displayed in figure 2.5. In most cases, the reinforcement
learning algorithm finds shorter plans than the simple heuristic presented
before.
61
Figure 2.6. Position and velocity of the car for the initial position of x = 0.5
using simple heuristic and reinforcement learning, respectively
can notice that during the second left oscillation, the car hits the fixed wall
and its speed becomes 0. The additional steps of the solution are due to the
fact that it didnt control its acceleration better so that it could climb the left
side of the mountain only up to a position sufficient to gain enough
momentum to reach the goal. The bottom row contains the results of the
reinforcement learning algorithm.
Filtered Supervised Solution
From the Q matrix found by the SARSA algorithm, we can extract a
supervised learning dataset, so that each row of the Q matrix is transformed
into a training instance.
Let S be the matrix of states, S sij , 1 i n, 1 j m , let A be the
(2.2)
A*
(2.3)
where:
(2.4)
Figure 2.7. The percent of failed trials when the filtering factor varies
Figure 2.7 shows the percent of failed trials when the filtering factor
varies. By failed trials we mean plans in which the agent fails to reach the
goal, resulting in a continuous oscillatory behaviour. When the training
dataset is small, the information may be insufficient for the agent to learn
the solution. As additional information is accumulated, the agent begins to
use it to solve the problem more frequently. Thus the failed trials decrease
from over 50% with 27 randomly chosen training instances to less than 1%
for 243 randomly chosen training instances. Of course, all the 270 original
instances are sufficient for the agent to solve the problem every time.
Taking into account only the successful trials, we counted the
average number of steps and the minimum number of steeps needed to reach
the goal, graphically presented in figures 2.8 and 2.9, respectively.
64
One can see that although the average number of steps increases
when the agent receives more information, the minimum number of steps is
attained only with a small dataset. In order to find the optimal dataset, we
extended the test with 10000 trials only for a filtering factor of 10% for the
initial position of 0.5. From the initial 270 states used by SARSA
discretization, with additional removal of redundant states and with a further
removal of training instances where the acceleration was 0 (because we
considered that the optimal solution was to be attained only when the agent
65
actively pursues the goal, with no passive actions), the number of training
instances was reduced to 12. The best solution has now only 104 steps.
These distinctive instances are displayed in table 2.1.
Table 2.1. The selected training instances
Car position x Car velocity v
(agent state)
0.80
0.04
0.70
0.00
0.70
0.06
0.60
0.06
0.40
0.03
0.40
0.02
0.40
0.04
0.30
0.04
0.10
0.04
0.00
0.01
0.10
0.03
0.10
0.02
Acceleration a
(agent action)
1
1
1
1*
1
1
1
1
1
1
1
1
The goal of the agents is to move the 3 blocks (K1, K2, K3) to the
Goal state, in this respective order. There are 3 types of agents: A, B, C, D
(these types can also be interpreted as roles that these heterogeneous agents
can play). There is only 1 agent of type A, 2 agents of types B and C, and 3
agents of type D. The blocks can be carried only by a specific combination
of agent types: K1 can be carried only by an A together with a B agent, K2
can be carried by 3 agents of types B, C, and D, while block K3 can be
carried by a B and a D agent.
The world has 2 types of obstacles: a hard wall, displayed in dark
grey, which no agent can cross, and a soft wall, displayed in light grey,
which only an agent of type C can cross (with a penalty denoting its greater
effort).
The agents can perform 7 actions: move in the four axis-parallel
directions (Left, Right, Up, Down), Pick up an object, Put down an object,
and Wait (perform no action). An important feature is that an object can
refer both to a block and to another agent. Therefore, agents in this setting
can directly act on their peers.
Agents can move into or through a cell occupied by another agent.
They execute in a tick-based manner, all in parallel.
There are several rewards given to the agents in different situations:
1. Moving 1 square on the grid: r = 1;
68
Synchronization
A weak form of synchronization appears because two or three agents must
come to the location of a block in order to move it. A stronger, more
interesting form of synchronization is needed in order to find the solution as
quickly as possible. Thus, the moving blocks must arrive to the Goal state
one after another, if possible. This translates into a synchronization phase
between the moving blocks, such that a block should start when another
block passes through a particular, appropriate location.
Internal States
There are multiple, consecutive subgoals to be achieved by the agents (move
to a block, carry it to the Goal state, possibly return for another block, and
carry the latter to the Goal). The need to carry more than one block appears
because all three blocks need a type B agent, but there are only two type B
agents present. Therefore, one B agent must eventually carry two blocks.
Especially in this case, it is difficult to find a unique optimal policy
that makes the agent move virtually on the same trajectory in opposite
directions with different goals. It is more convenient to assume that agents
can have internal states that are triggered by certain events, such as picking
up an object or dropping it to the Goal. Thus the behaviour of the agents is
no longer based on a first order Markov process, where the current state St
depends only on the previous state St-1: P(St | St-1, ... , S0) = P(St | St-1).
Game Theoretic Analysis of Competitors Behaviour
In order to reduce the size of the state space when searching for agent
policies, we use a game theoretic approach as a heuristic or pre-processing
phase. We thus analyze what is the rational meta-action (or decision
regarding an individual subgoal) for the agents belonging to the three types
that involve competition. There is no competition in the case of type A,
because there is only one such agent.
Type B
B1 and B2 agents are located in opposite sides of the grid, next to K3 and K1
or K2, respectively. In order to minimize losses from negative rewards
associated with movement (r = 1), agents should strive in general to move
to their nearest block and get the corresponding reward of 100. One can
notice from figure 2.10 that B2 is closer to K2 than to K1. This leads to the
70
short-term analysis presented in table 2.2. The choice regarding which block
to pursue is equivalent to a game in normal (or strategic) form where the
utilities are represented by the sum of rewards received by each agent. For
clarity and simplicity of computation, we will consider that the discount
factor = 1.
Table 2.2. Short-term utilities of B type agents
B1
K1
K2
K3
K1
16, 96
86, 96
96, 96
B2
K2
84, 98
14, 98
96, 98
K3
84, 82
86, 82
96, 18
The cells of table 2.2 show the utilities received by B1 and B2,
respectively, while following all the combinations of the 3 actions: pursuing
block K1, K2, or K3. The calculation of utilities is based on the reward
model described above. For example, in the first cell (16, 96), we consider
the situation where both agents pursue block K1. Agent B2 will reach it first,
after 4 moves, and on completion, it will get a reward of 100. Therefore, it
will get a corresponding utility of 4 ( 1) 100 96 . Agent B1 will arrive
later and thus get a utility of only 16, for moving 16 squares. According to
the Nash equilibrium analysis, columns K1 and K3 are dominated, because
B2 will get a utility of 98 by choosing K2, rather than getting 96 by choosing
K1, or 82 or 18 by choosing K3. If B2 is rational, it will always choose K2,
irrespective of what B1 does. Therefore, B1 should assume this and try to
maximize its own reward under these circumstances. The best B1 can do
when B2 chooses K2 is to choose K3. In this case, B1 will receive the
maximum reward out of the three possibilities: 84, 14, or 96.
Thus, the pure Nash equilibrium of this game is for B1 to get block
K3 and for B2 to get K2. The corresponding cell is marked in bold.
However, this does not take into account the fact that since block K1
is not taken, the overall problem cannot be solved. Also, it is easy to see that
the B agent that gets block K1 will also have a chance to get the third block,
because it will deliver the block to the Goal and will be free sooner than the
second agent.
The long-term analysis of the situation is presented in table 2.3. The
dominated strategies where an agent can have a negative utility were
marked with . Also, the strategy that was previously the Nash
equilibrium is now marked with x, because it fails to solve the problem,
and thus it is dominated by the strategies leading to the final reward of 5000.
The utilities presented in table 2.3 do not include this final reward.
71
B1
K1
K2
K3
K1
152, 331
178, 307
K2
319, 164
K3
Similar to the computations for table 2.2, the resulting utilities are
the sums of the rewards for moving on the shortest path to the targets and
for picking up the targets. Finding the optimal path is a different problem
which can be solved itself by reinforcement learning if the environment is
initially unknown. In order to simplify the solution and concentrate on
higher level issues, we assume that the environment is accessible and
discrete, and the agents have means to compute shortest paths, e.g. by
applying the A* algorithm (Hart, Nilsson & Raphael, 1968).
This game has two Nash equilibria, marked in bold. The subgame
formed by strategies (K1, K3) for B1 and (K1, K2) for B2 has a mixed Nash
equilibrium where the agents can stochastically choose either actions with
probabilities P(B1, K1) = 0.65 and P(B2, K1) = 0.64.
In the following, for our case study, we will consider the pure
equilibrium (178,307), because if we assume a cooperative behaviour, both
the sum of utilities (utilitarian solution) and their product (Nash solution)
are greater than for the (319,164) equilibrium. In this case, agent B1 gets
block K3 and then waits until block K2 is moved to the Goal. The optimal
strategy for B2 is to get K1, move it to the goal, then return for block K2,
and also carry it to the Goal.
Type C
Table 2.4 presents the utilities of the two C type agents when the subgoals
are either picking up block K2 or picking up agent A. Of course, since
agents can pick up any object in their environment, there are many other
possible subgoals. However, we chose to analyze these ones because only
they can contribute to a solution to the problem. The other meta-actions are
irrelevant and could only increase the number of time steps needed to reach
the overall goal. We also included the Wait action for C2, because otherwise
the Nash equilibrium would force it to accept a negative utility.
In this case, the pair of strategies (Move to K2, Wait) is the pure
Nash equilibrium of the game. This would also mean that no block would be
taken to the Goal, because agent A cannot reach K1, and C2 has no incentive
to help it.
72
A
K2
C1
A
9, 13
92, 13
C2
K2
9, 88
92, 12
Wait
9, 0
92, 0
However, in the long run, C2 can benefit from the completion of the
joint task, therefore, its dominant strategy becomes Move to A.
Type D
Table 2.5 presents the normal form of the game played by D agents.
In this case, the Nash equilibrium requires D1 and D3 to move to their
nearest blocks, and D2 to Wait, because if the other two agents behave
rationally, it cannot pick up any of the two blocks K2 or K3. The three tables
can be aggregately viewed as a 3D matrix, with one axis corresponding to
one of the three agents involved. The utilities displayed in the cells are in
the following order: u ( D1 ), u ( D2 ), u ( D3 ) .
The game has a pure Nash equilibrium point, marked in bold.
Table 2.5. Utilities of D type agents
D2 K2
D1
K2
K3
D3
K2
(94, 10, 13)
(88, 90, 13)
K3
(94, 10, 95)
(12, 90, 95)
D2 Wait
D1
D2 K3
K2
K3
D1
K2
K3
D3
K2
(94, 94, 87)
(12, 94, 87)
K3
(94, 6, 95)
(88, 6, 95)
D3
K2
(94, 0, 13)
(88, 0, 87)
K3
(94, 0, 95)
(12, 0, 95)
necessarily on the current state, but on the nearest state with a defined
action. In our case, in order to develop individual policies that would
perform well when combined in the multiagent system, we resort to the use
of state attractors, i.e. states when an action is defined, such that an action
in another state is computed as the action of the closest state attractor. The
state attractors can be viewed as the centre points of a Voronoi diagram.
When an agent enters the corresponding region, it will follow only the
action specified by its centre.
A genetic algorithm was used to discover the state attractors.
Separate searches were performed for every agent when moving
individually. When several agents are jointly involved in moving a block to
the Goal state, a single joint policy was searched for.
F ( ) t R( S ;t ) ,
t 0
74
(2.5)
where is the policy resulting from the actual execution of the agent
guided by the state attractors, and N is the number of steps to the terminal
state, N 200. R is a function giving the reward in state S.
When two chromosomes have the same fitness value, the shorter
chromosome is preferred in the tournament. Thus, the GA will try to find
the smallest number of state attractors that define an optimal or near-optimal
policy.
Simple State Attractors
The state attractors for the agents involved in reaching the first subgoals (i.e.
reaching a block) found by the GA are displayed in figure 2.12. The dotted
lines show the actual trajectories of the agents.
Joint State and Double State Attractors
As mentioned above, when two or three agents move a block, the GA finds
a unique policy for the block movement. This is shown in figure 2.13 where
state attractors have names such as K1, K2, or K3. Also, figure 2.13 shows
the trajectory of B2 when returning from the Goal state in order to get block
K2.
75
Figure 2.13. Joint state and double state attractors and agent trajectories
The double state attractors are more interesting. In this case, the
movement of a block also takes into account the position of another. In this
way, the system can exhibit a means of synchronization.
Since block K3 must arrive to the Goal just after K2, so that the time
to completion should be minimized, its state attractors also take into account
the state of K2. This is achieved by slightly modifying the encoding of the
chromosomes to have two positions instead of one, and the crossover being
executed onto sets of genes, multiples of 5.
The state attractors in this case are displayed in table 2.6.
Table 2.6. Double state attractors for joint movement of block K3
Line K3
1
1
8
9
9
Column K3
8
8
8
8
9
Line K2
5
5
9
9
9
Column K2
4
5
9
9
9
Action K3
Wait
Down
Down
Right
Put Down
76
Number of
attributes
Type
and-sym
xor-sym
drugs
iris-d3
iris-d5
iris-num
max
monks1
monks2
monks3
shepard1
shepard2
shepard4
shepard6
sin
weather
weather-sym
problem1
problem2
problem3
2
2
3
4
4
4
2
6
6
6
3
3
3
3
1
4
4
3
2
2
C
C
C
C
C
C
R
C
C
C
C
C
C
C
R
C
C
R
R
R
Number of
training
instances
4
4
12
150
150
150
6
124
169
122
8
8
8
8
40
14
14
1352
676
169
Number of
testing
instances
4
4
0
0
0
12
3
432
432
432
0
0
0
0
0
14
0
0
0
0
prescribe one of the two classes of drugs, according to the patients age,
blood pressure, and gender. monks{1-3} are the three classical MONKs
benchmark problems (Thrun et al., 1991). The shepard{1,2,4,6} are
problems proposed while studying human performance on a category
learning task (Shepard, Hovland & Jenkins, 1961). weather and weathersym are variants of the golf-playing problem (Quinlan, 1986) where data
values are symbolic or numerical, and only symbolic, respectively.
Finally, in order to further study the regression capabilities, three
functions were devised, with randomly generated samples. The problems are
defined as follows:
f1 ( x, y, z) sin(x) cos( y)
f 2 ( x, y) sin(x) cos( y)
z,
sin( y)
,
2 cos(x)
f 3 ( x, y) sin( x) cos(x y) ,
(2.6)
(2.7)
(2.8)
where the definition domains of x and y are the same for all: x, y ,
and z 2, 2 .
Behaviour of Individual Agents
The multiagent system is implemented using the JADE framework
(Bellifemine, Caire & Greenwood, 2007) and thus it can be easily
distributed. It is comprised of a varying number of agents that apply one of
three different algorithms to train their neural networks: Backpropagation
(Bryson & Ho, 1969; Werbos, 1974; Rumelhart, Hinton & Williams, 1986),
Quickprop (Fahlman, 1988) and RProp (Riedmiller & Braun, 1993).
The agents are given a number of tasks, consisting in repeated
versions of the 20 classification and regression problems described earlier,
problems that appear in a random order.
Before testing the overall behaviour of the multiagent system, the
problems were considered separately, so that their complexity could be
estimated in terms of execution time while executing the training
algorithms.
Since finding the optimal topology of a neural network is a difficult
problem, the agents successively try increasingly complex network
configurations. The performance of the algorithms depend on the Mean
79
Square Error (MSE) desired by the user, and also on the number of training
epochs. In this scenario, the agents need to find a MSE of 0.001 or lower,
and run the algorithms for 500 epochs. The MSE can be higher for
classification problems, if the percent of correctly classified instances is
100% for both training and testing.
They first use a network topology of 1 hidden layer with 10 neurons.
If this configuration proves to be too simple to learn the model, and the
performance criteria are not met, they use a network topology of 1 hidden
layer with 20 neurons. The third attempt is a topology of 2 hidden layers
with 15 and 5 neurons, respectively. Finally, they use a topology of 2 hidden
layers with 30 and 10 neurons, respectively.
These particular values were considered sufficient for most learning
problems, based on authors previous experience with neural network
models on real-world problems (Curteanu & Leon, 2006; Leon, Piuleac &
Curteanu, 2010). Since the process of building a proper neural network is
meant to be automated and the agents are also required to solve the
problems as quickly as possible, and therefore they do not have the option
of an extensive search for the optimal topology, these 4 variants, although
not guaranteed to be optimal, should be representative for the complexity of
the learning problems.
The ratio between the number of neurons in the first and the second
hidden layer is 3:1, following the heuristic proposed by (Kudrycki, 1988).
In this approach, the number of epochs is rather small and the target
MSE is rather large. However, these values can emphasize the difference in
convergence speed of the training algorithms. Figure 2.14 displays the
80
81
U C 100 PICI .
(2.9)
If the problem has both training and testing data, then the utility is
given as a weighted sum between the percent of incorrectly classified
instances for training and testing, respectively, in order to emphasize the
importance of generalization:
U C 100
training
ICI
testing
2 PICI
.
3
(2.10)
For regression problems, the received utility takes into account the
ratio between the training MSE achieved by the agent and the maximum
allowed error, which is 0.001 in our case:
R
MSEtraining
MSEmax
(2.11)
The utility decrease Ud follows the shape given in figure 2.15. The
utility of a solution for a regression problem is computed by the following
equation:
U R 100 U d .
(2.12)
Figure 2.15. The decrease in utility as a function of the MSE ratio (R)
Similar to the classification case, if the problem also has testing data,
the formula becomes:
82
U R 100
training
d
2 U dtesting
.
3
(2.13)
Case Studies
In this section, we study the overall performance of the agents in terms of
received utility. First, we consider the simple case when agents accept tasks
automatically. Then, we try to optimize their behaviour by allowing them to
learn from their previous experience, so that they can only accept tasks that
they believe would yield them a greater utility. We emphasize a surprising
result of the emergent behaviour of the agents. Finally, we analyze the way
in which agent performance scales when the number of agents and tasks
varies.
Non-Adaptive Agents
In order to analyze and later optimize the utilities received by the agents, we
establish a reference result, when all the agents accept any given task,
provided that they are available.
The total utilities received by the agents are displayed in figure 2.16.
The last 3 bars represent the average utility received by each agent type:
Backpropagation agents, Quickprop agents, and RProp agents.
83
84
the Backpropagation agents receive an average total utility greater than that
of Quickprop and even RProp agents.
Figure 2.18. Box plot summarizing the utilities of agent types when
Backpropagation agents use (adaptive BP) or do not use (non-adaptive BP)
a neural network to accept or reject tasks
85
Scaling
Finally, we address the way in which the system scales, by analyzing the
evolution of the average utility for each type of agents when the number of
agents varies and the number of tasks is constant: 5000 (figure 2.19a), and
when the number of tasks varies and the number of agents is constant: 30,
with 10 agents of each type (figure 2.19b). The figure displays the adaptive
BP (A) and non-adaptive BP (NA) cases for the three types of agents
involved.
When the number of agents varies, it can be noted that the average
utility decreases, because the tasks are divided among more solving entities.
When the number of agents grows, the difference between the types of
agents becomes less important. When the number of tasks increases, the
average utility received by agents increases in an approximately linear
manner.
86
Chapter 3
0 1 ,
(3.1)
1.
(3.2)
ci 1
1
2
2
2
... n
... n
(3.3)
where j j 1, j 1...n .
Such a chromosome can simultaneously represent 2 n dimensions
(Vlachogiannis & Lee, 2008) since its state is:
87
(3.4)
where:
a00...0 1 2 ... n
a00...1 1 2 ... n
a11...1 1 2 ... n
(3.5)
88
89
1/ d of the correct optimal one for all inputs (Cramton, Shoham & Steinberg,
2006).
Multi-Attribute Combinatorial Auctions
While auctions are an excellent form of price discovery, there are usually
other aspects in addition to price that can affect the value of the outcome. In
such cases, if the auction process focuses competition on price only, the
buyer is forced to ignore these other aspects or to handle them outside of the
auction by subsequent contractual arrangements. An effective multi-attribute
auction mechanism should possess the following two characteristics: the
buyer must be able to effectively assess the value of a multi-attribute bid,
and a bidder must be able to effectively bid on several attributes
simultaneously (Chen-Ritzo et al., 2005).
Many authors have studied multi-attribute auctions, with a focus to
bidding languages (Bichler, 1998), applications to outsourcing (Mishra &
Veeranmant, 2002), government procurement for the US Department of
Defense by using a model of multi-dimensional auction, including attributes
such as price and quality (Che, 1993). A review of multi-attribute
combinatorial auctions can be found in (Xie, Li & Sun, 2004).
Formalization
In this section, the combinatorial auction model used in this study (Leon,
2012b) is presented. Let G be the set of goods or items to be auctioned, let B
be the set of bids, and let A be the set of attributes. Each bid Bi has a value
for each attribute:
v( Bi ) 1 ,..., A ,
(3.6)
90
max xi vk Bi ,
(3.7)
i 1
B
i
(3.8)
i j
1,
(3.9)
(3.10)
cos( ' )
cos( ) cos( )
(
)
sin( ' )
sin( ) sin( ) .
(3.11)
0 1
U
.
1
0
(3.12)
determined which gene will have the lowest or highest impact on the fitness
of a chromosome. Therefore, a randomly selected gene with the value of 1
in the measured individual is set to 0, thus decreasing the number of
violated constraints, until the chromosome comes to satisfy all the
constraints of the problem.
Case Studies
Combinatorial Auction Scenarios
In order to create combinatorial auction problems, the CATS software
(Leyton-Brown et al., 2011) was used, which is a generator of combinatorial
auction instances.
Three single-attribute problems were generated following the CATS
distribution L2, where the number of goods is chosen according to a random
distribution, and the price is chosen according to a linear random
distribution, with different degrees of complexity: a small one (with 5 goods
and 30 bids), a medium-sized one (with 10 goods and 200 bids), and a big
one (with 50 goods and 1000 bids).
In order to transform them into multi-attribute problems, some
processing was performed. The three resulting problems have three
attributes, computed as follows:
93
Performance Metrics
Unlike in the single-objective case, the performance of multi-objective
algorithms is harder to compare. Since the solution is actually a set of nondominated vectors, usually of different cardinalities, it is not straightforward
to assess the quality of a solution in terms of a single real number.
There are two main approaches that try to overcome this difficulty
(Groan, 2003). One focuses on the convergence of the solution to the real
Pareto front (if known, in the ideal case). The other concentrates on the
diversity of a solution, often by measuring how far apart the solution vectors
are from one another.
In this study, a performance metric inspired by the R2 convergence
metric in (Hansen & Jaszkiewicz, 1998) is used, which takes into account
the expected value of a solution given all possible utility functions.
This proposed metric is a weighted sum of partial fitness function
values, averaged over all possible values of the weights:
U (S )
sS aA wW
ua ( s)
p( w) dw
u amax ( s)
,
S
(3.13)
Tables 3.1 and 3.2 show the results obtained by NSGA-II and
QIEA-SSEHC, respectively, for the first problem.
Table 3.1. The results of NGSA-II for the small problem
with 5 goods and 30 bids
Pop. Number of Average Best Average number
size generations utility
utility
of solutions
20
100
0.7183 0.7442
9
100
100
0.7604 0.7905
11.5
20
1000
0.7669 0.7929
10.5
100
1000
0.7910 0.7929
12.1
Even if the best results are close, one can see that QIEA-SSEHC
performs better even with a small population and a small number of
generations. In all four configurations, the average results of QIEA-SSEHC
are better than those of NGSA-II, and also the solution diversity is greater.
Since QIEA-SSEHC converges faster, there is no further improvement of
solution diversity when the population size and the number of generations
are increased.
Tables 3.3 and 3.4 show the results obtained by NSGA-II and
QIEA-SSEHC, respectively, for the second problem.
Table 3.3. The results of NGSA-II for the medium-sized problem
with 10 goods and 200 bids
Pop. Number of Average Best Average number
size generations utility
utility
of solutions
20
100
0.4873 0.5167
16
100
100
0.5380 0.5570
22.5
20
1000
0.5687 0.6128
17.8
100
1000
0.6209 0.6794
27.2
96
97
98
(3.15)
xi ximin
.
i arcsin max
min
x
x
i
i
(3.16)
i arccos
or (Zhao et al., 2009):
cos( i1 )
ci
sin( i1 )
cos( i 2 )
sin( i 2 )
...
...
cos( in )
sin( in )
U * maxU i
i 1
100
U i uij
j 1
The weights of the agents for the issues wij, are given (known), and so
are the issue values vj. We need to find the divisions dij, which maximize the
product of total utilities.
For example, let us consider the situation with 2 agents and 2 issues
presented in table 3.7. The weights in a row must sum up to 1.
Table 3.7. Agent weights for the 2 x 2 problem
Issues
Agents
A1
A2
I1
I2
Total
0.4
0.5
0.6
0.5
1
1
I1
2
I2
1
101
For the general case, with n agents and m issues, finding the optimal
division requires a general optimization method. In this study, we use and
compare several varieties of quantum-inspired evolutionary algorithms.
Table 3.9. Optimal issue divisions for the 2 x 2 problem
Issues
Agents
A1
A2
Total
I1
I2
0.125
0.875
1
1
0
1
Total
utility
0.7
0.875
0.6125
102
d
i 1
ij
1, j 1...m .
It must be mentioned that none of the three QIEA variants are actual
quantum algorithms meant to be executed on a quantum computer. They are
sequential algorithms for traditional computing architectures, but include
ideas from quantum computing.
103
Case Studies
The Problems
In the case studies, we will use 3 negotiation problems, with increasing
levels of difficulty. The first one is the 2 x 2 problem presented above. Its
exact Nash solution corresponds to a utility product of 0.6125.
The second is a 3 x 3 problem, with the agent weights displayed in
table 3.11 and equal issue values presented in table 3.12. The optimal
solution corresponds to the divisions from table 3.13, which lead to a utility
product of 96.026, as shown in table 3.14.
Table 3.11. Agent weights for the 3 x 3 problem
Issues
Agents
A1
A2
A3
I1
I2
I3
0.2
0.7
0.5
0.3
0.1
0.1
0.5
0.2
0.4
I1
10
I2
10
I3
10
I1
I2
I3
0
0.76
0.24
1
0
0
0.35
0
0.65
Total utility
4.75
5.32
3.80
96.026
The Results
Table 3.15 presents the best results of Monte Carlo sampling, with different
numbers of samples for each problem. In order to be more intuitive, the
results are presented as percents of the optimal result. One can see that this
method only works for small problem sizes. For the third problem, it simply
cannot give a useful result at all, even with 10 million samples. In this case,
the best solution provided is only 2% of the actual optimal solution.
Table 3.15. The results of Monte Carlo sampling
100,000
1,000,000
10,000,000
samples
samples
samples
value
% U*
value
% U*
value
% U*
2x2
0.612391 99.9822 0.612488 99.9980 0.612499 99.9999
3 x 3 86.819573 90.4126 88.107870 91.7542 91.482164 95.2681
10 x 10 0.018156 1.8753 0.020774 2.1456 0.023302 2.4067
Problem
Population
size
20
100
50
100
Average
fitness (%)
99.9675
99.9063
99.9041
99.9917
Best
fitness (%)
99.9933
99.9944
99.9758
99.9993
One can notice that for simple problems the algorithm performance is
quite good. However, for the 10 x 10 problem, even with a high number of
generations (1000) and a large population size (100), the results do not come
close to the optimal value. The best fitness is only around 60% of the known
optimum.
105
Population
size
20
100
50
100
Average
fitness (%)
99.6662
99.1343
98.1748
99.9339
Best
fitness (%)
99.9349
99.4662
99.6375
99.9749
Population
size
20
100
50
100
Average
fitness (%)
34.164
14.454
11.593
56.656
Best
fitness (%)
40.713
16.746
12.895
59.604
QIEA1
In order to evaluate the performance of the quantum-inspired algorithms,
several experiments were made while varying the parameters: the number of
generations, the population size and the mutation rate. Table 3.19 displays
the results in terms of average fitness and best fitness.
One can see that the results improve as the number of generations
increases. With 1000 generations, the best fitness comes very close to the
optimum value. Also, the accuracy of the results increases with the
population size, which ensures a greater diversity. The mutation rate must be
larger compared to the one usually used in a classical EA, but if it is much
greater (e.g. 25%), the results are affected by randomness and their quality
decreases. Therefore, an acceptable mutation rate was found to be around
10%.
The results in italics were repeated for convenience. The best results
in each group are marked in bold. Finally, the results with the best
combination of parameters are presented.
Table 3.20 shows the results of the algorithm with different
parameters when applied to the 3 x 3 problem, and Table 3.21 displays the
corresponding results for the 10 x 10 problem.
106
Population
size
20
20
20
Mutation
rate
0.05
0.05
0.05
Average
fitness (%)
93.97
95.49
96.22
Best
fitness (%)
96.3037
96.4899
99.3993
100
100
100
20
50
100
0.05
0.05
0.05
95.49
96.25
97.74
96.4899
96.9849
99.1758
100
100
100
50
50
50
0.05
0.10
0.25
96.25
97.60
97.11
96.9849
98.2241
97.1142
1000
100
0.10
99.06
99.5515
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
54.73
66.27
63.87
65.44
Best
fitness (%)
72.16
73.57
72.73
84.74
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
0.927
1.199
1.150
1.092
Best
fitness (%)
1.088
1.255
1.387
1.277
It is evident that, despite the fact that this variant of the QIEA is very
fast, the results are inferior even to those of the Monte Carlo simulation.
QIEA2
Since the first problem is quite simple, in the following we will focus on the
results of the other two problems. Table 3.22 presents the results of QIEA2
on the 3 x 3 problem, and table 3.23 presents the results for the 10 x 10
problem. One can see that the solutions are better than in the previous case,
107
but especially for complex problems, they are still not satisfactory (the best
fitness is only around 4% of the optimal one).
Table 3.22. The results of the QIEA2 for the 3 x 3 problem
Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
71.72
80.83
77.05
80.56
Best
fitness (%)
87.38
85.34
85.12
88.21
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
3.304
3.529
3.112
4.031
Best
fitness (%)
3.744
3.950
4.088
4.168
QIEA3
Table 3.24 presents the results of the third, improved variant of QIEA on the
3 x 3 problem, and Table 3.25 presents the results for the 10 x 10 problem. It
is clear that by the modifications applied to the quantum-inspired algorithm,
the quality of the solutions is now much better. For the 3 x 3 problem, good
results are obtained even with a small configuration (50 individuals and 100
generations), which leads to a very good execution speed.
Table 3.24. The results of the QIEA3 for the 3 x 3 problem
Number of
generations
1000
100
100
1000
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
Average
fitness (%)
99.9988
99.9703
99.9721
99.9993
Best
fitness (%)
99.9997
99.9977
99.9991
99.9997
Population
size
20
100
50
100
Mutation
rate
0.05
0.05
0.10
0.10
109
Average
fitness (%)
81.157
58.571
48.728
91.381
Best
fitness (%)
83.591
63.036
51.733
92.787
Chapter 4
Problem 1
Training Validation
0%
17.65%
0%
11.76%
10.71%
20.69%
8.33%
17.24%
0%
28.57%
0%
17.85%
13.09%
17.24%
0%
13.79%
0%
13.79%
0%
17.24%
Problem 2
Training Validation
0%
10.71%
0%
7.14%
12%
32.0%
12%
34.61%
0%
32%
0%
32%
16%
30%
0%
26.92%
0%
23.07%
0%
26.92%
rather slow.
Although the neural network method yields the best results and the
prediction phase is very fast, the process of finding the best network may be
difficult.
The Bayesian classifier generalizes well, because it is based on the
estimation of the probability distribution of data. Unfortunately, it has high
error rates in both situations.
An interesting approach is the nearest neighbour paradigm,
especially k-NN, which proves to be a good choice for both problems. Like
the neural networks, it doesnt provide a structure for the data. The speed
and simplicity of the learning process is counterweighted by the prediction
phase, which must employ a search process through the already learned
instances in order to find the desired class.
The prediction of the mesophase occurrence with machine learning
methods, as well as the choice and the codification (numerical and nominal)
of different sets of parameters which characterize the structure and the
behaviour of the studied copolyethers represent a new approach in the field.
In the paper summarized above the liquid crystalline behaviour for a
series of copolyethers with mesogene groups in the main chain has been
investigated. It has been shown that for the studied copolyethers the neural
networks prove to be efficient classification tools.
In the second selected article (Leon, Lisa & Curteanu, 2010), the
case study focuses more on other classification algorithms for the crystalline
property prediction rather than neural networks. An original classification
algorithm is also used with accurate prediction results. It extends the
nearest-neighbour methods by allowing the training instances to be grouped
into hyper-rectangles that are treated unitarily during the prediction phase,
which can greatly improve the speed of the classification, and also by
computing information about the prototype of mode value of the attributes
of such hyper-rectangles, which help to give additional information of an
instance regarding how relevant it is to a class, and thus allowing a fuzzy
logic interpretation of the results.
We used an organic compounds database with 390 records which
includes a wide variety of compounds: bis and tris phenyl aromatic, azo
aromatic, azomethinic types containing connecting groups in the rigid core
as azo, azomethine or double bond. We report a new approach to predict the
liquid crystalline behaviour for these compounds, using neural networks and
classification algorithms.
A performance analysis for several well-known classification
algorithms is made. The error is calculated both for the training set, to
determine how accurately a certain algorithm can build a model of the data,
113
and for a validation set, to obtain the prediction capability of that model.
The algorithms presented here belong to two important classes of
classification methods: eager learners and lazy learners. Eager learners use
many computational resources in the first step to build the actual model, and
then the prediction is easy; we chose different decision tree inducers for this
class: C4.5, Random tree, Random forest and REPTree (Esposito et al.,
1999). Lazy learners, on the other hand, build a very simple model and most
of the processing is made in the second step, for prediction; for this category,
we chose a set of classifiers based on the nearest neighbour paradigm:
simple Nearest Neighbour, k-Nearest Neighbour and Non Nested
Generalized Exemplars with Prototypes.
As stated before, instance-based learning reduces the learning effort
by simply storing the examples presented to the learning agent and
classifying the new instances on the basis of their closeness to their
neighbours, i.e. previously encountered instances with similar attribute
value. Experimental results showed that not all the elements of a category
are processed in the same way; some elements are considered more typical
of one category than others are. This is known as the prototypical effect
(Rosch, 1975). For a European, an apple is more representative for the fruit
category than a coconut. Cognitive psychology gives the prototype two
possible interpretations (Medin, Ross & Markman, 2005). It can represent
one or more real exemplars, which appear most frequently when human
subjects are asked to exemplify a category. The second approach considers
the prototype to be an ideal exemplar that sums up the characteristics of
more members of the category.
The NNGE model (Martin, 1995) was extended in order to include
prototype information about each rule or hyper-rectangle. Since a
generalized exemplar contains several instances, it is not necessary for the
statistical average of those instances to match the centre of the
corresponding hyper-rectangle. The prototype may differ from the hyperrectangle geometric centre. The proposed Non-Nested Generalized
Exemplars with Prototypes, NNGEP (Leon, 2006) is an incremental
algorithm, so the prototypes can also be computed incrementally. Adding a
new instance to a hyper-rectangle is a particularization of the general case of
merging two Gaussian prototypes, with means 1 and 2 and standard
deviations 1 and 2 respectively. We understand by merging computing
the mean and standard deviation for a new Gaussian prototype that
would have resulted if all the instances of the two source prototypes had
been added to it from the start.
This helps computing the similarity between a given instance and the
closest generalized exemplar. Although classification can be based on
114
similarity, the two notions are not identical. Even if an instance has been
assigned a certain category due to an existing rule, it may not be
representative of that category. An important advantage of the model built
by NNGEP is that instances have graded membership to categories, which
permit a fuzzy interpretation using fuzzy numbers with multidimensional
Gaussian membership functions.
Using this fuzzy interpretation, if an instances membership in a rule
(hyper-rectangle) is less than a specified threshold, the instance can be
included into a different rule, or form a different rule by itself.
The above procedure can only be applied to numeric attributes.
When attributes are symbolic or discrete, their mean value cannot be
computed. Instead, the mode of the set of attribute values is computed, i.e.
the value that occurs most frequently in the given set. The distance on a
certain dimension between an instance and a prototype is either 0 if the
instance attribute value on that specific dimension is the same as the mode,
or 1 otherwise.
There are 5 inputs to our problem: length of the rigid core (Lrig.),
length of the flexible core (Lflex), molecular weight (M), ratio of molecular
diameter and total length (S) and compound class (classcomp) (1 6).
Concerning the liquid crystal behaviour, we have coded the possibility to
generate a mesophase with 1 and the crystalline or amorphous phases with
0. This is the symbolic output of the model.
A comparison between the algorithms, taking into account the best
error found, is displayed in figure 4.1.
20
18
16
14
error
12
Training
10
Prediction
8
6
4
2
0
Nearest k-Nearest
Neighbor Neighbor
C4.5
NNGEP
33%
20%
10%
Nearest Neighbour
k-Nearest Neighbour
C4.5
Random Tree
Random Forest
REPTree
NNGEP
84.615
85.271
83.761
83.721
84.496
81.395
85.271
90.67576
88.43355
86.09706
90.13394
90.60364
87.20127
91.07333
95.33788
90.86627
87.89403
95.06697
95.30182
91.66764
95.53667
116
Lflex
9.21
9.22
9.22
9.22
9.23
9.23
9.21
9.21
9.21
9.21
9.21
9.21
25.5
20.98
6.22
8.77
8.9
16.62
6.39
9.94
20.61
11.69
17.24
15.2
0.08
0.09
0.19
0.16
0.16
0.11
0.18
0.15
0.10
0.14
0.11
0.12
463
439
270
298
296
381
266
310
439
360
431
404
LC
experimental
0
0
1
1
1
0
0
1
1
1
0
0
LC
network
0
0
0
1
1
0
0
1
1
0
0
0
The data in table 4.3 show the efficiency of the neural model which
has a probability of a correct answer of 83.33% compared to validation data
set (which represents 10% from the data set of compounds 2). Cells marked
in grey represent wrong predictions of the networks. This percent can not be
compared with the results from table 4.2 because it is limited to a single
class of compounds. For the whole database, the probability is less than
75%.
Although neural networks are a very popular classification tool in
many domains, according to our study, they yielded the worse results for
117
our particular problem. In this case, the best predictions were given by our
original algorithm NNGEP, along with perfect accuracy on the training set.
It thus combines the good performance of the classic instance-based
methods with the lower memory requirements due to its hyper-rectangles
and the ease of the interpretation of its explicit model in the form of rules.
118
The inputs of the model are the recent values of the time series,
depending on the size of the sliding window s. Basically, the stack model
predicts the value of the time series at moment t depending on the latest s
values:
yt f ( yt 1 , yt 1 ,..., yt s ) .
(4.1)
The neurons in the hidden layers of both networks that compose the
stack are normal multiplayer perceptron (MLP) neurons, with bipolar
sigmoid, or hyperbolic tangent, activation functions.
The difference between the two neural networks lies in the output
layer of the individual neural networks that compose the stack. The first
network, represented at the top in figure 4.2, has the same activation
function, the bipolar sigmoid in the output layer. The second network has an
exponential function instead.
Each network respects the basic information flow of an MLP, where
the values represent the thresholds of the neurons:
(4.2)
(4.3)
119
(4.4)
120
In figure 4.5, the effect of the weights on the mean square error of
the stack on the testing data is displayed. One can see that the optimal
weights are w1 = 100 and w2 = 0, where w1 is the weight of the neural
network with sigmoid activation functions and w2 = 1 w1 is the weight of
the second neural network, whose output neuron has an exponential
activation function.
Table 4.4 shows the errors both for the training and for testing. It
separately presents the mean square errors for the first, sigmoid network, for
the second, exponential network, and for the stack, respectively. Since the
range of the datasets are very different, we also display the MSE on the
normalized data between 0 and 1, in order to better compare the
performance of our model on data with different shapes.
121
Table 4.4. The errors of the model for the sunspot data
Training
Testing
Sigmoid output NN
Exponential output NN
Stack
Sigmoid output NN
Exponential output NN
Stack
122
Training
Testing
Sigmoid output NN
Exponential output NN
Stack
Sigmoid output NN
Exponential output NN
Stack
123
UK Industrial Production
This data series contains the index of industrial production in the United
Kingdom, from 1700 to 1912 (Janacek, 2001). We first consider the
performance of the model on the training set with a window size of 5 points
and 21 points ahead, as displayed in figure 4.9. The forecasting capabilities
of the model are shown in figure 4.10.
The evolution of the mean square error of the stack on the testing
data is displayed in figure 4.11, as a function of the sigmoid NN weight, w1.
The optimal weights are w1 = 0 and w2 = 100. Table 4.6 shows the mean
square errors obtained for a window size of 5.
124
Training
Testing
Sigmoid output NN
Exponential output NN
Stack
Sigmoid output NN
Exponential output NN
Stack
125
outperform a single, best trained network for the following reasons (Tian,
Zhang & Morris, 2001):
making successive trials it is very difficult to test the whole domain for the
values of the stack weights.
Our technique generates a stack formed of three multilayer
perceptrons. One must emphasize the fact that the presented method can be
easily extended to a larger number of individual networks aggregated into a
stack, and also to building heterogeneous stacks, formed of different types
of neural networks. Moreover, like any modelling methodology based on
neural networks, it can be used for different processes and systems. The
synthesis of hydrogels based on polyacrylamide is considered a good
example due to at least two reasons: the complexity of the process and the
lack of knowledge concerning the physical and chemical laws related to it
and the consistency of the available experimental data.
Results and Discussion
The predictions of the yield proved to be easier. A single neural network
MLP(7:15:5:1) was found with a relative error of 5.2304% and a correlation
of 0.9423 in the validation stage, which are acceptable results.
Concerning the swelling degree, the second parameter of interest in
this case study, by using a single neural network, the minimum validation
error is about 10%. Therefore, a procedure based on stacked neural
networks is used as a solution for improving the model performance. Table
4.7 presents the three neural networks chosen for the stack, along with their
performance in the training and validation phases. The errors are high
enough, but the performance of individual models is presented in order to
compare them with the performance registered by different stacks. The
stacks obtained and tested here have the same components, but with
different weights, i.e. contributions to the general output.
The second step in our modelling methodology based on stacked
neural networks with optimized weights is represented by successive trials
with different values of the weights.
Table 4.7. The three individual neural networks aggregated into a stack
Training
Average error
Correlation
Et%
7.95430
0.91507
Validation
Average error
Correlation
Ev%
10.32700
0.85025
Individual
network
N1
MLP(7:10:1)
N2
MLP(7:15:1)
8.21040
0.90845
12.94670
0.84547
N3
MLP(7:12:4:1)
7.56330
0.92197
9.91280
0.87200
128
Weights,%
Training
Average error
Correlation
Et%
5.26581
0.97815
Validation
Average error
Correlation
Ev%
8.99005
0.94653
w1
w2
w3
10
10
80
20
10
70
4.51110
0.98590
10.75674
0.92844
60
10
30
6.36672
0.96883
19.42466
0.84869
30
20
50
5.10411
0.98228
9.47487
0.94023
40
20
40
5.33578
0.97831
11.64715
0.92025
30
30
40
6.63880
0.94828
8.82913
0.94175
40
30
30
6.92451
0.92289
9.22556
0.93172
50
40
10
7.59543
0.95752
9.13066
0.90260
20
60
20
7.97599
0.95402
9.2432
0.92100
10
20
30
50
6.74835
0.96532
8.9050
0.9459
11
70
20
10
7.9102
0.9155
10.2290
0.8971
12
10
10
80
8.2165
0.9001
10.9921
0.8899
output. Only two inputs are necessary because w3 = 100 w1 w2. One
network was prepared for the interpolation of the training results and
another one for the validation results. These networks were admitted to be
large enough, MLP(2:24:8:1) and MLP(2:21:7:1), for training and
validation, respectively, because the interpolation capacity is here more
important than the generalization capability. The predictions of these models
were generated with a step of 1% in order to find the optimum values for the
weights which lead to smaller errors and a better correlation.
Figure 4.12. The influence of the stack weights upon the error in
the validation stage in neural network modelling of swelling degree
The contours in figure 4.12 show the validation error and the
weights w1 and w2 which correspond to this error. One can see a minimum
validation error of approximately 5%. The validation error was chosen for
display, instead of correlation, because this measure is easier to visualize
and understand by the reader. Table 4.9 contains some examples taken from
the contours in figure 4.12.
The smallest validation error is that of stack 10 (4.80111%), but the
largest correlation is that of stack 4 (0.99993). The training error of stack 10
is 4.21283%, which is significantly greater than the training error of stack 4,
which is 0.03968. At the same time, the validation error of stack 4 is
4.91658%, close to the validation error of stack 10, which is 4.80111%.
130
Weights,%
Training
Validation
w1
w2
w3
Average error
Et%
Correlation
Average error
Ev%
Correlation
20
80
4.10903
0.98903
4.9564
0.9836
41
58
0.98174
4.94792
0.99981
20
78
4.13607
0.98854
5.08107
0.97167
40
58
0.03968
0.98223
4.91658
0.99993
43
55
0.97992
5.15297
0.99989
20
77
4.14946
0.98841
5.14339
0.9672
43
54
0.97962
5.19517
0.9999
42
54
0.98014
5.15579
0.9999
21
73
4.1917
0.98838
5.03165
0.97124
10
22
71
4.21283
0.98808
4.80111
0.97331
11
13
23
64
4.32662
0.98759
5.03536
0.97887
12
17
25
58
4.47654
0.98593
4.96954
0.9785
13
18
26
56
4.54248
0.98532
4.81774
0.98017
14
19
26
55
4.58003
0.98488
4.9975
0.97798
15
20
26
54
4.62123
0.98443
5.19274
0.97596
16
23
28
49
4.85229
0.98234
5.15541
0.97659
17
24
29
47
4.96186
0.98142
5.03862
0.97794
18
27
31
42
5.27457
0.97881
5.05065
0.97834
19
28
32
40
5.41643
0.97773
4.94861
0.97979
20
30
33
37
5.62787
0.9759
5.08703
0.97853
21
31
34
35
5.78407
0.97471
4.9974
0.97996
22
32
35
33
5.94462
0.97352
4.91409
0.98154
23
33
35
32
6.00224
0.97273
5.15105
0.97847
24
34
36
30
6.16465
0.97147
5.07881
0.97988
25
37
39
24
6.64903
0.96783
4.91092
0.98509
26
38
39
23
6.70035
0.96679
5.16718
0.98106
27
43
44
13
7.39964
0.96123
5.04476
0.99152
131
Therefore, we can consider that the best weights for the networks
presented in table 4.7, for both training and prediction, are w1 = 2%,
w2 = 40% and w3 = 58%.
For the modelling of the swelling degree good results were obtained
by aggregating individual networks into stacks and weighting their outputs.
Thus, three MLP networks were used. The main idea was to replace a
possibly large, complex network with a stack formed of simple networks. If
the best individual network that modelled the swelling degree was an
MLP(7:12:4:1) with a validation error of 9.9128% and a correlation of
0.872, the stack formed of three similar MLPs reached a validation error of
4.91658% and a correlation of 0.99993 after the optimization of the weights.
One can draw the conclusion that the performance of neural
modelling for the validation phase can be significantly improved by
aggregating stacks and optimizing the weights of the stack components. The
modelling methodology developed here can be easily adapted and applied
for other complex problems.
Stacked Neural Network Modelling of Heterogeneous Photocatalytic
Decomposition of Triclopyr
The phenomenological treatment of such photochemical systems is very
complex. In general, the rate of reaction in heterogeneous photocatalytic
systems is a complex nonlinear function of catalyst loading, light intensity,
initial solution pH, reactant and oxidants concentration, etc. Therefore, the
ability of systems such as artificial neural networks to recognize and
reproduce cause-effect relationships through training, for multiple inputoutput mappings, has gained popularity in various areas of chemical
engineering and also in the field of photocatalytic treatment of wastewater.
For this case study, the photocatalytic degradation of triclopyr, we
model the final concentration of this compound as a function of process
conditions. The neural models consider the irradiation time (t), the initial
concentration of triclopyr (C0), the concentration of TiO2 used as a catalyst (
CTiO2 ) and the concentration of H2O2 ( CH 2O2 ) as inputs, and the final
concentration of triclopyr (C) as the output.
First, the data (368 in total) were split into training and validation
data sets, about 15% being the test data set used to evaluate the performance
of the neural network to data not being used in the training process. In this
way, we can evaluate the most important feature of a neural model the
generalization capability.
132
MSE
0.0012
0.0004
0.000354
0.000375
0.000756
0.00058
0.000079
0.000106
0.00047
r
0.988230
0.998497
0.998669
0.99859
0.997156
0.997816
0.999702
0.999599
0.998232
Ep%
19.2430
13.1822
9.9729
13.5923
17.9502
11.6835
3.8722
5.5637
10.6775
Table 4.10 contains a series of MLPs with one or two hidden layers
trained with experimental data, and their performance is registered in the
training stage: mean square error MSE, the correlation between experimental
data and the output of the neural network r and the percent error Ep. Only
several examples are presented in table 4.10 from the many neural networks
trained. Taking into account their performance, three neural networks were
selected: MLP(4:15:1), MLP(4:25:20:1) and MLP(4:30:25:1).
The method used to combine the parallel models was the weighted
summation of the individual outputs. Consequently, the performance of the
stack is influenced by the aggregated individual models and their
corresponding weights.
Results and Discussion
Individual and stacked neural networks were applied to the training and
validation datasets in order to compare their performance and, finally, to
choose the most appropriate model for the studied process. Like in the
133
previous study, in order to find the optimal stacked neural network, separate
neural networks were developed for interpolation. One network was
prepared for the interpolation of the training results and another one for the
validation results. They had two inputs, the weights for N1 and N2 and the
correlation, r, as the output. Only two inputs are necessary because the third
weight w3 = 100 w1 w2. While these networks were admitted to be large
enough, several variants were tried: MLP(2:24:8:1) and MLP(2:21:7:1), for
training and validation, respectively, because the interpolation capacity is
here more important than the generalization capability. The predictions of
these models were generated with a step of 1% and the maximum
correlation of 0.999045 was obtained with the following contributions of the
individual networks N1, N2 and N3: w1 = 15%, w2 = 52% and w3 = 33%,
respectively. Figure 4.13 shows the variation of the stack performance with
the weights of the component neural networks, for validation stage in case 1.
Because of the interpolation errors, the above result (the maximum)
is not precise. Additional experiments were performed in the neighbourhood
of the potential maximum in order to improve the solution. The value
0.999048 for correlation corresponds to a stack with weights of 12%, 50%
and 38% (stack 1).
In another trial, three neural networks with one single hidden layer
(the simplest networks with acceptable performances), MLP(4:5:1),
MLP(4:10:1) and MLP(4:15:1) were considered for the stack. The entire
procedure described above was repeated to obtain the weights of the
individual networks in the stack which lead to the best correlation in the
validation phase.
An optimization procedure based on a separate neural network for all
training and validation results, MLP(2:15:5:1), with the weights as inputs
and the correlation as the output, gives the weights 9%, 55% and 36% with
a correlation of 0.996071. Figure 4.14 presents the variation of the
correlation values with the weights of the stack, emphasizing the maximum.
Additional simulations around this optimum point found a correlation of
0.996087 for the following weights: 11%, 53% and 36% (stack 2).
The results of the stacks were better than those of the individual
models. Good predictions were obtained in the validation phase; therefore,
these models give a very good representation of the photocatalytic oxidation
of triclopyr and they can provide useful information for experimental
practice.
134
Figure 4.13. The variation of the stack performance in the validation stage
with the weights of the component neural networks for stack 1
Figure 4.14. The variation of the stack performance in the validation phase
with the weights of the component neural networks for stack 2
135
sequences having less than 40% identity with each other. From this set, 386
representatives of the same 27 largest folds stated above were selected. All
PDB-40D proteins that had higher than 35% identity with the proteins of the
training set were excluded from the testing set. 90% of the test proteins have
less than 25% sequence identity with the training proteins (Brenner, Chothia
& Hubbard, 1998). Therefore, we can expect that a prediction accuracy on
the testing dataset above 35% is not due to the overlapping patterns of the
test and training dataset, but to the generalization capability of the
classification model.
Case Studies
The algorithms we applied are well established in machine learning. From
the implementation point of view, the variants described by (Witten & Frank,
2000) and (Leon, 2006) were used.
The first experiments were made on the original 6 datasets presented
above: C, H, P, S, V and Z, each consisting in 313 instances. As stated, these
problems have 27 classes, corresponding to the protein folds.
Two investigations were made for each problem: the first by
evaluating the performance of the algorithms on the training set alone (e.g.
Ctrain), and the second by building the models on the training sets, but
computing the performance on the independent testing datasets (e.g. Ctest).
The testing datasets consist in 385 instances each. The instances in both the
training and the testing sets are described by 20 or 21 attributes.
100
90
80
70
C4.5
60
NN
50
k-NN
40
NNGE
30
NB
20
10
0
Ctrain
Ctest
Htrain
Htest
Ptrain
Ptest
Strain
Stest
Vtrain
Vtest
Ztrain
Ztest
The results are displayed in figure 4.15. One can notice that the best
accuracy on the training sets is provided by the instance-based methods. On
these problems, the decision tree and the Nave Bayes do not seem to be able
138
Accuracy
45.71%
35.32%
33.77%
39.22%
34.81%
32.73%
C4.5
60
50
NN
40
NNGE
30
NB
k-NN
20
10
0
CRtrain CRtest HRtrain HRtest PRtrain PRtest SRtrain SRtest VRtrain VRtest ZRtrain ZRtest
139
Table 4.12. Best accuracy values for the reduced test datasets
Dataset
R
C test
R
H test
R
P test
R
S test
R
V test
R
Z test
Accuracy
69.09%
59.74%
57.40%
74.03%
59.74%
55.32%
The structure of the models themselves can bring more insight into
the nature of the classification problem. An interesting discovery is that for
the 27 class problems the number of NNGE hyper-rectangles, or rules, are
greater than the number of single instances. This means that in these
problems there are more explicit rules, which however have a lesser scope.
Conversely, for the reduced problems, single instances are more numerous,
therefore the rules found have a greater scope but many training instances
remain outside them, and the testing instances are classified mostly by the
nearest neighbour method.
Fold recognition is a useful structure classification approach,
complementary to the one based on string sequence similarity. The
advantage of a representation with attributes is that many common machine
learning algorithms can be applied to this type of problems without
modification. The purpose of the tests presented in this study is to give an
intuition about the type of classification algorithms which is the most
appropriate for the specific problem of classifying protein folds. The main
result is that instance-based methods, related to the way in which people
classify by analogy or similarity, perform particularly well and also show
good generalization capabilities.
140
Chapter 5
141
conversion
0.8
0.6
0.4
0.2
0
0
100
200
300
400
500
time [min]
DPw
DPn , DPw
30000
25000
20000
15000
10000
5000
DPn
0
0
100
200
300
400
500
time [min]
142
The trial and error method which consists in successive tests relying
on the development of several configurations of the neural networks
and the evaluation of their performance, e.g. (Piuleac et al., 2010);
Empirical or statistical methods, which study the influence of
different internal parameters of NNs, choosing their optimal values
depending on the network performance, e.g. (Balestrassi et al., 2009);
Hybrid methods, such as the fuzzy inference, in which a network can
be interpreted as an adaptive fuzzy system or it can operate on fuzzy
instead of real numbers, e.g. (Attik, Bougrain & Alexandre, 2005);
Constructive methods and/or pruning algorithms, which add and/or
remove neurons or weights to/from an initial architecture, using a
pre-specified criterion to show the manner that these changes should
affect the network performance, e.g. (Xing & Hu, 2009);
143
Two case studies will be presented in this section. The first one
(Curteanu, Leon, Furtun, Drgoi & Curteanu, 2010) shows a comparison
between three methods: trial and error, a real-coded genetic algorithm and
differential evolution. The second one (Drgoi, Curteanu, Leon, Galaction
& Cacaval, 2011) compares the standard version of differential evolution
with a self-adaptive variant of differential evolution.
Trial and Error, Genetic Algorithm and Differential Evolution
The free radical polymerization of styrene performed by suspension
technique is the first case study selected for presentation (Curteanu, Leon,
Furtun, Drgoi & Curteanu, 2010). A complete mathematical model based
on conservation equations for the elements in the reaction mixture was
elaborated and solved using the distribution moments of the concentrations
(Curteanu, 2003). This model was the simulator for producing the database
for neural network modelling used to predict the changes of monomer
conversion and molecular weight depending on initiator concentration,
temperature and reaction time.
Three methods for determining the neural network topology are
applied and compared using a complex nonlinear process as a case study.
These methods are: a trial and error optimizing methodology applied for NN
parameters (OMP) and two evolutionary strategies based on differential
evolution (DE) and a genetic algorithm (GA).
The steps describing the algorithm for finding the optimum
parameters for a feed-forward neural network are:
1. Finding the optimum number of neurons in the hidden layer for one
hidden layer neural network;
2. Finding the optimum value for the learning rate;
3. Finding the optimum value for the momentum term;
4. Finding the optimum activation function for the output layer;
5. Finding the optimum number of neurons in the hidden layers for a
two hidden layers neural network;
6. Optimizing the parameters for the two hidden layer neural network
following the steps 2-4.
144
Figure 5.3. The content of a solution vector used in the evolutionary methods
145
Table 5.1 presents the relative errors obtained by applying the three
methods, for each modelled variable (x, Mn and Mw). The relative errors
were calculated with the following equation:
Er
pdesired pnet
100 ,
pdesired
(5.1)
Correlations
Mn
Mw
0.044
5.480
7.018
0.999
0.989
0.988
0.811
2.233
6.130
0.999
0.998
0.994
0.787
2.437
6.385
0.999
0.993
0.979
A comparison between these methods will take into account both the
error values, but also other factors such as the accessibility of the method
(execution time, algorithm complexity etc.) or the purpose to which the
model is intended (prediction, monitoring or control).
In terms of accessibility, DE and GA methods, once implemented,
are easy to use because the execution of the software program provides the
optimal network topology, training and testing errors, and predictions over
the training and testing data. However, the runs of the program should be
repeated because of the stochastic nature of the algorithms. Furthermore,
one cannot say precisely that the results of these algorithms are optimal
networks because there is always the possibility of reaching a local
optimum.
The OMP method is more laborious, but the systemized practical
considerations on modelling a neural network (structured in a 6-step
algorithm) and the criterion and formula used for calculating the
146
147
Standard DE approach
Table 5.2. Parameters of the two best networks obtained with the standard DE
approach and self-adaptive jDE mechanism
Neural Network parameter
Parameter Value
Number of neurons
Activation functions
Biases
Number of neurons
(H1.1):Linear
(H1.1):-4.254
1
Activation functions
Biases
(O.1):Linear
(O.1):0.180
1
4
6
(H1.1):Step
(H1.1):Step
(H1.1):Linear
(H1.1):-0.240
(H1.3):-0.552
(H1.5):-0.093
1
(O.1):Step
(O.1):-0.271
Number of neurons
Activation functions
Biases
(H1.2):Sigmoid
(H1.2):-3.211
(H1.2):Linear
(H1.2):Sigmoid
(H1.2):Linear
(H1.2):0.201
(H1.4):-0.256
(H1.6):0.142
150
151
CS(X ,X )
a X ;a X :a a
X
(5.2)
where X, X are two sets of solutions vectors. CS maps the ordered pair (X,
X) to the interval [0, 1]. CS(X, X) = 1 means that all points in X are
weakly dominated by the solutions in X. The opposite, CS(X, X) = 0,
represents the situation when none of the solutions in X are covered by the
set X. Since the domination operator is not symmetric, CS(X, X) is not
necessarily equal to 1 CS(X, X). Therefore, both CS(X, X) and
CS(X, X) need to be considered.
The spacing metric was computed as:
1
PF
PF
(d
i 1
dm )2 ,
(5.3)
152
Table 5.3. The results of the simulations for optimum popSize and optimum
noGen with CrossoverProbability = 0.9 and MutationProbability = 0.03
Simulation
no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Current NSGA-II
parameters
popSize
noGen
10
10
10
10
10
50
50
50
50
50
100
100
100
100
100
300
300
300
300
300
500
500
500
500
500
50
100
300
500
1000
50
100
300
500
1000
50
100
300
500
1000
50
100
300
500
1000
50
100
300
500
1000
CS (current,
best)
CS (best,
current)
1
0.2
0.2
0.1
0.1
0.2
0.2
0.36
0.32
0.86
0
0.02
0.02
0.08
0.02
0
0.02
0.58
0.407
0.607
0.69
0.038
0.786
0.888
0.24
1
0
0
0.2
0.2
0.2
0.46
0.24
0.28
0
0.88
0.6
0.61
0.3
0.65
0.973
0.557
0.237
0.853
0.62
0.608
0.992
0.572
0.572
0.93
Best NSGA-II
parameters
popSize
noGen
10
10
10
10
10
10
50
50
50
50
50
50
50
50
50
50
50
50
300
300
300
500
500
500
500
50
50
100
300
300
300
50
50
300
500
1000
1000
1000
1000
1000
1000
1000
1000
300
300
300
50
50
300
500
153
they are applied to unseen data, is basically due to the over-fitting or underfitting of the training data leading to poor generalization performance.
A combination of two or more neural networks would avoid the
failure of individual component networks caused by the limited training set,
the over-fitting of the noise in the data or the convergence of the training to
local minima. Another advantage of using a combination of multiple neural
networks is based on the fact that different networks could perform well in
different regions of the input space, so using them simultaneously would
increase the prediction accuracy on the entire input space. Such aggregation
methods of neural networks are: stacked neural network e.g. (Zhang, 2008),
neural network ensemble e.g. (Nguyen, Abbass & McKay, 2005) or
aggregated neural network e.g. (Mukherjee & Zhang, 2008).
This study (Furtun, Curteanu & Leon, 2012) is based on the
development of an optimized stacked neural network used for modelling the
synthesis of polyacrylamide-based multicomponent hydrogels. This is a
very complex chemical process and there is no known phenomenological
model that can reproduce the physical and chemical laws that govern it.
Therefore, empirical models that work with input-output data sets, such as
the neural networks, are recommended as alternatives for the modelling of
this kind of process.
The available experimental database consisted in 178 instances
which were randomized and then divided into 75% for neural network
training (134 instances) and 25% for testing (44 instances).
Like in the first case study presented in section 4.3, 7 input variables
were considered for each individual neural network: CM (monomer
concentration), CI (initiator concentration), CA (crosslinking agent
concentration), PI (amount of inclusion polymer), T (temperature), t
(reaction time), and type of included polymer (Pt), codified as 1 no
polymer added, 2 starch, 3 poly(vinyl alcohol) (PVA), and 4 gelatine.
The outputs of the neural models and, implicitly, the outputs of the stacked
neural network, were (yield in crosslinked polymer) and (swelling
degree). Thus, the neural network modelling established the influence of
reaction conditions on reaction yield and swelling degree.
The Multiobjective Optimization Procedure
The core of the multiobjective optimization procedure is the real-coded
NSGA-II algorithm. The multiobjective optimization problem consists in
finding the best topology and the best weights for a stacked neural network,
which lead to the greatest generalization performance, while maintaining the
stacked neural network to a minimum size.
154
(5.4)
where MSEtrain is the mean squared error obtained for the training set,
MSEtest is the mean squared error obtained for the testing set and r is the
linear correlation coefficient for testing. Since the best value for MSEtrain
and MSEtest is 0 and the best value for r is 1, the ideal value for perf_index
(the performance index of the stacked neural network) would be 1. As the
performance index (perf_index) is closer to 1, the better is the accuracy and
the generalization capacity of the stacked neural network.
The output of the neural network corresponding to the yield in
crosslinked polymer () will be denoted as the first output and the output
corresponding to the swelling degree (), as the second output.
The goal of the multiobjective optimization was to maximize the
performance of the stacked neural network, namely minimizing the training
and testing errors and obtaining a testing correlation coefficient of 1, while
minimizing its total number of hidden neurons. The decision variables
optimized with the evolutionary algorithm were: the number of individual
neural networks from the stack, the weights for each output of the individual
neural networks combined in the stack and the number of hidden neurons
for each neural network.
Therefore, the multiobjective function to be maximized in the
optimization procedure was constructed from two fitness functions and had
the following expression:
NNno
(5.5)
where Hnk is the number of neurons in the k-th individual neural network
and wjk is the weight of the j-th output of the k-th neural network from the
stack.
The selection of the maximum allowed number of hidden neurons
for every neural network took into account the practical considerations
stated in (Furtun, Curteanu & Cazacu, 2011).
For a better search efficiency, the total number of connection
weights in a neural network must be limited:
155
Nw
1
Nt No
10
(5.6)
Nw
1
Nt No
2
(5.7)
Nw Ni Hn Hn No
(5.8)
(5.9)
memory. Consequently, the size of the search space for NSGA-II must be
limited through the complexity of the chromosome structure and the range
of values for its genes. Thus, the decision variables representing the genes
from every chromosome were limited to a certain interval of values. The
number of individual neural networks included in the stack varied from 1 to
5, the weights for each output of the individual neural networks combined in
the stack took values between 0 and 100%, and the number of hidden
neurons for each neural network ranged from 2 to 15. The maximum
number of hidden neurons was established by using equations 5.6, 5.8 and
5.9.
The number of neural networks in the stack and the number of
hidden neurons in each network were limited to the above-mentioned ranges
due to the amount of time and memory required for the convergence of the
evolutionary algorithm in case of a larger stacked neural network. These
limitations were imposed based on a series of tests done with the
evolutionary hyper-heuristic. Also, the results of the multiobjective
optimization showed that, usually, 2 or 3 neural networks in the stack
provide it with a very satisfactory accuracy and a good generalization
capacity. This fact is sustained by the results presented in table 5.4.
Table 5.4. The set of optimal non-dominated solutions obtained
with the NSGA-II-QNSNN evolutionary hyper-heuristic
No. NNno w11 w12 w13 w14 w15 w21 w22 w23 w24 w25 Hn1 Hn2 Hn3 Hn4 Hn5
1
0.17 0.83
0.47 0.53
0.25 0.75
0.08 0.92
0.25 0.75
0.22 0.78
0.22 0.78
0.31 0.69
0.40 0.60
0.97 0.03
0.40 0.60
11
0.80 0.20
0.30 0.70
11
0.01 0.99
0.05 0.95
14
10
0.20 0.80
14
11
0.03 0.97
0.18 0.82
11
14
12
14
12
13
11
14
12
14
14
12
12
15
11
14
12
12
16
0.02 0.26 0.24 0.26 0.22 0.06 0.42 0.26 0.12 0.14 11
14
12
12
14
158
Activation
function 2
()
Linear
Logistic
Tanh
Linear
Logistic
Tanh
Linear
Logistic
Tanh
MSEtrain
MSEtest
perf_index
0.012
0.052
0.012
0.117
0.045
0.137
0.011
0.074
0.179
0.049
0.028
0.164
0.050
0.028
0.093
0.140
0.047
0.112
0.822
0.457
0.638
0.227
0.267
0.036
0.669
0.280
0.265
0.761
0.377
0.463
0.060
0.198
-0.194
0.518
0.159
-0.026
MSEtrain
MSEtest
perf_index
Total Hn
1
2
3
4
5
0.109
0.105
0.084
0.087
0.080
0.044
0.037
0.027
0.027
0.026
0.789
0.821
0.875
0.881
0.882
0.636
0.679
0.764
0.767
0.776
4
5
6
7
9
6
7
8
9
10
11
0.072
0.047
0.047
0.024
0.027
0.021
0.024
0.027
0.023
0.016
0.015
0.015
0.892
0.876
0.895
0.930
0.936
0.934
0.796
0.802
0.825
0.89
0.894
0.898
11
14
15
16
19
25
12
13
14
15
16
0.019
0.018
0.020
0.017
0.016
0.011
0.011
0.010
0.010
0.012
0.951
0.951
0.955
0.954
0.956
0.921
0.922
0.925
0.927
0.928
28
37
43
49
63
160
161
166
In the future, the study can be continued with the investigation of the
influence of system parameters, considered both individually and
collectively. The model can also be used with slight modifications to model
learning behaviour in an e-learning environment, where the agents, i.e. the
learning individuals, can benefit from tutoring and group learning by
solving tasks in a cooperative manner.
We also presented the design of a multiagent system that can display
different types of behaviours, from asymptotically stable to chaotic. In this
case, chaos arises only from the agent interactions, and it is not artificially
introduced through a chaotic map.
Here, as future directions of research, we aim at further analysing the
results of the interactions in order to see whether some probabilistic
predictions can be made, taking into account the system state at a certain
moment. It is important to determine when small perturbations have visible
effects and when they can be controlled. Also, one must investigate whether
classical chaos control techniques used for physical systems such as the
OGY method (Ott, Grebogi & Yorke, 1990), can be applied for this
multiagent system.
Another fundamental question is whether the chaos in the system is
only transient and eventually stabilises into a steady state or its behaviour
remains chaotic forever. Out of many experiments, it was observed that
sometimes the system converges to a stable state. In other cases, chaos
doesnt seem to be only transient, e.g. with 50 agents executing in
lexicographic order, which corresponds to fewer fluctuations, there are still
sudden changes occurring in the utility variation even after 50000 time
steps. One needs to distinguish between these cases as well.
Also, beside the analysis of the exogenous perturbations showing
that very small changes can have a great impact on the evolution of the
system and the investigation of some methods of controlling such
perturbations in order to have a desirable final state, we aim at analysing
endogenous perturbations and the effect of alternative decisions on the
evolution of agent utilities. In this respect, we will suggest different methods
of describing the behaviour of the multiagent system.
The planning method with quasi-determined states shows a way to
include supervised, inductive learning into a planning problem. A model of
extracting a training dataset from the Q matrix of a reinforcement learning
algorithm was described. Since the agent does not possess all the necessary
information at any given time, it needs to compute the optimal action. If the
environment is non-deterministic, the agent can learn and change its model.
A predicative representation of the states is not necessary, because the states
are dynamically recognized by means of predictions made on the basis of
167
Optimization Methods
Quantum-inspired evolutionary algorithms (QIEA) seem to be a promising
direction of research, given that they outperform classical evolutionary
algorithms especially for large optimization problems. The QIEA-SSEHC
algorithm was proposed for solving multi-attribute combinatorial auction
problems, with the following set of characteristics: an evolutionary hillclimbing phase at the end to fine-tune the solutions and the use of a steady
state model, instead of a generational model. In order to maintain the genetic
diversity, a repairing procedure was employed, which guarantees that all the
chromosomes of the population are feasible, i.e. satisfy the problem
constraints. A metric for the comparison of multi-objective solution sets was
proposed, based on sampling for the integral of the whole space of utility
functions. Also, the average number of solution vectors in the Pareto front
was used as a diversity metric. In order to apply the rotation gate for the
quantum-inspired crossover, a randomly selected non-dominated solution
vector was chosen as the quantum best.
Future research will focus on using different variants of quantum
operators, for example different types of mutations instead of the one based
on flipping probability amplitudes, and especially on finding a real quantum
optimization algorithm, which could exploit the state superpositions to
extract the optimum without actually representing all the possible observed
states.
169
170
171
172
References
[1]
[2]
Albert R., Jeong H., Barabsi A. L. (1999) Diameter of the world-wide web,
Nature, no. 401, pp. 130-131.
[3]
Altschul S. F., Madden T. L., Schffer A. A., Zhang J., Zhang Z., Miller W.,
Lipman D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs, Nucleic Acids Res, 25, pp. 3389-3402.
[4]
[5]
[6]
Atay N., Bayazit B. (2007) Emergent Task Allocation for Mobile Robots,
Proceedings of Robotics: Science and Systems Conference.
[7]
[8]
[9]
Balestrassi P. P., Popova E., Paiva A. P., Marangon Lima J. W. (2009) Design of
experiments on neural network's training for nonlinear time series
forecasting, Neurocomputing, vol. 72, no. 4-6, pp. 1160-1178.
175
References
[33] Che Y. K. (1993) Design competition through multidimensional auctions,
RAND Journal of Economics, vol. 24. no. 4, pp. 668-680.
[34] Chen-Ritzo C. H., Harrison T. P., Kwasnica A. M., Thomas D. J. (2005) Better,
Faster, Cheaper: An experimental analysis of a multi-attribute reverse auction
mechanism with restricted information feedback, Management Science, vol.
51, no. 12, pp. 1753-1762.
[35] Chli M., De Wilde P., Goossenaerts J., Abramov V., Szirbik N., Correia L.,
Mariano P., Ribeiro R. (2003) Stability of Multi-Agent Systems, IEEE
International Conference on Systems, Man and Cybernetics, vol. 1, pp. 551556.
[36] Crciu M. S., Leon F. (2010) Comparative Study of Multiobjective Genetic
Algorithms, Bulletin of the Polytechnic Institute of Iai, Romania, tome LVI
(LX), section Automatic Control and Computer Science, fasc. 1, pp. 35-47.
[37] Coello C. A. C., Lamont G. B., Van Veldhuizen D. A. (2007) Evolutionary
Algorithms for Solving Multi-Objective Problems, in Goldberg, D. E., Koza, J. R.
(Eds. ), Genetic and Evolutionary Computation Series. Springer, New York,
pp. 233-282.
[38] Conitzer V., Sandholm T. (2003) AWESOME: A general multiagent learning
algorithm that converges in self-play and learns a best response against
stationary opponents, Proceedings of the 20th International Conference on
Machine Learning, ICML-03, pp. 8390, Washington, US.
[39] Cornforth D., Green D. G., Newth D. (2005) Ordered asynchronous processes
in multi-agent systems, Physica D (Nonlinear Phenomena), vol. 204, pp. 7082.
[40] Cortes C., Vapnik V. N. (1995) Support-Vector Networks, Machine Learning,
20, [online] http://www. springerlink. com/content/k238jx04hm87j80g/.
Freund Y., Schapire R. E. (1997) A Decision-Theoretic Generalization of OnLine Learning and an Application to Boosting, Journal of Computer and
System Sciences, 55(1):119-139.
[41] Cover T. M., Hart P. E. (1967) Nearest neighbor pattern classification, IEEE
Transactions on Information Theory, vol. 13 (1), pp. 21-27.
[42] Cramton P., Shoham Y., Steinberg R., eds. (2006) Combinatorial Auctions,
MIT Press, Cambridge, MA.
[43] Craven M. W., Mural R. J., Hauser L. J., Uberbacher E. C. (1995) Predicting
protein folding classes without overly relying on homology, ISMB, vol. 3, pp.
98-106.
[44] Crites R., Barto A. G. (1998) Elevator Group Control Using Multiple
Reinforcement Learning Agents, Machine Learning, vol. 33, pp. 235-262 .
[45] Cundari T. R., Deng J., Pop H. F., Sarbu, C. (2000) Structural analysis of
transition metal beta-X substituent interactions. Toward the use of soft
computing methods for catalyst modeling, J. Chem. Inf. Comput. Sci., 40,
1052-1061.
177
178
References
[60] Dury A., Le Ber F., Chevrier V. (1998) A Reactive Approach for Solving
Constraint Satisfaction Problems: Assigning Land Use to Farming Territories,
Proceedings of Agents Theories, Architectures and Languages, ATAL98,
Lecture Notes in Artificial Intelligence 1555 Intelligent Agents V, J. P.
Muller, M. P. Singh et A. S. Rao (eds), Springer-Verlag, pp. 397-412.
[61] Esposito F., Malerba D., Semeraro G., Tamma V. (1999) The Effects of Pruning
Methods on the Predictive Accuracy of Induced Decision Trees, Applied
Stochastic Models in Business and Industry, pp. 277-299.
[62] Fahlman S. E. (1988) Faster-Learning Variations on Back-Propagation: An
Empirical Study, in Proceedings of the 1988 Connectionist Models Summer
School, Morgan-Kaufmann, Los Altos CA.
[63] Fan K., Brabazon A., OSullivan C., ONeill M. (2007) Option Pricing Model
Calibration using a Real-valued Quantum-inspired Evolutionary Algorithm,
Proceedings of the 9th annual conference on Genetic and evolutionary
computation (GECCO 07), London, England, pp. 1983-1990.
[64] Feigenbaum M. J. (1979) The Universal Metric Properties of Nonlinear
Transformations, Journal of Statistical Physics, vol. 21, pp. 669-706.
[65] Ferreira P. R., Bazzan A. L. C. (2006) Swarm-GAP: A Swarm Based
Approximation Algorithm for E-GAP, First International Workshop on Agent
Technology for Disaster Management, pp. 49-55.
[66] Festinger L., Riecken H. W., Schachter S. (1956) When Prophecy Fails: A
Social and Psychological Study of A Modern Group that Predicted the
Destruction of the World, Harper-Torchbooks.
[67] Fischer M. M., Reismann M., Hlavackova-Schindler K. (1999) Parameter
estimation in neural spatial interaction modelling by a derivative free global
optimization method, Proceedings of IV international conference on
geocomputation, Fredericksburg, USA [online] http://citeseerx. ist. psu.
edu/ viewdoc/download?doi=10. 1. 1. 20. 9676&rep=rep1&type=pdf.
[68] Fisher R. A. (1936) The use of multiple measurements in taxonomic problems,
Annual Eugenics, 7, Part II, pp. 179-188.
[69] Fletcher R., Powell M. J. D. (1963) A rapidly convergent descent method for
minimization, The Computer Journal, 6(2):163168.
[70] Furtun R., Curteanu S., Cazacu M. (2011) Optimization Methodology Applied
to Feed-Forward Artificial Neural Network Parameters, Int. J. Quantum Chem.
111, pp. 539-553.
[71] Furtun R., Curteanu S., Leon F. (2011) An Elitist Non-Dominated Sorting
Genetic Algorithm Enhanced with a Neural Network Applied to the MultiObjective Optimization of a Polysiloxane Synthesis Process, Engineering
Applications of Artificial Intelligence, Elsevier, vol. 24, pp. 772-785.
[72] Furtun R., Curteanu S., Leon F. (2012) Multi-objective optimization of a
stacked neural network using an evolutionary hyper-heuristic, Applied Soft
Computing, Elsevier, vol. 12, issue 1, January 2012, pp. 133-144.
179
180
References
[87] Holm L., Ouzounis C., Sander C., Tuparev G., Vriend G. (2008) FSSP: Families
of Structurally Similar Proteins,
[online]
ftp://ftp.
ebi.
ac.
uk/pub/databases/fssp.
[88] Hoos H. H., Boutilier C. (2000) Solving combinatorial auctions using
stochastic local search, Proceedings of the Seventeenth National Conference
on Artificial Intelligence (AAAI-2000).
[89] Hota A. R., Pat A. (2011) An Adaptive Quantum-inspired Differential Evolution
Algorithm for 0-1 Knapsack Problem, Computer Science - Neural and
Evolutionary Computing, I. 2. 8.
[90] Houle J. L., Cadigan W., Henry S., Pinnamaneni A., Lundahl S. (2004)
Database Mining in the Human Genome Initiative, Whitepaper, Biodatabases.
com,
Amita
Corporation,
[online]
http://www.
biodatabases.
com/whitepaper. html.
[91] Hu J., Wellman M. P. (1998) Multiagent reinforcement learning: Theoretical
framework and an algorithm, Proceedings of the 15th International
Conference on Machine Learning, ICML-98, pp. 242250, Madison, US.
[92] Janacek G. (2001) Practical Time Series, Oxford University Press.
[93] Jennings N. R., Sycara K., Woolridge M. (1998) A Roadmap of Agent Research
and Development, Autonomous Agents and Multi-Agent Systems, vol. 1, pp.
7-38.
[94] Johnson A., Morris P., Muscettola N., Rajan K. (2000) Planning in
Interplanetary Space, Theory and Practice, Proceedings of AIPS.
[95] Jovanovic B., Nyarko Y. (1995) A Bayesian Learning Model Fitted to a Variety
of Empirical Learning Curves, Brookings Papers on Economic Activity, pp.
247-305.
[96] Katerelos I. D., Koulouris A. G. (2004) Is Prediction Possible? Chaotic Behavior
of Multiple Equilibria Regulation Model in Cellular Automata Topology,
Complexiy, Wiley Periodicals, vol. 10, no. 1.
[97] Kautz H., Selman B. (1996) Pushing the envelope: Planning, propositional
logic and stochastic search, Proceedings of AAAI
[98] Kennedy J., Eberhart R. (1995) Particle Swarm Optimization, Proceedings of
IEEE International Conference on Neural Networks IV, pp. 19421948,
doi:10.1109/ICNN.1995.488968.
[99] Khanpour S., Movaghar A. (2006) Design and Implementation of Optimal
Winner Determination Algorithm in Combinatorial e-Auctions, World
Academy of Science, Engineering and Technology, vol. 20, 2006.
[100] Khasei M., Bijari M. (2010) An artificial neural network (p, d, q) model for
timeseries forecasting, Expert Systems with Applications, vol. 37, pp. 479489.
[101] Kohonen T. (1982) Self-Organized Formation of Topologically Correct
Feature Maps, Biological Cybernetics 43 (1): 5969.
181
182
References
[114] Leon F. (2012b) A Quantum-Inspired Evolutionary Algorithm for MultiAttribute Combinatorial Auctions, Proceedings of 2012 16th International
Conference on System Theory, Control and Computing (ICSTCC 2012),
Sinaia, Romania
[115] Leon F. (2012c) Real-Valued Quantum-Inspired Evolutionary Algorithm for
Multi-Issue Multi-Lateral Negotiation, Proceedings of 2012 IEEE 8th
International Conference on Intelligent Computer Communication and
Processing (ICCP 2012), pp. 41-48, Cluj-Napoca, Romania
[116] Leon F. (2013) A Multiagent System Generating Complex Behaviours, in
Costin Bdic, Ngoc Thanh Nguyen, Marius Brezovan (eds. ), Computational
Collective Intelligence. Technologies and Applications, Lecture Notes in
Artificial Intelligence, LNAI 8083, 5th International Conference, ICCCI 2013,
Springer-Verlag Berlin Heidelberg, pp. 154-164.
[117] Leon F., Aigntoaiei B. I., Zaharia M. H. (2009) Performance Analysis of
Algorithms for Protein Structure Classification, Proceedings of the 20th
International Workshop on Database and Expert Systems Applications,
DEXA 2009, eds. A. M. Tjoa, R. R. Wagner, IEEE Computer Society,
Conference Publishing Services, pp. 203-207.
[118] Leon F., Curteanu S., Lisa C., Hurduc N. (2007) Machine Learning Methods
Used to Predict the Liquid-Cristalline Behavior of Some Copolyethers,
Molecular Crystals & Liquid Crystals, vol. 469, pp. 1-22, Taylor & Francis
Group, USA.
[119] Leon F., Leca A. D. (2011) Dual Manner of Using Neural Networks in a
Multiagent System to Solve Inductive Learning Problems and to Learn from
Experience, in F. M. T. Brazier, Kees Nieuwenhuis, Gregor Pavlin, Martijn
Warnier, Costin Bdic (eds. ), Intelligent Distributed Computing V, Studies
in Computational Intelligence, vol. 382, Proceedings of the 5th International
Symposium on Intelligent Distributed Computing - IDC 2011, Delft, The
Netherlands October 2011, Springer-Verlag Berlin Heidelberg 2011, pp.
81-91.
[120] Leon F., Leca A. D., Atanasiu G. M. (2010) Strategy Management in a
Multiagent System Using Neural Networks for Inductive and Experience-based
Learning, in C. Bratianu, N. A. Pop (eds. ) - Management & Marketing,
Challenges for Knowledge Society, vol. 5, no. 4, pp. 3-28, Editura Economic,
Bucureti.
[121] Leon F., Lisa C., Curteanu S. (2010) Prediction of the Liquid Crystalline
Property Using Different Classification Methods, Molecular Crystals and
Liquid Crystals, vol. 518, pp. 129-148.
[122] Leon F., Piuleac C. G., Curteanu S. (2010) Stacked Neural Network Modeling
Applied to the Synthesis of Polyacrylamide Based Multicomponent Hydrogels,
Macromolecular Reaction Engineering, vol. 4, pp. 591-598, WILEY-VCH
Verlag GmbH & Co., Germany.
183
184
References
Intelligence: GEOgraphic Object-Based Image Analysis for the 21st Century,
Calgary.
[138] Mikki S., Kishk A. A. (2006) Quantum Particle Swarm Optimization for
Electromagnetics, IEEE Transactions on Antennas and Propagation, vol. 54,
issue 10, pp. 2764-2775.
[139] Milgram S. (1967) The Small World Problem, Psychology Today, vol. 2, pp.
60-67.
[140] Mishra D., Veeranmant D. (2002) A multi-attribute reverse auction for
outsourcing, Proceedings of the 13th International Workshop on Database
and Expert Systems Application, pp. 675-679.
[141] Moore A. (1990) Efficient Memory-Based Learning for Robot Control, PhD
thesis, University of Cambridge.
[142] Mukherjee A., Zhang J. (2008) A reliable multi-objective control strategy for
batch processes based on bootstrap aggregated neural network models, J.
Process Contr. 18, pp. 720-734.
[143] Murzin A. G., Chandonia J. M., Andreeva A., Howorth D., L. LoConte, Ailey B.
G., Brenner S. E., Hubbard T. J. P., Chothia C. (2007) SCOP: Structural
Classification of Proteins, [online] http://scop. mrc-lmb. cam. ac. uk/scop.
[144] Naeeni A. F. (2004) Advanced Multi-Agent Fuzzy Reinforcement Learning,
Master Thesis, Computer Science Department, Dalarna University College,
Sweden,
[online]
http://www2.
informatik.
hu-berlin.
de/~ferdowsi/Thesis/ Master %20Thesis. pdf.
[145] Nguyen M. H., Abbass H. A., McKay R. I. (2005) Stopping criteria for ensemble
of evolutionary artificial neural networks, Appl. Soft Comput. 6, pp. 100-107.
[146] Nielsen M. A., Chuang I. L. (2010) Quantum Computation and Quantum
Information, 10th Anniversary Edition, Cambridge University Press.
[147] Oeda S., Ichimura T., Yoshida K. (2004) Immune Multi Agent Neural Network
and Its Application to the Coronary Heart Disease Database, Lecture Notes in
Computer Science, 2004, vol. 3214, pp. 1097-1105.
[148] Orengo C., Cuff A., Sillitoe I., Lewis T., Clegg A. (2008) CATH: Protein
Structure Classification, Institute of Structural and Molecular Biology,
University College London, http://www. cathdb. info.
[149] Ott E., Grebogi C., Yorke J. A. (1990) Controlling Chaos, Phys. Rev. Lett. 64,
2837.
[150] Pagnucco M., Peppas P. (2001) Causality and Minimal Change Demystified,
Proceedings of the Seventeenth International Joint Conference on Artificial
Intelligence (IJCAI'01), Seattle, USA, pp. 125-130.
[151] Pant M., Thangaraj R., Abraham A. (2008) A New Quantum Behaved Particle
Swarm Optimization, Proceedings of the 10th annual conference on Genetic
and evolutionary computation (GECCO 08), Atlanta, GA, USA, pp. 87-94.
[152] Pfeffermann D., Allon J. (1989) Multivariate exponential smoothing: Methods
and practice, International Journal of Forecasting, vol. 5, pp. 83-98.
185
[online]
[165] Rodriguez S., Hilaire V., Koukam A. (2006) Holonic Modeling of Environments
for Situated Multi-agent Systems, Environments for Multi-Agent Systems II,
Second International Workshop, E4MAS 2005, Utrecht, The Netherlands,
Selected Revised and Invited Papers, pp. 18-31.
[166] Rosch, E. (1975) Cognitive Representation of Semantic Categories, J
Experimental Psychology, vol. 104, pp. 192-233.
[167] Rothman D. (2006) Nonlinear Dynamics I: Chaos, [online] http://ocw. mit.
edu/courses/earth-atmospheric-and-planetary-sciences/
12-006jnonlinear-dynamics-i-chaos-fall-2006/lecture-notes/ lecnotes15. pdf.
186
References
[168] Rumelhart D. E., Hinton G. E., Williams R. J. (1986) Learning internal
representations by error propagation, in D. E. Rumelhart, J. L. McClelland
(eds. ): Parallel Distributed Processing: Explorations in the Microstructure
of Cognition, Volume 1, 318-362, The MIT Press, Cambridge, MA.
[169] Rummery G. A., Niranjan M. (1994) On-line Q-learning Using Connectionist
Systems, Technical Report CUED/F-INFENG/TR 166, Engineering
Department, Cambridge University.
[170] Russell S. J., Norvig P. (2002) Artificial Intelligence: A Modern Approach,
Prentice Hall, 2nd Edition.
[171] Sandholm T. (2002) Algorithm for optimal winner determination in
combinatorial auctions, Artificial Intelligence, vol. 135, no. 1-2, pp. 1-54.
[172] Sandholm T., Suri S. (2000) Improved algorithms for optimal winner
determination in combinatorial auction and generalization, Proceedings of
the Seventeenth National Conference on Artificial Intelligence (AAAI-2000).
[173] Scerri P., Farinelli A., Okamoto S., Tambe M. (2005) Allocating tasks in
extreme teams, Proceedings of the Fourth International Joint Conference on
Autonomous Agents and Multiagent Systems, pp. 727-734, ACM Press.
[174] Schaffer D. J. (1985) Multiple objective optimization with vector evaluated
genetic algorithms, Proceedings of the International Conference on Genetic
Algorithm and Their Applications, 1985.
[175] Schwind M., Stockheim T., Rothlauf F. (2003) Optimization Heuristics for the
Combinatorial Auction Problem, Working Papers in Information Systems,
University of Mannheim, [online] http://wi. bwl. uni-mainz. de/Dateien/
working_paper_2003_13. pdf.
[176] Shepard R. N., Hovland C. I., Jenkins H. M. (1961) Learning and memorization
of classifications, Psychological Monographs, 75.
[177] Singh S. P., Sutton R. S. (1996) Reinforcement learning with replacing
eligibility traces, Machine Learning, vol. 22(1/2/3), pp. 123-158.
[178] Smith R. G. (1980) The Contract Net Protocol: High-Level Communication and
Control in a Distributed Problem Solver, IEEE Transactions on Computers,
vol. 29, no. 12, pp. 1104-1113.
[179] Sol R. V., Gamarra J. G. P., Ginovart M., Lpez D. (1999) Controlling Chaos in
Ecology: From Deterministic to Individual-based Models, Bulletin of
Mathematical Biology, vol. 61, pp. 1187-1207
[180] Stepney S., Polack F. A. C., Turner H. R. (2006) Engineering Emergence,
ICECCS'06, Stanford, CA, USA, IEEE, pp 89-97.
[181] Stone L., He D. (2007) Chaotic oscillations and cycles in multi-trophic
ecological systems, Journal of Theoretical Biology, vol. 248, pp. 382-390.
[182] Storn R., Price K. (1997) Differential evolution - a simple and efficient
heuristic for global optimization over continuous spaces, Journal of Global
Optimization, vol. 11, pp. 341-359.
187
188
References
[198] Vlachogiannis J. G., Lee K. Y. (2008) Quantum-Inspired Evolutionary
Algorithm for Real and Reactive Power Dispatch, IEEE Transactions on Power
Systems, vol. 23, no. 4, pp. 1627-1636.
[199] Voort M. V. D., Dougherty M., Watson S. (1996) Combining Kohonen maps
with ARIMA time series models to forecast traffic flow, Transportation
Research Part C: Emerging Technologies, vol. 4, pp. 307-318.
[200] Wang F. (2002) Self-organising Communities Formed by Middle Agents,
Proceedings of the 1st International Conference on Autonomous Agents and
Multi-Agent Systems, AAMAS02, pp. 1333-1339.
[201] Wang Y., Zhou J., Mo L., Zhang R., Zhang Y. (2012) A Modified Differential
Real-coded Quantum-inspired Evolutionary Algorithm for Continuous Space
Optimization, Journal of Computational Information Systems, vol. 8, no. 4,
pp. 1487-1495.
[202] Watkins C. J. C. H., (1989) Learning from Delayed Rewards, PhD thesis,
Cambridge University.
[203] Watkins C. J. C. H., Dayan P. (1992) Technical Note: Q-Learning, Machine
Learning, vol. 8, pp. 55-68.
[204] Watts D. J., Strogatz S. H. (1998) Collective dynamics of small-world
networks, Nature, no. 393, p. 440.
[205] Weinberger K. Q., Saul L. K. (2009) Distance Metric Learning for Large
Margin Nearest Neighbor Classification, Journal of Machine Learning
Research, vol. 10, pp. 207-244.
[206] Werbos P. J. (1974) Beyond regression: new tools for prediction and analysis
in the behavioral sciences, PhD Thesis, Harvard University, Cambridge, MA.
[207] Witten I. H., Frank E. (2000) Data Mining: Practical machine learning tools
with Java implementations, Morgan Kaufmann, San Francisco.
[208] Wolf A., Swift J. B., Swinney H. L. Vastano J. A. (1985) Determining Lyapunov
exponents from a time series, Physica D, (Nonlinear Phenomena) vol. 16, pp.
285-317.
[209] Xiao J., Yan Y., Lin Y., Yuan L., Zhang J. (2008) A Quantum-inspired Genetic
Algorithm for Data Clustering, Proceedings of IEEE Congress on
Evolutionary Computation (CEC 2008), IEEE World Congress on
Computational Intelligence, pp. 1513-1519.
[210] Xie A., Li Y., Sun W. (2004) Review on the Theory of Multi-Attribute
E-Auction, Asia Pacific Management Review, vol. 9, no. 4, pp. 621-643.
[211] Xing H. J., Hu B. G. (2009) Two-phase construction of multilayer perceptrons
using information theory, IEEE Trans. Neural Networks, vol. 20, no. 4, pp.
715-721.
[212] Zhang J. (2008) Batch-to-batch optimal control of a batch polymerization
process based on stacked neural network models, Chem. Eng. Sci. 63, pp.
1273-1281.
189
190