Vous êtes sur la page 1sur 21

Applied Soft Computing 55 (2017) 331351

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Co-evolutionary multi-population genetic programming for


classication in software defect prediction: An empirical case study
Goran Mausa , Tihana Galinac Grbac
Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia

a r t i c l e i n f o a b s t r a c t

Article history: Evolving diverse ensembles using genetic programming has recently been proposed for classication
Received 8 December 2016 problems with unbalanced data. Population diversity is crucial for evolving effective algorithms. Mul-
Received in revised form 18 January 2017 tilevel selection strategies that involve additional colonization and migration operations have shown
Accepted 26 January 2017
better performance in some applications. Therefore, in this paper, we are interested in analysing the
Available online 3 February 2017
performance of evolving diverse ensembles using genetic programming for software defect prediction
with unbalanced data by using different selection strategies. We use colonization and migration opera-
Keywords:
tors along with three ensemble selection strategies for the multi-objective evolutionary algorithm. We
Genetic programming
Classication
compare the performance of the operators for software defect prediction datasets with varying levels of
Coevolution data imbalance. Moreover, to generalize the results, gain a broader view and understand the underlying
Software defect prediction effects, we replicated the same experiments on UCI datasets, which are often used in the evolutionary
computing community. The use of multilevel selection strategies provides reliable results with relatively
fast convergence speeds and outperforms the other evolutionary algorithms that are often used in this
research area and investigated in this paper. This paper also presented a promising ensemble strategy
based on a simple convex hull approach and at the same time it raised the question whether ensemble
strategy based on the whole population should also be investigated.
2017 Elsevier B.V. All rights reserved.

1. Introduction simple empirical principle is universally valid for all software sys-
tems [1]. On the other hand, software defects generally do not
Software defect prediction (SDP) is an important decision sup- follow any particular probability distribution that could provide a
port activity in software quality assurance for large and complex mathematical model [2]. A systematic literature survey performed
software systems. Its goal is to improve the allocation of testing by [3] did not nd a widely applicable SDP model despite a number
resources (and consequently, costs) by identifying defect-prone of studies that aimed to nd the best performing model, gener-
software components in advance. SDP datasets are collections of ally speaking. The main reason lies in the very nature of software
measurements performed for each software component (le, class, datasets and their imbalance, complexity and properties that seem
method), and each component is represented by the number of to be dependent on the environmental conditions and application
static code attributes (like cyclomatic complexity, lines of code, domain. Class distributions are highly skewed, which is connected
number of methods, etc.) and the number of defects identied to the class imbalance, a recognized problem in standard machine
and corrected on that component. Such datasets are used to build learning approaches [4]. The problem is related to learning from a
models to predict defective components for a subsequent software class with rare representatives (the so-called minority class) that
release and to adjust the development and verication strategy are not in balance with learning from the class with the rest of
accordingly. Defects in complex software systems usually behave the dataset representatives (majority class). This scenario usually
according to the Pareto principle, meaning that the majority of results in poor minority-class accuracies and high majority-class
defects (approximately 80%) are concentrated in a small propor- accuracies, although the minority class is the main focus of inter-
tion of system components (approximately 20%). It seems that this est in SDP. A number of techniques have been proposed to address
unbalanced datasets. These solutions are not equally effective in
all environments and application domains. The taxonomy of tech-
Corresponding author.
niques has been proposed in [5] and categorized as follows: the
E-mail addresses: gmausa@riteh.hr (G. Mausa), tgalinac@riteh.hr
internal algorithm approach that accounts for the signicance of
(T. Galinac Grbac). the majority class internally within the algorithm, the external

http://dx.doi.org/10.1016/j.asoc.2017.01.050
1568-4946/ 2017 Elsevier B.V. All rights reserved.
332 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

data pre-processing approach that articially balances the dataset, nearly equidistant or perfectly discriminant and fail to differentiate
the cost-sensitive approach that introduces the different costs of between local and global optima on hard problems [17,18].
misclassication for the minority and majority classes, and the Because the SDP problem has been shown to be a complex one
ensemble of classiers that are known to increase the accuracy of with unknown tness landscape properties and a successful predic-
single classiers by combining several classiers. These techniques tion model is still being sought, this paper explores the interactions
may be divided into algorithm-level and data-level approaches, between genetic operators and approaches, such as multiple sub-
depending on the object of their improvement. The algorithm-level populations and colonization. Moreover, SDP datasets often suffer
approaches usually outperform the data-level approaches [6,7]. from high levels of class imbalance, so we also analyse the impact
One of algorithm-level approaches is the use of precisely congured of ensemble selection strategies. However, the use of MOGP for the
multi-objective classication algorithms. Multi-objective frame- purpose of classication in SDP is still rather limited. Hence, we
work of classication algorithms offers a range of solutions that pose the following research questions:
may be grouped into ensembles and that is how it improves the
generalization ability [8]. Ensemble solutions divide the problem RQ Which combination of evolutionary operators and ensemble
space into a number of subdomains and search for several local selection strategies yields the best-performing MOGP congu-
optimum solutions that are more accurate in the relevant subdo- ration used for the classication of unbalanced SDP datasets?
main [9]. A number of studies have reported ensemble algorithms
as being the most effective, especially when they are combined
We achieved good results using the standard MOGP algorithms
with resampling or boosting techniques within hybrid approaches
in SDP [20]. In this paper, we explore the potential of an additional
[10,5]. Recent ndings show that ensemble algorithms based on
migration operator used in multiple subpopulations of a species
an evolutionary approach yield the best performance when used
and a colonization operator that is used in two phenotypically
as a class imbalance learning method on raw datasets without
different co-evolutionary species. Furthermore, some studies ana-
pre-processing [11,12]. The approach is based on the use of multi-
lyzed the ensemble selection strategies and found improvements
objective genetic programming (MOGP) with the minority and
in computationally demanding strategies [12]. This paper explores
majority class accuracy as competing objectives in the learning pro-
the potential of using the convex hull approach as an ensemble
cess. In this paper, we will further investigate the performance that
selection strategy. It is an approach that is successfully used in the
can be obtained when applying MOGP to unbalanced SDP datasets.
evolution process [21] that promotes phenotypic diversity, which
The genes of an evolutionary genetic programme used for
has been shown to be better than promoting genotypic diversity
classication most often represent input variables weights and
[17].
mathematical or logical operations in a tree-based structure. One
The paper is structured as follows. Section 2 presents the back-
combination of genes represents a solution, i.e., a classication
ground behind the two migration based MOGP approaches and the
algorithm that predicts an outcome based on input variables. Evo-
ensemble selection, followed by the contributions of this paper.
lutionary algorithms (EAs) use tness functions to simultaneously
Section 3 describes the MOGP approaches that we used and Sec-
develop a set of solutions, referred to as the population, in a series
tion 4 gives further details about our empirical case study: the data,
of iteration, referred to as generations. Species are populations that
the algorithm conguration details, and the experiment scheme.
have different genotypes (set of genes) or phenotypes (tness func-
Section 5 gives our results, which we discuss in Section 6 and we
tions). The evolutionary operators that may be used to evolve the
conclude the paper with Section 7.
population through the generations change random genes (muta-
tion), mix genes using different solutions (crossover), maintain
solutions diversity and choose the solutions that are to be kept in 2. Background
the following generation (selection). The population may be divided
into subpopulations or co-evolutionary species. Such congurations There are two important factors that inuence the performance
may also use operators for mixing solutions between subpopu- of MOGP, (i) the evolutionary operators and (ii) the ensemble selec-
lations (migrations) or different species (colonization). These two tion strategy [13]. Some studies suggest that MOGP performance
operators migrate a proportion of the most t solutions after a num- can be improved by using multiple subpopulations [9] and co-
ber of generations, which are referred to as epochs. After evolution evolution [22,23] in general optimization problems, such as the
ends, a subset of diverse and best performing solutions is grouped knapsack problem and the optimization of numerical functions.
into ensembles according to the ensemble selection strategy. MOGP classiers using the migration operator was shown to
The diversity of solutions is important in a multi-objective EA be an effective method for improving the classication accuracy
to prevent genetic drift, a premature convergence and stagnation of noisy biomedical data [24]. Multiple NSGA-II subpopulations
in local optima that are manifested due to selection pressure. For that were used to search for different regions of the Pareto front
example, if a problem has two equally t solutions and our EA demonstrated its usefulness even on a six-objective knapsack
with a population size of 100 has developed 51 solutions that problem [9]. The method allows each subpopulation to exploit
aspire towards one optimum and 49 solutions that aspire towards the search space in the vicinity of its own known high-quality
the other optimum, we are increasingly likely to select solutions solutions and then interact with other high-quality solutions
that aspire towards the rst optimum with each subsequent gen- that may cover a different portion of the search space to their
eration [13]. A diversity of solutions is encouraged during the mutual benet [25]. The co-evolutionary approach suggests that
evolution of the MOGP algorithms [14,15], and several techniques the result may be improved problem-solving capabilities at lower
exist to enhance it further and improve the ability to nd multiple computational costs [26]. The approach was more enhanced
optima in the tness landscape, an area formed by tness functions by using the colonization operator [22,23]. The application of
used as objectives. The multiple subpopulation technique isolates co-evolutionary EA also exhibited both signicant improvements
groups of solutions [13], the co-evolution technique changes differ- and negative results in the ability to solve difcult optimization
ent species [16], while other niching techniques such as crowding, problems [16]. The co-evolutionary concept was successfully used
clearing, tness sharing, and speciation attempt to preserve diver- to divide the feature landscape and solve the high-dimensional
sity within one group of solutions [1719]. Techniques that work classication problem that contains up to 12,600 features in [27].
within one group of solutions often require estimates or even a pri- A co-evolutionary concept inspired by a development learning
ori knowledge of the tness landscape, assuming that all optima are theory outperformed or performed equally well as six other GA
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 333

the same optimum, a smaller number of solutions should migrate


between the subpopulations. There are also several migration
directions that can be used. The forward migration takes place
from the (n)th subpopulation only into the (n+1)th subpopulation,
while the both-ways migration takes place from the (n)th subpop-
ulation into both the (n+1)th and (n1)th subpopulation. There
are a few approaches to determine the migration directions, but
they do not seem to inuence the results signicantly. Martin et al.
[25] concluded that a migration fraction of over 25% of the sub-
population size can be disruptive. They also found that the quality
requirements for the selection of the migrants increase the occur-
rence of premature stagnation, and the random selection strategy
outperforms other approaches. A general recommendation for the
migration fraction is 10% of the subpopulation size and 20 genera-
tions for the migration interval [33].

Fig. 1. Multiple subpopulations evolutionary algorithm. 2.2. Co-evolutionary colonizations

The collaborative learning that occurs during the learning phase


congurations on six benchmark test functions [28]. They used a
may improve the effectiveness of various optimization techniques
standard MOGP algorithm that interacts with a target CoMOGP
in classication [34]. Co-evolution is a cross-species interaction
when solving instances that are categorized according to their dif-
on a shared tness landscape that is based on a predatorprey
culty. A rare example of CoMOGP usage in SDP is conducted by Mu
or hostparasite relationship. In a predatorprey relationship,
et al. [29], in which they used a co-evolutionary algorithm based on
the solutions are competing for survival, and in a hostparasite
a competitive relationship of ve releases from the NASA metrics
relationship, the solutions work together [16]. The colonization
programme. Their results show the approach to be effective when
approach based on the predatorprey relationship in [22,23] was
compared with well-known classication algorithms such as the
to eliminate the solutions with the worst tness and replace them
Naive Bayes, Random Forest and Radial Basis Function Network.
with the offspring of the ttest solutions. However, it is important
Such results encouraged us to expand this research. Ensemble
to be aware that an overly high level of elitism may lead to genetic
selection is the process of choosing the subset of nal solutions that
drift and that the weakest solutions may contain high quality genes
will be given the right to vote whose class every instance should
[35]. The expected and desirable behaviour of CoMOGPs is a con-
belong to. This process can be performed using a second GA, but
tinual improvement called an arms race. However, the potential
this is time-consuming and difcult, especially when the number
pitfall of CoMOGPs is that it may exhibit the opposite behaviour of
of potential voters is large [30]. There were some improvements
endless mediocrity when co-evolutionary populations are equally
made to this process by using exible size constraints and selection
unt [16], for which genetic drift may be the cause. A less elitist
pressure [12]. A traditional majority voting strategy is found to
selection strategy yields better results [25], and interactions with
be effective for ensemble generalization, in which the class of an
randomly chosen solutions improves the performance [36] in sim-
instance is assigned to the class for which at least 50% of ensemble
ilar tasks.
members voted and where all votes have equal weight [12].
The CoMOGP algorithm is presented in Fig. 2. Its general idea
is to have at least two species, i.e., separated populations that
2.1. Multiple subpopulation migrations interact after an epoch of generations. The interaction can be coop-
erative or competitive, and its output should provide CoMOGPs
The EA based on multiple subpopulations that operate in with additional information. The co-evolutionary EAs generally
tandem is also known as the island model EA, parallel EA and coarse- have no a priori assumptions regarding (1) sharing properties, (2)
grain parallel EA [13]. The essential idea is to compute the EAs internal evolutionary clock synchronization or (3) tness landscape
in parallel. Each subpopulation is to be run separately and, after coupling. Furthermore, the interaction operators are not standard-
an epoch, a number of solutions migrate between the popula- ized and need to be set [16]. The interaction approach in which
tions. The msMOGP algorithm is presented in Fig. 1. This algorithm a standard MOGP algorithm interacts with a target CoMOGP in a
can be adjusted using the following options: the number and size predatorprey relationship yielded good performance in [28].
of subpopulations and the migration operator (selection strategy,
migration interval, migration fraction and migration direction).
The subpopulation size  should be above a certain critical
mass level because overly small subpopulations have a statisti-
cal disadvantage over single populations. Having determined the
total population size P according to the usual recommendation
of 10 times the number of features [31], the number of sub-
populations N is determined as N = P/. The migration operator
denes the solutions that are to migrate (selection strategy), the
frequency with which the migration take place (migration inter-
val), the number of solutions that migrate (migration fraction) and
the migration direction. There are several selection strategies, such
as copy-best, pick-from-ttest-half or are random other options. The
choice of migration interval and migration fraction are somewhat
linked. Overly frequent migrations make the islands share the same
solutions, and exceedingly large intervals lead to a degradation in
performance [32,24]. To prevent an overly rapid convergence to Fig. 2. Coevolutionary algorithm.
334 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

2.3. Contributions of this paper knowledge on which genes values could yield better performance.
A general recommendation for the population size is 10 times the
The contributions of this paper are a co-evolutionary MOGP number of the independent variables [31]. However, we use the
approach based on colonization (CoMOGP), a phenotype Convex- same population size as [11] because we use the same UCI datasets.
Hull-based ensemble selection strategy (CH), and an empirical case Our population size is set to 500. This size satises the general rec-
study that was performed to analyse these approaches on a number ommendation for SDP datasets because they contain 50 software
of carefully collected SDP datasets. The CoMOGP is compared with metrics as independent variables, but it does not satisfy the same
the traditional single-population MOGP (MOGP) and the multiple recommendation for UCI datasets that contain fewer independent
subpopulation (msMOGP). The CH is compared with the traditional variables. We use the tournament selection method of size 7 selec-
Pareto Front ensemble selection (PF) and the strategy that has no tion method that randomly selects 7 solutions, and only the ttest
selection, i.e., that uses the whole nal population into the voting one gets selected. We use the single point crossover function that
process (Pop). Furthermore, the paper attempts to generalize the randomly selects a gene before which all the genes originate from
results by performing the same experiment in the domain of gen- rst parent and after which all the genes originate from the second
eral classication datasets that are obtained from the UCI repository parent. The mutation probability is set to 10% and the new value is
[37] that are also used in a related study [11,12]. selected using a uniform distribution within the genes domain. The
Our CoMOGP approach adopts the best aspects of the pre- replacement strategy uses an elitism count of 5% and a crossover frac-
viously mentioned congurations. Its colonization is based on a tion of 80%, ensuring that 5% of most elite solutions survive, 80% of
predatorprey relationship [22,23] with a less elitist selection strat- solutions in the new generation are current generations offspring
egy [25] that creates an interaction between the best solutions and and the remaining 15% are the mutated children. As for the stopping
randomly selected solutions [36]. Our ensemble selection approach criteria, there are various recommendations. There is a recommen-
is also motivated by several ndings. It is not computationally dation that GP end the search after 1000V generations, where V is
demanding like other similar strategies [30], and it should improve the number of features contained in the dataset, or if the best solu-
the performance for the usually more important minority class tion does not change after 100V generations [31]. On the other hand,
in highly unbalanced classication tasks by creating small and Bhowan et al. used 100 generations in [11] and only 30 generations
extremely diverse ensembles [12]. in [11]. Considering that our experiment is comparable to the one
performed by Bhowan, we stop the GPs after 200 generations or
3. MOGP approaches for classication if the generation does not achieve progress in 10 generations. We
purposely made the evolution period longer because the migration
Our research question seeks for the best-performing combina- and colonization operators require an additional number of gener-
tion of evolutionary operators and ensemble selection strategies ations to inuence the populations performance. Furthermore, it
in MOGP that are used for the classication of unbalanced SDP will allow us to observe the changes in the evolution process more
datasets. The MOGP congurations that we explore are enhanced thoroughly.
by migration and colonization operators, concepts that were
proven to be useful in improving the performance of standard
single-population MOGP. The ensemble strategy that we propose 3.1.2. Single population MOGP
is designed to improve the classication performance of high-level The MOGP pseudo code is given in Algorithm 1. It is a standard
tasks with a class imbalance. EA process that involves the evaluation and ranking of the cur-
rent population, a crossover of the ttest solutions to create even
3.1. MOGP congurations more t new solutions, a mutation of randomly chosen genes and a
selection process that keeps the population size stable, as explained
This paper explores six different MOGP congurations used in the previous subsection. All the following MOGP congurations
for classication. We use the following abbreviations in their have these settings in common. Hence, the following pseudo codes
names: GP1 and GP2 signify the difference in tness functions, use just the Train keyword to express this process.
MOGP presents the traditional single population MOGP, msMOGP
Algorithm 1. MOGP pseudo code.
presents the usage of multiple subpopulation conguration and
CoMOGP presents the usage of co-evolutionary conguration based
on colonization. The rst set of tness functions is majority and
minority accuracy (TPR and TNR), as was used by Bhowan et al.
[11,12], while the second set of tness functions is the same set
that is used to calculate AUC (TPR and FPR), a very important eval-
uation metric for classiers output expressed as a probability in a
binary classication [38].

3.1.1. GP general settings


All the MOGP congurations we use are based on the NGSA-
II algorithm and its implementation in Matlab 2014. NSGA-II is
the most frequently used Pareto dominance-based evolutionary
multi-objective algorithm in the literature [9]. We use the GP repre- 3.1.3. Multi subpopulation MOGP
sentation given in Eq. (1), where weight wi is a oating point number The msMOGP pseudo code is given in Algorithm 2. Its main
between [1000, 1000] and oi is a mathematical operation in the characteristic are its N separated subpopulations of size . We set
range { +, , *, % }. The protected division % changes the value of the their values so that N  is equal to the population size P in the
denominator to 1 to avoid being divided by zero. other congurations. The subpopulations are trained individually
and forward migrations occur after a number of generations, which
y = w0 o1 (w1 m1 )o2 (w2 m2 ) on (wn mn ) (1)
are dened by the Migration epoch. The migration operator elimi-
The initial population is determined randomly using a uniform nates a dened fraction of the worst solutions in one subpopulation
distribution within the domain of each gene, as we have no prior and replaces them with the same fraction of best solutions from
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 335

the other subpopulation. The best solutions that migrate are not We examine three different ensemble selection strategies: the
removed from their origin subpopulation but rather are copied. Pareto Front ensemble, the Convex Hull ensemble proposed by this
paper, and the full population ensemble. The Pareto Front ensemble
Algorithm 2. msMOGP pseudo code. strategy (PF) gives the right to vote to all the solutions that belong
on the Pareto Front of non-dominated solutions [39]. It is a tradi-
tional and often-used approach [11]. The full population ensemble
(Pop) is actually a strategy that does not involve selection. Instead,
it gives the right to vote to every solution in the nal population.
It is used as a sanity check that should prove whether the selec-
tion improves the classication performance. The convex hull of a
set X of points in the Euclidean plane is the smallest convex poly-
gon that contains all the points of X. The Convex Hull ensemble
selection (CH) is the strategy that takes a subset of the Pareto Front
members that maximize the area under the Pareto Front when cal-
culated using the trapezoid integral approximation. In other words,
Convex Hull tries to cover the optimal points for a given set of clas-
siers [40]. To the best of our knowledge, this is the rst time a CH
ensemble selection strategy has been used.

3.3. Evaluation

In this paper, we use three evaluation metrics to evaluate


the performance of MOGPs as follows: the hyperarea (Hyp) of an
3.1.4. Co-evolutionary MOGP
evolved set of solutions, the geometric mean accuracy (GM) of the
The CoMOGP pseudo code is given in Algorithm 3. The CoMOGP
ensemble classier and the area under ROC curve (AUC). GM is the
receives colonizers but does not send colonizers of its own to keep
indicator of ensemble classication performance when solving the
the other species intact. Hence, we evolve two GPs, the MOGP2,
classication task. Hyp represents how well all the solutions cover
which is the source of colonizers, and the CoMOGP1 that receives
the tness landscape, i.e., their classication potential [11,12]. AUC
them. The two GPs have different tness functions to resemble dif-
is used to depict the tradeoff between benet and cost [41,7]. Hyp
ferent species. The colonization takes place after the Colonization
is the area under the Pareto front in objective-space (TPR, TNR).
epoch of generations. At each epoch, a fraction of the best solu-
The Hyp area is calculated using trapezoidal integral approxima-
tions from MOGP2 competes for survival with randomly selected
tion; this approximation method can obtain a value within a range
solutions in CoMOGP1. The colonizers replace the same fraction of
of [0,1] and higher results represent a better set of solutions [11].
solutions, but this occurs only in those in which they dominate at
The GP algorithm indirectly tends to maximize the Hyp value in
least one tness function. The comparison of solutions is performed
the training phase, but we calculate the Hyp value in the testing
in terms of the hosts, i.e. CoMOGP1s tness functions.
phase. Some evolutionary algorithms, such as the SMS-EMOA, are
Algorithm 3. CoMOGP pseudo code. even based on Hyp (i.e., hypervolume when having more than 2
objectives) and maximize it directly during the evolution [9]. GM is
the geometric mean of the majority accuracy (TPR) and the minor-
ity accuracy (TNR). It evaluates one ensemble of solutions that can
contain the whole population, the Pareto front, or the Convex Hull
front when used on a testing set. GM can obtain values within a
range of [0,1] with higher results being better, and it is more sensi-
tive to class imbalance than the arithmetic mean accuracy [12]. On
the other hand, traditional accuracy is not a good performance indi-
cator in situations of class imbalance because it favours the majority
class [42,43]. The GP1 congurations are trained and evaluated in
the same (TPR, TNR) objective-space, which gives them an advan-
tage over GP2 congurations that are trained in a different (TPR,
FPR) objective-space. Thus, to present a fair comparison for the GP2
congurations, we named the GM as GM1 and we introduce a sim-
ilar measure, the GM2, as the geometric mean between TPR and
1-FPR. Both TPR and 1-FPR are objectives that need to be maxi-
mized, so the GM2 can also obtain values within a range of [0,1],
with higher results being better. Every ensemble yields a probabil-
3.2. Ensemble selection method ity value for each instance in the testing sample. Taking a different
probability threshold, the nal classication output may have dif-
Single versus ensemble approaches tend to nd a single opti- ferent benet (TPR) and cost (FPR) ratio. AUC is the single scalar
mum that may be too general of a solution for many complex value that indicates the trade-off between that benet and cost in
real-world problems, such as SDP. Ensembles usually perform clas- receiver-operating curve (ROC) [44]. Unlike Hyp, AUC is calculated
sication through a majority vote of all its solutions. Without the as the portion of the area of unit square under the curve (TPR, FPR)
ensemble selection, the full Pareto front or a full population set of that is obtained for each probability threshold. The worst scenarios
solutions may be included to vote in the classication task. How- are (0,0) and (1,1) and they are the starting and the ending point
ever, in tasks with high levels of class imbalance, smaller and more of ROC [41]. Hence, AUC usually obtains values within range [0.5,
diverse ensembles improve the classication performance, partic- 1], where 0.5 is equal to pure guessing and 1 is ideal classication.
ularly for the typically more important minority class [12]. Moreover, it is important to observe the misclassication errors
336 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

present in both classes. Related research in [12] noted a genetic Four benchmark datasets from the UCI Machine Learning Repos-
drift towards the majority class accuracy for UCI datasets. More itory were also used: classication of radar returns from the
solutions with strong majority class bias are present in the Pareto ionosphere (Ion), Data on cardiac Single Proton Emission Computed
Front as the evolution progresses over generations. This resulted Tomography (Spt), Predicting the Cellular Localization Sites of Pro-
in higher TPR than TNR in the nal generation for each task they teins (Yst) and Balance scale weight and distance database (Bal).
analyzed, and this is the reason we will also perform an analysis of These datasets are publicly available and used in many other studies
the evolution of TPR and TNR for SDP datasets in a two-dimensional that inspected classication of unbalanced data. These were chosen
graph. Although this paper does not analyse the misclassication because they were also used in studies by Bhowan et al. [11,12] that
error cost, it is an important aspect of performance in SDP. The cost motivated this research. Table 1 presents the quantity of data (the
of not revealing a defect is certainly different than the cost of testing number of les for SDP datasets and the number of instances for
a software module that does not contain a defect. The practitioners UCI datasets), the distribution of the two classes, and the number
may benet if they could choose the ratio of TPR and TNR that suits and types of features. For the SDP datasets that we collected, we
their needs. also report the quality of SDP data, i.e., the linking rate (LR) and the
number of bugs that were reported for each release. LR is the per-
centage of xed bugs that are present in the bug tracking repository
4. Case study and were successfully linked using our technique with commits in
the source code management repository, i.e., the les they affected
In this section, we detail the experimental setup, i.e., the data, [46]. For UCI datasets, we describe the meaning of the two classes
main setting of algorithms and validation and evaluation criteria they contain.
used in our empirical study. The experiments were carried out by
using Matlab R2014a. The experimental environment was Intel i3- 4.2. Algorithm conguration
540 @ 3.07 GHz CPU for the UCI datasets and 2 Intel Pentium
E6300 @ 2.80 GHz CPU for the SDP datasets. The idea of dividing the global population into multiple smaller
subpopulations and the idea of co-evolution have the goal of
4.1. Data enhancing solution diversity and consequently improving the over-
all performance of MOGP. By combining these two approaches, we
Seven subsequent releases of the Eclipse project Plug-in Devel- propose a co-evolutionary GP algorithm based on colonization to
opment Environment (PDE) and Java Development Tools (JDT) are evolve MOGP ensembles.
used in this study: 2.0, 2.1, 3.0, 3.1, 3.2, 3.3 and 3.4.1 We also used We introduced the two following co-evolutionary operators:
the publicly available datasets from the Apache Hadoop project.2 survival of the tter solution after random encounters (survival
Eclipse PDE3 provides a comprehensive set of tools to create, operator) and colonization of the ttest solutions (colonization
develop, test, debug, build and deploy Eclipse plug-ins, fragments, operator) between CoMOGP and MOGP of different tness func-
features, update sites and Rich Client Platform products. Eclipse tions. The evolution process is divided into epochs during which
JDT4 provides tool plug-ins that support the development of any different MOGPs are trained. The MOGP is evolved without any
Java application. It adds a Java perspective to the Eclipse Workbench inuence, the msMOGP is evolved with internal migrations after
and the number of views, editors, wizards, builders, code merging a number of generations, and the CoMOGP is evolved with external
and refactoring tools. Apache Hadoop is an open-source software colonization from the MOGP after each epoch. The external colo-
framework for distributed storage processing of large datasets. nization at the end of each epoch is the approach that uses the two
The data were collected using the BuCo Analyzer tool [45], devel- co-evolutionary operators.
oped for the purpose of systematic data collection in SDP research
[46]. It includes the Regular Expression bug-code linking technique 4.3. Experiment setup
that has been shown to be superior to other matching techniques
and some other complex linking techniques that use predictive The six MOGP congurations described in Section 3.1 are used
models on the Eclipse data [47], and it takes into account the most with each dataset presented in Section 4.1. We used the 10 times
important data collection issues [48]. The tool should help prac- repeated 10-fold cross-validation for sampling the data into train-
titioners overcome problems such as fault prediction with no or ing and testing dataset. At the same time, we maintained the class
limited fault data and provide them with a platform for uniform distribution from the original dataset. After executing the evolu-
data collection with minimum noise [49]. This type of platform tion process using the training sets, the MOGPs are evaluated in
is a response to the need for automated tools for the SDP prob- terms of Hyp and GM using the testing sets. Each step of the evo-
lem and the need to obtain more general and widely applicable lutionary process is stored so that the changes in performance can
ndings [50]. The JHawk5 and LOC6 tools are used to extract the be observed.
software metrics. A complete list of metrics is given in Table A.15. Table 2 presents the main differences in the MOGP congura-
Each dataset consists of 50 independent software product metrics tions we used, and Fig. 3 presents their synchronized evolution.
and 1 dependent variable, which is transformed into a binary value Epoch length in Fig. 3 is set to CoMOGPs epoch time of 40 gen-
class variable using the typical threshold of 1 bug. The data collec- erations. CoMOGPs epoch is twice as long as msMOGPs epoch, a
tion procedures that was used for Hadoop datasets has compatible ratio similar to the ratio of the number of solutions that migrate in
settings regarding the choice of bugs and the bug-le duplicated these two approaches. In the case of msMOGP, 20 solutions migrate
links [51]. into every subpopulation, and in the case of CoMOGP, 50 solutions
migrate into the population. Furthermore, a longer period should
give enough time for the algorithm to evolve the solutions that take
the best genes from both species.
1
http://www.seiplab.riteh.uniri.hr/%3Fpage id=834 The analysis is continued with statistical hypothesis tests.
2
www.cs.ucl.ac.uk/staff/F.Sarro/projects/hadoop/ Firstly, we plan to use the ShapiroWilk normality test for each
3
http://eclipse.org/pde/
4
http://www.eclipse.org/jdt/
group of results, obtained after 10 iterations of randomly split train-
5
http://www.virtualmachinery.com/ ing and testing sets. The descriptive statistics will contain mean and
6
http://www.locmetrics.com/ standard deviations if all the results are normally distributed, and
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 337

Table 1
Datasets class distribution.

SDP datasets

Release LR Bugs Files Non-faulty:faulty Features

PDE 2.0 22.3% 561 576 80.7%:19.3% 50 ratio


PDE 2.1 27.4% 427 761 83.7%:16.3% 50 ratio
PDE 3.0 33.6% 1041 881 68.8%:31.2% 50 ratio
PDE 3.1 51.6% 769 1108 67.8%:32.2% 50 ratio
PDE 3.2 69.2% 546 1351 53.8%:46.2% 50 ratio
PDE 3.3 85.3% 727 1713 56.3%:43.7% 50 ratio

PDE 3.4 80.9% 963 2144 71.5%:28.5% 50 ratio


JDT 2.0 48.4% 4276 2397 54.1%:45.9% 50 ratio
JDT 2.1 64.4% 1875 2743 68.1%:31.9% 50 ratio
JDT 3.0 70.9% 3385 3420 61.4%:38.6% 50 ratio
JDT 3.1 80.6% 2653 3883 67.3%:32.7% 50 ratio
JDT 3.2 84.9% 1879 2233 63.5%:36.5% 50 ratio
JDT 3.3 88.7% 1341 4821 76.2%:23.8% 50 ratio
JDT 3.4 90.0% 989 4932 81.5%:18.5% 50 ratio

Hadoop 0.1 Unknown 142 64.8%:35.2% 8 ratio


Hadoop 0.2 Unknown 192 78.1%:21.9% 8 ratio
Hadoop 0.3 Unknown 212 75.0%:25.0% 8 ratio
Hadoop 0.4 Unknown 202 79.2%:20.8% 8 ratio
Hadoop 0.5 Unknown 218 83.0%:17.0% 8 ratio
Hadoop 0.6 Unknown 235 86.8%:13.2% 8 ratio
Hadoop 0.7 Unknown 251 80.9%:19.1% 8 ratio
Hadoop 0.8 Unknown 241 93.4%:6.6% 8 ratio

UCI datasets

Name Classes Instances Non-faulty:faulty Features

Ion Good/bad 351 64.2%:35.8% 34 ratio


Spt Abnormal/normal 267 79.4%:20.6% 22 nominal
Yst Mit/nontarget 1484 83.5%:16.5% 8 nominal, ratio
Bal Balanced/unbalanced 625 92.2%:7.8% 4 interval

median otherwise. Secondly, we plan to investigate whether there We summarize the results according to the win-tie-loss procedure
are signicant differences between the groups of obtained results. If [53]. The number of wins is incremented for a given option if, in a
the groups of results come from normally distributed populations, pairwise comparison, its mean value is greater and the hypothesis
we will use the T-test, and if not, we will use the KruskalWallis that the two groups of results come from the same population is
analysis of variance [52,12]. The KruskalWallis analysis of vari- rejected. The number of losses is incremented for the option that
ance is an extension of the Wilcoxon Rank Sum test used to test was compared against the winning option. The number of ties is
more than two groups of results. Thirdly, we plan to use the mul- incremented if the results come from the same population, i.e., if
tiple comparison test to reveal the particular pairs of groups that the hypothesis is not rejected. Because ties can be computed by
are signicantly different. When performing the multiple compar- subtracting (number of wins + defeats) from the total number of
ison test, we use the Bonferonni correction (overall = 0.05), i.e., comparisons, there is no need to report them. These comparisons
adjustment to compensate for multiple comparisons. should enable us to answer our RQ. We plan to perform these anal-
The KruskalWallis statistical hypothesis and multiple compar- yses separately for SDP (PDE, JDT and Hadoop) and UCI datasets to
ison tests assess whether there is a signicant difference between reveal any inconsistency and to answer the RQ3. We hope to nd
the groups of results when performing the following pairwise com- consistently different behaviour to give a positive answer or some-
parisons (pc): what consistent results to give a negative answer. Furthermore, we
also plan to compare the evolution characteristics between the dif-
pc1 between results obtained from different MOGP congura- ferent data sources, such as the evolution duration, expressed in
tions that are trained and evaluated using the same dataset. terms of generations, which is required for a given MOGP congu-
pc2 between results obtained from different ensemble selection ration to evolve to stability. Such an analysis could reveal whether
strategies that are used for the same MOGP conguration. there are differences or a common behaviour in the evolution of

Table 2
The MOGP congurations used in our experiment.

Conguration: MOGP1 MOGP2 msMOGP1 msMOGP2 CoMOGP1 CoMOGP2

Fitness function TPR, TNR AUC TPR, TNR AUC TPR, TNR AUC
Population size P = 500 N = 5,  = 100 P = 500
Max gen 200
Migration epoch After 20 generations
Migration fraction 20% Pareto Front
Colonization epoch After 40 generations
Colonization fraction 10% Pareto Front
338 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

Fig. 3. The evolution of MOGP algorithms in our experiment.

long-lasting and large software products, such as Eclipse PDE and GM1, and GP2 evaluated in GM2). The measures of dispersion were
JDT and Apache Hadoop. left out of Tables 36 to make them easier to read. However, the
range of these values, given in Appendix, proves that all algorithms
5. Results yield stable performance with limited variations. The only excep-
tion is the Bal dataset from UCI repository that exhibit less stable
Tables 36 show the median values of Hyperarea, GM1 and classication results. The AUC values for the Bal dataset are close
AUC that the three ensemble selection strategies have obtained for to 0.5, which conrms that the classication is difcult for that
SDP and UCI datasets using the 10 times repeated 10-fold cross- dataset. All the classication congurations exhibit decent (AUC >
validation. To make the tables easier to read, we highlighted the 0.6), good (AUC > 0.7) or very good performance (AUC > 0.8) for the
best performing congurations for each dataset in terms of GM remaining datasets.
and AUC. All the groups of results were not normally distributed The results show that all the MOGPs evolve populations with
according to the ShapiroWilk normality test. Our rst experi- high values of Hyp. We compared the Hyp values we obtained for
ment was performed using the sampling strategy that splits the UCI datasets with those obtained by Bhowan et al. [11] and noticed
datasets randomly into 10 training and testing datasets in 2/3:1/3 they have similar values. The highest average values of GM we
ratios. Tables C.17C.19 in Appendix show the min(median)max obtained are even better than those obtained by Bhowan et al. [12]
value of Hyperarea and GM1 for three ensemble selection strate- for the Ion and Spt datasets. On the other hand, their results are
gies and mean () and standard deviation () of the training time better for the Bal and Yst (their equivalent is named Yst1). The Bal
expressed in seconds, which was obtained in the rst experiment. dataset exhibits the worst performance both in Hyp and GM when
We also performed a win-tie-loss calculation between all MOGP compared to all other datasets we have analysed. The reason might
congurations for each of the evaluation metrics separately. The be a very small number of only 4 independent features.
GP1 congurations were compared against each other in their (TPR, We performed the pc1 analysis in which the GP1 congura-
TNR) domain and the GP2 congurations were compared in their tions (MOGP1, msMOGP1 and CoMOGP1) are evaluated in terms
(TPR, 1-FPR) domain. The GP1 and GP2 congurations were not of Hyp and GM1 and the GP2 congurations (MOGP2, msMOGP2
compared against each other in one domain because each domain and CoMOGP2) are evaluated in terms of Hyp, GM2 and AUC. The
would favour its congurations. The following analyses evaluated results of this analysis are given in Tables 710, respectively, of
MOGP congurations in their training domain (GP1 evaluated in the datasets they analyzed. The results in Tables 79 are based
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 339

Table 3
The median values of Hyperarea, GM1 and AUC for PDE datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH AUC Pop AUC PF AUC CH

PDE 2.0 MOGP1 0.89 0.79 0.81 0.80 0.78 0.74 0.80
MOGP2 0.90 0.77 0.70 0.78 0.77 0.80 0.77
msMOGP1 0.89 0.80 0.83 0.81 0.80 0.79 0.80
msMOGP2 0.89 0.75 0.73 0.76 0.74 0.75 0.75
CoMOGP1 0.88 0.79 0.82 0.81 0.79 0.79 0.81
CoMOGP2 0.90 0.77 0.81 0.79 0.77 0.80 0.79

PDE 2.1 MOGP1 0.84 0.76 0.76 0.76 0.73 0.72 0.75
MOGP2 0.85 0.73 0.76 0.75 0.72 0.75 0.75
msMOGP1 0.84 0.76 0.76 0.76 0.77 0.77 0.75
msMOGP2 0.85 0.70 0.78 0.70 0.70 0.71 0.69
CoMOGP1 0.84 0.76 0.76 0.78 0.76 0.74 0.75
CoMOGP2 0.85 0.72 0.74 0.77 0.72 0.75 0.77

PDE 3.0 MOGP1 0.86 0.74 0.74 0.74 0.74 0.73 0.72
MOGP2 0.85 0.73 0.76 0.72 0.69 0.71 0.69
msMOGP1 0.82 0.70 0.73 0.74 0.72 0.72 0.69
msMOGP2 0.81 0.72 0.70 0.71 0.65 0.68 0.60
CoMOGP1 0.84 0.75 0.74 0.75 0.74 0.74 0.74
CoMOGP2 0.83 0.70 0.73 0.74 0.70 0.73 0.71

PDE 3.1 MOGP1 0.86 0.74 0.73 0.77 0.72 0.72 0.71
MOGP2 0.87 0.70 0.70 0.74 0.70 0.71 0.68
msMOGP1 0.84 0.71 0.71 0.75 0.71 0.71 0.72
msMOGP2 0.83 0.69 0.69 0.70 0.65 0.68 0.58
CoMOGP1 0.88 0.72 0.73 0.75 0.73 0.73 0.73
CoMOGP2 0.87 0.69 0.71 0.72 0.69 0.70 0.70

PDE 3.2 MOGP1 0.84 0.72 0.73 0.72 0.72 0.72 0.72
MOGP2 0.85 0.70 0.71 0.61 0.70 0.71 0.62
msMOGP1 0.86 0.70 0.70 0.73 0.72 0.72 0.73
msMOGP2 0.87 0.66 0.70 0.63 0.66 0.69 0.62
CoMOGP1 0.85 0.71 0.73 0.76 0.73 0.74 0.73
CoMOGP2 0.86 0.71 0.71 0.67 0.70 0.71 0.66

PDE 3.3 MOGP1 0.83 0.73 0.75 0.75 0.73 0.73 0.73
MOGP2 0.82 0.70 0.71 0.61 0.69 0.70 0.62
msMOGP1 0.84 0.68 0.72 0.76 0.71 0.71 0.70
msMOGP2 0.83 0.69 0.71 0.61 0.67 0.70 0.62
CoMOGP1 0.83 0.73 0.74 0.77 0.73 0.73 0.74
CoMOGP2 0.84 0.70 0.71 0.63 0.70 0.71 0.64

PDE 3.4 MOGP1 0.79 0.70 0.70 0.71 0.69 0.70 0.69
MOGP2 0.79 0.66 0.69 0.59 0.66 0.67 0.58
msMOGP1 0.80 0.68 0.68 0.72 0.68 0.68 0.68
msMOGP2 0.81 0.64 0.41 0.60 0.65 0.68 0.59
CoMOGP1 0.78 0.71 0.70 0.73 0.69 0.69 0.69
CoMOGP2 0.79 0.67 0.68 0.62 0.67 0.68 0.62

on 7 SDP datasets, and the results in Table 10 are based on 4 UCI the GP1 congurations are evaluated in terms of GM1, and the GP2
datasets. Therefore, a MOGP conguration can obtain a maximum congurations are evaluated in terms of GM2. This analysis does
number of 14 wins and losses for one evaluation metric and all SDP not make an evaluation in terms of Hyp because it is not depen-
datasets and 8 wins and losses for one evaluation metric and all UCI dent on the ensemble strategy. The results given in Tables 1113
datasets. The total sum of wins and losses is given in the nal row. are based on 7 SDP datasets, and the results given in Table 14 are
The number of wins indicates that the CoMOGP is the best per- based on 4 UCI datasets. Again, that is the reason for the different
forming conguration for SDP datasets from Eclipse community, maximum number of win and losses for each MOGP congurations
achieving 16 wins and only 1 loss for PDE releases and 38 wins in these two tables: 14 for SDP datasets and 8 for UCI datasets. The
and 2 losses for JDT releases. The msMOGP conguration exhib- total number of wins and losses for the ensemble selection strategy,
ited the best performed for the UCI datasets, achieving 10 wins and given in the nal row, is 6 times greater because there are 6 different
8 losses, and the worst performance for all SDP datasets, achiev- MOGP congurations we examine. The motivation for using CH as
ing 22 losses for PDE, 61 losses for JDT and 36 losses for Hadoop ensemble selection strategy was to obtain small and diverse ensem-
datasets. The MOGP conguration achieved the highest number of bles because of assumption that such ensembles should improve
wins for Hadoop datasets and it was the second best performing the performance in the minority accuracy, when majority accu-
conguration in all SDP datasets. It is interesting to notice that, with racy is the other objective. The results conrm this assumption
the exception of JDT releases, no algorithm performed signicantly only for some datasets. The majority of datasets revealed that the
different from the others in terms of Hyp. This is the rst indica- ensemble selection based on the whole nal population is more
tion that Hyp may not be the most reliable or the most indicative often the best choice, although the number of wins is too small
evaluation metric of a classiers performance. to build stronger condence. The CH ensemble selection strategy
Tables 1114 show the number of wins and losses after perform- exhibited both positive and negative results. It achieved the highest
ing the pc2 analysis. The number of wins and losses are calculated number of wins both for PDE and Hadoop projects. At the same, it
for each ensemble selection strategy using the same MOGP congu- may be considered the worst performing conguration for the PDE
ration and the same datasets over 10 runs. As in a previous analysis, and JDT projects due to the great number of losses. The traditional
340 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

Table 4
The median values of Hyperarea, GM1 and AUC for JDT datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH AUC Pop AUC PF AUC CH

JDT 2.0 MOGP1 0.79 0.70 0.69 0.68 0.78 0.77 0.77
MOGP2 0.78 0.68 0.68 0.60 0.76 0.76 0.73
msMOGP1 0.78 0.68 0.68 0.68 0.76 0.76 0.76
msMOGP2 0.77 0.66 0.68 0.61 0.75 0.75 0.71
CoMOGP1 0.79 0.69 0.70 0.68 0.78 0.77 0.77
CoMOGP2 0.79 0.68 0.68 0.60 0.76 0.77 0.73

JDT 2.1 MOGP1 0.84 0.70 0.70 0.73 0.82 0.82 0.82
MOGP2 0.83 0.68 0.69 0.60 0.81 0.81 0.78
msMOGP1 0.83 0.73 0.70 0.75 0.81 0.81 0.81
msMOGP2 0.81 0.67 0.70 0.61 0.80 0.80 0.74
CoMOGP1 0.85 0.74 0.74 0.73 0.82 0.82 0.83
CoMOGP2 0.83 0.68 0.69 0.60 0.81 0.82 0.79

JDT 3.0 MOGP1 0.83 0.73 0.74 0.75 0.82 0.82 0.82
MOGP2 0.83 0.71 0.72 0.65 0.81 0.81 0.77
msMOGP1 0.83 0.73 0.73 0.76 0.81 0.81 0.81
msMOGP2 0.82 0.69 0.72 0.68 0.80 0.81 0.75
CoMOGP1 0.84 0.73 0.75 0.76 0.83 0.82 0.82
CoMOGP2 0.83 0.71 0.72 0.69 0.81 0.83 0.78

JDT 3.1 MOGP1 0.83 0.73 0.72 0.72 0.82 0.81 0.81
MOGP2 0.82 0.70 0.71 0.60 0.81 0.81 0.77
msMOGP1 0.82 0.72 0.71 0.73 0.80 0.81 0.81
msMOGP2 0.81 0.68 0.71 0.62 0.80 0.80 0.74
CoMOGP1 0.83 0.73 0.73 0.68 0.81 0.81 0.83
CoMOGP2 0.83 0.70 0.71 0.60 0.81 0.81 0.78

JDT 3.2 MOGP1 0.83 0.73 0.73 0.66 0.81 0.81 0.81
MOGP2 0.83 0.71 0.72 0.65 0.80 0.80 0.77
msMOGP1 0.82 0.72 0.68 0.72 0.80 0.80 0.80
msMOGP2 0.81 0.70 0.72 0.67 0.79 0.80 0.75
CoMOGP1 0.83 0.72 0.72 0.70 0.81 0.82 0.81
CoMOGP2 0.83 0.72 0.72 0.68 0.80 0.81 0.77

JDT 3.3 MOGP1 0.83 0.74 0.73 0.72 0.81 0.81 0.81
MOGP2 0.82 0.71 0.71 0.63 0.81 0.81 0.77
msMOGP1 0.82 0.73 0.70 0.74 0.81 0.81 0.81
msMOGP2 0.81 0.69 0.71 0.65 0.80 0.80 0.75
CoMOGP1 0.83 0.74 0.73 0.75 0.83 0.81 0.81
CoMOGP2 0.82 0.71 0.71 0.70 0.81 0.81 0.77

JDT 3.4 MOGP1 0.80 0.71 0.71 0.71 0.79 0.79 0.79
MOGP2 0.80 0.70 0.70 0.68 0.79 0.79 0.74
msMOGP1 0.80 0.72 0.69 0.73 0.79 0.78 0.78
msMOGP2 0.79 0.69 0.70 0.65 0.78 0.78 0.73
CoMOGP1 0.80 0.74 0.75 0.73 0.79 0.79 0.80
CoMOGP2 0.80 0.70 0.70 0.70 0.79 0.79 0.75

PF ensemble selection strategy outperformed CH for PDE releases long evolution for all PDE datasets together. For all gures, the min
achieving fewer losses and performed equally good for UCI datasets. and max values are presented with thin dotted line and the median
Unexpectedly, the best performance for PDE, JDT and UCI datasets is values are explained in legends.
achieved by ensemble without a selection strategy (Pop), where the All the congurations exhibit a genetic drift towards the major-
whole nal population is voting. It achieved the lowest number of ity class regardless of the ensemble selection strategy. However, the
losses and the highest difference between wins and losses for PDE difference between minority and majority accuracy is much lower
and UCI datasets. It also signicantly outperformed the remaining for GP1 congurations than GP2 congurations. The GP2 congu-
ensemble selection strategies for the JDT datasets with 31 wins and rations used minority accuracy as one objective, but they always
0 losses. achieve lower values of minority accuracy than GP1. This conrms
Finally, we report the minority and majority accuracy (TPR and that the usage of minority and majority accuracy as conicting
TNR) of all the MOGP congurations. We do not report it for all the objectives is very important for the classication of unbalanced
ensemble selection strategies and for all the datasets due to space datasets. The difference between min and max values also shows
it would require. We selected the PDE and JDT datasets because the that ensembles of greater size (CH having the smallest and Pop
proposed CoMOGP conguration performed best for these datasets having the greatest) exhibit smaller variations in performance. The
and there were the highest numbers of wins and losses, i.e. the evolution of TPR and TNR shows that the majority class bias is
most signicant differences between different ensemble strategies. becoming greater with the evolutionary progress for GP2 but not
Figs. 4 and 5 show the min, max and median values of TPR and GP1 congurations.
TNR for the most successful ensemble selection strategies of PDE
datasets (Pop and PF). Figs. 6 and 7 show the min, max and median 6. Discussion
values of TPR and TNR for the worst performing ensemble selection
strategies of JDT datasets (PF and CH). Fig. 8 presents the median, The research presented in this paper is conducted in accordance
min and max values of TPR and TNR during the 200 generations with the guidelines given by Catal and Diri [54]: the datasets are
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 341

Table 5
The median values of Hyperarea, GM1 and AUC for Hadoop datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH AUC Pop AUC PF AUC CH

Hadoop 0.1 MOGP1 0.90 0.73 0.68 0.67 0.81 0.80 0.77
MOGP2 0.91 0.66 0.66 0.66 0.80 0.81 0.81
msMOGP1 0.88 0.68 0.73 0.66 0.79 0.80 0.78
msMOGP2 0.90 0.64 0.65 0.63 0.77 0.79 0.76
CoMOGP1 0.90 0.70 0.68 0.65 0.81 0.82 0.76
CoMOGP2 0.91 0.65 0.66 0.67 0.78 0.80 0.81

Hadoop 0.2 MOGP1 0.86 0.60 0.61 0.50 0.68 0.69 0.65
MOGP2 0.86 0.60 0.59 0.61 0.68 0.70 0.68
msMOGP1 0.84 0.60 0.60 0.56 0.65 0.66 0.63
msMOGP2 0.84 0.48 0.47 0.57 0.63 0.61 0.60
CoMOGP1 0.85 0.58 0.56 0.48 0.67 0.69 0.62
CoMOGP2 0.86 0.54 0.60 0.61 0.67 0.68 0.66

Hadoop 0.3 MOGP1 0.84 0.59 0.57 0.50 0.68 0.69 0.61
MOGP2 0.85 0.57 0.59 0.59 0.69 0.70 0.67
msMOGP1 0.83 0.56 0.57 0.49 0.64 0.66 0.62
msMOGP2 0.84 0.49 0.48 0.53 0.64 0.65 0.60
CoMOGP1 0.84 0.58 0.57 0.48 0.67 0.70 0.63
CoMOGP2 0.84 0.54 0.58 0.60 0.66 0.68 0.64

Hadoop 0.4 MOGP1 0.86 0.56 0.50 0.46 0.68 0.69 0.64
MOGP2 0.86 0.50 0.53 0.61 0.69 0.70 0.70
msMOGP1 0.84 0.58 0.59 0.47 0.64 0.65 0.63
msMOGP2 0.85 0.48 0.48 0.58 0.64 0.65 0.62
CoMOGP1 0.85 0.56 0.50 0.47 0.67 0.71 0.64
CoMOGP2 0.86 0.48 0.52 0.60 0.66 0.68 0.67

Hadoop 0.5 MOGP1 0.75 0.57 0.52 0.44 0.64 0.67 0.59
MOGP2 0.76 0.53 0.53 0.57 0.63 0.64 0.64
msMOGP1 0.74 0.53 0.54 0.46 0.60 0.61 0.58
msMOGP2 0.75 0.43 0.42 0.50 0.59 0.60 0.57
CoMOGP1 0.75 0.54 0.53 0.43 0.63 0.64 0.60
CoMOGP2 0.75 0.48 0.52 0.55 0.64 0.66 0.63

Hadoop 0.6 MOGP1 0.85 0.57 0.59 0.58 0.72 0.71 0.67
MOGP2 0.86 0.58 0.59 0.64 0.71 0.71 0.69
msMOGP1 0.84 0.55 0.62 0.60 0.68 0.70 0.67
msMOGP2 0.84 0.49 0.44 0.57 0.66 0.65 0.63
CoMOGP1 0.85 0.59 0.58 0.59 0.70 0.71 0.68
CoMOGP2 0.85 0.57 0.56 0.62 0.71 0.72 0.69

Hadoop 0.7 MOGP1 0.75 0.57 0.56 0.57 0.67 0.67 0.64
MOGP2 0.76 0.58 0.56 0.59 0.68 0.69 0.68
msMOGP1 0.75 0.55 0.56 0.55 0.65 0.67 0.64
msMOGP2 0.75 0.49 0.43 0.53 0.61 0.65 0.60
CoMOGP1 0.75 0.59 0.58 0.57 0.66 0.71 0.64
CoMOGP2 0.76 0.57 0.56 0.59 0.67 0.72 0.65

publicly available, they contain class-level software metrics and the other hand, (iii) the co-evolutionary approach with the coloniza-
prediction model is based on machine learning. This paper analysed tion operator did improve the performance of MOGP in SDP as it did
a few aspects of the usage of MOGP for the classication of unbal- in general optimization problems such as in [22,23]. The migration
anced data in SDP. We also used several unbalanced datasets from operator and co-evolutionary approach did not reduce the compu-
the UCI repository to make the results more general. The results tational costs, as suggested by [25,13,26]. This is mainly because
that we have obtained for UCI datasets are similar to the results the algorithm implementation that we had used was not prepared
that are obtained by [11,12]. We believe that this proves that our for parallel execution. We did not nd (iv) very large differences
MOGP congurations are implemented correctly and that it makes between TPR and TNR when using ensembles of larger sizes in SDP,
our results more comparable and our conclusions more reliable. In like Bhowan et al. did when they used the PF ensemble selection for
this chapter, we compare our ndings with related research and some UCI datasets [12]. Moreover, (v) the majority class bias does
comment on some new observations. not become greater with the evolution process of PF ensembles of
We conrmed the ndings from [11] that (i) the multiple con- GP1 congurations in SDP datasets, as was the case in [12]. Finally,
icting objective used in MOGP may outperform other well-known we conrmed that (vi) the ensembles of smaller size and greater
soft computing algorithms. This was not our primary concern, so diversity may improve the performance of MOGP in TPR and TNR
the results are only briey mentioned in Appendix B. The power for the classication of unbalanced data. This nding was suggested
of using minority and majority accuracy as objectives is noticed in [12] and it motivated our CH ensemble selection strategy, which
in their simultaneous high performance, which is important for eventually achieved the only wins in GP1 congurations.
SDP and classication with unbalanced data in general. Our results Furthermore, we also made several new observations. We notice
indicate that (ii) the usage of a migration operator in multiple sub- (i) a degradation of performance for all the MOGP congurations
populations did not bring improvements to the performance in with the evolution of the PDE project. With each following release,
SDP as it did in other domains, such as biomedical data [24] or the average value of Hyp is decreased, regardless of data imbalance.
general optimization problems [9]. However, it did perform best We do not have a similar trend for JDT and we cannot see such a
in general classication datasets from the UCI repository. On the trend for UCI datasets as they do not represent an evolution. The
342 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

Table 6
The median values of Hyperarea, GM1 and AUC for UCI datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH AUC Pop AUC PF AUC CH

Ion MOGP1 0.95 0.84 0.66 0.86 0.84 0.83 0.81


MOGP2 0.96 0.85 0.77 0.77 0.84 0.83 0.81
msMOGP1 0.94 0.86 0.86 0.84 0.83 0.83 0.82
msMOGP2 0.95 0.85 0.86 0.78 0.84 0.86 0.80
CoMOGP1 0.95 0.84 0.76 0.86 0.84 0.83 0.82
CoMOGP2 0.96 0.85 0.78 0.76 0.84 0.84 0.81

Spt MOGP1 0.90 0.73 0.73 0.70 0.77 0.77 0.74


MOGP2 0.90 0.70 0.71 0.69 0.76 0.76 0.73
msMOGP1 0.89 0.74 0.74 0.71 0.76 0.76 0.75
msMOGP2 0.90 0.70 0.70 0.68 0.74 0.75 0.72
CoMOGP1 0.90 0.71 0.73 0.70 0.79 0.78 0.74
CoMOGP2 0.91 0.59 0.70 0.68 0.75 0.76 0.73

Yst MOGP1 0.98 0.87 0.88 0.88 0.87 0.87 0.87


MOGP2 0.98 0.82 0.83 0.79 0.85 0.86 0.86
msMOGP1 0.98 0.83 0.80 0.87 0.87 0.87 0.87
msMOGP2 0.98 0.82 0.82 0.79 0.87 0.86 0.86
CoMOGP1 0.98 0.86 0.88 0.87 0.87 0.87 0.87
CoMOGP2 0.98 0.82 0.83 0.79 0.88 0.87 0.88

Bal MOGP1 0.75 0.57 0.59 0.66 0.48 0.50 0.50


MOGP2 0.75 0.67 0.00 0.62 0.50 0.50 0.50
msMOGP1 0.75 0.63 0.63 0.62 0.48 0.50 0.50
msMOGP2 0.74 0.51 0.48 0.59 0.50 0.50 0.49
CoMOGP1 0.77 0.55 0.48 0.70 0.49 0.50 0.53
CoMOGP2 0.73 0.66 0.64 0.67 0.50 0.50 0.50

Table 7
The MOGP congurations winloss count for each tness landscape used for PDE datasets.

MOGP msMOGP CoMOGP MOGP msMOGP CoMOGP

GP1: Wins Losses


Hyp 0 0 0 0 0 0
GM1 pop 1 0 3 0 4 0
GM1 PF 1 0 1 0 2 0
GM1 CH 0 0 0 0 0 0
AUC pop 1 2 2 2 3 0
AUC PF 1 2 2 3 2 0
AUC CH 1 0 1 0 2 0

GP2: Wins Losses


Hyp 0 0 0 0 0 0
GM2 pop 0 0 1 0 1 0
GM2 PF 0 2 1 1 1 1
GM2 CH 0 0 0 0 0 0
AUC pop 1 2 2 2 3 0
AUC PF 1 2 2 3 2 0
AUC CH 1 0 1 0 2 0

Total: 8 10 16 11 22 1

Table 8
The MOGP congurations winloss count for each tness landscape used for JDT datasets.

MOGP msMOGP CoMOGP MOGP msMOGP CoMOGP

GP1: Wins Losses


Hyp 1 0 20 3 0
GM1 pop 0 0 00 0 0
GM1 PF 0 0 00 0 0
GM1 CH 0 0 20 2 0
AUC pop 0 0 00 0 0
AUC PF 0 0 00 0 0
AUC CH 0 0 0 0 0 0

GP2: Wins Losses


Hyp 7 0 7 0 14 0
GM2 pop 0 0 0 0 0 0
GM2 PF 5 0 6 0 11 0
GM2 CH 0 4 0 2 0 2
AUC pop 3 0 6 0 9 0
AUC PF 2 0 7 0 9 0
AUC CH 6 0 8 1 13 0

Total: 24 4 38 3 61 2
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 343

Table 9
The MOGP congurations winloss count for each tness landscape used for Hadoop datasets.

MOGP msMOGP CoMOGP MOGP msMOGP CoMOGP

GP1: Wins Losses


Hyp 0 0 0 0 0 0
GM1 pop 4 0 3 0 7 0
GM1 PF 7 0 5 0 12 0
GM1 CH 0 0 0 0 0 0
AUC pop 0 0 0 0 0 0
AUC PF 0 0 0 0 0 0
AUC CH 0 0 0 0 0 0

GP2: Wins Losses


Hyp 0 0 0 0 0 0
GM2 pop 1 0 0 0 1 0
GM2 PF 1 0 2 0 3 0
GM2 CH 2 0 0 0 2 0
AUC pop 2 0 0 0 2 0
AUC PF 3 0 2 0 5 0
AUC CH 3 0 1 0 4 0

Total: 23 0 13 0 36 0

Table 10
The MOGP congurations winloss count for each tness landscape used for UCI datasets.

MOGP msMOGP CoMOGP MOGP msMOGP CoMOGP

GP1: Wins Losses


Hyp 0 0 0 0 0 0
GM1 pop 1 0 1 0 2 0
GM1 PF 1 2 1 1 2 1
GM1 CH 0 1 1 1 1 0
AUC pop 0 0 0 0 0 0
AUC PF 1 0 1 0 2 0
AUC CH 0 1 1 1 1 0

GP2: Wins Losses


Hyp 0 0 0 0 0 0
GM2 pop 0 2 0 1 0 1
GM2 PF 0 4 0 2 0 2
GM2 CH 0 0 0 0 0 0
AUC pop 0 0 0 0 0 0
AUC PF 0 0 0 0 0 0
AUC CH 0 0 0 0 0 0

Total: 3 10 5 6 8 4

Table 11
The ensemble selection strategies winloss count for each MOGP congurations used for PDE datasets.

Pop PF CH Pop PF CH

GM: Wins Losses


MOGP1 1 1 4 2 2 2
msMOGP1 6 2 1 0 4 5
CoMOGP1 1 1 7 4 3 2
MOGP2 1 2 0 0 0 3
msMOGP2 1 1 4 2 2 2
CoMOGP2 4 3 0 0 0 7

Total: 14 10 16 8 11 21

Table 12
The ensemble selection strategies winloss count for each MOGP congurations used for JDT datasets.

Pop PF CH Pop PF CH

GM: Wins Losses


MOGP1 0 0 0 0 0 0
msMOGP1 7 4 0 0 0 11
CoMOGP1 2 1 0 0 0 3
MOGP2 12 1 1 0 7 7
msMOGP2 2 1 0 0 0 3
CoMOGP2 8 4 0 0 1 11

Total: 31 11 1 0 8 35
344 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

Table 13
The ensemble selection strategies winloss count for each MOGP congurations used for Hadoop datasets.

Pop PF CH Pop PF CH

GM: Wins Losses


MOGP1 1 2 0 0 0 3
msMOGP1 0 0 3 1 2 0
CoMOGP1 0 0 0 0 0 0
MOGP2 0 0 7 3 4 0
msMOGP2 1 3 0 0 0 4
CoMOGP2 0 0 3 2 1 0

Total: 2 5 13 6 7 7

Table 14
The ensemble selection strategies winloss count for each MOGP congurations used for UCI datasets.

Pop PF CH Pop PF CH

GM: Wins Losses


MOGP1 0 0 1 0 1 0
msMOGP1 1 1 0 0 1 1
CoMOGP1 0 0 0 0 0 0
MOGP2 1 1 0 0 0 2
msMOGP2 0 0 2 1 1 0
CoMOGP2 1 1 0 0 1 1

Total: 3 3 3 1 4 4

PDE and JDT datasets represent the evolution of a project through results (iii) arouse certain suspicions regarding the benets of using
a number of releases, each with different publishing year. We also an ensemble selection for GP congurations that we analysed. The
noticed that (ii) a higher data imbalance does not necessarily lead number of signicantly different results between the three ensem-
to lower Hyp or GM values in UCI datasets, where the Yst dataset ble selection techniques is unexpectedly low and, in the case of GP2
is a good example. This might be an indication that the MOGP is congurations, the Pop strategy that uses no selection yields the
not strongly inuenced by the level of data imbalance. Finally, our best performance. This might indicate that (iv) the computationally

Fig. 4. TPR and TNR values for population ensembles in PDE datasets.
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 345

Fig. 5. TPR and TNR values for Pareto Front ensembles in PDE datasets.

Fig. 6. TPR and TNR values for Pareto Front ensembles in JDT datasets.
346 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

Fig. 7. TPR and TNR values for Convex Hul ensembles in JDT datasets.

Fig. 8. TPR and TNR values for PF ensembles during the evolution in PDE datasets.
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 347

demanding ensemble selection techniques are indeed required to Appendix A. SDP dataset features
see signicant improvements, as in [30,12].
Table A.15 presents the independent variables, i.e., features that
6.1. Threats to validity we collected for each SDP dataset that comes from PDE and JDT
projects. For the purpose of classication, we omitted only the le
Empirical case studies are subject to threats to validity [55]. Lack descriptions and the name of the superclass (SUPER) because they
of industrial data is a known problem in software defect prediction
and a threat to construct validity. However, we have collected data
from large and complex open source projects with long evolution Table A.15
Source code metrics in our analysis.
that resemble industrial projects. Conclusion validity is threatened
if the results cannot be repeated. In our case, both the data and # Metric Description
the MOGP congurations are available and the experiment is easily File description
repeated. Furthermore, we have used the same datasets, similar GP 1 File Path Unique identier of each analyzed public class in the les we
examined
congurations, and equal evaluation metrics that were also used 2 Product The name of the analyzed open source project
by other recent studies. Internal validity may be threatened by the 3 Release The release of the analyzed open source project
use of the specic programming environment and by the specic LOC metrics
implementation of the MOGP, like the one we have used. Further- 4 LOC* Total lines of code
more, the causality of our conclusions may be inuenced by the 5 SLOC-P Physical executable source lines of code
6 SLOC-L Logical source lines of code
certain underlaying factors that may be present in open source 7 MVG McCabe VG complexity
community that we are not aware of in the data collection pro- 8 BLOC Blank lines of code
cess. To minimize the impact of this threat, we collected the SDP 9 C&SLOC Lines with both code and comments
10 CLOC Comment lines of code
datasets according to the contemporary best performing data col- 11 CWORD Number of comment words
lection procedure. The generalization of our conclusions, i.e. the 12 HCLOC Header comments
external validity is limited to the projects and open source commu- 13 HCWORD Header words

nities that we have used in this case study. Expanding the research Jhawk
on larger number of datasets would improve the generalization of 14 AVCC Average Cyclomatic Complexity of all the methods in the class
15 CCML Total number of comment lines in the class
the results and conclusions. However, the generalization is some- 16 CCOM Total Number of Comments in the class
what improved by the fact that we have also used several UCI 17 CBO* Coupling Between Objects
datasets for general classication purpose. 18 COH Cohesion
19 DIT* Depth of inheritance tree for this class
20 EXT Number of external method calls made from the class
7. Conclusion 21 FIN Fan In (or Afferent Coupling)
22 FOUT Fan Out (or Efferent Coupling)
23 HBUG Cumulative Halstead Bugs of all the methods in the class
Our RQ is in search of the best-performing combination of evolu-
24 HEFF Cumulative Halstead Effort of all the components in the class
tionary operators and ensemble selection strategies in MOGP when 25 HIER Number of methods called that are dened in the hierarchy of the
used for the classication of unbalanced SDP datasets. The summed class.
number of wins and losses are too small to make clear and general 26 HLTH Cumulative Halstead Length of all the components in the class
27 HVOL Cumulative Halstead Volume of all the components in the class
conclusions. However, our empirical case study showed that the co- 28 INST Number of instance variables (or attributes) dened in this class
evolutionary MOGP approach based on the colonization operator 29 INTR Number of interfaces implemented by this class
outperformed the other examined approaches, the single popula- 30 MOD Number of modiers (public, protected etc dened for this class)
31 LCOM* Lack of Cohesion of Methods
tion and the multiple population MOGP congurations in the case 32 LCOM2 LCOM by keeping a count of the number of method pairs that share
of the SDP datasets. It also showed that instead of searching for the instance variable references
best tness functions in various MOGP congurations, it may be 33 LMC Number of Local method calls i.e. calls to methods that are dened
in this class
more useful to combine them in a coevolutionary approach. How- 34 MAXCC Maximum Cyclomatic Complexity of any method in the class
ever, the results did not give a clear answer regarding the choice of 35 MI Maintainability Index a complex calculation involving a number
ensemble selection. In the case of MOGP congurations that tend of different metrics
36 MINC Maintainability index as MI above but the calculation does not
to optimize minority and majority accuracy simultaneously, small
include the comment Part
and diverse ensembles tend to improve the results. A simple tech- 37 MPC Message Passing Coupling the number of external methods called
nique such as Convex Hull makes such improvement only in Eclipse by all the methods in the class.
38 NAME Name of class
PDE and Apache Hadoop datasets. On the other hand, the ensemble
39 NCO Number of commands number of methods in the class that do
conguration that used the whole population for voting exhibited not return a value
very good performance, especially for the JDT datasets. In summary, 40 NLOC Total number of Lines of Code in the class class level plus for each
the proposed co-evolutionary MOGPs based on colonization were of the methods dened in it
41 NOMT* Number of methods in class
shown to be a promising solution for performing classication tasks 42 NOS Total Number of Java Statements in class
with unbalanced data. The ensemble selection strategy based on the 43 NQU Number of queries (number of methods in the class that return a
convex hull, on the other hand, exhibited both positive and negative value)
44 NSUP Number of superclasses to this class (including the Object class)
results in the attempt to improve the classication performance. 45 NSUB* Number of subclasses of this class.
This empirical case study arouse a question of doubt regarding the 46 PACK Number of packages imported by this class
benets of using the ensemble selection strategy without exploring 47 R-R Reuse ratio
48 S-R Specialization Ratio
the strategy that uses the whole population in the nal ensemble 49 RFC* Response for class
of MOGP classiers. 50 SIX Specialization Index measures the extent to which subclasses
override
51 SUPER Name of Superclass
Acknowledgements
52 TCC Total Cyclomatic Complexity of all the methods in the class
53 UWCS Unweighted class size
This work has been supported in part by Croatian Science
Fault status
Foundations funding of the project UIP-2014-09-7945 and by the 54 Num def Number of faults found for a le in SCMS repository
University of Rijeka Research Grant 13.09.2.2.16. 55 Status* Whether the le is F (1) or NF (0)
348 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

are not numerical features. There are 8 features and 1 dependent are done with LogReg and RotFor were executed in Weka. Results
variable that belong to the Hadoop datasets and they are marked indicate that the CoMOGP1 conguration outperforms both LogReg
with * in Table A.15. and RotFor in terms of GM in almost every case. The only exception
is the Ion dataset, where RotFor is the best performing classier
and three JDT datasets, where RotFor is better or equally good as
Appendix B. Comparing CoMOGP, logistic regression and CoMOGP1. There seams to be an insignicant difference between
rotation forest the performance of CoMOGP1 and RotFor in the case of Yst, PDE3.3
and PDE3.4 datasets.
Table B.16 presents another experiment. We compared the best
performing CoMOGP1 conguration with CH ensemble selection
against Logistic Regression (LogReg) and Rotation Forest (RotFor). Appendix C. First experiment results
LogReg is a well known and widely used statistical classication
algorithm, that exhibits reliable performance. RotFor is a novel The rst experiment was performed by randomly splitting
classication algorithm that uses principal component analysis the dataset into 10 training and testing dataset in 2/3:1/3 ratio.
upon randomly selected subsets of independent metrics to create Tables C.17C.19 present the min(median)max value of Hyper-
ensembles of decision trees. It has shown to be a very well perform- area and GM1 for three ensemble selection strategies and mean ()
ing algorithm. It outperformed both LogReg and Random Forest and standard deviation () of training time expressed in seconds
(another novel classication algorithm of excellent performance) obtained for seven subsequent releases of Eclipse PDE and JDT and
when used for SDP in our previous study [56]. The experiments that four UCI datasets over 10 runs.

Table B.16
GM () values for all examined datasets.

CoMOGP1 LogReg RotFor

PDE 2.0 0.815 (0.026) 0.662 (0.044) 0.665 (0.067)


PDE 2.1 0.748 (0.040) 0.532 (0.054) 0.545 (0.051)
PDE 3.0 0.740 (0.030) 0.624 (0.077) 0.671 (0.018)
PDE 3.1 0.728 (0.019) 0.646 (0.044) 0.699 (0.022)
PDE 3.2 0.726 (0.021) 0.577 (0.007) 0.645 (0.020)
PDE 3.3 0.720 (0.034) 0.673 (0.023) 0.716 (0.014)
PDE 3.4 0.689 (0.016) 0.592 (0.013) 0.668 (0.022)

JDT 2.0 0.699 (0.010) 0.683 (0.027) 0.735 (0.018)


JDT 2.1 0.737 (0.018) 0.662 (0.014) 0.728 (0.019)
JDT 3.0 0.731 (0.016) 0.668 (0.014) 0.731 (0.012)
JDT 3.1 0.724 (0.026) 0.598 (0.015) 0.683 (0.025)
JDT 3.2 0.727 (0.022) 0.685 (0.015) 0.743 (0.020)
JDT 3.3 0.733 (0.011) 0.489 (0.016) 0.584 (0.019)
JDT 3.4 0.719 (0.010) 0.420 (0.017) 0.460 (0.013)

Ion 0.895 (0.012) 0.813 (0.050) 0.916 (0.028)


Spt 0.807 (0.017) 0.640 (0.044) 0.648 (0.042)
Yst 0.866 (0.023) 0.822 (0.009) 0.844 (0.011)
Bal 0.703 (0.089) 0.000 (0.000) 0.000 (0.000)

Table C.17
The min(median)max of Hyp and GM1 and () of training time for PDE datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH Time

PDE 2.0 MOGP1 0.86 (0.89) 0.91 0.76 (0.79) 0.82 0.67 (0.73) 0.81 0.78 (0.80) 0.82 154.1 (7.0)
MOGP2 0.86 (0.89) 0.90 0.76 (0.77) 0.80 0.78 (0.81) 0.83 0.71 (0.78) 0.81 149.2 (9.7)
msMOGP1 0.87 (0.90) 0.93 0.79 (0.80) 0.82 0.74 (0.79) 0.83 0.72 (0.81) 0.83 148.8 (7.1)
msMOGP2 0.84 (0.86) 0.89 0.70 (0.75) 0.78 0.71 (0.76) 0.79 0.66 (0.76) 0.81 150.3 (4.4)
CoMOGP1 0.86 (0.90) 0.92 0.77 (0.79) 0.82 0.77 (0.79) 0.82 0.78 (0.81) 0.85 150.0 (8.9)
CoMOGP2 0.87 (0.90) 0.92 0.74 (0.77) 0.81 0.77 (0.80) 0.84 0.72 (0.79) 0.84 148.6 (7.2)

PDE 2.1 MOGP1 0.82 (0.85) 0.87 0.69 (0.73) 0.76 0.69 (0.71) 0.77 0.69 (0.76) 0.79 183.3 (7.9)
MOGP2 0.82 (0.85) 0.88 0.67 (0.72) 0.79 0.71 (0.75) 0.79 0.69 (0.75) 0.82 175.9 (8.9)
msMOGP1 0.83 (0.86) 0.88 0.74 (0.77) 0.81 0.73 (0.76) 0.81 0.63 (0.77) 0.80 181.6 (2.6)
msMOGP2 0.78 (0.82) 0.87 0.65 (0.70) 0.77 0.65 (0.70) 0.81 0.57 (0.70) 0.76 174.1 (9.2)
CoMOGP1 0.83 (0.87) 0.89 0.70 (0.76) 0.81 0.68 (0.74) 0.79 0.68 (0.75) 0.82 180.0 (7.1)
CoMOGP2 0.84 (0.86) 0.88 0.69 (0.72) 0.78 0.71 (0.74) 0.83 0.74 (0.77) 0.80 176.7 (8.4)

PDE 3.0 MOGP1 0.81 (0.82) 0.83 0.71 (0.74) 0.76 0.70 (0.73) 0.76 0.68 (0.72) 0.75 201.7 (3.5)
MOGP2 0.76 (0.81) 0.83 0.65 (0.69) 0.74 0.66 (0.72) 0.74 0.63 (0.68) 0.77 193.1 (9.5)
msMOGP1 0.78 (0.80) 0.84 0.70 (0.72) 0.75 0.66 (0.72) 0.76 0.65 (0.70) 0.73 187.2 (10.4)
msMOGP2 0.73 (0.77) 0.82 0.62 (0.65) 0.70 0.64 (0.68) 0.71 0.26 (0.63) 0.68 191.1 (3.6)
CoMOGP1 0.81 (0.83) 0.84 0.70 (0.74) 0.77 0.70 (0.74) 0.77 0.69 (0.74) 0.79 196.5 (4.3)
CoMOGP2 0.81 (0.83) 0.84 0.68 (0.70) 0.74 0.70 (0.73) 0.76 0.66 (0.71) 0.77 195.5 (6.3)

PDE 3.1 MOGP1 0.77 (0.80) 0.83 0.69 (0.73) 0.75 0.69 (0.72) 0.75 0.66 (0.71) 0.76 213.0 (17.8)
MOGP2 0.77 (0.80) 0.82 0.67 (0.70) 0.73 0.68 (0.70) 0.74 0.65 (0.68) 0.73 212.4 (14.0)
msMOGP1 0.75 (0.79) 0.81 0.69 (0.71) 0.73 0.68 (0.71) 0.73 0.71 (0.71) 0.74 224.3 (4.4)
msMOGP2 0.71 (0.78) 0.80 0.62 (0.66) 0.68 0.63 (0.69) 0.71 0.37 (0.63) 0.69 215.9 (9.9)
CoMOGP1 0.79 (0.81) 0.83 0.70 (0.72) 0.76 0.70 (0.72) 0.75 0.70 (0.73) 0.76 208.1 (11.4)
CoMOGP2 0.78 (0.81) 0.83 0.66 (0.69) 0.71 0.67 (0.70) 0.72 0.66 (0.70) 0.75 207.0 (11.1)
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 349

Table C.17 (Continued)

Task Conf. Hyperarea GM Pop GM PF GM CH Time

PDE 3.2 MOGP1 0.79 (0.80) 0.83 0.69 (0.72) 0.75 0.70 (0.72) 0.75 0.65 (0.72) 0.75 251.1 (19.2)
MOGP2 0.78 (0.82) 0.83 0.67 (0.70) 0.72 0.68 (0.71) 0.73 0.55 (0.61) 0.70 257.5 (8.2)
msMOGP1 0.80 (0.82) 0.83 0.70 (0.73) 0.74 0.66 (0.73) 0.75 0.69 (0.73) 0.74 253.1 (9.6)
msMOGP2 0.75 (0.78) 0.79 0.63 (0.66) 0.68 0.66 (0.70) 0.71 0.57 (0.63) 0.67 253.8 (6.0)
CoMOGP1 0.80 (0.82) 0.84 0.71 (0.74) 0.76 0.71 (0.74) 0.77 0.70 (0.73) 0.75 249.9 (13.9)
CoMOGP2 0.79 (0.82) 0.83 0.68 (0.70) 0.72 0.70 (0.71) 0.73 0.61 (0.67) 0.71 248.9 (12.2)

PDE 3.3 MOGP1 0.79 (0.82) 0.83 0.70 (0.73) 0.74 0.70 (0.73) 0.74 0.70 (0.73) 0.74 310.7 (3.3)
MOGP2 0.78 (0.81) 0.82 0.67 (0.70) 0.72 0.67 (0.71) 0.73 0.59 (0.61) 0.69 299.6 (17.8)
msMOGP1 0.78 (0.81) 0.82 0.68 (0.72) 0.75 0.66 (0.72) 0.75 0.61 (0.71) 0.74 296.1 (9.3)
msMOGP2 0.76 (0.78) 0.80 0.65 (0.67) 0.69 0.66 (0.71) 0.73 0.56 (0.61) 0.69 297.2 (6.9)
CoMOGP1 0.79 (0.82) 0.83 0.71 (0.73) 0.75 0.72 (0.73) 0.75 0.63 (0.73) 0.76 300.5 (15.4)
CoMOGP2 0.79 (0.82) 0.83 0.67 (0.70) 0.73 0.70 (0.71) 0.73 0.55 (0.63) 0.73 299.0 (17.6)

PDE 3.4 MOGP1 0.75 (0.77) 0.78 0.68 (0.69) 0.70 0.68 (0.70) 0.71 0.67 (0.69) 0.70 372.4 (13.8)
MOGP2 0.75 (0.76) 0.77 0.64 (0.66) 0.68 0.65 (0.67) 0.68 0.50 (0.59) 0.64 363.6 (18.6)
msMOGP1 0.73 (0.75) 0.77 0.67 (0.68) 0.70 0.67 (0.68) 0.71 0.67 (0.68) 0.70 351.7 (12.9)
msMOGP2 0.73 (0.74) 0.76 0.63 (0.64) 0.67 0.66 (0.67) 0.70 0.49 (0.60) 0.65 361.9 (13.8)
CoMOGP1 0.76 (0.77) 0.78 0.67 (0.70) 0.70 0.68 (0.70) 0.70 0.66 (0.69) 0.70 364.0 (10.3)
CoMOGP2 0.76 (0.77) 0.78 0.65 (0.67) 0.68 0.67 (0.68) 0.70 0.56 (0.62) 0.68 361.5 (15.7)

Table C.18
The min(median)max of Hyp and GM1 and () of training time for JDT datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH Time

JDE 2.0 MOGP1 0.74 (0.77) 0.78 0.67 (0.69) 0.71 0.68 (0.69) 0.72 0.65 (0.69) 0.72 379.6 (19.6)
MOGP2 0.73 (0.76) 0.78 0.66 (0.68) 0.70 0.66 (0.68) 0.71 0.51 (0.58) 0.66 385.4 (11.7)
msMOGP1 0.72 (0.75) 0.77 0.66 (0.68) 0.69 0.66 (0.68) 0.69 0.63 (0.68) 0.69 374.3 (13.8)
msMOGP2 0.72 (0.73) 0.75 0.62 (0.65) 0.67 0.65 (0.67) 0.68 0.50 (0.59) 0.68 378.8 (8.6)
CoMOGP1 0.75 (0.77) 0.79 0.68 (0.69) 0.72 0.68 (0.70) 0.72 0.68 (0.70) 0.71 378.5 (17.5)
CoMOGP2 0.75 (0.77) 0.78 0.67 (0.68) 0.71 0.68 (0.69) 0.71 0.53 (0.60) 0.65 379.4 (14.5)

JDE 2.1 MOGP1 0.80 (0.82) 0.82 0.72 (0.74) 0.74 0.72 (0.74) 0.75 0.70 (0.74) 0.75 429.6 (16.3)
MOGP2 0.80 (0.81) 0.83 0.66 (0.70) 0.72 0.67 (0.71) 0.72 0.58 (0.60) 0.71 428.9 (14.1)
msMOGP1 0.78 (0.80) 0.83 0.70 (0.72) 0.75 0.69 (0.72) 0.75 0.70 (0.72) 0.75 430.5 (14.0)
msMOGP2 0.75 (0.78) 0.80 0.63 (0.65) 0.67 0.67 (0.69) 0.72 0.52 (0.62) 0.66 431.8 (9.0)
CoMOGP1 0.81 (0.82) 0.84 0.73 (0.74) 0.76 0.73 (0.74) 0.76 0.70 (0.74) 0.75 426.9 (14.5)
CoMOGP2 0.81 (0.82) 0.83 0.67 (0.71) 0.73 0.67 (0.71) 0.74 0.60 (0.64) 0.71 426.6 (13.0)

JDE 3.0 MOGP1 0.78 (0.80) 0.82 0.70 (0.72) 0.74 0.69 (0.73) 0.75 0.71 (0.72) 0.75 523.2 (24.8)
MOGP2 0.78 (0.80) 0.81 0.66 (0.70) 0.73 0.67 (0.70) 0.73 0.52 (0.62) 0.66 519.8 (22.4)
msMOGP1 0.78 (0.80) 0.81 0.71 (0.72) 0.73 0.71 (0.72) 0.73 0.70 (0.72) 0.74 524.4 (9.0)
msMOGP2 0.76 (0.79) 0.81 0.66 (0.69) 0.70 0.69 (0.71) 0.72 0.56 (0.63) 0.70 521.6 (9.0)
CoMOGP1 0.80 (0.81) 0.82 0.72 (0.73) 0.75 0.72 (0.73) 0.75 0.70 (0.73) 0.75 514.6 (25.2)
CoMOGP2 0.79 (0.81) 0.82 0.66 (0.70) 0.73 0.70 (0.72) 0.74 0.55 (0.61) 0.67 516.3 (23.1)

JDE 3.1 MOGP1 0.79 (0.81) 0.82 0.70 (0.73) 0.74 0.71 (0.73) 0.74 0.65 (0.72) 0.74 615.6 (24.8)
MOGP2 0.78 (0.80) 0.81 0.65 (0.69) 0.72 0.67 (0.71) 0.73 0.52 (0.60) 0.64 599.5 (39.4)
msMOGP1 0.76 (0.79) 0.81 0.69 (0.71) 0.72 0.70 (0.71) 0.72 0.67 (0.71) 0.73 599.2 (36.3)
msMOGP2 0.75 (0.78) 0.80 0.62 (0.66) 0.70 0.63 (0.71) 0.72 0.57 (0.62) 0.67 598.6 (20.3)
CoMOGP1 0.80 (0.81) 0.82 0.72 (0.73) 0.74 0.72 (0.73) 0.75 0.65 (0.73) 0.75 604.9 (35.6)
CoMOGP2 0.80 (0.81) 0.82 0.68 (0.70) 0.72 0.69 (0.71) 0.72 0.55 (0.62) 0.67 599.4 (36.9)

JDE 3.2 MOGP1 0.79 (0.81) 0.82 0.70 (0.72) 0.73 0.70 (0.73) 0.74 0.68 (0.72) 0.74 361.3 (9.6)
MOGP2 0.78 (0.79) 0.81 0.69 (0.70) 0.73 0.69 (0.71) 0.73 0.62 (0.65) 0.67 361.5 (4.9)
msMOGP1 0.78 (0.79) 0.82 0.70 (0.72) 0.73 0.69 (0.72) 0.74 0.68 (0.71) 0.74 356.3 (8.6)
msMOGP2 0.75 (0.78) 0.80 0.67 (0.70) 0.71 0.68 (0.71) 0.74 0.61 (0.65) 0.67 359.0 (5.0)
CoMOGP1 0.80 (0.81) 0.82 0.71 (0.73) 0.74 0.71 (0.73) 0.74 0.68 (0.74) 0.75 362.2 (9.7)
CoMOGP2 0.80 (0.81) 0.82 0.71 (0.72) 0.73 0.71 (0.73) 0.74 0.62 (0.67) 0.72 360.3 (9.6)

JDE 3.3 MOGP1 0.79 (0.81) 0.82 0.70 (0.73) 0.74 0.70 (0.73) 0.74 0.67 (0.72) 0.74 724.1 (34.2)
MOGP2 0.79 (0.80) 0.82 0.67 (0.71) 0.72 0.67 (0.72) 0.72 0.53 (0.63) 0.67 705.0 (37.1)
msMOGP1 0.78 (0.79) 0.81 0.70 (0.72) 0.73 0.69 (0.72) 0.73 0.70 (0.72) 0.73 721.9 (30.9)
msMOGP2 0.76 (0.78) 0.81 0.67 (0.69) 0.71 0.68 (0.70) 0.74 0.47 (0.60) 0.70 719.9 (19.3)
CoMOGP1 0.80 (0.81) 0.82 0.72 (0.74) 0.75 0.72 (0.74) 0.75 0.71 (0.74) 0.74 707.2 (34.3)
CoMOGP2 0.80 (0.81) 0.82 0.66 (0.71) 0.73 0.70 (0.72) 0.73 0.56 (0.60) 0.65 706.0 (31.7)

JDE 3.4 MOGP1 0.76 (0.78) 0.81 0.70 (0.71) 0.73 0.70 (0.71) 0.73 0.68 (0.71) 0.72 728.8 (35.1)
MOGP2 0.76 (0.79) 0.80 0.69 (0.70) 0.71 0.69 (0.71) 0.72 0.51 (0.61) 0.68 724.5 (32.7)
msMOGP1 0.76 (0.78) 0.78 0.68 (0.71) 0.72 0.68 (0.71) 0.72 0.69 (0.70) 0.72 738.7 (20.5)
msMOGP2 0.74 (0.77) 0.79 0.64 (0.67) 0.70 0.67 (0.70) 0.72 0.55 (0.61) 0.68 727.0 (19.5)
CoMOGP1 0.78 (0.80) 0.81 0.70 (0.72) 0.73 0.71 (0.72) 0.73 0.69 (0.72) 0.73 726.7 (32.6)
CoMOGP2 0.77 (0.79) 0.80 0.68 (0.70) 0.71 0.69 (0.71) 0.72 0.50 (0.57) 0.66 721.3 (29.9)
350 G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351

Table C.19
The min(median)max of Hyp and GM1 and () of training time for UCI datasets.

Task Conf. Hyperarea GM Pop GM PF GM CH Time

Ion MOGP1 0.89 (0.90) 0.92 0.83 (0.84) 0.88 0.00 (0.00) 0.87 0.79 (0.86) 0.92 105.8 (4.6)
MOGP2 0.88 (0.92) 0.92 0.83 (0.84) 0.86 0.00 (0.42) 0.85 0.60 (0.77) 0.87 101.6 (5.4)
msMOGP1 0.90 (0.92) 0.96 0.87 (0.89) 0.94 0.85 (0.89) 0.93 0.74 (0.88) 0.90 98.2 (4.4)
msMOGP2 0.89 (0.91) 0.91 0.81 (0.86) 0.88 0.80 (0.86) 0.88 0.65 (0.75) 0.82 102.1 (3.9)
CoMOGP1 0.91 (0.92) 0.94 0.84 (0.89) 0.93 0.00 (0.88) 0.91 0.88 (0.90) 0.91 103.0 (4.5)
CoMOGP2 0.90 (0.92) 0.93 0.87 (0.90) 0.93 0.00 (0.26) 0.86 0.37 (0.74) 0.80 102.0 (4.4)

Spt MOGP1 0.85 (0.86) 0.88 0.77 (0.78) 0.79 0.75 (0.79) 0.79 0.77 (0.78) 0.80 77.8 (3.0)
MOGP2 0.83 (0.87) 0.91 0.75 (0.76) 0.81 0.20 (0.78) 0.80 0.71 (0.74) 0.75 79.2 (2.9)
msMOGP1 0.86 (0.88) 0.89 0.77 (0.80) 0.84 0.78 (0.81) 0.82 0.74 (0.82) 0.83 71.4 (1.2)
msMOGP2 0.85 (0.86) 0.87 0.78 (0.79) 0.79 0.78 (0.80) 0.81 0.54 (0.79) 0.79 70.3 (1.6)
CoMOGP1 0.85 (0.88) 0.91 0.79 (0.81) 0.83 0.78 (0.81) 0.83 0.79 (0.80) 0.84 79.5 (2.3)
CoMOGP2 0.86 (0.86) 0.90 0.75 (0.78) 0.83 0.76 (0.79) 0.82 0.69 (0.75) 0.81 77.9 (2.2)

Yst MOGP1 0.95 (0.96) 0.98 0.76 (0.81) 0.88 0.44 (0.77) 0.85 0.83 (0.88) 0.91 213.9 (1.3)
MOGP2 0.96 (0.97) 0.97 0.79 (0.80) 0.80 0.77 (0.78) 0.80 0.78 (0.82) 0.89 210.2 (1.2)
msMOGP1 0.97 (0.98) 0.98 0.81 (0.83) 0.87 0.78 (0.83) 0.86 0.81 (0.88) 0.93 210.3 (1.7)
msMOGP2 0.94 (0.96) 0.97 0.79 (0.81) 0.85 0.74 (0.81) 0.84 0.72 (0.84) 0.90 206.7 (1.7)
CoMOGP1 0.97 (0.98) 0.98 0.84 (0.86) 0.87 0.84 (0.87) 0.88 0.84 (0.87) 0.89 214.5 (0.8)
CoMOGP2 0.97 (0.97) 0.97 0.80 (0.81) 0.84 0.76 (0.78) 0.84 0.68 (0.80) 0.92 211.0 (0.8)

Bal MOGP1 0.72 (0.75) 0.79 0.25 (0.57) 0.67 0.25 (0.59) 0.71 0.61 (0.66) 0.73 109.6 (0.2)
MOGP2 0.74 (0.75) 0.77 0.61 (0.67) 0.70 0.00 (0.00) 0.05 0.58 (0.62) 0.74 108.1 (0.4)
msMOGP1 0.72 (0.75) 0.80 0.55 (0.63) 0.73 0.52 (0.63) 0.74 0.46 (0.62) 0.77 105.8 (0.2)
msMOGP2 0.73 (0.74) 0.78 0.41 (0.51) 0.60 0.42 (0.48) 0.53 0.56 (0.59) 0.67 104.6 (0.1)
CoMOGP1 0.76 (0.77) 0.79 0.30 (0.55) 0.59 0.42 (0.48) 0.64 0.61 (0.76) 0.79 109.6 (0.3)
CoMOGP2 0.73 (0.75) 0.77 0.64 (0.66) 0.70 0.29 (0.64) 0.72 0.61 (0.67) 0.71 108.4 (0.3)

References [19] S. Biswas, S. Kundu, S. Das, Inducing niching behavior in differential evolution
through local information sharing, IEEE Trans. Evol. Comput. 19 (2) (2015)
[1] T. Galinac Grbac, P. Runeson, D. Huljenic, A second replicated quantitative 246263.
analysis of fault distributions in complex software systems, IEEE Trans. Softw. [20] E. Rubinic, G. Mausa, T. Galinac Grbac, Software defect classication with a
Eng. 39 (4) (2013) 462476. variant of NSGA-II and simple voting strategies, in: M. Barros, Y. Labiche
[2] T. Galinac Grbac, D. Huljenic, On the probability distribution of faults in (Eds.), Search-Based Software Engineering, Vol. 9275 of Lecture Notes in
complex software systems, Inf. Softw. Technol. 58 (2015) 250258. Computer Science, Springer International Publishing, 2015,
[3] T. Hall, S. Beecham, D. Bowes, D. Gray, S. Counsell, A systematic literature pp. 347353.
review on fault prediction performance in software engineering, IEEE Trans. [21] P. Wang, M. Emmerich, R. Li, K. Tang, T. Bck, X. Yao, Convex hull-based
Softw. Eng. 38 (6) (2012) 12761304. multi-objective genetic programming for maximizing ROC performance,
[4] F. Provost, Machine learning from imbalanced data sets 101 (extended CoRR abs/1303.3145.
abstract). [22] R. Akbari, V. Zeighami, K. Ziarati, MLGA: a multilevel cooperative genetic
[5] M. Galar, A. Fernndez, E. Barrenechea, H. Bustince, F.G. Herrera, A review on algorithm, in: 2010 IEEE Fifth International Conference on Bio-Inspired
ensembles for the class imbalance problem: bagging-, boosting-, and Computing: Theories and Applications (BIC-TA), 2010, pp. 271277.
hybrid-based approaches, IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42 [23] K. Ziarati, R. Akbari, A multilevel evolutionary algorithm for optimizing
(4) (2012) 463484. numerical functions, Int. J. Ind. Eng. Comput. 2 (2) (2011) 419430.
[6] C. Seiffert, T.M. Khoshgoftaar, J.V. Hulse, Improving software-quality [24] B.T. Skinner, H.T. Nguyen, D. Liu, Distributed classier migration in XCS for
predictions with data sampling and boosting, IEEE Trans. Syst. Man Cybern. classication of electroencephalographic signals, in: IEEE Congress on
Part A: Syst. Hum. 39 (6) (2009) 12831294. Evolutionary Computation, IEEE, 2007, pp. 28292836.
[7] J. Ren, K. Qin, Y. Ma, G. Luo, On software defect prediction using machine [25] W.N. Martin, J. Lienig, J.P. Cohoon, Island (migration) models: evolutionary
learning, J. Appl. Math. (2014) 785435:1785435:8. algorithms based on punctuated equilibria, 6 (1997) 3.
[8] L. Graning, Y. Jin, B. Sendhoff, Generalization improvement in multi-objective [26] M.A. Potter, K.A.D. Jong, A cooperative coevolutionary approach to function
learning, in: The 2006 IEEE International Joint Conference on Neural Network optimization, in: Proceedings of the International Conference on Evolutionary
Proceedings, 2006, pp. 48394846. Computation. The Third Conference on Parallel Problem Solving from Nature:
[9] H. Ishibuchi, N. Akedo, Y. Nojima, Behavior of multiobjective evolutionary Parallel Problem Solving from Nature, PPSN III, Springer-Verlag, London, UK,
algorithms on many-objective knapsack problems, IEEE Trans. Evol. Comput. 1994, pp. 249257.
19 (2) (2015) 264283. [27] M. Abedini, M. Kirley, CoXCS: a coevolutionary learning classier based on
[10] B. Wang, J. Pineau, Online bagging and boosting for imbalanced data streams, feature space partitioning, in: A.E. Nicholson, X. Li (Eds.), Australasian
IEEE Trans. Knowl. Data Eng. 28 (12) (2016) 33533366. Conference on Articial Intelligence, Vol. 5866 of Lecture Notes in Computer
[11] U. Bhowan, M. Johnston, M. Zhang, X. Yao, Evolving diverse ensembles using Science, Springer, 2009, pp. 360369.
genetic programming for classication with unbalanced data, IEEE Trans. [28] J. Lohn, W. Kraus, G. Haith, Comparing a coevolutionary genetic algorithm for
Evol. Comput. 17 (3) (2013) 368386. multiobjective optimization, in: Proc. of Congress on Evolutionary Computing
[12] U. Bhowan, M. Johnston, M. Zhang, X. Yao, Reusing genetic programming for 02, 2002.
ensemble selection in classication of unbalanced data, IEEE Trans. Evol. [29] X.-D. Mu, R.-H. Chang, L. Zhang, Software defect prediction based on
Comput. (2013). competitive organization coevolutionary algorithm, J. Converg. Inf. Technol. 7
[13] A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, (5) (2012) 325332.
Springer-Verlag, 2003. [30] X. Yao, Y. Liu, Making use of population information in evolutionary articial
[14] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective neural networks, Trans. Syst. Man Cyber. Part B 28 (3) (1998) 417425.
genetic algorithm: NSGA-II, Trans. Evol. Comput. 6 (2) (2002) 182197. [31] S.-J. Huang, N.-H. Chiu, Optimization of analogy weights by genetic algorithm
[15] U. Bhowan, M. Johnston, M. Zhang, Evolving ensembles in multi-objective for software effort estimation, IST 48 (11) (2006) 10341045.
genetic programming for classication with unbalanced data, in: 13th Annual [32] Z. Skolicki, K. De Jong, The inuence of migration sizes and intervals on island
Genetic and Evolutionary Computation Conference, GECCO 2011 Proceedings, models, in: Proceedings of the 7th Annual Conference on Genetic and
Dublin, Ireland, July 1216, 2011, 2011, pp. 13311338. Evolutionary Computation, GECCO 05, NY, USA, ACM, 2005, pp. 12951302.
[16] K. De Jong, Co-evolutionary algorithms: a useful computational abstraction? [33] M. Tomassini, Spatially Structured Evolutionary Algorithms, Springer-Verlag,
in: Proceeding of 7th International Symposium, SSBSE, 2015, pp. 311. 2005.
[17] B. Sareni, L. Krahenbuhl, Fitness sharing and niching methods revisited Trans. [34] U. Boryczka, J. Kozak, Enhancing the effectiveness of ant colony decision tree
Evol. Comput. 2 (3) (1998) 97106. algorithms by co-learning, Appl. Soft Comput. 30C (2015) 166178.
[18] A.D. Cioppa, C.D. Stefano, A. Marcelli, Where are the niches? Dynamic tness [35] C.W. Ahn, Advances in Evolutionary Algorithms: Theory, Design and Practice,
sharing, IEEE Trans. Evol. Comput. 11 (4) (2007) 453465. Vol. 18 of Studies in Computational Intelligence, Springer, 2006.
G. Mausa, T. Galinac Grbac / Applied Soft Computing 55 (2017) 331351 351

[36] R.P. Wiegand, W.C. Liles, K.A.D. Jong, An empirical analysis of collaboration [46] G. Mausa, T. Galinac Grbac, B. Dalbelo Basic, A systemathic data collection
methods in cooperative coevolutionary algorithms, in: Proceedings from the procedure for software defect prediction, Comput. Sci. Inf. Syst. 13 (1) (2016)
Genetic and Evolutionary Computation Conference, Morgan Kaufmann, 2001, 173197.
pp. 12351242. [47] G. Mausa, P. Perkovic, T. Galinac Grbac, I. Stajduhar, Techniques for bug-code
[37] M. Lichman, UCI Machine Learning Repository, 2013 http://archive.ics.uci. linking, in: Proc. of SQAMIA 14, 2014, pp. 4755.
edu/ml. [48] G. Mausa, T. Galinac Grbac, B. Dalbelo Basic, Data collection for software
[38] C. Cortes, M. Mohri, AUC optimization vs. error rate minimization, in: defect prediction an exploratory case study of open source software projects,
Advances in Neural Information Processing Systems, MIT Press, 2004. in: Proceedings of MIPRO 14, Opatija, Croatia, 2015, pp. 513519.
[39] C. Chatelain, S. Adam, Y. Lecourtier, L. Heutte, T. Paquet, A multi-model [49] C. Catal, Software mining and fault prediction, Wiley Interdisc. Rew. Data Min.
selection framework for unknown and/or evolutive misclassication cost Knowl. Discov. 2 (5) (2012) 420426.
problems, Pattern Recognit. 43 (3) (2010) 815823. [50] C. Catal, Review: Software fault prediction: a literature review and current
[40] J. Zhao, V.B. Fernandes, L. Jiao, I. Yevseyeva, A. Maulana, R. Li, T. Bck, M.T.M. trends, Expert Syst. Appl. 38 (4) (2011) 46264636.
Emmerich, Multiobjective optimization of classiers by means of 3-d convex [51] M. Harman, S. Islam, Y. Jia, L. Minku, F. Sarro, K. Srivisut, Less is more:
hull based evolutionary algorithm, CoRR abs/1412.5710. temporal fault predictive performance over multiple Hadoop releases, in:
[41] Y. Ma, G. Luo, X. Zeng, A. Chen, Transfer learning for cross-company software Proc. of SSBSE14, 2014, pp. 240246.
defect prediction, Inf. Softw. Technol. 54 (3) (2012) 248256. [52] L. Richard, Concepts & Applications of Inferential Statistics, 2011.
[42] U. Bhowan, M. Johnston, M. Zhang, Differentiating between individual class [53] E. Kocaguneli, T. Menzies, J.W. Keung, On the value of ensemble effort
performance in genetic programming tness for classication with estimation, IEEE Trans. Softw. Eng. 38 (6) (2012) 14031416.
unbalanced data, in: in: IEEE Congress on Evolutionary Computation, IEEE, [54] C. Catal, B. Diri, A systematic review of software fault prediction studies,
2009, pp. 28022809. Expert Syst. Appl. 36 (4) (2009) 73467354.
[43] U. Bhowan, M. Johnston, M. Zhang, Developing new tness functions in [55] P. Runeson, M. Hst, Guidelines for conducting and reporting case study
genetic programming for classication with unbalanced data, IEEE Trans. Syst. research in software engineering, Empir. Softw. Eng. 14 (2) (2009)
Man Cybern. Part B 42 (2) (2012) 406421. 131164.
[44] T. Fawcett, An introduction to ROC analysis, Pattern Recogn. Lett. 27 (8) [56] G. Mausa, N. Bogunovic, T. Grbac Galinac, B. Basic Dalbelo, Rotation forest in
(2006) 861874. software defect prediction, in: Z. Budimac, M. Hericko (Eds.), SQAMIA, Vol.
[45] G. Mausa, T. Galinac Grbac, B. Dalbelo Basic, Software defect prediction with 1375 of CEUR Workshop Proceedings, CEUR-WS.org, 2015,
bug-code analyzer a data collection tool demo, in: Proc. of SoftCOM 14, pp. 3543.
2014.

Vous aimerez peut-être aussi