Académique Documents
Professionnel Documents
Culture Documents
Computer and Communication Systems, Faculty of Engineering, University Putra Malaysia, 43400, Serdang, Selangor, Malaysia
Laboratory of Applied and Computational Statistic, Institute for Mathematical Research, UPM, 43400, Serdang, Selangor, Malaysia
a r t i c l e
i n f o
Article history:
Received 17 October 2009
Accepted 10 January 2011
Available online 28 January 2011
Keywords:
back propagation neural network
committee neural network
fuzzy genetic algorithm
reservoir properties
a b s t r a c t
Combining numerous appropriate experts can improve the generalization performance of the group when
compared to a single network alone. There are different ways of combining the intelligent systems' outputs in
the combiner in the committee neural network, such as simple averaging, gating network, stacking, support
vector machine, and genetic algorithm. Premature convergence is a classical problem in nding optimal
solution in genetic algorithms. In this paper, we propose a new technique for choosing the female
chromosome during sexual selection to avoid the premature convergence in a genetic algorithm. A bi-linear
allocation lifetime approach is used to label the chromosomes based on their tness value, which will then be
used to characterize the diversity of the population. The label of the selected male chromosome and the
population diversity of the previous generation are then applied within a set of fuzzy rules to select a suitable
female chromosome for recombination. Finally, we use fuzzy genetic algorithm methods for combining the
output of experts to predict a reservoir parameter in petroleum industry. The results show that the proposed
method (fuzzy genetic algorithm) gives the smallest error and highest correlation coefcient compared to ve
members and genetic algorithm and produces signicant information on the reliability of the permeability
predictions.
2011 Elsevier B.V. All rights reserved.
1. Introduction
There are some reasons for distributing a learning task among a
number of individual networks. The main reason is due to improving
the generalization ability, because the generalization of individual
networks is not unique. The combination of some Articial Neural
Network (ANN) when they do the same task is called as the
ensemble of neural network or committee of neural network. When
the networks are different it is called a committee of machine. In
ensemble methods, the ensemble candidates are different. There are a
number of methods to create different individual training data, the
initial condition, the topology of nets, and the training algorithms.
After selecting individuals and training them, their generated results
will be combined by some methods. The committee machine
structure can be viewed in Fig. 1. In the committee machine, the
expectation is that difference experts converge to different local
minima on the error surface, and the overall output improved the
performance, (Wolpert, 1992; Efron and Tibshirani, 1993; Rezaee,
2001). The mean square error (MSE) between individual output and
expectation output (target) can be expressed in terms of the bias
squared plus the variance, (Haykin, 1999). The MSE equation makes it
clear that we can reduce either the bias or the variance to reduce the
neural network error. Unfortunately, it is found that for the concerned
Corresponding author. Tel.: +60 124422445.
E-mail addresses: sajkenari@yahoo.com (S.A. Jafari), syamsiah@eng.upm.edu.my
(S. Mashohor), jalali@inspem.upm.edu.my (M.J. Varnamkhasti).
0920-4105/$ see front matter 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.petrol.2011.01.006
218
S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223
NN 1
y1 (n )
Input
NN 2
y2 (n )
Output
Combin
X(n)
Y(n)
yk (n )
NN k
Fig. 1. Committee neural network with k members.
and the (M-1) remaining data set as the training data. After M time
repeating, we have M numbers of overlapping training sets and M
independent test sets. Since the training sets are different, then the
generated errors after training are expected to fall in different local
error minima and therefore lead to different results. The performance
of experts is measured on the corresponding test data set. Breiman,
Friedman, Olshen and Stone, use cross-validation to prune classication tree algorithms.
2.4. Stacking (Wolpert, 1992)
The rst part of the stacking method is similar to the crossvalidation method. As mentioned above, there is an M training set and
an M test set. After that we use the M training sets to train two
generalizers G1,G2 and then the M test set is put into G1 and G2, (these
outputs will be used as second space generalizer inputs). The output
of G1 and G2and target value, (g1i, g2i, yi) will be used as the training set
of generalizer G as a second space generalizer.
2.5. Boosting by ltering (Schapire, 1990), AdaBoost (Freund and Schapire)
In this method, there are three experts. The rst expert is trained
with the M training data of the source training data set and the result
of the rst expert will be applied to the second expert. After that, the
second expert will be trained with this data set. After training the
second expert, the training data of the source data will be passed from
the rst and second experts. Finally the third expert will be trained
only on the data set in which the output of the rst and second experts
is disagreed. That means, if there are disagreements between the rst
and second experts on a certain data, this data will be passed to the
third expert. The nal result is related to the outputs of the three
experts. Freund and Schapire (1995); Drucker et al. (1994), have
shown that the boosting algorithm is very effective in many
experiences. Another method of boosting is adaptive boosting. In
this method, the training data will be selected with their probability.
For every data, the predicted value is close to the target value and the
probability to choose this data is low, otherwise the probability is
high. This method gives more chances to such data for retraining. For a
classication problem, we can use majority voting and for a regression
problem the result with lowest error rate is selected. AdaBoost is
sensitive to noisy data and outliers, but it is less sensitive to the over
tting for most learning algorithms.
3. Combination methods
The last stage of design Committee Machine (CM) is the combination
of the expert outputs. Many investigations have been done to nd the
combining methods to combine the expert outputs and produce the
nal outputs. In this section, we have introduced some traditional
combining methods in the CM. Some of them are suitable for the
classier and some of them performed well in regression.
3.1. Simple averaging (Lincoln and Skrzypek, 1990)
One of the most frequently used combination methods is simple
averaging. In this method after training the committee members, the
nal output can be obtained by averaging the output of the committee
members. It is easy to illustrate by Cauchy's inequality which the
Mean Square Error (MSE) for committee machine with the simple
averaging method is less or equal than the average of MSE for every
expert. This method is more useful when the variances of the
ensemble members are different, because the simple averaging can
reduce the variance of the nets. The disadvantage of simple averaging
is the equal weight for every committee member, i.e. there is no
difference between the weights of two committee members with low
and high generalizations.
S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223
MSEGA =
i=1
1
2
w y + w2 y2i + ::: + wk yki Ti ;
n 1 1i
wi = 1 1
i=1
where, y1i is the output of the rst network on the ith input or ith
training pattern, wi is the weight of the ith member, Ti is the target
value of the i-th input, and n is the number of training data.
3.3. Majority voting (Hansen and Salamon, 1990)
This combination method is most popular for classication
problems. If more than half of the individuals vote for a prediction,
majority voting will select this prediction to be the nal output. Majority
voting ignores the fact that some networks that lie in a minority
sometimes can produce the correct results. At this stage of combination,
it ignores the existence of diversity that is the motivation for ensembles.
3.4. Ranking (Ho et al., 1994; Al-Ghoneim and Kumar, 1996)
This method uses experimental results obtained by a set of experts
on a set of dataset to generate a ranking of those experts (each expert
has a rank related with an input dataset). After that the results of the
ranks of each expert will be calculated by some methods such as
average rank, success rate ratio, and signicant wins to generate nal
ranking for experts. The nal rank can be used to select one or more
suitable experts for a test (unseen) data (Brazdil and Soares, 2000).
There are no unique criteria on the selection of the mentioned
combination methods. The choice mainly depends on the characteristic
of the particular application that we have in hand, e.g. the nature of the
application (classier or regression), the size and quality of the training
data and the generated errors on the region of the input space. Using
the same combination method on an ensemble for a regression problem
may generate good results. However, it may not work on a classication
problem and vice versa. Much work has been done to introduce
combining method in ensemble approaches. Major contribution in
ranking is as the weighted majority voting (Kuncheva, 2004), decision
templates (Kuncheva et al., 2001), naive Bayesian fusion (Xu et al.,
1992), Dumpster Shafer combination (Ahmadzadeh and Petrou, 2003)
and Fuzzy integral (Cho and Kim, 1995).
4. Fuzzy genetic algorithm (FGA) for combining
Genetic Algorithm (GA) is a search optimization technique that
mimics some of the processes of natural selection and evolution. In
optimization, when a GA fails to nd the global optimum, the problem
is often credited to premature convergence, which means that the
sampling process converges on a local optimum rather than the global
optimum. Sexual selection by means of female preferences has
promoted the evolution of complex male ornaments in many animal
groups. A sex-determination system is a biological system that
determines the expansion of sexual characteristics in an organism.
Most sexual organisms have two sexes. In many cases, sex determination is genetic: males and females have different alleles or even different
genes that state their sexual morphology. In a classical GA, chromosomes reproduce asexually: any two chromosomes may be parents in
crossover. Gender division and sexual selection inspired a model of
gendered GA in which crossover takes place only between chromo-
219
somes of an opposite sex. In this study, a relation between the age and
tness as in biological systems affecting the selection procedure is
proposed. A bi-linear allocation lifetime approach is used to label the
chromosomes based on their tness value, which will then be used to
characterize the diversity of the population. Inspired by the non-genetic
sex-determination system that exists in some species of reptiles,
including alligators and some turtles where sex is determined by the
temperature at which the egg is incubated, we divided the population
into two groups, male and female, so that the male and female can be
selected in an alternate way. In each generation, the layout of the
selection of male and female is different. During the sexual selection, the
male chromosome is selected randomly. The label of the selected male
chromosome and the population diversity of the previous generation
are then applied within a set of fuzzy rules to select a suitable female
chromosome (Jalali and Lee, 2009). Fuzzy systems are encountered in
numerous areas of application. Fuzzy rules, for example, viewed as a
generic mechanism of grainy knowledge representation, are positioned
in the center of the knowledge-based systems. A fuzzy IF-THEN rule
consists of an IF part (antecedent) and a THEN part (consequent).
The antecedent is a combination of terms, whereas the consequent is
exactly one term. In the antecedent, the terms can be combined by using
fuzzy conjunction, disjunction and negation. A term is an expression
of the form: X = T, where X is a linguistic variable and T is one of its
linguistic terms. In this paper, we use a linguistic variable age for
chromosomes. Fig. 2 describes the linguistic variable age where Infant,
Teenager, Adult and Elderly are the linguistic values.
The system applied in our study uses triangular membership
functions, the (minimum) intersection operator and correlationproduct inference procedure. Defuzzication of the outputs was
performed using the fuzzy centroid method described by Kosko
(1992). To nd the membership function, we use the tness value of
each chromosome and the minimum, maximum and average tness
values of the population in each generation. Each chromosome has
its own label determined by the age function. Let
=
fi f min
;
favr f min
fi favr
;
f max favr
= favr fi
or
8
UL +
>
>
; 0
<
n
ageci =
>
>
: U + ; b 0
n
Linguistic variable
Age
Syntactic rules
Infant
Teenager
Adult
Elderly
Linguistic terms
220
S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223
Infant
1
Teenager
Adult
Elderly
Table 1
Fuzzy rules for selecting female chromosome.
Male age Diversity Female age
(Mage)
(Fage)
0.25
0.45
0.65
0.85
Infant
age (ci)
L + ;
+ ;
0
:
b0
High
2.5
Medium
Low
4.5
6.5
Very Low
8.5
D (ci)
High
Medium
Low
Very low
Teenager High
Medium
Low
Very low
Elderly or adult
Adult
Adult or teenager
Teenager or infant
Infant
Elderly or adult
Elderly
Adult or teenager
Teenager or infant
Infant
High
Medium
Low
Very low
High
Medium
Low
Very low
Elderly or adult
Adult or teenager
Teenager or infant
Infant
Adult or teenager
Teenager or infant
Infant
Infant
Male
Female
Randomly numbers
S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223
221
Fig. 6. Variety of female's age chromosome when male's age chromosome and when diversity of population are changing.
MSEFGA =
i=1
1
2
w y + w2 y2i + w3 y3i + w4 y4i + w5 y5i Ti 8
n 1 1i
In this function, yji are the outputs of j-th expert where j = 1, 2,.., 5,
Ti is the target value of the i-th input, and n is the number of training
data. The parameters of the applied FGA are described as follows. We
divided the population into two groups, male and female, so that the
male and female can be selected in an alternate way. In each
generation, the layout of this selection is different. During the sexual
selection, the male chromosome is selected randomly and the label of
the selected chromosome (1) and the population diversity (4) of
the previous generation are then applied within the set of Fuzzy rules
(as in Table 1) to select a suitable female chromosome. For crossover
we consider K-Point, and Random number (KPR). In this method,
when male and female are selected, a positive integer number k is
selected randomly, then these parents are divided into (k + 1)-parts.
After that, k-part of these parts will be used for the offspring of the
same k-point cut method and the other part of offspring is completed
with random numbers of 0 or 1. Fig. 5 shows this technique for two
points cut in offspring. The place of these parts can be changed
randomly (Jalali and Lee, 2009).
222
S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223
Fig. 9. (af). Crossplot showing R between core and predicted permeability using ve training algorithms and FGA.
S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223
Table 2
The comparison of MSE and R2 for test data using ve training algorithm, GA and FGA.
Algorithm
R2
MSE
LM
BR
OSS
RP
SCG
GA
FGA
0.8274
0.8239
0.7257
0.751
0.7885
0.8438
0.8523
0.0012
0.0012
0.0015
0.0016
0.0015
0.001
0.00092
223
Arabas, J., Michalewicz, Z., et al., 1994. GAVaPSa genetic algorithm with varying
population size. Evolutionary Computation, 1994: IEEE World Congress on
Computational Intelligence, vol.1, pp. 7378.
Bhatt, A., Helle, H.B., 2002. Committee neural networks for porosity and permeability
prediction from well logs. Geophys. Prospect. 50 (6), 645660.
Brazdil, P., Soares, C., 2000. A Comparison of Ranking Methods for Classication
Algorithm Selection, pp. 6375.
Breiman, L., 1996. Bagging predictors. Mach. Learn. 24 (2), 123140.
Chen, C.-H., Lin, Z.-S., 2006. A committee machine with empirical formulas for
permeability prediction. Comput. Geosci. 32 (4), 485496.
Cho, S.-B., Kim, J.H., 1995. An HMM/MLP architecture for sequence recognition. Neural
Comput. 7 (2), 358369.
Drucker, H., Cortes, C., et al., 1994. Boosting and other ensemble methods. Neural
Comput. 6 (6), 12891301.
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. [u.a.] Chapman & Hall,
New York. 1993.
Freund, Y., Schapire, R., 1995. A decision-theoretic generalization of on-line learning
and an application to boosting. European Conference on Computational Learning
Theory, pp. 2337.
Hansen, L.K., Salamon, P., 1990. Neural network ensembles. IEEE Trans. Pattern Anal.
Mach. Intell. 12 (10), 9931001.
Haykin, S., 1999. Neural NetworksA Comprehensive Foundation, Upper Saddle River.
Prentice-Hall, NJ.
Ho, T.K., Hull, J.J., et al., 1994. Decision combination in multiple classier systems. IEEE
Trans. Pattern Anal. Mach. Intell. 16 (1), 6675.
Jacobs, R.A., 1995. Methods for combining experts' probability assessments. Neural
Comput. 7 (5), 867888.
Jalali, M., Lee, L.S., 2009. Fuzzy genetic algorithm with sexual selection (FGASS). Second
Int. Conf. and Workshop on Basic and Applied Science, 24 June, Johor Bahru,
Malaysia.
Kadkhodaie-Ilkhchi, A., Rahimpour-Bonab, H., et al., 2009a. A committee machine
with intelligent systems for estimation of total organic carbon content from
petrophysical data: an example from Kangan and Dalan reservoirs in South Pars
Gas Field, Iran. Comput. Geosci. 35 (3), 459474.
Kadkhodaie-Ilkhchi, A., Rezaee, M.R., et al., 2009b. A committee neural network for
prediction of normalized oil content from well log data: an example from South
Pars Gas Field, Persian Gulf. J. Petrol. Sci. Eng. 65 (12), 2332.
Kosko, B., 1992. Neural Networks and Fuzzy Systems: A Dynamical Systems Approach
to Machine Intelligence. Prentice-Hall, Inc, p. 449.
Krogh, A., Vedelsby, J., 1995. Neural network ensembles, cross validation, and active
learning. Adv. Neural Inf. Process. Syst. 7, 231238.
Kuncheva, L.I., 2004. Classier Ensembles for Changing Environments, pp. 115.
Kuncheva, L.I., Bezdek, J.C., et al., 2001. Decision templates for multiple classier fusion:
an experimental comparison. Pattern Recognit. 34 (2), 299314.
Lincoln, W., Skrzypek, J., 1990. Synergy of clustering multiple back propagation
networks. Adv. Neural Inf. Process. systems 2, 650657.
Naftaly, U., Intrator, N., Horn, D., 1997. Optimal ensemble averaging of neural networks.
Network 8, 283296.
Opitz, D.W., Shavlik, J.W., 1996. Actively searching for an effective neural network
ensemble. Connect. Sci. 8 (3), 337354.
Raviv, Y., Intrator, N., 1996. Bootstrapping with noise: an effective regularization
technique. Connect. Sci. 8 (3), 355372.
Rezaee, M.R., 2001. Petroleum Geology. Alavi Publications, Tehran, Iran.
Schapire, R.E., 1990. The strength of weak learnability. Mach. Learn. 5 (2), 197227.
Service, U. o. T. a. A. P. E., 1999. A Dictionary for the Petroleum Industry. Petroleum
Extension Service.
Wolpert, D.H., 1992. Stacked generalization. Neural Netw. 5, 241259.
Xu, L., Krzyzak, A., et al., 1992. Methods of combining multiple classiers and their
applications to handwriting recognition. Syst. Man Cybern. IEEE Trans. 22 (3),
418435.