Académique Documents
Professionnel Documents
Culture Documents
Project Guide: Dr. S. G. Sanjeevi (Head of the Department) Associate Professor 12/31/2011
Page |1
CONTENTS
1. Introduction a. Bayesian Network b. K2 Algorithm c. Learning Variable Ordering (VO) 2. Previous Experiments a. Evolutionary Algorithms (EAs) b. VOGA (Variable Ordering Genetic Algorithm) i. What is VOGA? ii. How it is implemented? iii. Experiment 3. Scope a. Differential Evolution i. Algorithm 4. Conclusion 5. References -- Page 2 -- Page 2 -- Page 2 -- Page 3 -- Page 5 -- Page 5 -- Page 5 -- Page 5 -- Page 5 -- Page 7 -- Page 9 -- Page 9 -- Page 9 -- Page 10 -- Page 10
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |2
INTRODUCTION
Bayesian Network:
A Bayesian Network (G) has a directed acyclic graph (DAG) structure. Each node in the graph corresponds to a discrete random variable in the domain. An edge, Y X, on the graph, describes a parent and child relation in which Y is the parent and X is the child. All parents of X constitute the parent set of X which is denoted by ( ). In addition to the graph, each node has a conditional probability table (CPT) specifying the probability of each possible state of the node given each possible combination of states of its parents. If a node contains no parent, the table gives the marginal probabilities of the node. In a process of learning BNs from data, the BN variables represent the dataset attributes (or features). When using algorithms based on heuristic search, the initial order of the dataset attributes may be an important issue. Some of these algorithms depend on this ordering to determine the arcs direction such that an earlier attribute (in an ordered list) is a possible parent only of the later ones. Instead of encoding a joint probability distribution over a set of random variables, a Bayesian Classifier (BC) aims at correctly predicting the value of a designated discrete class variable given a vector of attributes (predictors). Learning Bayesian Networks methods may be used to induce BC and it is done in this work. The BN learning algorithm applied in our experiments is based on the K2 algorithm, which constructs a BN from data and uses a heuristic search for doing so.
K2 Algorithm:
The K2 algorithm constructs a BN from data using a heuristic search. It receives as input a complete database and a VO. Considering these assumptions, the K2 algorithm searches for the BN structure that best represents the database. This algorithm is commonly applied due to its performance in terms of computational complexity (time) and good results when an adequate VO is supplied. The attributes preorder assumption is used to reduce the number of possible structures to be learned. In this sense, K2 uses an ordered list (containing all the attributes including the class), which asserts that only the attributes positioned before a given attribute A may be parents of A. Hence, the first attribute in the list has no parent, i.e. it is a root node in the BN.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |3 The algorithm uses a greedy method to search for the best structure. It begins as if every node had no parent. Then, beginning with the second attribute from the ordered list (the first one is a root node), the possible parents are tested and those that maximize the whole probability structure are added to the network. This process is repeated to all attributes in order to get the best possible structure. K2 metric to test each possible parent set to each variable is defined by the following equation. ( ( ) has ) )
Where each attribute dataset with m objects. Each attribute instantiations of ( instantiated as ( ).
has value =
With the best structure already defined, the network conditional probabilities are determined. It is done using a Bayesian estimation of the (predefined) network structure probability. When dataset D has a distinguished class variable, K2 may be used as a BC learning algorithm. This is exactly our assumption.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |4 optimization tool for most different types of problems. Several works propose hybrid GA/Bayes methods using a GA to define an adequate VO: Presented a genetic algorithm to search for the best variable ordering. Each element of the
population is a possible ordering and their fitness function is the K2 metric. Implemented a GA for the problem of permutation of variables in BN learning and inference. Considers a subgroup of the set of dependence /independence relations to get the variables
ordering. This process is guided by genetic algorithms and simulated annealing. Even having a number of works dealing with this issue; most of them are defined to learn unrestricted BN. Our GA/Bayes hybrid approach (VOGA), on the other hand, is devoted to learn Bayesian Classifiers from data. In this sense, the class variable may play an interesting role in the variable ordering definition.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |5
PREVIOUS EXPERIMENTS
Genetic algorithms like VOGA and VOGA+ have been used for optimizing the learning of BC from data process by means of the identification of a suitable VO. In these genetic algorithms each element of the population is a possible ordering and their fitness is the K2 metric (g value). Evolutionary algorithms with canonical crossover and mutation have also been used to find an appropriate VO.
How it is implemented?
VOGA generates a random initial population. Each chromosome is evaluated by the K2 algorithm whose function g is used as fitness function. The best chromosomes are selected, and using crossover and mutation operators the next generation is generated. The process is repeated and for each generation the best ordering is stored. If there is no improvement after 10 generations, the algorithm locks up and returns the best found ordering. The flowchart summarizes the process all.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |6
Start Chromosomes Evaluation Read data Crossover and Mutation Returns the best VO Stop?
End Chromosomes Evaluation Selection Flow Chart In addition to the aforementioned VOGA algorithm, it was implemented as a slightly different version, namely VOGA+, in which the initial population is not randomly generated. In VOGA+, more information about the class variable is used trying to optimize the initial population and, therefore, trying to obtain better BC structures (mainly in domains having many attributes). In order to define the VO of the initial population chromosomes, the 2 (chi-squared) statistical test is performed using each variable jointly with the class variable (for this reason, VOGA+ can only be applied in a classification context, where there is a distinguished variable, namely class variable). Thus, the strength of the dependence relationship between each variable and the class can be measured. Subsequently, the variables are decreasingly ordered according to their 2 scores. The first variable in the ordered list has the highest 2 score, i.e. it is the most dependent upon the class. Obviously, the relation between the 2 statistical test and the best VO may not hold strictly, but the work, show that good results can be achieved using this heuristic. Having defined the VO given by 2 statistical test, all initial population chromosomes are defined using this VO (all chromosomes are identical).
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |7
Experiment:
Seven domains were used in our simulations. Two well-known Bayesian Network domain (Engine Fuel System and Asia) and five benchmark problems from the U. C. Irvine repository1 were used in the VO and classification task, namely, Balance, Breast w, Congressional Voting Records (Voting), Vehicle and Iris. The following table summarizes the data set features. Asia AT IN CL 8 15000 2 Balance 5 625 3 Breast w 10 683 2 Engine 9 15000 2 Iris 5 150 3 Vehicle 19 846 4 Voting 17 232 2
Datasets Description with dataset name (Data), number of attributes plus class (AT), number of instances (IN) and number of classes (CL).
The experiments were conducted following the steps below. 1. Initially, the datasets had been used as input to the K2 algorithm. The VO was the original one given in the file. The Bayesian score (g) obtained to each dataset was stored. 2. The same datasets used in step 1 had been used as input to VOGA and VOGA+. The Bayesian score (g) obtained to each dataset and the number of generations necessary to reach the solution were stored.
Results achieved in steps 1 and 2 are presented in the following tables respectively. Asia K2 VOGA VOGA+ -33610 -33610 -33608 Balance -4457 -4457 -4457 Breast w -8159 -8159 -8159 Engine -33809 -33755 -33755 Iris -2026 -2026 -2026 Vehicle -10357 -10006 -9956 Voting -1749 -1727 -1724
Bayesian Score (g function) of each achieved Bayesian Network Structure. The best results in each dataset are in bold face.
Analyzing results presented in the above Table, it is possible to infer that, as far as the Bayesian score (g function) is concerned, in all performed experiments, VOGA produced results at least as good as the
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |8 ones produced by K2 and in 3 out of the 7 datasets VOGA improved the results obtained using K2. In addition, VOGA+ performed at least as well as VOGA and in 3 out of the 7 datasets VOGA+ improved the results obtained using VOGA. Another interesting issue revealed in Table 2 is that datasets having higher number of attributes, namely Vehicle (19 attributes) and Voting (17 attributes) favored the proposed method (VOGA), mainly when using the enhanced version VOGA+. Asia VOGA VOGA+ 11 19 Balance 11 11 Breast w 11 11 Engine 13 12 Iris 11 11 Vehicle 11 15 Voting 38 6
When the number of generations is concerned, in 4 (Balance, Breast-w, Engine and Iris) out of the 7 datasets VOGA and VOGA+ presented (mostly) the same results. The other 3 datasets (Asia, Vehicle and Voting) revealed that, when the number of generations was not the same for VOGA and VOGA+, the Bayesian score obtained by the later one was always better.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |9
SCOPE
Scope: Replacing Genetic Algorithm with Differential Evolution algorithm for better convergence and for a better Variable Ordering (possibly).
Differential Evolution:
A basic variant of the DE algorithm works by having a population of candidate solutions (called agents). These agents are moved around in the search-space by using simple mathematical formulae to combine the positions of existing agents from the population. If the new position of an agent is an improvement it is accepted and forms part of the population, otherwise the new position is simply discarded.
Algorithm:
Differential Evolution Algorithm: Let designate candidate solution in the population. The basic DE algorithm can then be
described as follows: Initialize all agents with random positions in the search-space. Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached), repeat the following: For each agent in the population do: Pick three agents , and from the population at random, they must be
distinct from each other as well as from agent Pick a random index be optimized). Compute the agent's potentially new position Pick a uniformly distributed number * ( ) +as follows: * +(n being the dimensionality of the problem to
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
P a g e | 10 If ( ) ( ) then replace the agent in the population with the improved with in the population.
Pick the agent from the population that has the highest fitness or lowest cost and return it as the best found candidate solution.
Note that
both these parameters are selectable by the practitioner along with the population size
CONCLUSION
Experiments for the usage of differential evolution to find a suitable variable ordering (possibly) and to extend the results for Bayesian networks.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
P a g e | 11
REFERENCES
SANTOS, E. B.; HRUSCHKA JR., EBECKEN, Evolutionary Algorithm using Random Multi-point Crossover Operator for Learning Bayesian Network Structures, In 9th INTERNATIOAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, 2010.
SANTOS, E. B.; HRUSCHKA JR., ER. VOGA: Variable ordering genetic algorithm for learning Bayesian classifiers. In: 6TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS - HIS2006, 2006, Auckland.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773