Vous êtes sur la page 1sur 9

Classification with Bayesian Networks

Damián Abalo Mirón


March 1, 2011

1
Abstract
The objective of this paper is testing different Bayesian Networks based
classifiers, and comparing the results between them. The experiments
that were realized to write this paper were done with the laboratory soft-
ware BayesiaLab[4], and the dataset that was used was IRIS dataset[2],
explained later.

Keywords: Bayesian Network, BayesiaLab, Classifiers, Ex-


periment, IRIS

2
Contents
1 Introduction 4

2 Dataset 4

3 Experiment 4
3.1 Naive Bayes and Sons and Spouses . . . . . . . . . . . . . . 4
3.2 Augmented Naive Bayes . . . . . . . . . . . . . . . . . . . . 7
3.3 Markov blanket, Augmented Markov blanket and minimal
augmented markov blanket . . . . . . . . . . . . . . . . . . 7
3.4 Semisupervised Learning . . . . . . . . . . . . . . . . . . . . 7

4 Conclusions 9
4.1 Subsection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1.1 Subsubsection . . . . . . . . . . . . . . . . . . . . . . 9

3
1 Introduction
The Bayesian Networks belong to the family of probabilistic graphical
models that are used to represent knowledge of a certain domain. Each
node in the graph represents a random variable, and the edges between
them represent their probabilistic dependences. In this experiment we are
going to compare different classifiers based on Bayesian networks (naive
Bayes, augmented naive Bayes, sons and spouses, Markov blanket, aug-
mented Markov blanket, minimal augmented Markov blanket, semisuper-
vised learning).
For this purpose we are going to use the software BayesiaLab under demo
version[4]. BayesiaLab is a bayesian network publishing and automatic
learning program which represents expert knowledge and allows one to
find it among a mass of data.
For testing and comparing the mentioned classifiers we are also going need
a dataset suited for the classification task. This dataset is defined in the
following section of the paper.

2 Dataset
The IRIS Dataset[2] is perhaps the best known database to be found in
the pattern recognition literature. Fisher’s paper is a classic in the field
and is referenced frequently to this day. The data set contains 3 different
classes of 50 instances each, where each class refers to a type of iris plant.
One class is linearly separable from the other 2; the latter are not linearly
separable from each other. About the attributes, it has 4 input attributes,
all of them expressed in centimeters, being the length and the width of
the petal and the sepal.
The formating made to the data file found in the UCI Repository only con-
sisted of adding a new header line with the names of the attributes. The
dataset was saved to a .data file, and later imported from BayesiaLab[4]
software.

3 Experiment
In this section of the paper we are going to see the experiment results.
First of all we have to load the dataset into the software, which applies
a KMeans algorithm with 4 intervals to discretize the data. This process
can be seen in the table (1).
Once we have imported the dataset we have ready and empy Bayesian
Network (Edgeless)(2) ready to apply the different learners that we are
going to work with.

3.1 Naive Bayes and Sons and Spouses


The result of training the network with the Naive Bayes learner can be
seen in (3). In the graph, the class(C) is probabilistically dependant
of all the other attributes, Petal Length(PL), Petal Width(PW), Sepal
Length(SL) and Sepal Width(SW). This can be expressed as following:

4
Figure 1: Table with the importing dataset report.

Figure 2: Empty Bayesian Network obtained from the IRIS dataset.

5
Figure 3: Naive Bayes and Sons and Spouses trained Bayesian network.

Figure 4: Augmented Naive Bayes trained Bayesian network.

6
P(C —PL,PW,SL,SW).
The results obtained from this learning get a precision of 100% for the
iris setosa, a 90% for the iris versicolor and a 94% for the iris virginica,
obtaining a total presicion of 94.67%. In a later section this results will
be compared with the results of the other learners.
Using the Sons and Spouses learner you obtain as sons of the class at-
tribute a subset of nodes which can have other parents, but in this case
this learner gives the very same solution than the first one(3), since its
a simple solution with a very high performance, having exactly the same
dependencies and precision P(C —PL,PW,SL,SW).

3.2 Augmented Naive Bayes


This learner enforces the class attribute to be dependant to the rest of
attributes, taking all of them into account, and the attributes can also
have correlation between them. Again in this case, the initial solution
of the class dependant of all the attributes is a very high performing
solution, so it is the final solution of the algorithm. Again, we get the
same graph(4) were the blue arrows represent that they were enforced.
However the result is the exact same one as the first section, obtaining
the same performance and the same dependencies P(C —PL,PW,SL,SW).

3.3 Markov blanket, Augmented Markov blanket


and minimal augmented markov blanket
The Markov blanket learner searchs for the target node in its neighbor-
hood, searcing for parents, sons and spouses, being the sons the nodes
with probabilistic dependece to the class, the spouses probabilistically
independent, but with some relation with common sons, and the father
nodes which the class have dependance to. This is a very fast learning
method, since it totally focuses on the target node, and the Augemnted
Markov blanket algorithm, aswell as the minimal augmented markov blan-
ket are extensions to this algorithm.
The graph obtained from training the network using the Markov Blanket
learning algorithm can be seen in (5). The only attribute taken into ac-
count is Petal Width(PW), since it is the most influencing one, and it is
dependant to the class(C) P(C —PW). It is a very simple graph but it
obtains very good performance, with 100% of precision for the iris setosa,
a 98% for the iris versicolor and a 90% for the iris virginica, obtaining a
total precision of 96%.
For the extended methods Augmented Markov blanket and Minimal aug-
mented Markov blanket, since we don’t have a big amount of attributes,
the results are exactly the same than the ones obtained with plain Markov
blanked, obtaining as well the same graph(5).

3.4 Semisupervised Learning


Semisupervised learning algorithm searches the relation between nodes
that are at a certain distance from the class attribute. The obtained
graph(6) the only attribute having a direct relation with the class attribute

7
Figure 5: Markov Blanket, Augmented Markov Blanket and Minimal Aug-
mented Markov Blanket trained Bayesian network.

Figure 6: Semisupervised Learning trained Bayesian network.

8
is again Petal Width(PW), but this node have relations with the other
attributes. The dependencies between the attributes are the following
P(C—PW), P(PW—PL) and P(PL—SW, SL).
About the performance, this algorithm gives the exact same results than
the previous Markov blanket algorithm.

4 Conclusions
Here goes the text.
S = πr2 (1)
One can refer to equations like this: see equation (1). One can also refer
to sections in the same way: see section 4.1. Or to the bibliography like
this: [1].

4.1 Subsection
More text.

4.1.1 Subsubsection
More text.

References
[1] Author, Title, Journal/Editor, (year)
[2] Fisher, IRIS Dataset, UCI Machine learning repository, (1936)
[3] Ben-Gal I., Bayesian Networks, Encyclopedia of Statistics in Quality
and Reliability, Wiley and Sons, (2007)
[4] Bayesia SAS BayesiaLab 5.0 Demo Version, (2010-2011)

Vous aimerez peut-être aussi