Vous êtes sur la page 1sur 6

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856

Comparative Study on Uncertain Knowledge Based Frameworks


Mona Gamal1, Ahmed Abo El-Fatoh2, Shereef Barakat3
1,2,3

Mansoura University, Faculty of Computer and Information Sciences Information System Department, Egypt

Abstract: Uncertain knowledge management is the most


difficult problem in the current world. Many researchers tried to handle this problem using the traditional probabilistic and mathematical methods but these methods are very complicated. Soft computing algorithms can be used to replace the traditional techniques in handling uncertain knowledge problems. Uncertain knowledge based system framework has to handle some problems in order to be completely automated. First the variables that deal with uncertainness and vagueness need to be prepared well(in case of fuzzy variables that will be generating the fuzzy membership function). Second, finding the set of rules that is accurate and efficient which should be the system equation. Third , Enhancing the system equation to find what will the system look like in time series. This paper discusses the problems that face the knowledge based framework and introduces different frameworks with regard to the advantages & disadvantages of each approach and a comparison between different approaches in accuracy rate.

create rules which handle the linguistic world's problems. The hybridization between fuzzy system and various soft computing techniques is a very interesting search topic these days. Hybridizations like fuzzy rough, fuzzy neural, fuzzy genetic algorithms and many others are very powerful in dealing with uncertain knowledge in linguistic form away from the complicated mathematical calculations of probabilities. The fuzzy rough hybrid[17.] is very interesting in the field of building equivalence classes with soft boundaries and degrees of membership of the objects inside these classes. After the development of the KBS, the system equation will be the rule set that represent the data space under consideration. This system equation can be enhanced to produce a system equation in time series which would be very important for future recommendations and also to cover the whole data space with rules instead of objects. Fuzzy cellular automata [4.] is a parallel processing system that is composed of a set of interconnected cells. These systems can be used efficiently to build a grid of rules in time sequence based on initial if-then rules produced from a soft computing rule generating technique. This paper introduces a comparative study for knowledge based systems that use different approaches. The second section discusses knowledge based system framework problems such as variables preparation(membership function generation), rule set generation and system equation enhancement. section 3 introduces some knowledge based system frameworks with short explanation as will as the advantages and disadvantages of each framework, a comparison between the previously explained approaches is declared in section 4 providing the accuracy rates for different data sets and finally the conclusion in section 5.

Keywords: Knowledge Based Systems, Uncertain Knowledge, Fuzzy Rule Based Systems, Fuzzy Rough Rule Based System, Parallel Genetic Algorithms, Fuzzy Membership Generation, Self Organized Feature Maps, Fuzzy Cellular Automata.

1. INTRODUCTION
Knowledge based system (KBS) [15.] is a framework for extracting valuable information from concrete data sets. These frameworks used to be designed manually or semi manually (part manual and part automated). Artificial intelligence algorithms (soft computing algorithms)[7.] can work together to completely automate the implementation of KBS. Uncertain, incomplete and imprecise data forces the KBS to deal with vague concepts. Fuzzy systems[6.][9.] are considered to work very efficiently with these vague concepts. Also Fuzzy rough systems are good hybrid competitive in dealing with uncertainness. Many soft computing techniques were used in solving various KBS problems. For example, Genetic Algorithms (GAs) [2.][3.][16.][22.] were used for optimization and search problems. Rough sets[20.] and Artificial Neural Networks[8.] are very good tools for classification and prediction problems. They are efficient in dealing with discrete data, however the real world is dealing with values like tall, short, normal, up normal and so on. So fuzzy set theory[8.][10.] which deal with linguistic values of the variables are to be used here to Volume 2, Issue 2 March April 2013

2. Uncertain Knowledge Based System


Knowledge Discovery (KD)[14.] is the process of extracting valuable knowledge from concrete data sets. Knowledge based systems(KBS) are frameworks which implement the KD process using appropriate soft computing techniques. There are three major problems facing the KBS. These problems are variables preparation(specially fuzzy variables because of Page 146

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
uncertainness situations) , Rule set generation (fuzzy and fuzzy rough rules will be convenient for the same reason) and the system equation enhancement in time series. 2.1 Fuzzy generation) Variables Preparation (Membership Where (x is A) and (y is B) are two fuzzy propositions; x and y are fuzzy variables defined over universes of discourse U and V respectively; and A and B are fuzzy sets defined by their fuzzy membership functions (u): U 01 and (v): V 01

Fuzzy variables are variables that have fuzzy values which belong to a set of fuzzy subsets. These fuzzy subsets are called labels, terms or words that define the fuzzy variable. The fuzzy values (elements) of a variable partially belong to the fuzzy subsets. The degree that an element belongs to a fuzzy subset is called the membership degree. This degree of membership is characterized by a fuzzy membership function.

Finding good, accurate and efficient set of rules is the core of any KBS and automating this process is a tricky problem. 2.3 Enhancing The Uncertain Knowledge Based System The set of fuzzy rough rules are the system equation and can be represented graphically on a d dimension grid. The grid dimensions are decided by the number of conditional attributes in the core set. This system equation has some error rate and the rules do not fully cover the data space and it does not contribute to future recommendations as it does not depend on time parameter. Fuzzy Cellular automata(FCA)[4.] is a parallel distributed processing system that aims to build a grid of cells from some initial fuzzy configuration in the time sequence iterations according to some fuzzy transition function (update rule). The system equation(the rule set) can be used as an initial configuration for the FCA. During the iterations of the fuzzy cellular automata parallel system, the cell state will be formed according to the fuzzy n4V1 nonstable update rule(transition function) [5]. This transition function is a fuzzification of the regular n4V1 nonstable update rule[1.] which resulted by replacing the Boolean operators AND, OR and NOT by their fuzzy extensions.

(u): U A

0 1

(1)

where U is called the universe and A is a fuzzy subset of U. The value of membership are real values in the interval [0,1] where the value 0 means that the element does not belong to the subset A and the variable value 1 means that the value entirely belongs to the subset. The fuzzy membership functions can be represented by many forms one of them is the analogue form. The definition of the membership function used to be collected from experts. Using clustering techniques like self organized feature maps to find the subsets in a fuzzy variable data space is a good way to get the variable value and the membership degree of that value in the data space feature subsets[1.]. 2.2 Rule Set Generation (Fuzzy & Fuzzy Rough Rules) The fuzzy rules generation is an interesting topic that many researches try to find a model to build them. At first, fuzzy if-then rules were usually derived from human experts but it was a very difficult work to gather these rules from an expert and these rules might be affected by perspective of the expert. Therefore, many approaches were proposed to automatically generate fuzzy if-then rules from training datasets. Theses approaches were interested in the rules accuracy and the speed of generating them. There are many types of fuzzy rules used for building knowledge based fuzzy systems. One of these fuzzy rules is Zadeh-Mamdani's fuzzy rules [13.] which are if then rules that its conditions and decisions both consists of fuzzy variables that belongs to some fuzzy sets with some degree of membership. IF x is A, THEN y is B

3. UNCERTAIN FRAMEWORKS

KNOWLEDGE

BASED

3.1 Knowledge Based System Using C4.5 This system make use of the C4.5 as a machine learning algorithm as a classifier. It just applies the data set features to the C4.5 algorithm which generates the tree representation of the data space. The system is too simple with no complication but: (1) the classification process through this system has a very high error rate. (2) the system does not handle uncertain data sets and vague concepts. (3) the attribute reduction problem is not solved. 3.2 Knowledge Based System Using Neural Networks This framework is a transform hybrid model that is composed a preprocessing sub module and a training Artificial Neural Network(ANNs)[8.] module. It is composed of three main modules which are : (1) Data Preprocessing : this module is responsible for managing the medical data introduced to the system Page 147

Volume 2, Issue 2 March April 2013

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
for farther classification. The management includes the data reduction which is removing the unnecessary attributes that do not contribute in the classification process. (2) The Neural Network Training Module :which takes the training data set and apply them to the network structure to modify the connecting weights and produce the final network structure. (3) The Neural Network Testing Module: that applies the testing data set to the network and computes the accuracy in terms of the absolute error rate. This framework made use of the preprocessing phase to remove the redundant conditional features and the soft computing techniques(ANN) to find the knowledge patterns in the training data sets which could has some uncertainness and imprecision. But there is a problem with the reasoning of ANN and how it actually works and also the classification rules can not be extracted in a strait procedure. 3.3 Building Fuzzy Rules By a Hybrid of Self Organized Feature Maps & Parallel Genetic Algorithms The fuzzy rules are simple if-then rules but with fuzzy variables. A Hybrid of Self Organized Feature Maps and Parallel Genetic Algorithms for Uncertain Knowledge framework[12.] tries to build these fuzzy rules in two phases. The first phase is to generate the membership function for the subsets of the fuzzy variables. The second phase is to design the fuzzy rule using a set of training data and check for its applicability on the test data. Self Organized Feature Maps (SOFM)[1.][18.] ,as a clustering mechanism, and Parallel Genetic Algorithms(PGAs)[19.][21.] ,as a parallel evolutionary searching technique, were used for generating the membership functions[1.] and finding the fuzzy rules respectively. The data used in the training process as well as the features data collected from experts are used as inputs to the Generating fuzzy membership function for features subsets process which outputs a data file that contains the values of the fuzzy variables and their corresponding membership degrees in the subsets of these variables. These fuzzy membership degrees and the training data again are used as inputs to the generating fuzzy rules process which in turn outputs the corresponding fuzzy rules set after testing them by the test data records. The main components are: Generating fuzzy membership function for features subsets: This module uses the SOFM capabilities of unsupervised learning and clustering to generate the membership functions of the features subsets. Generating fuzzy rules: This module uses the PGA search mechanisms to build and search for the best fuzzy rules using its fitness function and then tests the resulting rules against the prepared testing data. The framework makes use of the soft computing techniques in solving the knowledge based system Volume 2, Issue 2 March April 2013 problems like preparing the fuzzy membership function for the fuzzy variables and finding the best set of fuzzy rules that represent the data set under consideration. But the PGA does not guarantee finding the optimal set of fuzzy rules and also it has a time complexity problem for large data sets and the increasing number of conditional attributes. Also the feature reduction process is not clearly declared how it is accomplished and there is not any enhancement on the set of rules(the rule based system equation) resulted from the PGA to find the rules that depend on time parameter. 3.4 A Fuzzy Rough Rule Based System Enhanced By Fuzzy Cellular Automata This framework generate a fuzzy rough rule based system and enhance it using the fuzzy cellular automata. The fuzzy rough rules are simple if-then rules but with fuzzy rough variables. This research tries to build these fuzzy rough rules in three phases. The first phase is to generate the membership function for the subsets of the fuzzy variables. The second phase is to reduce the features using the fuzzy membership dependency between the features and the data set. The third phase is to design the fuzzy rough rule by summarizing the data of the reduced features basing on the rough set theory then tests these rules efficiency by the test data set. The data used in the training process as well as the features data collected from experts are used as inputs to the Generating fuzzy membership function for features subsets process which outputs a data file that contains the values of the fuzzy variables and their corresponding membership degrees in the subsets of these variables. These fuzzy membership degrees and the training data again are used as inputs to the reducing features process that measures the attributes dependency and produces the core attributes. The generating fuzzy rough rules process takes the core attributes (reduced training data set) and summarizes the data set to output the corresponding fuzzy rough rules set after testing them by the test data records. The set of the fuzzy rough rules are used as an initial state for the fuzzy cellular automata parallel system to enhance the rule based system and produce what can be said as an equation of the system on time sequence. The main components are: Generating fuzzy membership function for features subsets: This module uses the SOFM capabilities of unsupervised learning and clustering to generate the membership functions of the features subsets. Reducing Features: This module uses the fuzzy rough attribute reduction (FRAR) algorithm to reduce the features basing on the measuring the dependency membership degree between the fuzzy variables and the training data set to produce the reduct (core attributes). Generating fuzzy rough rules: This module summarizes the data set using the reduced data set according to the Page 148

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
fuzzy rough theory to generate the corresponding fuzzy rough rules. These rules are applied on the testing data set to measure its accuracy rate. Enhancing Rule Set using Fuzzy Cellular Automata: this module takes the fuzzy rough rule set as an initial state and iterate according to the fuzzy n4V1 nonsatble update rule to produce the view of the fuzzy rough rule based system in the time sequence. These views of the system may give a good recommendations for the experts in the field of the data under consideration (training data) and cover all the data space with accurate rule for more classification or prediction issues. This framework clearly declared the knowledge based system development problems and their solutions using soft computing techniques. It also used the hybrid of fuzzy rough rule sets which make use of fuzzy system ,which deal efficiently with uncertainness, and rough sets ,which handle the conditional attributes reduction effectively,. The enhancement process of the system equation using fuzzy cellular automata to produce a time series equation is an other advantage of this framework. But the fuzzy cellular automata for grids above 2 and 3 dimensions are time and space complexity problems which needs to be solved. Figure 1 shows the comparison between the classification accuracy measure of the five KB frameworks (The first and second frameworks use the C4.5 and ANN classifiers respectively to extract knowledge from the data sets without feature preprocessing(feature membership generation and reduction). The third framework applies the SOFM in preparing the fuzzy variables and PGA in finding the fuzzy rules. The fourth framework applies the SOFM also for fuzzy variables preparation and fuzzy rough reduction algorithm and finding the fuzzy rough rules using data summarizing technique and finally uses fuzzy cellular automata to enhance the fuzzy rough rule set). The accuracy measure is the mean absolute error in Y axis against different data sets on X axis. The criteria of classification is building a set of classification rules upon the core attributes of the training set and test these set of rules on the test set. Table 2 shows the comparison in numerical mode. Table 2: comparison between the proposed model and other techniques found in the field of generating fuzzy rules
C4.5 Iris Weathe r Liver Breast Cancer Wine 84.5 68 49 95.1 82 ANN 91.2 78 47 95.7 89 SOFM +PGA 81.3 71.4 60.7 94.2 64.3 SOFM + Fuzzy Rough 89.3 100 60.1 97 88.5

4. Comparison between Different Approaches


The comparison between different knowledge based system(KBS) frameworks aims to provide a perspective of how these frameworks solve the problems which face KBS development. This study compared the frameworks in terms of the accuracy rates for different data sets. The accuracy measure used in this work is the proportion of data objects correctly classified against all objects classified in the prepared test sets. Table 1 : Description of the data sets properties
Name of the data set No of att rib ute s 4 10 13 6 4 No of contin uous attribu tes 2 10 13 6 4 No of catego rical attrib utes 2 0 0 0 0 No of data recor ds No of cla sse s 2 2 3 2 3

The comparison shows that the fuzzy and fuzzy rough frameworks(hybrid models) compete with each others for accuracy and they are more efficient than the classification approaches(C4.5 and ANN with feature reduction preprocessing). For the hybrid models, the fourth hybrid model (A Fuzzy Rough Rule Based System Enhanced By Fuzzy Cellular Automata) proved its ability for most test sets.
accuracy
120 100

weather Breast Cancer Wine liver Iris

14 699 168 345 150

Som +PGA Fuzzy Rough

60 40 20

The data sets used in this research to test the model are taken from the UCI machine learning repository (accessed at 1 august 2012) and their properties are illustrated in table 1. The data set records are divided in two equal parts (one for the training data and one for the test data). Volume 2, Issue 2 March April 2013

0 wine liver weather Data Set iris

Figure 1: Classification Accuracy

accuracy

c4.5 neural

80

Page 149

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856 5. CONCLUSION
Knowledge based systems are very important tools in data mining field as they extract knowledge from concrete uncertain data. Rule based systems are a branch from knowledge based systems that depend on a set of rules to extract the information (consequences) from data (propositions). The uncertainness in the data requires a special kind of rules. There are a number of problems that face the KBS development. First the variables preparation process and these variables should be able to deal with uncertainness and vagueness. Fuzzy variables are very good in dealing with such situations. To define fuzzy variables, the fuzzy membership function must be declared efficiently and this will be a challenging procedure. The second problem is to find the set of rules which implement the classification process under uncertainness and vagueness circumstances. Fuzzy and fuzzy rough rules are very efficient in dealing with uncertainness and hence finding these rule set will solve the problem. The classification rule set will work as the KBS equation and this equation needs to be enhanced to work with time parameter and help in predicting the future recommendation in the data space under consideration and this will be considered a third problem needs to be solved. This paper showed that preparing the fuzzy variables(generating the fuzzy membership functions) and using the rule set that efficiently deal with uncertainty and vagueness (fuzzy and fuzzy rough rule sets) increase the KBS performance in terms of accuracy ranges resulted from the classification phase. The enhancement of the set of rules that actually represent the system equations helps in generating an equation that depend on time parameter or in other words a time series equation. These equations helps in future prediction of the system output and could be very helpful in giving good advise or recommendations to the experts on the field of the data set under consideration. In a comparison between the new KBS frameworks (the framework that uses SOFM in generating membership functions and PGA in finding the set of fuzzy rules and the one that uses SOFM for the same reason, fuzzy rough reduction, data summarizing techniques in finding the fuzzy rough rules and enhancing the resulting rule set using fuzzy cellular automata) and traditional techniques like using C4.5 in extracting the knowledge patterns from the data sets and using ANN for the same reason but with a preprocessing phase using the ordinary feature reduction techniques, the hybrid KBS with the fuzzy rough feature reduction proved their ability to increase the classification accuracy(system performance). Also, the new systems gave an enhancement for the rule set and hence are capable to provide future recommendations in the data space under consideration. Volume 2, Issue 2 March April 2013

REFERENCES
[1.] Piwonska and F. Seredynski, Solving TwoDimensional Binary Classification Problem with Use of Cellular Automata", in AUTOMATA the 17th International Workshop on Cellular Automata and Discrete Complex Systems Proceedings, Santiago, Chile, 2011. [2.] Chih-Chung Yang, N.K. Bose, Generating fuzzy membership function with self-organizing feature map, Letters Vol. 1, pp. 356365, April 2006. [3.] E. Goldberg, Genetic algorithms in search, optimization, and machine learning, AddisonWesley, 412, 1989. [4.] Fevrier Valdez, Patricia Melin and Oscar Castillo, Evolutionary method combining Particle Swarm Optimisation and Genetic Algorithms using fuzzy logic for parameter adaptation and aggregation: the case neural network optimization for face recognition, IJAISC, Vol.2(1/2), pp.77-102, 2010. [5.] Betel and P. Flocchini, "On the Relationship between Boolean and Fuzzy Cellular Automata", 2009. [6.] Ishibuchi, K. Nozaki and H. Tanaka, Adaptive Fuzzy Rule-Based Classification Systems, IEEE Trans. on Fuzzy Systems, Vol. 4, no. 3, pp. 238-250, 1996. [7.] H. Holland, Adaptation in natural and artificial systems, University of Michigan Press, Ann Arbor, MI, 1975. [8.] Janusz Kacprzyk, "Studies in Fuzziness and Soft Computing" ,ISBN 978-3540737223, ISSN: 14349922 (Print) 1860-0808 (Online), Springer Berlin / Heidelberg ,2009. [9.] Kenji Suzuki, "Artificial Neural Networks: Methodological Advances and Biomedical Applications", InTech, ISBN-13: 9789533072432, 2011. [10.] Lotfi A. Zadeh, "From computing with numbers to computing with words from manipulation of measurements to manipulation of perceptions", in International Journal of Applied Math and Computer Science, Vol. 12, no. 3, pp. 307324, 2002. [11.] Lotfi A. Zadeh, "Fuzzy sets and systems". In: Fox J, editor. System Theory. Brooklyn, NY: Polytechnic Press, pp. 2939, 1965. [12.] Mariusz Nowostawski, Riccardo Poli, Parallel Genetic Algorithms Taxonomy, Proceedings of Third International Conference on Knowledge-based Intelligent Information Engineering Systems KES'99 Adelaide, South Australia, 31 August - 1 September 1999. [13.] Mona Gamal, Ahmed Abo El-fatoh, Shereef Barakat and Elsayed Radwan." A Hybrid of Self Organized Feature Maps and Parallel Genetic Algorithms for Uncertain Knowledge", International Journal of Computer Applications, Foundation of Computer Science, New York, USA, Vol 60, no 6, 2012. Page 150

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
[14.] Nikola K. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, the MIT Press, Cambridge, MA, ISBN 0-262-11212-4, 1996. [15.] O. Cordon, F. Gomide, F. Herrera, F. Hoffmann, and L. Magdalena, Ten years of genetic fuzzy systems: Current framework and new trends, Fuzzy Sets and Systems, pp. 5-31, 2004. [16.] Oded Maimon and Lior Rokach ,"Soft Computing for Knowledge Discovery and Data Mining" , ISBN10: 0387699341, ISBN-13: 978-0387699349, Springer; 2008 edition, November 26, 2007. [17.] Saroj, Nishant Prabhat, A Genetic-Fuzzy Algorithm to Discover Fuzzy Classification Rules for Mixed Attributes Datasets, International Journal of Computer Applications, Vol 34 No.5, November 2011. [18.] P. Tiwari and Arun K. Srivastava, "Fuzzy rough sets, fuzzy preorders and fuzzy topologies Fuzzy Sets and Systems", Vol. 210 , pp. 63-68, January 2013. [19.] Kohonen, Self-Organizing Maps, Springer Series in Information Sciences, Vol. 30, Springer, Berlin, Heidelberg, New York, ISBN 3-540-67921-9, ISSN 0720-678X 1995, 1997, 2001. [20.] Reiko Tanese, Parallel genetic algorithms for a hypercube, Proc. of the Second International Conference on Genetic Algorithms, 1987. [21.] Y. Caballero, D. Alvarez, R. Bello and M. M. Garcia, "Feature Selection Algorithms Using Rough Set Theory, In Intelligent Systems Design and Applications", ISDA Seventh International Conference on, pp. 407-411 , 2007. [22.] Z. Konfrt, Parallel Genetic Algorithms: advances, computing trends, application and perspectives, 18th IPDPS 2004, IEEE CS, Santa Fe, New Mexico, pp.162, 2004. [23.] Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlang, 252, 1992.

Volume 2, Issue 2 March April 2013

Page 151

Vous aimerez peut-être aussi