Académique Documents
Professionnel Documents
Culture Documents
Artificial Intelligence
Information Sci
Hershey New York
Director of Editorial Content: Kristin Klinger
Managing Development Editor: Kristin Roth
Development Editorial Assistant: Julia Mosemann, Rebecca Beistline
Senior Managing Editor: Jennifer Neidig
Managing Editor: Jamie Snavely
Assistant Managing Editor: Carole Coulson
Typesetter: Jennifer Neidig, Amanda Appicello, Cindy Consonery
Cover Design: Lisa Tosheff
Printed at: Yurchak Printing Inc.
Copyright 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any
means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate
a claim of ownership by IGI Global of the trademark or registered trademark.
Encyclopedia of artificial intelligence / Juan Ramon Rabunal Dopico, Julian Dorado de la Calle, and Alejandro Pazos Sierra, editors.
p. cm.
Includes bibliographical references and index.
Summary: "This book is a comprehensive and in-depth reference to the most recent developments in the field covering theoretical developments, tech-
niques, technologies, among others"--Provided by publisher.
ISBN 978-1-59904-849-9 (hardcover) -- ISBN 978-1-59904-850-5 (ebook)
1. Artificial intelligence--Encyclopedias. I. Rabunal, Juan Ramon, 1973- II. Dorado, Julian, 1970- III. Pazos Sierra, Alejandro.
Q334.2.E63 2008
006.303--dc22
2008027245
All work contributed to this encyclopedia set is new, previously-unpublished material. The views expressed in this encyclopedia set are those of the
authors, but not necessarily of the publisher.
If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating
the library's complimentary electronic access to this publication.
Editorial Advisory Board
Llus Jofre
Polytechnical University of Catalunya, Spain
List of Contributors
Volume I
Active Learning with SVM / Jun Jiang, City University of Hong Kong, Hong Kong; and
Horace H. S. Ip, City University of Hong Kong, Hong Kong...............................................................................1
Adaptive Neural Algorithms for PCA and ICA / Radu Mutihac, University of Bucharest, Romania................22
Adaptive Neuro-Fuzzy Systems / Larbi Esmahi, Athabasca University, Canada; Kristian Williamson,
Statistics Canada, Canada; and Elarbi Badidi, United Arab Emirates University, UAE..................................31
Adaptive Technology and Its Applications / Joo Jos Neto, Universidade de So Paulo, Brazil....................37
Agent-Based Intelligent System Modeling / Zaiyong Tang, Salem State College, USA; Xiaoyu Huang,
University of Shanghai for Science & Technology, China; and Kallol Bagchi, University of Texas
at El Paso, USA...................................................................................................................................................51
AI and Ideas by Statistical Mechanics / Lester Ingber, Lester Ingber Research, USA.......................................58
AI Methods for Analyzing Microarray Data / Amira Djebbari, National Research Council Canada,
Canada; Aedn C. Culhane, Harvard School of Public Health, USA; Alice J. Armstrong,
The George Washington University, USA; and John Quackenbush, Harvard School of Public Health,
USA.....................................................................................................................................................................65
Algorithms for Association Rule Mining / Vasudha Bhatnagar, University of Delhi, India;
Anamika Gupta, University of Delhi, India; and Naveen Kumar, University of Delhi, India............................76
Ambient Intelligence / Fariba Sadri, Imperial College London, UK; and Kostas Stathis,
Royal Holloway, University of London, UK.......................................................................................................85
Analytics for Noisy Unstructured Text Data I / Shourya Roy, IBM Research, India Research Lab, India;
and L. Venkata Subramaniam, IBM Research, India Research Lab, India........................................................99
Analytics for Noisy Unstructured Text Data II / L. Venkata Subramaniam, IBM Research, India Research
Lab, India; and Shourya Roy, IBM Research, India Research Lab, India.......................................................105
ANN Application in the Field of Structural Concrete / Juan L. Prez, University of A Corua, Spain;
M Isabel Martnez, University of A Corua, Spain; and Manuel F. Herrador, University of A
Corua, Spain................................................................................................................................................... 118
ANN Development with EC Tools: An Overview / Daniel Rivero, University of A Corua, Spain;
and Juan Ramn Rabual Dopico, University of A Corua, Spain..................................................................125
ANN-Based Defects Diagnosis of Industrial Optical Devices / Matthieu Voiry, University of Paris,
France & SAGEM REOSC, France; Vronique Amarger, University of Paris, France; Joel Bernier,
SAGEM REOSC, France; and Kurosh Madani, University of Paris, France..................................................131
Artificial Intelligence and Education / Eduardo Snchez, University of Santiago de Compostela, Spain;
and Manuel Lama, University of Santiago de Compostela, Spain...................................................................138
Artificial Intelligence and Rubble-Mound Breakwater Stability / Gregorio Iglesias Rodriquez, University of
Santiago de Compostela, Spain; Alberte Castro Ponte, University of Santiago de Compostela, Spain;
Rodrigo Carballo Sanchez, University of Santiago de Compostela, Spain; and Miguel ngel Losada
Rodriguez, University of Granada, Spain.........................................................................................................144
Artificial Intelligence for Information Retrieval / Thomas Mandl, University of Hildesheim, Germany.........151
Artificial Intelligence in Computer-Aided Diagnosis / Paulo Eduardo Ambrsio, Santa Cruz State
University, Brazil..............................................................................................................................................157
Artificial Neural Networks and Cognitive Modelling / Amanda J.C. Sharkey, University of Sheffield, UK....161
Artificial NeuroGlial Networks / Ana Beln Porto Pazos, University of A Corua, Spain; Alberto Alvarellos
Gonzlez, University of A Corua, Spain; and Flix Montas Pazos, University of A Corua, Spain..........167
Association Rule Mining / Vasudha Bhatnagar, University of Delhi, India; and Sarabjeet Kochhar,
University of Delhi, India.................................................................................................................................172
Bayesian Neural Networks for Image Restoration / Radu Mutihac, University of Bucharest,
Romania............................................................................................................................................................223
Bioinspired Associative Memories / Roberto A. Vazquez, Center for Computing Research, IPN,
Mexico; and Humberto Sossa, Center for Computing Research, IPN, Mexico................................................248
Blind Source Separation by ICA / Miguel A. Ferrer, University of Las Palmas de Gran Canaria, Spain;
and Aday Tejera Santana, University of Las Palmas de Gran Canaria, Spain................................................270
Class Prediction in Test Sets with Shifted Distributions / scar Prez, Universidad Autnoma
de Madrid, Spain; and Manuel Snchez-Montas, Universidad Autnoma de Madrid, Spain......................282
Cluster Analysis of Gene Expression Data / Alan Wee-Chung Liew, Griffith University, Australia;
Ngai-Fong Law, The Hong Kong Polytechnic University, Hong Kong; and Hong Yan, City University
of Hong Kong, Hong Kong & University of Sydney, Australia........................................................................289
Clustering Algorithm for Arbitrary Data Sets / Yu-Chen Song, Inner Mongolia University of Science
and Technology, China; and Hai-Dong Meng, Inner Mongolia University of Science and Technology,
China.................................................................................................................................................................297
CNS Tumor Prediction Using Gene Expression Data Part I / Atiq Islam, University of Memphis, USA;
Khan M. Iftekharuddin, University of Memphis, USA; E. Olusegun George, University of Memphis, USA;
and David J. Russomanno, University of Memphis, USA.................................................................................304
CNS Tumor Prediction Using Gene Expression Data Part II / Atiq Islam, University of Memphis, USA;
Khan M. Iftekharuddin, University of Memphis, USA; E. Olusegun George, University of Memphis, USA;
and David J. Russomanno, University of Memphis, USA.................................................................................312
Combining Classifiers and Learning Mixture-of-Experts / Lei Xu, Chinese University of Hong Kong,
Hong Kong & Peking University, China; and Shun-ichi Amari, Brain Science Institute, Japan.....................318
Comparison of Cooling Schedules for Simulated Annealing, A / Jos Fernando Daz Martn,
University of Deusto, Spain; and Jess M. Riao Sierra, University of Deusto, Spain...................................344
Component Analysis in Artificial Vision / Oscar Dniz Surez, University of Las Palmas de Gran Canaria,
Spain; and Gloria Bueno Garca, University of Castilla-La Mancha, Spain...................................................367
Computer Vision for Wave Flume Experiments / scar Ibez, University of A Corua, Spain;
and Juan Rabual Dopico, University of A Corua, Spain..............................................................................383
Conditional Hazard Estimating Neural Networks / Antonio Eleuteri, Royal Liverpool University Hospital,
UK; Azzam Taktak, Royal Liverpool University Hospital, UK; Bertil Damato, Royal Liverpool University
Hospital, UK; Angela Douglas, Liverpool Womens Hospital, UK; and Sarah Coupland, Royal Liverpool
University Hospital, UK....................................................................................................................................390
Configuration / Luca Anselma, Universit di Torino, Italy; and Diego Magro,
Universit di Torino, Italy.................................................................................................................................396
Continuous ACO in a SVR Traffic Forecasting Model / Wei-Chiang Hong, Oriental Institute
of Technology, Taiwan.......................................................................................................................................410
Data Mining Fundamental Concepts and Critical Issues / John Wang, Montclair State University, USA;
Qiyang Chen, Montclair State University, USA; and James Yao, Montclair State University, USA................418
Data Warehousing Development and Design Methodologies / James Yao, Montclair State University,
USA; and John Wang, Montclair State University, USA..................................................................................424
Decision Making in Intelligent Agents / Mats Danielson, Stockholm University, Sweden &
Royal Institute of Technology, Sweden; and Love Ekenberg, Stockholm University, Sweden &
Royal Institute of Technology, Sweden..............................................................................................................431
Decision Tree Applications for Data Modelling / Man Wai Lee, Brunel University, UK;
Kyriacos Chrysostomou, Brunel University, UK; Sherry Y. Chen, Brunel University, UK; and
Xiaohui Liu, Brunel University, UK..................................................................................................................437
Developmental Robotics / Max Lungarella, University of Zurich, Switzerland; and Gabriel Gmez,
University of Zurich, Switzerland.....................................................................................................................464
Device-Level Majority von Neumann Multiplexing / Valeriu Beiu, United Arab Emirates University,
UAE; Walid Ibrahim, United Arab Emirates University, UAE; and Sanja Lazarova-Molnar, United Arab
Emirates University, UAE.................................................................................................................................471
Disk-Based Search / Stefan Edelkamp, University of Dortmund, Germany; and Shahid Jabbar,
University of Dortmund, Germany...................................................................................................................501
Distributed Constraint Reasoning / Marius C. Silaghi, Florida Insitute of Technology, USA;
and Makoto Yokoo, Kyushu University, Japan.................................................................................................507
EC Techniques in the Structural Concrete Field / Juan L. Prez, University of A Corua, Spain; Beln
Gonzlez-Fonteboa, University of A Corua, Spain; and Fernando Martnez Abella,
University of A Corua, Spain..........................................................................................................................526
Emulating Subjective Criteria in Corpus Validation / Ignasi Iriondo, Universitat Ramon Llull, Spain;
Santiago Planet, Universitat Ramon Llull, Spain; Francesc Alas, Universitat Ramon Llull, Spain;
Joan-Claudi Socor, Universitat Ramon Llull, Spain; and Elisa Martnez, Universitat Ramon Llull,
Spain.................................................................................................................................................................541
Volume II
Energy Minimizing Active Models in Artificial Vision / Gloria Bueno Garca, University of
Castilla La Mancha, Spain; Antonio Martnez, University of Castilla La Mancha, Spain;
Roberto Gonzlez, University of Castilla La Mancha, Spain; and Manuel Torres,
University of Castilla La Mancha, Spain.......................................................................................................547
Ensemble of ANN for Traffic Sign Recognition / M. Paz Sesmero Lorente, Universidad Carlos III
de Madrid, Spain; Juan Manuel Alonso-Weber, Universidad Carlos III de Madrid, Spain;
Germn Gutirrez Snchez, Universidad Carlos III de Madrid, Spain; Agapito Ledezma Espino,
Universidad Carlos III de Madrid, Spain; and Araceli Sanchis de Miguel, Universidad Carlos
III de Madrid, Spain..........................................................................................................................................554
Ensemble of SVM Classifiers for Spam Filtering / ngela Blanco, Universidad Pontificia de
Salamanca, Spain; and Manuel Martn-Merino, Universidad Pontificia de Salamanca, Spain......................561
Evolutionary Approaches for ANNs Design / Antonia Azzini, University of Milan, Italy; and
Andrea G.B. Tettamanzi, University of Milan, Italy.........................................................................................575
Evolutionary Approaches to Variable Selection / Marcos Gestal, University of A Corua, Spain;
and Jos Manuel Andrade, University of A Corua, Spain..............................................................................581
Evolutionary Computing Approach for Ad-Hoc Networks / Prayag Narula, University of Delhi,
India; Sudip Misra, Yale University, USA; and Sanjay Kumar Dhurandher, University of Delhi,
India..................................................................................................................................................................589
Evolved Synthesis of Digital Circuits / Laureniu Ionescu, University of Pitesti, Romania; Alin Mazare,
University of Pitesti, Romania; Gheorghe erban, University of Pitesti, Romania; and Emil Sofron,
University of Pitesti, Romania..........................................................................................................................609
Evolving Graphs for ANN Development and Simplification / Daniel Rivero, University of
A Corua, Spain; and David Periscal, University of A Corua, Spain............................................................618
Facial Expression Recognition for HCI Applications / Fadi Dornaika, Institut Gographique National,
France; and Bogdan Raducanu, Computer Vision Center, Spain....................................................................625
Full-Text Search Engines for Databases / Lszl Kovcs, University of Miskolc, Hungary;
and Domonkos Tikk, Budapest University of Technology and Economics, Hungary.......................................654
Fuzzy Control Systems: An Introduction / Guanrong Chen, City University of Hong Kong,
Hong Kong; and Young Hoon Joo, Kunsan National University, Korea..........................................................688
Fuzzy Logic Applied to Biomedical Image Analysis / Alfonso Castro, University of A Corua,
Spain; and Bernardino Arcay, University of A Corua, Spain.........................................................................710
Fuzzy Logic Estimator for Variant SNR Environments / Rosa Maria Alsina Pags, Universitat Ramon
Llull, Spain; Cludia Mateo Segura, Universitat Ramon Llull, Spain; and Joan-Claudi Socor Carri,
Universitat Ramon Llull, Spain........................................................................................................................719
Fuzzy Systems Modeling: An Introduction / Young Hoon Joo, Kunsan National University, Korea;
and Guanrong Chen, City University of Hong Kong, Hong Kong, China.......................................................734
Gene Regulation Network Use for Information Processing / Enrique Fernandez-Blanco, University of
A Corua, Spain; and J.Andrs Serantes, University of A Corua, Spain.......................................................744
Genetic Algorithm Applications to Optimization Modeling / Pi-Sheng Deng, California State University
at Stanislaus, USA.............................................................................................................................................748
Genetic Algorithms for Wireless Sensor Networks / Joo H. Kleinschmidt, State University of Campinas,
Brazil.................................................................................................................................................................755
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering / scar Ibez, University of
A Corua, Spain; and Alberte Castro Ponte, University of Santiago de Compostela, Spain...........................759
Growing Self-Organizing Maps for Data Analysis / Soledad Delgado, Technical University of Madrid,
Spain; Consuelo Gonzalo, Technical University of Madrid, Spain; Estbaliz Martnez, Technical
University of Madrid, Spain; and gueda Arquero, Technical University of Madrid, Spain...........................781
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis / Llus Formiga, Universitat Ramon Llull,
Spain; and Francesc Alas, Universitat Ramon Llull, Spain............................................................................788
Handling Fuzzy Similarity for Data Classification / Roy Gelbard, Bar-Ilan University, Israel;
and Avichai Meged, Bar-Ilan University, Israel...............................................................................................796
Harmony Search for Multiple Dam Scheduling / Zong Woo Geem, Johns Hopkins University, USA.............803
Hierarchical Neuro-Fuzzy Systems Part I / Marley Vellasco, PUC-Rio, Brazil; Marco Pacheco,
PUC-Rio, Brazil; Karla Figueiredo, UERJ, Brazil; and Flavio Souza, UERJ, Brazil.....................................808
Hierarchical Neuro-Fuzzy Systems Part II / Marley Vellasco, PUC-Rio, Brazil; Marco Pacheco,
PUC-Rio, Brazil; Karla Figueiredo, UERJ, Brazil; and Flavio Souza, UERJ, Brazil.....................................817
High Level Design Approach for FPGA Implementation of ANNs / Nouma Izeboudjen, Center de
Dveloppement des Technologies Avances (CDTA), Algrie; Ahcene Farah, Ajman University,
UAE; Hamid Bessalah, Center de Dveloppement des Technologies Avances (CDTA), Algrie;
Ahmed. Bouridene, Queens University of Belfast, Ireland; and Nassim Chikhi,
Center de Dveloppement des Technologies Avances (CDTA), Algrie..........................................................831
HOPS: A Hybrid Dual Camera Vision System / Stefano Cagnoni, Universit degli Studi di Parma, Italy;
Monica Mordonini, Universit degli Studi di Parma, Italy; Luca Mussi, Universit degli Studi di
Perugia, Italy; and Giovanni Adorni, Universit degli Studi di Genova, Italy................................................840
Hybrid Dual Camera Vision System / Stefano Cagnoni, Universit degli Studi di Parma, Italy;
Monica Mordonini, Universit degli Studi di Parma, Italy; Luca Mussi, Universit degli Studi di
Perugia, Italy; and Giovanni Adorni, Universit degli Studi di Genova, Italy................................................848
Hybrid Meta-Heuristics Based System for Dynamic Scheduling / Ana Maria Madureira,
Polytechnic Institute of Porto, Portugal...........................................................................................................853
Hybrid System for Automatic Infant Cry Recognition I, A / Carlos Alberto Reyes-Garca, Instituto
Nacional de Astrofsica ptica y Electrnica, Mexico; Ramon Zatarain, Instituto Tecnologico de
Culiacan, Mexico; Lucia Barron, Instituto Tecnologico de Culiacan, Mexico; and
Orion Fausto Reyes-Galaviz, Universidad Autnoma de Tlaxcala, Mexico....................................................860
Hybrid System for Automatic Infant Cry Recognition II, A / Carlos Alberto Reyes-Garca,
Instituto Nacional de Astrofsica ptica y Electrnica, Mexico; Sandra E. Barajas, Instituto Nacional
de Astrofsica ptica y Electrnica, Mexico; Esteban Tlelo-Cuautle, Instituto Nacional de Astrofsica
ptica y Electrnica, Mexico; and Orion Fausto Reyes-Galaviz, Universidad Autnoma de Tlaxcala,
Mexico...............................................................................................................................................................867
IA Algorithm Acceleration Using GPUs / Antonio Seoane, University of A Corua, Spain; and
Alberto Jaspe, University of A Corua, Spain..................................................................................................873
Improving the Nave Bayes Classifier / Liwei Fan, National University of Singapore, Singapore;
and Kim Leng Poh, National University of Singapore, Singapore...................................................................879
Incorporating Fuzzy Logic in Data Mining Tasks / Lior Rokach, Ben Gurion University, Israel....................884
Independent Subspaces / Lei Xu, Chinese University of Hong Kong, Hong Kong & Peking University,
China.................................................................................................................................................................892
Intelligent MAS in System Engineering and Robotics / G. Nicols Marichal, University of La Laguna,
Spain; and Evelio J. Gonzlez, University of La Laguna, Spain......................................................................917
Intelligent Query Answering Mechanism in Multi Agent Systems / Safiye Turgay, Abant zzet Baysal
University, Turkey; and Fahrettin Yaman, Abant zzet Baysal University, Turkey...........................................924
Intelligent Radar Detectors / Ral Vicen Bueno, University of Alcal, Spain; Manuel Rosa Zurera,
University of Alcal, Spain; Mara Pilar Jarabo Amores, University of Alcal, Spain;
Roberto Gil Pita, University of Alcal, Spain; and David de la Mata Moya, University of Alcal,
Spain.................................................................................................................................................................933
Intelligent Software Agents Analysis in E-Commerce I / Xin Luo, The University of New Mexico, USA;
and Somasheker Akkaladevi, Virginia State University, USA...........................................................................940
Intelligent Software Agents Analysis in E-Commerce II / Xin Luo, The University of New Mexico, USA;
and Somasheker Akkaladevi, Virginia State University, USA...........................................................................945
Intelligent Traffic Sign Classifiers / Ral Vicen Bueno, University of Alcal, Spain; Elena Torijano
Gordo, University of Alcal, Spain; Antonio Garca Gonzlez, University of Alcal, Spain;
Manuel Rosa Zurera, University of Alcal, Spain; and Roberto Gil Pita, University of Alcal, Spain...........956
Interactive Systems and Sources of Uncertainties / Qiyang Chen, Montclair State University, USA;
and John Wang, Montclair State University, USA............................................................................................963
Intuitionistic Fuzzy Image Processing / Ioannis K. Vlachos, Aristotle University of Thessaloniki, Greece;
and George D. Sergiadis, Aristotle University of Thessaloniki, Greece...........................................................967
Knowledge Management Tools and Their Desirable Characteristics / Juan Ares, University of A Corua,
Spain; Rafael Garca, University of A Corua, Spain; Mara Seoane, University of A Corua, Spain;
and Sonia Surez, University of A Corua, Spain............................................................................................982
Kohonen Maps and TS Algorithms / Marie-Thrse Boyer-Xambeu, Universit de Paris VII LED,
France; Ghislain Deleplace, Universit de Paris VIII LED, France; Patrice Gaubert,
Universit de Paris 12 ERUDITE, France; Lucien Gillard, CNRS LED, France; and
Madalina Olteanu, Universit de Paris I CES SAMOS, France...................................................................996
Learning in Feed-Forward Artificial Neural Networks I / Llus A. Belanche Muoz, Universitat
Politcnica de Catalunya, Spain.....................................................................................................................1004
Learning Nash Equilibria in Non-Cooperative Games / Alfredo Garro, University of Calabria, Italy.........1018
Learning-Based Planning / Sergio Jimnez Celorrio, Universidad Carlos III de Madrid, Spain;
and Toms de la Rosa Turbides, Universidad Carlos III de Madrid, Spain...................................................1024
Longitudinal Analysis of Labour Market Data with SOM, A / Patrick Rousset, CEREQ, France;
and Jean-Francois Giret, CEREQ, France....................................................................................................1029
Managing Uncertainties in Interactive Systems / Qiyang Chen, Montclair State University, USA;
and John Wang, Montclair State University, USA..........................................................................................1036
Microarray Information and Data Integration Using SAMIDI / Juan M. Gmez, Universidad Carlos III
de Madrid, Spain; Ricardo Colomo, Universidad Carlos III de Madrid, Spain; Marcos Ruano,
Universidad Carlos III de Madrid, Spain; and ngel Garca, Universidad Carlos III de Madrid, Spain.....1064
Mobile Robots Navigation, Mapping, and Localization Part I / Lee Gim Hee, DSO National
Laboratories, Singapore; and Marcelo H. Ang Jr., National University of Singapore, Singapore................1072
Mobile Robots Navigation, Mapping, and Localization Part II / Lee Gim Hee, DSO National
Laboratories, Singapore; and Marcelo H. Ang Jr., National University of Singapore, Singapore................1080
Modal Logics for Reasoning about Multiagent Systems / Nikolay V. Shilov, Russian Academy
of Science, Institute of Informatics Systems, Russia; and Natalia Garanina, Russian Academy of Science,
Institute of Informatics Systems, Russia..........................................................................................................1089
Morphological Filtering Principles / Jose Crespo, Universidad Politcnica de Madrid, Spain.................... 1102
Multilayer Optimization Approach for Fuzzy Systems / Ivan N. Silva, University of So Paulo, Brazil;
and Rogerio A. Flauzino, University of So Paulo, Brazil............................................................................. 1121
Multi-Layered Semantic Data Models / Lszl Kovcs, University of Miskolc, Hungary; and
Tanja Sieber, University of Miskolc, Hungary................................................................................................ 1130
Multilogistic Regression by Product Units / P.A. Gutirrez, University of Crdoba, Spain; C. Hervs,
University of Crdoba, Spain; F.J. Martnez-Estudillo, INSA ETEA, Spain; and M. Carbonero,
INSA ETEA, Spain....................................................................................................................................... 1136
Multi-Objective Evolutionary Algorithms / Sanjoy Das, Kansas State University, USA; and
Bijaya K. Panigrahi, Indian Institute of Technology, India............................................................................ 1145
Narrative Information and the NKRL Solution / Gian Piero Zarri, LaLIC, University Paris 4-Sorbonne,
France ............................................................................................................................................................ 1159
Narrative Information Problems / Gian Piero Zarri, LaLIC, University Paris 4-Sorbonne, France......... 1167
Natural Language Processing and Biological Methods / Gemma Bel Enguix, Rovira i
Virgili University, Spain; and M. Dolores Jimnez Lpez, Rovira i Virgili University, Spain........................ 1173
Natural Language Understanding and Assessment / Vasile Rus, The University of Memphis, USA;
Philip McCarthy, University of Memphis, USA; Danielle S. McNamara, The University of Memphis,
USA; and Art Graesser, University of Memphis, USA.................................................................................... 1179
Navigation by Image-Based Visual Homing / Matthew Szenher, University of Edinburgh, UK.................... 1185
Nelder-Mead Evolutionary Hybrid Algorithms / Sanjoy Das, Kansas State University, USA....................... 1191
Neural Control System for Autonomous Vehicles / Francisco Garca-Crdova, Polytechnic University
of Cartagena (UPCT), Spain; Antonio Guerrero-Gonzlez, Polytechnic University of Cartagena (UPCT),
Spain; and Fulgencio Marn-Garca, Polytechnic University of Cartagena (UPCT), Spain......................... 1197
Neural Network-Based Visual Data Mining for Cancer Data / Enrique Romero, Technical University of
Catalonia, Spain; Julio J. Valds,National Research Council Canada, Canada; and Alan J. Barton,
National Research Council Canada, Canada.................................................................................................1205
Neural Network-Based Process Analysis in Sport / Juergen Perl, University of Mainz, Germany...............1212
Neural Networks and Equilibria, Synchronization, and Time Lags / Daniela Danciu, University
of Craiova, Romania; and Vladimir Rsvan, University of Craiova, Romania.............................................1219
Neural Networks and HOS for Power Quality Evaluation / Juan J. Gonzlez De la Rosa,
Universities of Cdiz-Crdoba, Spain; Carlos G. Puntonet, University of Granada, Spain;
and A. Moreno-Muoz, Universities of Cdiz-Crdoba, Spain......................................................................1226
New Self-Organizing Map for Dissimilarity Data, A / Tien Ho-Phuoc, GIPSA-lab, France;
and Anne Guerin-Dugue, GIPSA-lab, France................................................................................................1244
NLP Techniques in Intelligent Tutoring Systems / Chutima Boonthum, Hampton University, USA;
Irwin B. Levinstein, Old Dominion University, USA; Danielle S. McNamara, The University of Memphis,
USA; Joseph P. Magliano, Northern Illinois University, USA; and Keith K. Millis,
The University of Memphis, USA....................................................................................................................1253
Nonlinear Techniques for Signals Characterization / Jess Bernardino Alonso Hernndez, University
of Las Palmas de Gran Canaria, Spain; and Patricia Henrquez Rodrguez, University of Las Palmas
de Gran Canaria, Spain..................................................................................................................................1266
Ontologies and Processing Patterns for Microarrays / Mnica Migulez Rico, University of A Corua,
Spain; Jos Antonio Seoane Fernndez, University of A Corua, Spain; and Julin Dorado de la Calle,
University of A Corua, Spain........................................................................................................................1273
Ontologies for Education and Learning Design / Manuel Lama, University of Santiago de Compostela,
Spain; and Eduardo Snchez, University of Santiago de Compostela, Spain................................................1278
Ontology Alignment Overview / Jos Manuel Vzquez Naya, University of A Corua, Spain;
Marcos Martnez Romero, University of A Corua, Spain; Javier Pereira Loureiro, University of
A Corua, Spain; and Alejandro Pazos Sierra, University of A Corua, Spain.............................................1283
Personalized Decision Support Systems / Neal Shambaugh, West Virginia University, USA........................1310
Planning Agent for Geriatric Residences / Javier Bajo, Universidad Pontificia de Salamanca, Spain;
Dante I. Tapia, Universidad de Salamanca, Spain; Sara Rodrguez, Universidad de Salamanca, Spain;
and Juan M. Corchado, Universidad de Salamanca, Spain...........................................................................1316
Privacy-Preserving Estimation / Mohammad Saad Al-Ahmadi, King Fahd University of Petroleum and
Minerals (KFUPM), Saudi Arabia; and Rathindra Sarathy, Oklahoma State University, USA....................1323
Randomized Hough Transform / Lei Xu, Chinese University of Hong Kong, Hong Kong & Peking
University, China; and Erkki Oja, Helsinki University of Technology, Finland............................................1343
RBF Networks for Power System Topology Verification / Robert Lukomski, Wroclaw University of
Technology, Poland; and Kazimierz Wilkosz, Wroclaw University of Technology, Poland............................1356
Robot Model of Dynamic Appraisal and Response, A / Carlos Herrera, Intelligent Systems Research
Centre University of Ulster, North Ireland; Tom Ziemke, University of Skovde, Sweden; and
Thomas M. McGinnity, Intelligent Systems Research Centre University of Ulster, University of Ulster,
North Ireland...................................................................................................................................................1376
Robots in Education / Muhammad Ali Yousuf, Tecnologico de Monterrey Santa Fe Campus, Mxico......1383
Robust Learning Algorithm with LTS Error Function / Andrzej Rusiecki, Wroclaw University of
Technology, Poland.........................................................................................................................................1389
Rough Set-Based Neuro-Fuzzy System / Kai Keng Ang, Institute for Infocomm Research, Singapore;
and Chai Quek, Nanyang Technological University, Singapore.....................................................................1396
Rule Engines and Agent-Based Systems / Agostino Poggi, Universit di Parma, Italy; and
Michele Tomaiuolo, Universit di Parma, Italy.............................................................................................1404
Sequence Processing with Recurrent Neural Networks / Chun-Cheng Peng, University of London, UK;
and George D. Magoulas, University of London, UK.................................................................................... 1411
Shortening Automated Negotiation Threads via Neural Nets / Ioanna Roussaki, National Technical
University of Athens, Greece; Ioannis Papaioannou, National Technical University of Athens, Greece;
and Miltiades Anagnostou, National Technical University of Athens, Greece...............................................1418
Signed Formulae as a New Update Process / Fernando Zacaras Flores, Benemrita Universidad
Autnoma de Puebla, Mexico; Dionicio Zacaras Flores, Benemrita Universidad Autnoma de Puebla,
Mexico; Rosalba Cuapa Canto, Benemrita Universidad Autnoma de Puebla, Mexico; and
Luis Miguel Guzmn Muoz, Benemrita Universidad Autnoma de Puebla, Mexico..................................1426
Solar Radiation Forecasting Model / Fatih Onur Hocaolu, Anadolu University Eskisehir, Turkey;
mer Nezih Gerek, Anadolu University Eskisehir, Turkey; and Mehmet Kurban, Anadolu University
Eskisehir, Turkey.............................................................................................................................................1433
State of the Art in Writers Off-Line Identification / Carlos M. Travieso Gonzlez, University of
Las Palmas de Gran Canaria, Spain; and Carlos F. Romero, University of Las Palmas de Gran
Canaria, Spain................................................................................................................................................1447
State-of-the-Art on Video-Based Face Recognition / Yan Yan, Tsinghua University, Beijing, China;
and Yu-Jin Zhang, Tsinghua University, Beijing, China.................................................................................1455
Stationary Density of Stochastic Search Processes / Arturo Berrones, Universidad Autnoma de Nuevo
Len, Mxico; Dexmont Pea, Universidad Autnoma de Nuevo Len, Mexico; and
Ricardo Snchez, Universidad Autnoma de Nuevo Len, Mexico................................................................1462
Statistical Simulations on Perceptron-Based Adders / Snorre Aunet, University of Oslo, Norway &
Centers for Neural Inspired Nano Architectures, Norway; and Hans Kristian Otnes Berge,
University of Oslo, Norway.............................................................................................................................1474
Stochastic Approximation Monte Carlo for MLP Learning / Faming Liang, Texas A&M
University, USA...............................................................................................................................................1482
Study of the Performance Effect of Genetic Operators, A / Pi-Sheng Deng, California State University
at Stanislaus, USA...........................................................................................................................................1504
Support Vector Machines / Cecilio Angulo, Technical University of Catalonia, Spain; and
Luis Gonzalez-Abril, Technical University of Catalonia, Spain.....................................................................1518
Swarm Intelligence Approach for Ad-Hoc Networks / Prayag Narula, University of Delhi, India;
Sudip Misra, Yale University, USA; and Sanjay Kumar Dhurandher, University of Delhi, India..................1530
Symbol Grounding Problem / Angelo Loula, State University of Feira de Santana, Brazil & State
University of Campinas (UNICAMP), Brazil; and Joo Queiroz, State University of Campinas (UNICAMP),
Brazil & Federal University of Bahia, Brazil.................................................................................................1543
Synthetic Neuron Implementations / Snorre Aunet, University of Oslo, Norway & Centers for
Neural Inspired Nano Architectures, Norway.................................................................................................1555
Thermal Design of Gas-Fired Cooktop Burners Through ANN / T.T. Wong, The Hong Kong Polytechnic
University, Hong Kong; and C.W. Leung, The Hong Kong Polytechnic University, Hong Kong..................1568
2D-PAGE Analysis Using Evolutionary Computation / Pablo Mesejo, University of A Corua, Spain;
Enrique Fernndez-Blanco, University of A Corua, Spain; Diego Martnez-Feijo, University of A
Corua, Spain; and Francisco J. Blanco, Juan Canalejo Hospital, Spain....................................................1583
Visualizing Cancer Databases Using Hybrid Spaces / Julio J. Valds, National Research Council Canada,
Canada; and Alan J. Barton, National Research Council Canada, Canada.................................................1589
Voltage Instability Detection Using Neural Networks / Adnan Khashman, Near East University,
Turkey; Kadri Buruncuk, Near East University, Turkey; and Samir Jabr, Near East University,
Turkey..............................................................................................................................................................1596
Preface
Through the history the man has always hoped the boost of three main characteristics: physical, metaphysical
and intellectual.
From the physical viewpoint he invented and developed all kind of tools: levers, wheels, cams, pistons, etc.,
until achieving the sophisticated machines existing nowadays.
Regarding the metaphysical aspect, the initial celebration of magical-animistic rituals led to attempts, either
real or literary, for creating ex nihilo life: life from inert substance. The most actual approaches involve the
cryoconservation of deceased people for them to be returned to life in the future; the generation of life at the
laboratories by means of cells, tissues, organs, systems or individuals created from previously frozen stem cells
is also currently aimed.
The third aspect considered, the intellectual one, is the most interesting here. There have been multiple
contributions, since devices that increased the calculi ability as the abacus appeared, until the later theoreti-
cal proposals for trying to solve problems, as the Ars Magna by Ramn Lull. The first written reference of the
Artificial Intelligence that is known is The Iliad, where Homer describes the visit of the goddess Thetis and her
son Achilles to the workshop of Hephaestus, god of smiths: At once he was helped along by female servants
made of gold, who moved to him. They look like living servant girls, possessing minds, hearts with intelligence,
vocal chords, and strength.
However, the first reference of Artificial Intelligence, as it is currently understood, can be found in the proposal
made by J. McCarthy to the Rockefeller Foundation in 1956; this proposal hoped for funds that might support
a month-lasting meeting of twelve researchers of the Dartmouth Summer Research Project in order to establish
the basis of the, McCarthy-named, Artificial Intelligence.
Although the precursors of the Artificial Intelligence (S. Ramn y Cajal, N. Wienner, D. Hebb, C. Shannon and
J. McCulloch, among many others), come from multiple science disciplines, the true driving forces (A. Turing,
J. von Neumann, M. Minsky, T. Gdell,) emerge in the second third of the XX century with the apparition of
certain tools, the computers, capable of handling fairly complex problems. Some other scientists, as J. Hopfield
or J. Holland, proposed at the last third of the century some biology-inspired approaches that enabled the treat-
ment of complex problems of the real world that even might require certain adaptive ability.
All this long and productive trend of the history of the Artificial Intelligence demanded an encyclopaedia that
might give expression to the current situation of this multidisciplinary topic, where researches from multiple
fields as neuroscience, computing science, cognitive sciences, exact sciences and different engineering areas
converge.
This work intends to provide a wide and well balanced coverage of all the points of interest that currently
exist in the field of Artificial Intelligence, from the most theoretical fundamentals to the most recent industrial
applications.
Multiple researches have been contacted and several notifications have been performed in different forums
of the scientific field dealt here.
All the proposals have been carefully revised by the editors for balancing, as far as possible, the contributions,
with the intention of achieving an accurately wide document that might exemplify this field.
xxxii
A first selection was performed after the reception of all the proposals and it was later sent to three external
expert reviewers in order to carry out a double-blind revision based on a peer review. As a result of this strict
and complex process, and before the final acceptance, a high number of contributions (80% approximately) were
rejected or required to be modified.
The effort of the last two years is now believed to be worthwhile; at least this is the belief of the editors
who, with the invaluable help of a high number of people mentioned in the acknowledgements, have managed
to get this complete encyclopaedia off the ground. The numbers speak for themselves: 233 articles published
that have been carried out by 442 authors from 38 different countries and also revised by 238 scientific review-
ers. The diverse and comprehensive coverage of the disciplines directly related with the Artificial Intelligence
is also believed to contribute to a better understanding of all the researching related to this important field of
study. It was also intended that the contributions compiled in this work might have a considerable impact on
the expansion and the development of the body of knowledge related to this wide field, for it to be an important
reference source used by researchers and system developers of this area. It was hoped that the encyclopaedia
might be an effective help in order to achieve a better understanding of concepts, problems, trends, challenges
and opportunities related to this field of study; it should be useful for the research colleagues, for the teaching
personnel, for the students, etc. The editors will be happy to know that this work could inspire the readers for
contributing to new advances and discoveries in this fantastic work area that might themselves also contribute
to a better life quality of different society aspects: productive processes, health care or any other area where a
system or product developed by techniques and procedures of Artificial Intelligence might be used.
xxxiii
Juan Ramn Rabual Dopico is associate professor in the Department of Information and Communications
Technologies, University of A Corua (Spain). He finished his graduate in computer science in 1996, and in
2002, he became a PhD in computer science with his thesis Methodology for the Development of Knowledge
Extraction Systems in ANNs and he became a PhD in civil engineering in 2008. He has worked on several
Spanish and European projects and has published many books and papers in several international journals. He
is currently working in the areas of evolutionary computation, artificial neural networks, and knowledge extrac-
tion systems.
Julian Dorado is associate professor in the Faculty of Computer Science, University of A Corua (Spain). He
finished his graduate in computer science in 1994. In 1999, he became a PhD, with a special mention of Euro-
pean doctor. In 2004, he finished his graduate in biology. He has worked as a teacher of the university for more
than 8 years. He has published many books and papers in several journals and international conferences. He is
presently working on bioinformatics, evolutionary computing, artificial neural networks, computer graphics,
and data mining.
Alejandro Pazos is professor in computer science, University of A Corua (Spain). He was born in Padron in
1959. He is MD by Faculty of Medicine, University of Santiago de Compostela in 1987. He obtained a Master
of Knowledge Engerineering in 1989 and a PhD in computer science in 1990 from the Polytechnique University
of Madrid. He also archives the PhD grade in Medicine in 1996 by the University Complutese of Madrid. He has
worked with research groups at Georgia Institute of Technology, Havard Medical School, Stanford University,
Politechnique University of Madrid, etc. He funded and is the director of the research laboratory Artificial Neural
Networks and Adaptative Systems in Computer science Faculty and is co-director of the Medical Informatics
and Radiology Diagnostic Center at the University of A Corua.
Horace H. S. Ip
City University of Hong Kong, Hong Kong
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Active Learning with SVM
Where
EvalFun(x, L, h, H): the function of evaluating potential query x (the lowest
value is the best here)
L: the current labeled training set
H: the hypothesis space
Version Space Theory (Warmuth, Ratsch, Mathieson, Liao, & Lemmem, 2003)
successfully applied the algorithm of playing billiard
Based on the Probability Approximation Correct learn- to randomly sample the classifiers in the SVM version
ing model, the goal of machine learning is to find a space and the experiments showed that its performance
consistent classifier which has the lowest generaliza- was comparable to the performance of standard dis-
tion error bound. The Gibbs generalization error bound tance-based ALSVM (SD-ALSVM) which will be
(McAllester, 1998) is defined as introduced later. The deficiency is that the processes
are time-consuming.
1 1 em 2
(m, PH , z, ) = ln + ln
m PH (V (z ))
Gibbs
Standard Distance Based Active
Learning with SVM
where PH denotes a prior distribution over hypothesis
space H, V(z) denotes the version space of the training For SVM, the version space can be defined as:
set z, m is the number of z and d is a constant in [0, 1].
It follows that the generalization error bound of the V = {w W | w = 1, yi ( w ( xi ) > 0, i = 1,..., m}
consistent classifiers is controlled by the volume of the
version space if the distribution of the version space where (.) denotes the function which map the original
is uniform. This provides a theoretical justification for
input space X into a high-dimensional space ( X ) , and
version space based ALSVMs.
W denotes the parameter space. SVM has two proper-
ties which lead to its tractability with AL. The first is
Query by Committee with SVM its duality property that each point w in V corresponds
to one hyperplane in ( X ) which divides ( X ) into
This algorithm was proposed by (Freund et al., 1997)
two parts and vice versa. The other property is that
in which 2k classifiers were randomly sampled and
the solution of SVM w* is the center of the version
the sample on which these classifiers have maximal
space when the version space is symmetric or near to
disagreement can approximately halve the version
its center when it is asymmetric.
space and then will be queried to the oracle. However,
Based on the above two properties, (Tong & Koller,
the complexity of the structure of the version space
2001) inferred a lemma that the sample nearest to the
leads to the difficulty of random sampling within it.
Active Learning with SVM
a
A
H y perplane i nduc ed by
S upport V ec tor
H y perplane i nduc ed by
c the c andidate s am ple
W* T he s olution of S V M
W*
Figure 2a. The projection of the parameter space around the Version Space
S upport V ec tors
b
c
a
+
+ + margin
- 1 C las s
+
+ + + 1 C las s
++ C andidate U nlabeled
S am ples
decision boundary can make the expected size of the instead of 7/8, of the size of the version space. It can
version space decrease fastest. Thus the sample nearest be observed that this was ascribed to the small angles
to the decision boundary will be queried to the oracle between their induced hyperplanes.
(Figure 2). This is the so-called SD-ALSVM which has To overcome this problem, (Brinker, 2003) proposed
low additional computations for selecting the queried a new selection strategy by incorporating diversity
sample and fine performance in real applications. measure that considers the angles between the induced
hyperplanes. Let the labeled set be L and the pool query
Batch Running Mode Distance Based set be Q in the current round, then based on the diversity
Active Learning with SVM criterion the further added sample xq should be
Active Learning with SVM
Figure 3. One example of simple batch querying with a, b and c samples with pure SD-ALSVM
c
a b
H y perplane i nduc ed by
S upport V ec tor
H y perplane i nduc ed by
the c andidate s am ple
W* T he s olution of S V M
W*
Figure 4. One example of batch querying with a, b and c samples by incorporating diversity into SD-
ALSVM
c a
b
H y perplane i nduc ed by
S upport V ec tor
H y perplane i nduc ed by
the c andidate s am ple
W* T he s olution of S V M
W*
T he larges t i ns c ribed
hy pers phere
where denotes the cosine value of the angle between I , I r ( i , 2 ) ,..., I r ( i ,s ( i ) ,..., I r ( i ,s ( i )+m1 ,..., I r ( i ,n )
two hyperplanes induced by xj and xi, thus it is known r ( i ,1)
most relevant queried data least relevant
as angle diversity criterion. It can be observed that
the reduced volume of the version space in figure 4 is
larger than that in Figure 3. In SD-ALSVM, s(i) is such as I r ( i ,s ( i ) ,..., I r ( i ,s ( i )+m 1
are the m closest samples to the SVM boundary. This
RETIN Active Learning strategy implicitly relies on a strong assumption: an
accurate estimation of SVM boundary. However, the
decision boundary is usually unstable at the initial
Let ( I j ) j[1...n ] be the samples in a potential query set
iterations. (Gosselin & Cord, 2004) noticed that, even
Q, and r(i, k) be the function that, at iteration i, codes
if the decision boundary may change a lot during the
the position k in the relevance ranking according to
earlier iterations, the ranking function r() is quite stable.
the distance to the current decision boundary, then a
Thus they proposed a balanced selection criterion that
sequence can be obtained as follows:
Active Learning with SVM
is independent on the frontier and in which an adaptive uncorrelated. It is difficult to ensure this condition in
method was designed to tune s during the feedback real applications. A
iterations. It was expressed by
Active Learning with SVM
1. Active Learning with Transductive SVM In AL, the feedback from the oracle can also help to
identify the important features, and (Raghavan, Madani,
In the first stages of SD-ALSVM, a few labeled data & Jones, 2006) showed that such works can improve the
may lead to great deviation of the current solution performance of the final classifier significantly. In (Su,
from the true solution; while if unlabeled samples are Li, & Zhang, 2001), Principal Components Analysis was
considered, the solution may be closer to the true solu- used to identify important features. To our knowledge,
tion. (Wang, Chan, & Zhang, 2003) showed that the there are few reports addressing the issue.
closer the current solution is to the true one, the larger
the size of the version space will be reduced. They The Scaling of Active Learning
incorporated Transductive SVM (TSVM) to produce
more accurate intermediate solutions. However, sev- The scaling of AL to very large database has not been
eral studies (T. Zhang & Oles, 2000) challenged that extensively studied yet. However, it is an important
TSVM might not be so helpful from unlabeled data issue for many real applications. Some approaches
in theory and in practice. (Hoi & Lyu, 2005) applied have been proposed on how to index database (Lai,
the semi-supervised learning techniques based on the Goh, & Chang, 2004) and how to overcome the concept
Gaussian fields and Harmonic functions instead and the complexities accompanied with the scalability of the
improvements were reported to be significant. dataset (Panda, Goh, & Chang, 2006).
Active Learning with SVM
Cohn, D., Atlas, L., & Ladner, R. (1994). Improving Paper presented at the Workshop on Computer Vision
Generalization with Active Learning. Machine Learn- Meets Databases (CVDB) in cooperation with ACM A
ing, 15, 201-221. International Conference on Management of Data
(SIGMOD), Paris.
Cohn, D. A. (1997). Minimizing Statistical Bias with
Queries. In Advances in Neural Information Processing McAllester, D. A. (1998). Some PAC Bayesian Theo-
Systems 9, Also appears as AI Lab Memo 1552, CBCL rems. Paper presented at the Proceedings of the 11th
Paper 124. M. Mozer et al, eds. Annual Conference on Computational Learning Theory,
Madison, Wisconsin.
Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996).
Active Learning with Statistical Models. Journal of McCallum, A. K., & Nigam, K. (1998). Employing EM
Artificial Intelligence Research, 4, 129-145. and Pool-Based Active Learning for Text Classification.
Paper presented at the Proceedings of 15th International
Cord, M., Gosselin, P. H., & Philipp-Foliguet, S. (2007).
Conference on Machine Learning.
Stochastic exploration and active learning for image
retrieval. Image and Vision Computing, 25(1), 14-23. Muslea, I., Minton, S., & Knoblock, C. A. (2000). Selec-
tive Sampling with Redundant Views. Paper presented
Dagli, C. K., Rajaram, S., & Huang, T. S. (2006). Utiliz-
at the Proceedings of the 17th National Conference on
ing Information Theoretic Theoretic Diversity for SVM
Artificial Intelligence.
Active Learning. Paper presented at the International
Conference on Pattern Recognition, Hong Kong. Muslea, I., Minton, S., & Knoblock, C. A. (2002).
Active+Semi-Supervised Learning = Robust Multi-View
Engelbrecht, A. P., & BRITS, R. (2002). Supervised
Learning. Paper presented at the Proceedings of the 19th
Training Using an Unsuerpvised Approach to Active
International Conference on Machine Learning.
Learning. Neural Processing Letters, 15, 14.
Panda, N., Goh, K., & Chang, E. Y. (2006). Active
Ferecatu, M., Crucianu, M., & Boujemaa, N. (2004).
Learning in Very Large Image Databases Journal of
Reducing the redundancy in the selection of samples
Multimedia Tools and Applications Special Issue on
for SVM-based relevance feedback
Computer Vision Meets Databases.
Freund, Y., Seung, H. S., Shamir, E., & Tishby, N.
Raghavan, H., Madani, O., & Jones, R. (2006). Ac-
(1997). Selective Sampling Using the Query by Com-
tive Learning with Feedback on Both Features and
mittee Algorithm. Machine Learning, 28, 133-168.
Instances. Journal of Machine Learning Research, 7,
Gosselin, P. H., & Cord, M. (2004). RETIN AL: an 1655-1686.
active learning strategy for image category retrieval.
Roy, N., & McCallum, A. (2001). Toward Optimal Ac-
Paper presented at the International Conference on
tive Learning Through Sampling Estimation of Error
Image Processing.
Reduction. Paper presented at the Proceedings of 18th
He, J., Li, M., Zhang, H.-J., Tong, H., & Zhang, C. International Conference on Machine Learning.
(2004). Mean version space: a new active learning
Su, Z., Li, S., & Zhang, H. (2001). Extraction of Fea-
method for content-based image retrieval. Paper
ture Subspaces for Content-based Retrieval Using
presented at the International Multimedia Conference
Relevance Feedback. Paper presented at the ACM
Proceedings of the 6th ACM SIGMM International
Multimedia, Ottawa, Ontario, Canada.
Workshop on Mulitimedia Information Retrieval.
Tong, S., & Koller, D. (2001). Support Vector Machine
Hoi, S. C. H., & Lyu, M. R. (2005). A semi-supervised
Active Learning with Application to Text Classification.
active learning framework for image retrieval. Paper
Journal of Machine Learning Research, 45-66.
presented at the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition. Wang, L., Chan, K. L., & Zhang, Z. (2003). Bootstrap-
ping SVM active learning by incorporating unlabelled
Lai, W.-C., Goh, K., & Chang, E. Y. (2004, June). On
images for image retrieval. Paper presented at the
Scalability of Active Learning for Formulating Query
Proceeding of IEEE Computer Vision and Pattern
Concepts (long version of the ICME invited paper).
Recognition.
Active Learning with SVM
Warmuth, M. K., Ratsch, G., Mathieson, M., Liao, J., KEy TERMS
& Lemmem, C. (2003). Active Learning in the Drug
Discovery Process. Journal of Chemical Information Heuristic Active Learning: The set of active
Sciences, 43(2), 667-673. learning algorithms in which the sample selection
criteria is based on some heuristic objective function.
Xu, Z., Xu, X., Yu, K., & Tresp, V. (2003). A Hybrid
For example, version space based active learning is
Relevance-feedback Approach to Text Retrieval. Paper
to select the sample which can reduce the size of the
presented at the Proceedings of the 25th European
version space.
Conference on Information Retrieval Research, Lecture
Notes in Computer Science. Hypothesis Space: The set of all hypotheses
in which the objective hypothesis is assumed to be
Yin, P., Bhanu, B., Chang, K., & Dong, A. (2005).
found.
Integrating Relevance Feedback Techniques for Image
Retrieval Using Reinforcement Learning IEEE Trans- Semi-Supervised Learning: The set of learning
actions on Pattern Analysis and Machine Intelligence, algorithms in which both labelled and unlabelled data
27(10), 1536-1551. in the training dataset are directly used to train the
classifier.
Zhang, L., Lin, F., & Zhang, B. (2001). Support Vector
Machine Learning for Image Retrieval. Paper presented Statistical Active Learning: The set of active
at the International Conference on Image Processing. learning algorithms in which the sample selection
criteria is based on some statistical objective function,
Zhang, T., & Oles, F. (2000). A Probability Analysis
such as minimization of generalisation error, bias and
on The Value of Unlabeled Data for Classification
variance. Statistical active learning is usually statisti-
Problems. Paper presented at the Proceeding of 17th
cally optimal.
International Conference of Machine Learning, San
Francisco, CA. Supervised Learning: The set of learning algo-
rithms in which the samples in the training dataset are
all labelled.
Unsupervised Learning: The set of learning al-
gorithms in which the samples in training dataset are
all unlabelled.
Version Space: The subset of the hypothesis space
which is consistent with the training set.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Adaptive Algorithms for Intelligent Geometric Computing
for the next viewpoint can be computed by refining the significantly more efficient than conventional methods
existing mesh. (Broz et.al., 2007).
We first move to discuss evolutionary computing.
We demonstrate the power of adaptive computation by
ADAPTIVE GEOMETRIC COMPUTING developing and applying adaptive computational model
to missile simulation (Apu and Gavrilova, 2006). The
This chapter presents Adaptive Multi-Resolution developed adaptive algorithms described above have a
Technique for real-time terrain visualization utiliz- property that spatial memory units can form, refine and
ing a clever way of optimizing mesh dynamically for collapse to simulate learning, adapting and responding
smooth and continuous visualization with a very high to stimuli. The result is a complex multi-agent learning
efficiency (frame rate) (Apu and Gavrilova (2005) algorithm that clearly demonstrates organic behaviors
(2007)). Our method is characterized by the efficient such as sense of territory, trails, tracks etc. observed in
representation of massive underlying terrain, utilizes flocks/herds of wild animals and insects. This gives a
efficient transition between detail levels, and achieves motivation to explore the mechanism in application to
frame rate constancy ensuring visual continuity. At the swarm behavior modeling.
core of the method is adaptive processing: a formalized Swarm Intelligence (SI) is the property of a system
hierarchical representation that exploits the subsequent whereby the collective behaviors of unsophisticated
refinement principal. This allows us a full control over agents interacting locally with their environment cause
the complexity of the feature space. An error metric is coherent functional global patterns to emerge (Bo-
assigned by a higher level process where objects (or fea- nabeau, 1999). Swarm intelligence provides a basis
tures) are initially classified into different labels. Thus, for exploration of a collective (distributed) behavior
this adaptive method is highly useful for feature space of a group of agents without centralized control or the
representation. In 2006, Gavrilova and Apu showed provision of a global model. Agents in such system
that such methods can act as a powerful tool not only have limited perception (or intelligence) and cannot
for terrain rendering, but also for motion planning and individually carry out the complex tasks. According
adaptive simulations (Apu and Gavrilova, 2006). They to Bonebeau, by regulating the behavior of the agents
introduced Adaptive Spatial Memory (ASM) model in the swarm, one can demonstrate emergent behavior
that utilizes adaptive approach for real-time online and intelligence as a collective phenomenon. Although
algorithm for multi-agent collaborative motion plan- the swarming phenomenon is largely observed in
ning. They have demonstrate that the powerful notion biological organisms such as an ant colony or a flock
of adaptive computation can be applied to perception of birds, it is recently being used to simulate complex
and understanding of space. Extension of this method dynamic systems focused towards accomplishing a
for 3D motion planning as part of collaborative research well-defined objective (Kennedy, 2001, Raupp ans
with Prof. I. Kolingerova group has been reported to be Thalmann, 2001).
0
Adaptive Algorithms for Intelligent Geometric Computing
Let us now investigate application of the adaptive neuvers). Each attempt to destroy the target is called
computational paradigm and swarm intelligence con- an attack simulation run. Its effectiveness equals to A
cept to missile behavior simulation (Apu and Gavrilova, the number of missiles hitting the target. Therefore the
2006). First of all, let us note that complex strategic outcome of the simulation is easily quantifiable. On the
behavior can be observed by means of a task oriented other hand, the interaction between missiles and the
artificial evolutionary process in which behaviors of battleship is complex and nontrivial. As a result, war
individual missiles are described in surprising simplic- strategies may emerge in which a local penalty (i.e.
ity. Secondly, the global effectiveness and behavior of sacrificing a missile) can optimize global efficiency (i.e.
the missile swarm is relatively unaffected by disruption deception strategy). The simplest form of information
or destruction of individual units. From a strategic point known to each missile is its position and orientation
of view, this adaptive behavior is a strongly desired and the location of the target. This information is aug-
property in military applications, which motivates mented with information about missile neighborhood
our interest in applying it to missile simulation. Note and environment, which influences missile navigation
that this problem was chosen as it presents a complex pattern. For actual missile behavior simulation, we
challenge for which an optimum solution is very hard use strategy based on the modified version of Boids
to obtain using traditional methods. The dynamic and flocking technique.
competitive relationship between missiles and turrets We have just outlined the necessary set of actions
makes it extremely difficult to model using a determin- to reach the target or interact with the environment.
istic approach. It should also be noted that the problem This is the basic building block of missile navigation.
has an easy evaluation metric that allows determining The gene string is another important part that reflects
fitness values precisely. the complexity with which such courses of action
Now, let us summarize the idea of evolutionary could be chosen. It contains a unique combination of
optimization by applying genetic algorithm to evolve maneuvers (such as attack, evasion, etc.) that evolve
the missile genotype. We are particularly interested in to create complex combined intelligence. We describe
observing the evolution of complex 3D formations and the fitness of the missile gene in terms of collective
tactical strategies that the swarm learns to maximize performance. After investigating various possibilities,
their effectiveness during an attack simulation run. The we developed and used a two dimensional adaptive
simulation is based on attack, evasion and defense. fitness function to evolve the missile strains in one
While the missile sets strategy to strike the target, the evolutionary system. Details on this approach can be
battle ship prepares to shoot down as many missiles found in (Apu and Gavrilova, 2006).
as possible (Figure 2 illustrates the basic missile ma- After extensive experimentation, we have found
many interesting characteristics, such as geometric at-
tack formation and organic behaviors observed among
swarms in addition to the highly anticipated strategies
Figure 2. Basic maneuvers for a missile using the such as simultaneous attack, deception, retreat and
Gene String other strategies (see Figure 3). We also examined the
adaptability by randomizing the simulation coordinates,
distance, initial formation, attack rate, and other param-
eters of missiles and measured the mean and variance
of the fitness function. Results have shown that many
of the genotypes that evolved are highly adaptive to
the environment.
We have just reviewed the application of the adap-
tive computational paradigm to swarm intelligence and
briefly described the efficient tactical swarm simulation
method (Apu and Gavrilova 2006). The results clearly
demonstrate that the swarm is able to develop complex
strategy through the evolutionary process of genotype
mutation. This contribution among other works on
Adaptive Algorithms for Intelligent Geometric Computing
adaptive computational intelligence will be profiled in the same time as adaptive methodology was developing
detail in the upcoming book as part of Springer-Verlag in GIS, interest to topology-based data structures, such
book series on Computational Intelligence (Gavrilova, as Voronoi diagrams and Delaunay triangulations,
2007). has grown significantly. Some preliminary results on
As stated in the introduction, adaptive computation utilization of these topology-based data structures in
is based on a variable complexity level of detail para- biometric began to appear. For instance, research on
digm, where a physical phenomenon can be simulated image processing using Voronoi diagrams was presented
by the continuous process of local adaptation of spatial in (Liang and Asano, 2004, Asano, 2006), studies of
complexity. As presented by M. Gavrilova in Plenary utilizing Voronoi diagram for fingerprint synthesis
Lecture at 3IA Eurographics Conference, France in were conducted by (Bebis et. al., 1999, Capelli et. al.
2006, the adaptive paradigm is a powerful compu- 2002), and various surveys of methods for modeling
tational model that can also be applied to vast area of human faces using triangular mesh appeared in
of biometric research. This section therefore reviews (Wen and Huang, 2004, Li and Jain, 2005, Wayman
methods and techniques based on adaptive geomet- et. al. 2005). Some interesting results were recently
ric methods in application to biometric problems. It obtained in the BTLab, University of Calgary, through
emphasizes advantages that intelligent approach to the development of topology-based feature extrac-
geometric computing brings to the area of complex tion algorithms for fingerprint matching (Wang et. al.
biometric data processing (Gavrilova 2007). 2006, 2007, illustration is found in Figure 4), 3D facial
In information technology, biometrics refers to a expression modeling (Luo et. al. 2006) and iris syn-
study of physical and behavioral characteristics with thesis (Wecker et. al. 2005). A comprehensive review
the purpose of person identification (Yanushkevich, of topology-based approaches in biometric modeling
Gavrilova, Wang and Srihari, 2007). In recent years, the and synthesis can be found in recent book chapter on
area of biometrics has witnessed a tremendous growth, the subject (Gavrilova, 2007).
partly as a result of a pressing need for increased secu- In this chapter, we propose to manage the challenges
rity, and partly as a response to the new technological arising from large volumes of complex biometric data
advances that are literally changing the way we live. through the innovative utilization of the adaptive para-
Availability of much more affordable storage and the digm. We suggest combination of topology-based and
high resolution image biometric capturing devices hierarchy based methodology to store and search for
have contributed to accumulating very large datasets of biometric data, as well as to optimize such representation
biometric data. In the earlier sections, we have studied based on the data access and usage. Namely, retrieval
the background of the adaptive mesh generation. Let of the data, or creating real-time visualization can be
us now look at the background research in topology- based on the dynamic patter of data usage (how often,
based data structures, and its application to biometric what type of data, how much details, etc.), recorded
research. This information is highly relevant to goals of and analyzed in the process of the biometric system
modeling and visualizing complex biometric data. At being used for recognition and identification purposes.
(a) Deception pattern (b) Distraction pattern (c) Organic motion pattern
Adaptive Algorithms for Intelligent Geometric Computing
In addition to using this information for optimized changes to the model. The methodology has a high
data representation and retrieval, we also propose to potential of becoming one of the key approaches in
incorporate intelligent learning techniques to predict biometric data modeling and synthesis.
most likely patters of the system usage and to represent
and organize data accordingly.
On a practical side, to achieve our goal, we propose a CONCLUSION
novel way to represent complex biometric data through
the organization of the data in a hierarchical tree-like The chapter reviewed the adaptive computational
structure. Such organization is similar in principle to paradigm in application to surface modeling, evolu-
the Adaptive Memory Subdivision (AMS), capable of tionary computing and biometric research. Some of the
representing and retrieving varies amount of informa- key future developments in the upcoming years will
tion and level of detail that needs to be represented. undoubtedly highlight the area, inspiring new genera-
Spatial quad-tree is used to hold the information about tions of intelligent biometric systems with adaptive
the system, as well as the instructions on how to process behavior.
this information. Expansion is realized through the
spatial subdivision technique that refines the data and
increases level of details, and the collapsing is real- REFERENCES
ized through the merge operation that simplifies the
data representation and makes it more compact. The Apu R. & Gavrilova M (2005) Geo-Mass: Modeling
greedy strategy is used to optimally adapt to the best Massive Terrain in Real-Time, GEOMATICA J. 59(3),
representation based on the user requirements, amount 313-322.
of available data and resources, required resolution and
so on. This powerful technique enables us to achieve Apu R. & Gavrilova M. (2006) Battle Swarm: An Evo-
the goal of compact biometric data representation, that lutionary Approach to Complex Swarm Intelligence,
allows for instance to efficiently store minor details 3IA Int. C. Comp. Graphics and AI, Limoges, France,
of the modeled face (e.g. scars, wrinkles) or detailed 139-150.
patterns of the iris. Apu, R & Gavrilova, M. (2007) Fast and Efficient
Rendering System for Real-Time Terrain Visualization,
IJCSE Journal, 2(2), 5/6.
FUTURE TRENDS
Apu, R. & Gavrilova, M. (2006) An Efficient Swarm
In addition to data representation, adaptive technique Neighborhood Management for a 3D Tactical Simulator,
can be highly useful in biometric feature extraction with IEEE-CS proceedings, ISVD 2006, 85- 93
the purpose of fast and reliable retrieval and matching
of the biometric data, and in implementing dynamic
Adaptive Algorithms for Intelligent Geometric Computing
Asano, T. (2006) Aspect-Ratio Voronoi Diagram with Li Sheng, Liu Xuehui & Wu Enhau, (2003) Feature-
Applications, ISVD 2006, IEEE-CS proceedings, 32- Based Visibility-Driven CLOD for Terrain, In Proc.
39 Pacific Graphics 2003, 313-322, IEEE Press
Bebis G., Deaconu T & Georiopoulous, M. (1999) Li, S. & Jain, A. (2005) Handbook of Face Recogni-
Fingerprint Identification using Delaunay Triangula- tion. Springer-Verlag
tion, ICIIS 99, Maryland, 452-459
Liang X.F. & Asano T. (2004) A fast denoising method
Bonabeau, E., Dorigo, M. & Theraulaz, G. (1999) for binary fingerprint image, IASTED, Spain, 309-
Swarm Intelligence: From Natural to Artificial Systems, 313
NY: Oxford Univ. Press
Lindstrom, P. & Koller, D. (1996) Real-time continuous
Broz, P., Kolingerova, I, Zitka, P., Apu R. & Gavrilova level of detail rendering of height fields, SIGGRAPH
M. (2007) Path planning in dynamic environment using 1996 Proceedings, 109-118
an adaptive mesh, SCCG 2007, Spring Conference on
Luo, Y, Gavrilova, M. & Sousa M.C. (2006) NPAR by
Computer Graphics 2007, ACM SIGGRAPH
Example: line drawing facial animation from photo-
Capelli R, Maio, D, Maltoni D. (2002) Synthetic Fin- graphs, CGIV06, IEEE, Computer Graphics, Imaging
gerprint-Database Generation, ICPR 2002, Canada, and Visualization, 514-521
vol 3, 369-376
Raupp S. & Thalmann D. (2001) Hierarchical Model
Duchaineauy, M. et. al. (1997) ROAMing Terrain: for Real Time Simulation of Virtual Human Crowds,
Real-Time Optimally Adapting Meshes, IEEE Visu- IEEE Trans. on Visualization and Computer Graphics
alization 97, 81-88 7(2), 152-164
Gavrilova M.L. (2007) Computational Geometry and Shafae, M. & Pajarola, R. (2003) Dstrips: Dynamic
Image Processing in Biometrics: on the Path to Conver- Triangle Strips for Real-Time Mesh Simplification and
gence, in Book Image Pattern Recognition: Synthesis Rendering, Pacific Graphics 2003, 271-280
and Analysis in Biometrics, Book Chapter 4, 103-133,
Wang, C, Luo, Y, Gavrilova M & Rokne J. (2007)
World Scientific Publishers
Fingerprint Image Matching Using a Hierarchical
Gavrilova M.L. Computational Intelligence: A Geom- Approach, in Book Computational Intelligence in
etry-Based Approach, in book series Studies in Com- Information Assurance and Security, Springer SCI
putational Intelligence, Springer-Verlag, Ed. Janusz Series, 175-198
Kacprzyk, to appear.
Wang, H, Gavrilova, M, Luo Y. & J. Rokne (2006) An
Gavrilova, M.L. (2006) IEEE_CS Book of the 3rd Efficient Algorithm for Fingerprint Matching, ICPR
International Symposium on Voronoi Diagrams in 2006, Int. C. on Pattern Recognition, Hong Kong,
Science and Engineering, IEEE-CS, Softcover, 2006, IEEE-CS, 1034-1037
270 pages.
Wayman J, Jain A, Maltoni D & Maio D. (2005) Bio-
Gavrilova, M.L. (2006) Geometric Algorithms in 3D metric Systems: Technology, Design and Performance
Real-Time Rendering and Facial Expression Modeling, Evaluation, Book, Springer
3IA2006 Plenary Lecture, Eurographics, Limoges,
Wecker L, Samavati, F & Gavrilova M (2005) Iris
France, 5-18
Synthesis: A Multi-Resolution Approach, GRAPHITE
Hoppe, H. (1997) View-Dependent Refinement of 2005, ACM Press. 121-125
Progressive Meshes, SIGGRAPH 97 Proceedings,
Wen, Z. & Huang, T. (2004) 3D Face Processing:
189-198
Modeling, Analysis and Synthesis, Kluwer
Kennedy, J., Eberhart, R. C., & Shi, Y. (2001) Swarm
Yanushkevich, S, Gavrilova M., Wang, P & Srihari
Intelligence, San Francisco: Morgan Kaufmann Pub-
S. (2007) Image Pattern Recognition: Synthesis and
lishers
Analysis in Biometrics, Book World Scientific
Adaptive Algorithms for Intelligent Geometric Computing
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Adaptive Business Intelligence
MAIN FOCUS OF THE CHAPTER Davenport and Prusak, 2006; Prusak, 1997; Shortliffe
and Cimino, 2006), the commonly accepted distinction A
The answer to my problem is hidden in my data but between data, information, and knowledge is:
I cannot dig it up! This popular statement has been
around for years as business managers gathered and Data are collected on a daily basis in the form of
stored massive amounts of data in the belief that they bits, numbers, symbols, and objects.
contain some valuable insight. But business manag- Information is organized data, which are pre-
ers eventually discovered that raw data are rarely of processed, cleaned, arranged into structures, and
any benefit, and that their real value depends on an stripped of redundancy.
organizations ability to analyze them. Hence, the need Knowledge is integrated information, which
emerged for software systems capable of retrieving, includes facts and relationships that have been
summarizing, and interpreting data for end-users (Moss perceived, discovered, or learned.
and Atre, 2003).
This need fueled the emergence of hundreds of Because knowledge is such an essential component
business intelligence companies that specialized in of any decision-making process (as the old saying
providing software systems and services for extract- goes, Knowledge is power!), many businesses have
ing knowledge from raw data. These software systems viewed knowledge as the final objective. But it seems
would analyze a companys operational data and provide that knowledge is no longer enough. A business may
knowledge in the form of tables, graphs, pies, charts, know a lot about its customers it may have hun-
and other statistics. For example, a business intelligence dreds of charts and graphs that organize its customers
report may state that 57% of customers are between the by age, preferences, geographical location, and sales
ages of 40 and 50, or that product X sells much better history but management may still be unsure of
in Florida than in Georgia.1 what decision to make! And here lies the difference
Consequently, the general goal of most business between decision support and decision making:
intelligence systems was to: (1) access data from a all the knowledge in the world will not guarantee the
variety of different sources; (2) transform these data right or best decision.
into information, and then into knowledge; and (3) Moreover, recent research in psychology indicates
provide an easy-to-use graphical interface to display that widely held beliefs can actually hamper the deci-
this knowledge. In other words, a business intelligence sion-making process. For example, common beliefs like
system was responsible for collecting and digesting data, the more knowledge we have, the better our decisions
and presenting knowledge in a friendly way (thus en- will be, or we can distinguish between useful and
hancing the end-users ability to make good decisions). irrelevant knowledge, are not supported by empirical
The diagram in Figure 1 illustrates the processes that evidence. Having more knowledge merely increases
underpin a traditional business intelligence system. our confidence, but it does not improve the accuracy of
Although different texts have illustrated the relation- our decisions. Similarly, people supplied with good
ship between data and knowledge in different ways (e.g., and bad knowledge often have trouble distinguishing
I
N K
F N
D O O
R W
A Data Data
M L
T Preparation A Mining
E
A T D
I
G
O
N E
Adaptive Business Intelligence
between the two, proving that irrelevant knowledge (based on past data), but it does so in a very special way:
decreases our decision-making effectiveness. An Adaptive Business Intelligence system incorporates
Today, most business managers realize that a prediction and optimization modules to recommend
gap exists between having the right knowledge and near-optimal decisions, and an adaptability module
making the right decision. Because this gap affects for improving future recommendations. Such systems
managements ability to answer fundamental business can help business managers make decisions that in-
questions (such as What should be done to increase crease efficiency, productivity, and competitiveness.
profits? Reduce costs? Or increase market share?), Furthermore, the importance of adaptability cannot be
the future of business intelligence lies in systems that overemphasized. After all, what is the point of using a
can provide answers and recommendations, rather than software system that produces sub par schedules, inac-
mounds of knowledge in the form of reports. The future curate demand forecasts, and inferior logistic plans, time
of business intelligence lies in systems that can make after time? Would it not be wonderful to use a software
decisions! As a result, there is a new trend emerging in system that could adapt to changes in the marketplace?
the marketplace called Adaptive Business Intelligence. A software system that could improve with time?
In addition to performing the role of traditional busi-
ness intelligence (transforming data into knowledge),
Adaptive Business Intelligence also includes the deci- FUTURE TRENDS
sion-making process, which is based on prediction and
optimization as shown in Figure 2. The concept of adaptability is certainly gaining popu-
While business intelligence is often defined as a larity, and not just in the software sector. Adaptability
broad category of application programs and technolo- has already been introduced in everything from auto-
gies for gathering, storing, analyzing, and providing matic car transmissions (which adapt their gear-change
access to data, the term Adaptive Business Intelligence patterns to a drivers driving style), to running shoes
can be defined as the discipline of using prediction and (which adapt their cushioning level to a runners size
optimization techniques to build self-learning decision- and stride), to Internet search engines (which adapt
ing systems (as the above diagram shows). Adaptive their search results to a users preferences and prior
Business Intelligence systems include elements of data search history). These products are very appealing for
mining, predictive modeling, forecasting, optimization, individual consumers, because, despite their mass pro-
and adaptability, and are used by business managers to duction, they are capable of adapting to the preferences
make better decisions. of each unique owner after some period of time.
This relatively new approach to business intelligence The growing popularity of adaptability is also
is capable of recommending the best course of action underscored by a recent publication of the US De-
Adaptability
I
N
K
D
F N Optimization
E
D O O
C
R W
A Data M
Data
L I
T Preparation A Mining S
E
T I
A D
O
I
G Prediction
O N
N E
Adaptive Business Intelligence
partment of Defense. This lists 19 important research However, this is about to change. Keith Collins, the
topics for the next decade and many of them include Chief Technology Officer of SAS Institute (Collins et A
the term adaptive: Adaptive Coordinated Control in al. 2007) believes that:
the Multi-agent 3D Dynamic Battlefield, Control for
Adaptive and Cooperative Systems, Adaptive System A new platform definition is emerging for business
Interoperability, Adaptive Materials for Energy-Absorb- intelligence, where BI is no longer defined as simple
ing Structures, and Complex Adaptive Networks for query and reporting. [] In the next five years, well
Cooperative Control. also see a shift in performance management to what
For sure, adaptability was recognized as important were calling predictive performance management,
component of intelligence quite some time ago: Alfred where analytics play a huge role in moving us beyond
Binet (born 1857), French psychologist and inventor just simple metrics to more powerful measures.
of the first usable intelligence test, defined intelligence
as ... judgment, otherwise called good sense, practi- Further, Jim Davis, the VP Marketing of SAS
cal sense, initiative, the faculty of adapting ones self Institute (Collins et al. 2007) stated:
to circumstances. Adaptability is a vital component
of any intelligent system, as it is hard to argue that a In the next three to five years, well reach a tipping
system is intelligent if it does not have the capacity point where more organizations will be using BI to
to adapt. For humans, the importance of adaptability focus on how to optimize processes and influence the
is obvious: our ability to adapt was a key element in bottom line .
the evolutionary process. In psychology, a behavior or
trait is adaptive when it helps an individual adjust and Finally, it would be important to incorporate adapt-
function well within a changing social environment. ability in prediction and optimization components of
In the case of artificial intelligence, consider a chess the future Adaptive Business Intelligence systems.
program capable of beating the world chess master: There are some recent, successful implementations
Should we call this program intelligent? Probably of Adaptive Business Intelligence systems reported
not. We can attribute the programs performance to its (e.g., Michalewicz et al. 2005), which provide daily
ability to evaluate the current board situation against a decision support for large corporations and result in
multitude of possible future boards before selecting multi-million dollars return on investment. There are
the best move. However, because the program cannot also companies (e.g., www.solveitsoftware.com) which
learn or adapt to new rules, the program will lose its specialize in development of Adaptive Business Intelli-
effectiveness if the rules of the game are changed or gence tools. However, further research effort is required.
modified. Consequently, because the program is inca- For example, most of the research in machine learning
pable of learning or adapting to new rules, the program has focused on using historical data to build prediction
is not intelligent. models. Once the model is built and evaluated, the goal
The same holds true for any expert system. No one is accomplished. However, because new data arrive at
questions the usefulness of expert systems in some en- regular intervals, building and evaluating a model is just
vironments (which are usually well defined and static), the first step in Adaptive Business Intelligence. Because
but expert systems that are incapable of learning and these models need to be updated regularly (something
adapting should not be called intelligent. Some expert that the adaptability module is responsible for), we
knowledge was programmed in, that is all. expect to see more emphasis on this updating process
So, what are the future trends for Adaptive Business in machine learning research. Also, the frequency of
Intelligence? In words of Jim Goodnight, the CEO of updating the prediction module, which can vary from
SAS Institute (Collins et al. 2007): seconds (e.g., in real-time currency trading systems),
to weeks and months (e.g., in fraud detection systems)
Until recently, business intelligence was limited to may require different techniques and methodologies.
basic query and reporting, and it never really provided In general, Adaptive Business Intelligence systems
that much intelligence . would include the research results from control theory,
statistics, operations research, machine learning, and
modern heuristic methods, to name a few. We also
Adaptive Business Intelligence
expect that major advances will continue to be made in Intelligence is all about. Systems based on Adaptive
modern optimization techniques. In the years to come, Business Intelligence aim at solving real-world busi-
more and more research papers will be published on ness problems that have complex constraints, are set
constrained and multi-objective optimization problems, in time-changing environments, have several (possibly
and on optimization problems set in dynamic environ- conflicting) objectives, and where the number of pos-
ments. This is essential, as most real-world business sible solutions is too large to enumerate. Solving these
problems are constrained, multi-objective, and set in problems requires a system that incorporates modules
a time-changing environment. for prediction, optimization, and adaptability.
CONCLUSION REFERENCES
It is not surprising that the fundamental components of Coello, C.A.C., Van Veldhuizen, A.A., and Lamont,
Adaptive Business Intelligence are already emerging G.B. (2002). Evolutionary algorithms for solving multi-
in other areas of business. For example, the Six Sigma objective problems. Kluwer Academic.
methodology is a great example of a well-structured,
Collins, K., Goodnight, J., Hagstrm, M., Davis, J.
data-driven methodology for eliminating defects, waste,
(2007). The future of business intelligence: Four ques-
and quality-control problems in many industries. This
tions, four views. SASCOM, First quarter, 2007.
methodology recommends the sequence of steps shown
in Figure 3. Davenport, T.H. and Prusak, L. (2006). Working knowl-
Note that the above sequence is very close in edge. Academic Internet Publishers.
spirit to part of the previous diagram, as it describes
(in more detail) the adaptability control loop. Clearly, Deb, K. (2001). Multi-objective optimization using
we have to measure, analyze, and improve, as evolutionary algorithms.Wiley.
we operate in a dynamic environment, so the process of Dhar, V. and Stein, R., (1997). Seven methods for
improvement is continuous. The SAS Institute proposes transforming corporate data into business intelligence.
another methodology, which is more oriented towards Prentice Hall.
data mining activities. Their methodology recommends
the sequence of steps shown in Figure 4. Loshin, D. (2003). Business intelligence: The savvy
Again, note that the above sequence is very close managers guide. Margan Kaufmann.
to another part of our diagram, as it describes (in more Makridakis, S., Wheelwright, S.C., and Hyndman,
detail) the transformation from data to knowledge. It R.J. (1998). Forecasting: Methods and applications.
is not surprising that businesses are placing consider- Wiley.
able emphasis on these areas, because better decisions
usually translate into better financial performance. And Michalewicz, Z. and Fogel, D.B. (2004). How to solve
better financial performance is what Adaptive Business it: Modern heuristics, 2nd edition. Springer.
0
Adaptive Business Intelligence
Michalewicz, Z., Schmidt, M., Michalewicz, M., and TERMS AND DEFINITIONS
Chiriac, C. (2005). A decision-support system based A
on computational intelligence: A case study. IEEE Adaptive Business Intelligence: The discipline of
Intelligent Systems, 20(4), 44-49. using prediction and optimization techniques to build
self-learning decisioning systems.
Michalewicz, Z., Schmidt, M., Michalewicz, M., and
Chiriac, C. (2007). Adaptive business intelligence. Business Intelligence: A collection of tools,
Springer. methods, technologies, and processes needed to
transform data into actionable knowledge.
Moss, L. T. and Atre, S. (2003). Business intelligence
roadmap. Addison Wesley. Data: Pieces collected on a daily basis in the form
of bits, numbers, symbols, and objects.
Prusak, L. (1997). Knowledge in organizations. But-
terworth-Heinemann. Data Mining: The application of analytical methods
and tools to data for the purpose of identifying patterns,
Shortliffe, E. H. and Cimino, J. J. Eds (2006). Biomedi-
relationships, or obtaining systems that perform useful
cal informatics: Computer applications in health care
tasks such as classification, prediction, estimation, or
and biomedicine. Springer.
affinity grouping.
Vitt, E., Luckevich, M., and Misner, S. (2002). Business
Information: Organized data, which are prepro-
intelligence: Making better decisions faster. Microsoft
cessed, cleaned, arranged into structures, and stripped
Press.
of redundancy.
Weiss, S. M. and Indurkhya, N., (1998). Predictive
Knowledge: Integrated information, which in-
data mining. Morgan Kaufmann.
cludes facts and relationships that have been perceived,
Witten, I. H. and Frank, E. (2005). Data mining: discovered, or learned.
Practical machine learning tools and techniques, 2nd
Optimization: Process of finding the solution that
edition. Morgan Kaufmann.
is the best fit to the available resources.
Prediction: A statement or claim that a particular
event will occur in the future.
ENDNOTE
1
Note that business intelligence can be defined both
as a state (a report that contains knowledge) and
a process (software responsible for converting
data into knowledge).
INTRODUCTION BACKGROUND
Artificial neural networks (ANNs) (McCulloch & Pitts, In neural computation, transforming methods amount
1943) (Haykin, 1999) were developed as models of their to unsupervised learning, since the representation is
biological counterparts aiming to emulate the real neural only learned from data without any external control.
systems and mimic the structural organization and func- Irrespective of the nature of learning, the neural adap-
tion of the human brain. Their applications were based tation may be formally conceived as an optimization
on the ability of self-designing to solve a problem by problem: an objective function describes the task to be
learning the solution from data. A comparative study of performed by the network and a numerical optimization
neural implementations running principal component procedure allows adapting network parameters (e.g.,
analysis (PCA) and independent component analysis connection weights, biases, internal parameters). This
(ICA) was carried out. Artificially generated data ad- process amounts to search or nonlinear programming
ditively corrupted with white noise in order to enforce in a quite large parameter space. However, any prior
randomness were employed to critically evaluate and knowledge available on the solution might be efficiently
assess the reliability of data projections. Analysis in both exploited to narrow the search space. In supervised
time and frequency domains showed the superiority of learning, the additional knowledge is incorporated in
the estimated independent components (ICs) relative the net architecture or learning rules (Gold, 1996). A
to principal components (PCs) in faithful retrieval of less extensive research was focused on unsupervised
the genuine (latent) source signals. learning. In this respect, the mathematical methods
Neural computation belongs to information pro- usually employed are drawn from classical constrained
cessing dealing with adaptive, parallel, and distributed multivariate nonlinear optimization and rely on the
(localized) signal processing. In data analysis, a com- Lagrange multipliers method, the penalty or barrier
mon task consists in finding an adequate subspace of techniques, and the classical numerical algebra tech-
multivariate data for subsequent processing and inter- niques, such as deflation/renormalization (Fiori, 2000),
pretation. Linear transforms are frequently employed the Gram-Schmidt orthogonalization procedure, or the
in data model selection due to their computational and projection over the orthogonal group (Yang, 1995).
conceptual simplicity. Some common linear transforms
are PCA, factor analysis (FA), projection pursuit (PP), PCA and ICA Models
and, more recently, ICA (Comon, 1994). The latter
emerged as an extension of nonlinear PCA (Hotelling, Mathematically, the linear stationary PCA and ICA mod-
1993) and developed in the context of blind source els can be defined on the basis of a common data model.
separation (BSS) (Cardoso, 1998) in signal and array Suppose that some stochastic processes are represented
processing. ICA is also related to recent theories of
by three random (column) vectors x (t ), n (t ) N
the visual brain (Barlow, 1991), which assume that
consecutive processing steps lead to a progressive re- and s (t ) M with zero mean and finite covariance,
duction in the redundancy of representation (Olshausen
with the components of s (t ) = {s1 (t ), s2 (t ),..., sM (t )}
and Field, 1996).
being statistically independent and at most one Gauss-
This contribution is an overview of the PCA and
ian. Let A be a rectangular constant full column rank
ICA neuromorphic architectures and their associated
algorithmic implementations increasingly used as ex- N M matrix with at least as many rows as columns
ploratory techniques. The discussion is conducted on ( N M ), and denote by t the sample index (i.e., time
artificially generated sub- and super-Gaussian source or sample point) taking the discrete values t = 1, 2, ...,
signals.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Adaptive Neural Algorithms for PCA and ICA
Here s (t ) , x (t ), n (t ) , and A are the sources, principal eigenvectors {c1 , c 2 ,..., c L } with L < N , is
the observed data, the (unknown) noise in data, and called the PCA subspace of dimensionality L.
the (unknown) mixing matrix, respectively, whereas The ICA problem can be formulated as following:
ai , i = 1, 2,..., M are the columns of A. Mixing is sup- given T realizations of x (t ), estimate both the matrix
posed to be instantaneous, so there is no time delay A and the corresponding realizations of s (t ) . In BSS
between a (latent) source variable si (t ) mixing into the task is somewhat relaxed to finding the waveforms
an observable (data) variable x j (t ) , with i = 1, 2, ..., {s (t )} of the sources knowing only the (observed)
i
M and j = 1, 2, ..., N.
mixtures {x j (t )}. If no suppositions are made about
Consider that the stochastic vector process
the noise, the additive noise term is omitted in (1). A
{x (t )} N
has the mean E {x (t )}= 0 and the covari- practical strategy is to include noise in the signals as
supplementary term(s): hence the ICA model (Fig. 2)
{ }
ance matrix Cx = E x (t ) x (t ) . The goal of PCA is
T
becomes:
to identify the dependence structure in each dimension
and to come out with an orthogonal transform matrix M
x (t ) = As (t ) = ai si (t ) (2)
W of size L N from N to L , L < N , such that i =1
the L-dimensional output vector y (t ) = W x (t ) suf- The source separation consists in updating an unmix-
ficiently represents the intrinsic features of the input ing matrix B (t ) , without resorting to any information
data, and where the covariance matrix Cy of {y (t )} about the spatial mixing matrix A, so that the output vec-
is a diagonal matrix D with the diagonal elements ar- tor y (t ) = B (t ) x (t ) becomes an estimate y (t ) = s (t )
ranged in descending order, di ,i di +1,i +1 . The restoration of the original independent source signals s (t ) . The
of {x (t )} from {y (t )}, say {x (t )}, is consequently separating matrix B (t ) is divided in two parts deal-
given by x (t ) = W W x (t ) (Figure 1). For a given
T ing with dependencies in the first two moments, i.e.,
L, PCA aims to find an optimal value of W, such as the whitening matrix V (t ), and the dependencies in
Adaptive Neural Algorithms for PCA and ICA
Adaptive Neural Algorithms for PCA and ICA
function used. An update rule is specified by the itera- Due to the instability of the above nonlinear Heb-
tive incremental change W of the rotation matrix W, bian learning rule for the multi-unit case, a different A
which gives the general form of the learning rule: approach based on optimizing two criteria simultane-
ously was introduced (Oja, 1982):
W W + W (3)
W (t + 1) = W (t ) + (t ) x (t ) g (y (t ) )+ (t ) (I W (t ) W (t ) )
T T
First, consider a single artificial neuron receiving an Here (t ) is chosen positive or negative depending
M-dimensional input vector x. It gradually adapts its on our interest in maximizing or minimizing, respec-
weight vector w so that the function E f (w T x ) is { } tively, the objective function J1 (w i ) = E f (xT w i ) { }
maximized, where E is the expectation with respect to . Similarly, (t ) is another gain parameter that is
the (unknown) probability density of x and f is a con- always positive and constrains the weight vectors to
tinuous objective function. The function f is bounded by orthonormality, which is imposed by an appropriate
setting constant the Euclidian norm of w. A constrained penalty function such as:
gradient ascent learning rule based on a sequence of
sample functions for relatively small learning rates
(t ) is then (Oja, 1995): 1 1 M
(1 w Ti w i ) + (w wj) .
2 2
J 2 (w i ) = T
i
2 2 j =1, j i
( ) (
w (t + 1) = w (t ) + a (t ) I w (t ) w (t ) x (t ) g w (t ) w (t )
T T
)
This is the bigradient algorithm, which is iterated until
(4) the weight vectors have converged with the desired
accuracy. This algorithm can use normalized Hebbian
where g = f . Any PCA learning rules tend to find that or anti-Hebbian learning in a unified formula. Starting
direction in the input space along which the data has from one-unit rule, the multi-unit bigradient algorithm
maximal variance. If all directions in the input space can simultaneously extract several robust counterparts
have equal variance, the one-unit case with a suitable of the principal or minor eigenvectors of the data co-
nonlinearity is approximately minimizing the kurtosis variance matrix (Wang, 1996).
of the neuron input. It means that the weight vector of In the case of multilayered ANNs, the transfer
the unit will be determined by the direction in the input functions of the hidden nodes can be expressed by
space on which the projection of the input data is mostly radial basis functions (RBF), whose parameters could
clustered and deviates significantly from normality. This be learnt by a two-stage gradient descent strategy. A
task is essentially the goal in the PP technique. new growing RBF-node insertion strategy with different
In the case of single layer ANNs consisting of L RBF is used in order to improve the net performances.
parallel units, with each unit i having the same M- The learning strategy is reported to save computational
element input vector x and its own weight vector time and memory space in approximation of continuous
and discontinuous mappings (Esposito et al., 2000).
w i that together comprise an M L weight matrix
W = [w1 ,w 2 ,... ,w L ] the following training rule ob- Neural ICA
tained from (4) is a generalization of the linear PCA
learning rule (in matrix form): Various forms of unsupervised learning have been
implemented in ANNs beyond standard PCA like non-
linear PCA and ICA. Data whitening can be neurally
(
W (t + 1) = W (t ) + a (t ) I W (t ) W (t )
T
) x (t ) g (x (t ) W (t ))
T
emulated by PCA with a simple iterative algorithm that
(5) updates the sphering matrix V (t ):
Adaptive Neural Algorithms for PCA and ICA
V (t + 1) = V (t ) (t )(vvT I ) (7)
oped from the infomax principle satisfying a general
stability criterion and preserving the simple initial
architecture of the network. Applying either natural
After getting the decorrelation matrix V (t ), the ba- or relative gradient (Cardoso, 1996) for optimization,
sic task for ICA algorithms remains to come out with an their learning rule yields results that compete with
fixed-point batch computations.
orthogonal matrix W (t ) , which is equivalent to a suit-
The equivariant adaptive separation via indepen-
able rotation of the decorrelated data v (t ) = V (t )x (t ) dence (EASI) algorithm introduced by Cardoso and
aiming to maximize the product of the marginal densities Laheld (1996) is a nonlinear decorrelation method. The
of its components. There are various neural approaches
objective function J (W ) = E {f (Wx )} is subject to
to estimate the rotation matrix W (t ) . An important class minimization with the orthogonal constraint imposed
of algorithms is based on maximization of network on W and the nonlinearity g = f chosen according to
entropy (Bell, 1995). The BS nonlinear information data kurtosis. Its basic update rule equates to:
maximization (infomax) algorithm performs online
stochastic gradient ascent in mutual information (MI)
between outputs and inputs of a network. By minimiz-
W = (yy T
)
I + g (y )y T yg (y T ) W
{ }
2
of the form J G (w ) = E f (w x ) E {f (v )} with
T
1
W = WT + x (1 2y )
T
and w 0 = 1 2y v standing for the standardized Gaussian variable. The
learning rule is basically a Gram-Schmidt-like decor-
(8) relation method.
Adaptive Neural Algorithms for PCA and ICA
(
S (2 ) = sign sin (12 t + 9 cos (2 29 )) )
Saw-tooth:
S (3) = (rem (t , 79 ) 17 ) 23
Impulsive curve:
(
S (4 ) = (rem (t , 23) 11) 9 )
5
( )
S (6 ) = (rand (1, T ) < .5 ) 2 1 log (rand (1, T ))
Figure 4. Sub-Gaussian (left) and super-Gaussian (right) source signals and their corresponding histograms
(bottom)
Adaptive Neural Algorithms for PCA and ICA
performing ICA can be measured by means of some signals, times the number of time samples,
quantitative indexes. The first we used was defined as and times the module of the source signals:
the signal-to-interference ratio (SIR):
max (Qi )
2
1 N N
T 2
SIR =
N
10 log
i =1
10
QiT Qi max (Qi )
2
1
xi (t ) yi (t )
i =1 t =1
SRE = , t = 1, 2,..., T
TN N
T 2
(11) xi (t )
i =1 t =1
(13)
where Q = BA is the overall transforming matrix of
The lower the SRE is, the better the estimates ap-
the latent source components, Qi is the i-th column proximate the latent source signals.
of Q, max (Qi ) is the maximum element of Qi , and The stabilized version of FastICA algorithm is at-
N is the number of the source signals. The higher the tractive by its fast and reliable convergence, and by the
SIR is, the better the separation performance of the lack of parameters to be tuned. The natural gradient
algorithm. incorporated in the BS extended infomax performs
A secondly employed index was the distance be- better than the original gradient ascent and is compu-
tween the overall transforming matrix Q and an ideal tationally less demanding. Though the BS algorithm
permutation matrix, which is interpreted as the cross- is theoretically optimal in the sense of dealing with
talking error (CTE): mutual information as objective function, like all neu-
ral unsupervised algorithms, its performance heavily
depends on the learning rates and its convergence is
N N Qij N N Qij rather slow. The EGLD algorithm separates skewed
CTE = 1 + 1
i =1 j =1 max Qi j =1 i =1 max Q j distributions, even for zero kurtosis. In terms of com-
putational time, the BS extended infomax algorithm
(12) was the fastest, FastICA more faithfully retrieved the
sources among all algorithms under test, while the
Above, Qij is the ij-th element of Q, max Qi is EASI algorithm came out with a full transform matrix
the maximum absolute valued element of the row i Q that is the closest to unity.
in Q, and max Q j is the maximum absolute valued
element of the column j in Q. A permutation matrix is
defined so that on each of its rows and columns, only FUTURE TRENDS
one of the elements equals to unity while all the other
elements are zero. It means that the CTE attains its Neuromorphic methods in exploratory analysis and
minimum value zero for an exact permutation matrix data mining are rapidly emerging applications of unsu-
(i.e., perfect decomposition) and goes positively higher pervised neural training. In recent years, new learning
the more Q deviates from a permutation matrix (i.e., algorithms have been proposed, yet their theoretical
decomposition of lower accuracy). properties, range of optimal applicability, and compara-
We defined the relative signal retrieval er- tive assessment have remained largely unexplored. No
ror (SRE) as the Euclidian distance between the convergence theorems are associated with the training
source signals and their best matching estimated algorithms in use. Moreover, algorithm convergence
components normalized to the number of source heavily depends on the proper choice of the learning
rate(s) and, even when convergence is accomplished,
the neural algorithms are relatively slow compared with
batch-type computations. Nonlinear and nonstationary
neural ICA is expected to be developed due to ANNs
Adaptive Neural Algorithms for PCA and ICA
nonalgorithmic processing and their ability to learn Esposito, A., Marinaro, M., & Scarpetta, S. (2000). Ap-
nonanalytical relationships if adequately trained. proximation of continuous and discontinuous mappings A
by a growing neural RBF-based algorithm. Neural
Networks, 13(6) 651-665.
CONCLUSION
Fiori, S., & Piazza, F. (2000). A general class of APEX-
Both PCA and ICA share some common features like like PCA neural algorithms, IEEE Transactions on
aiming at building generative models that are likely Circuits and Systems - Part I. 47, 1394-1398.
to have produced the observed data and performing Gold, S., Rangarajan, A., & Mjolsness, E. (1996).
information preservation and redundancy reduction. Learning with preknowledge: Clustering with point
In a neuromorphic approach, the model parameters and graph matching distance. Neural Computation,
are treated as network weights that are changed during 8, 787-804.
the learning process. The main difficulty in function
approximation stems from choosing the network pa- Haykin, S. (1999). Neural networks (2nd ed.). Engle-
rameters that have to be fixed a priori, and those that wood Cliffs, NJ: Prentice Hall.
must be learnt by means of an adequate training rule. Hotelling, H. (1993). Analysis of a complex of statis-
PCA and ICA have major applications in data tical variables into principal components. Journal of
mining and exploratory data analysis, such as signal Educational Psychology, 24, 417-441 and 498-520.
characterization, optimal feature extraction, and data
compression, as well as the basis of subspace classi- Hyvrinen, A., & Oja, E. (1997). A fast fixed-point algo-
fiers in pattern recognition. ICA is much better suited rithm for ICA. Neural Computation, 9, 1483-1492.
than PCA to perform BSS, blind deconvolution, and Karhunen, J. (1996). Neural approaches to independent
equalization. component analysis and source separation. Proceedings
ESANN96, Bruges, Belgium, 249-266.
Adaptive Neural Algorithms for PCA and ICA
Yang, B. (1995). Projection approximation subspace Exploratory Data Analysis (EDA): An approach
tracking. IEEE Transactions on Signal Processing, based on allowing the data itself to reveal its underly-
43, 1247-1252. ing structure and model heavily using the collection of
techniques known as statistical graphics.
KEy TERMS Independent Component Analysis (ICA): An
exploratory method for separating a linear mixture of
Artificial Neural Networks (ANNs): An informa- latent signal sources into independent components as
tion-processing synthetic system made up of several optimal estimates of the original sources on the basis
simple nonlinear processing units connected by ele- of their mutual statistical independence and non-Gaus-
ments that have information storage and programming sianity.
functions adapting and learning from patterns, which
Learning Rule: Weight change strategy in a con-
mimics a biological neural network.
nectionist system aiming to optimize a certain objective
Blind Source Separation (BSS): Separation of function. Learning rules are iteratively applied to the
latent nonredundant (e.g., mutually statistically inde- training set inputs with error gradually reduced as the
pendent or decorrelated) source signals from a set of weights are adapting.
linear mixtures, such that the regularity of each result-
Principal Component Analysis (PCA): An or-
ing signal is maximized, and the regularity between
thogonal linear transform based on singular value
the signals is minimized (i.e. statistical independence
decomposition that projects data to a subspace that
is maximized) without (almost) any information on
preserves maximum variance.
the sources.
Confirmatory Data Analysis (CDA): An approach
which, subsequent to data acquisition, proceeds with
the imposition of a prior model and analysis, estima-
tion, and testing model parameters.
0
Kristian Williamson
Statistics Canada, Canada
Elarbi Badidi
United Arab Emirates University, UAE
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Adaptive Neuro-Fuzzy Systems
and outputs, the neural network can be trained to ap- Neural-Fuzzy Systems (NFS): are fuzzy systems
proximate it. The disadvantages of neural networks are augmented by neural networks (Jin, 2003,
they are not guaranteed to converge, that is to be trained pp.111-140).
properly, and after they have been trained they cannot
give any information about why they take a particular There also four main architectures used for imple-
course of action when given a particular input. menting neuro-fuzzy systems:
Fuzzy logic Inference systems can give human
readable and understandable information about why Fuzzy Multi-layer networks (Jang, 1993; Mitra et
a particular course of action was taken because it is al., 1995; Mitra et al., 2000; Mamdani et al., 1999;
governed by a series of IF THEN rules. Fuzzy logic Sugeno et al., 1988, Takagi et al., 1985).
systems can adapt in a way that their rules and the pa- Fuzzy Self-Organizing Map networks (Drobics
rameters of the fuzzy sets associated with those rules et al., 2000; Kosko, 1997, pp. 98; Haykin, 1999,
can be changed to meet some criteria. However fuzzy pp. 443)
logic systems lack the capability for self-learning, Black-Box Fuzzy ANN (Bellazzi et al., 1999; Qiu,
and must be modified by an external entity. Another 2000; Monti, 1996)
salient feature of fuzzy logic systems is that they are, Hybrid Architectures (Zatwarnicki, 2005; Borzem-
like artificial neural networks, capable of acting as ski et al., 2003; Marichal et al., 2001; Rahmoun et
universal approximators. al., 2001; Koprinska et al., 2000; Wang et al. 1999;
The common feature of being able to act as a uni- Whitfort et al., 1995).
versal approximator is the basis of most attempts to
merge these two technologies. Not only it can be used
to approximate a function but it can also be used by both DEVELOPMENT OF ADAPTIVE
neural networks, and fuzzy logic systems to approximate NEURO-FUZZy SySTEMS
each other as well. (Pal et al., 1999, pp. 66)
Universal approximation is the ability of a system Developing an Adaptive Neuro-fuzzy system is a pro-
to replicate a function to some degree. Both neural cess that is similar to the procedures used to create fuzzy
networks and fuzzy logic systems do this by using a logic systems, and neural networks. One advantage of
non-mathematical model of the system (Jang et al., this combined approach is that it is usually no more
1997, pp. 238; Pal et al., 1999, pp. 19). The term ap- complicated than either approach taken individually.
proximate is used as the model does not have to match As noted above, there are two methods of creating
the simulated function exactly, although it is sometime a Neuro-fuzzy system; integrating fuzzy logic into a
possible to do so if enough information about the func- neural network framework (FNN), and implementing
tion is available. In most cases it is not necessary or neural networks into a fuzzy logic system (NFS). A
even desirable to perfectly simulate a function as this fuzzy neural network is just a neural network with some
takes time and resources that may not be available and fuzzy logic components; hence is generally trained like
close is often good enough. a normal neural network is.
Training Process: The training regimen for a NFS
Categories of Neuro-Fuzzy Systems differs slightly from that used to create a neural network
and a fuzzy logic system in some key ways, while at
Efforts to combine fuzzy logic and neural networks have the same time incorporating many improvements over
been underway for several years and many methods those training methods.
have been attempted and implemented. These methods The training process of a Neuro-fuzzy system has
are of two major categories: five main steps: (Von Altrock, 1995, pp. 71-75)
Fuzzy Neural Networks (FNN): are neural networks Obtain Training Data: The data must cover all
that can use fuzzy data, such as fuzzy rules, sets possible inputs and output, and all the critical
and values (Jin, 2003, pp.205-220). regions of the function if it is to model it in an
appropriate manner.
Adaptive Neuro-Fuzzy Systems
Create a Fuzzy Logic System: The fuzzy system Limitation of ANF systems: The greatest limitation
may be an existing system which is known to work, in creating adaptive systems is known as the Curse of A
such as one that has been in production for some Dimensionality, which is named after the exponen-
time or one that has been created by following tial growth in the number of features that the model
expert system development methodologies. has to keep track of as the number of input attributes
Define the Neural Fuzzy Learning: This phase increases. Each attribute in the model is a variable in
deals with defining what you want the system to the system, which corresponds to an axis in a multidi-
learn. This allows greater control over the learning mensional graph that the function is mapped into. The
process while still allowing for rule knowledge connections between different attributes correspond to
discovery. the number of potential rules in the system as given
Training Phase: To run the training algorithm. by the formula:
The algorithm may have parameters that can be
adjusted to modify how the system is to be modi- Nrules = (Llingustic_terms)variables (Gorrostieta et al., 2006)
fied during training.
Optimization and Verification: Validation can This formula becomes more complicated if there are
take many forms, but will usually involve feeding different numbers of linguistic variables (fuzzy sets)
the system a series of known inputs to determine covering each attribute dimension. Fortunately there are
if the system generates the desired output, and or ways around this problem. As the neural fuzzy system
is within acceptable parameters. Furthermore, the is only approximating the function being modeled, the
rules and membership functions may be extracted system may not need all the attributes to achieve the
so they can be examined by human experts for desired results.
correctness. Another area of criticism in the Neuro-fuzzy field is
related to aspects that cant be learned or approximated.
One of the most known aspects here is the caveat at-
CONCLUSION AND FUTURE tached to the universal approximation. In fact, the
DEVELOPMENTS function being approximated has to be continuous; a
continuous function is a function that does not have
Advantages of ANF systems: Although there are many a singularity, a point where it goes to infinity. Other
ways to implement a Neuro-fuzzy system, the advan- functions that Adaptive Neuro-fuzzy systems may have
tages described for these systems are remarkably uni- problems learning are things like encryption algorithms,
form across the literature. The advantages attributed to which are purposely designed to be resistant to this
Neuro-fuzzy systems as compared to ANNs are usually type of analysis.
related to the following aspects: Future developments: Predicting the future has
always been hard; however for ANF technology the
Faster to train: This is due to the massive num- future expansion has been made easy because of the
ber of connections present in the ANN, and the widespread use of its basis technology (neural networks
non-trivial number of calculations associated with and fuzzy logic). Mixing of these technologies creates
each. As well, most neural fuzzy systems can be synergies as they remediate to each other weaknesses.
trained by going through the data once, whereas a ANF technology allows complex system to be grown
neural network may need to be exposed to the same instead of someone having to build them.
training data many times before it converges. One of the most promising areas for ANF systems
Less computational resources: Neural fuzzy sys- is System Mining. There exist many cases where we
tem is smaller in size and contains fewer internal wish to automate a system that cannot be systematically
connections than a comparable ANN, hence it is described in a mathematical manner. This means there is
faster and use significantly less resources. no way of creating a system using classical development
Offer the possibility to extract the rules: This methodologies (i.e. Programming a simulation.). If we
is a major advantage over ANNs in that the rules have an adequately large set of examples of inputs and
governing a system can be communicated to the their corresponding outputs, ANF can be used to get
human users in an easily understandable form. a model of the system. The rules and their associated
Adaptive Neuro-Fuzzy Systems
fuzzy sets can then be extracted from this system and Haykin, S. (1999). Neural Networks: A Comprehensive
examined for details about how the system works. This Foundation. Prentice Hall Publishers, 2nd edition
knowledge can be used to build the system directly.
Jang, J. S. R., Sun C. T. & Mizutani E. (1997). Neuro-
One interesting application of this technology is to audit
Fuzzy and Soft Computing: A Computational Approach
existing complex systems. The extracted rules could
to Learning and Machine Intelligence. Prentice Hall
be used to determine if the rules match the exceptions
Publishers, US Ed edition.
of what the system is supposed to do, and even detect
fraud actions. Alternatively, the extracted model may Jang, J.-S.R. (1993). ANFIS: adaptive-network-based
show an alternative, and or more efficient manner of fuzzy inference system. IEEE Transactions on Systems,
implementing the system. Man and Cybernetics, (23) 3, 665 685.
Jensen, R. & Shen, Q. (2004). Semantics-preserving
dimensionality reduction: rough and fuzzy-rough-based
REFERENCES
approaches. IEEE Transactions on Knowledge and Data
Engineering, (16) 12, 1457 1471.
Ang, K. K. & Quek, C. (2005). RSPOP: Rough Set-
Based Pseudo Outer-Product Fuzzy Rule Identification Jin Y. (2000). Fuzzy modeling of high-dimensional
Algorithm. Neural Computation, (17) 1, 205-243. systems: Complexity reduction and interpretability
improvement. IEEE Transactions on Fuzzy Systems,
Bellazzi, R., Guglielmann, R. & Ironi L. (1999). A
(8) 2, 212-221.
qualitative-fuzzy framework for nonlinear black-box
system identification. In Dean T., editor, Proceedings Jin, Y. (2003). Advanced Fuzzy Systems Design and
of the Sixteenth International Joint Conference on Applications. Physica-Verlag Heidelberg Publishers;
Artificial Intelligence (IJCAI 99), volume 2, pages 1 edition.
10411046. Morgan Kaufmann Publishers.
Klir, G. J. & Yuan, B. (1995). Fuzzy Sets and Fuzzy
Borzemski, L. & Zatwarnicki, K. (2003). A fuzzy adap- Logic: Theory and Applications. Prentice Hall PTR
tive request distribution algorithm for cluster-based Publishers; 1st edition.
Web systems. In the Proceedings Eleventh Euromicro
Conference on Parallel, Distributed and Network- Koprinska, L. & Kasabov, N. (2000). Evolving fuzzy
Based Processing, 119 - 126. Institute of Electrical & neural network for camera operations recognition. In
Electronics Engineering Publisher. Proceedings of the 15th International Conference on
Pattern Recognition, 523 - 526 vol.2. IEEE Computer
Chavan, S., Shah, K., Dave, N., Mukherjee, S., Abra- Society Press Publisher.
ham, A., & Sanyal, S. (2004). Adaptive neuro-fuzzy
intrusion detection systems. In Proceedings of the Kosko, B. (1997). Fuzzy Engineering. Prentice Hall
International Conference on Information Technology: Publishers, 1st edition.
Coding and Computing, ITCC 2004, 70 - 74 Vol.1. Mamdani, E. H. & Assilian, S. (1999). An Experiment
Institute of Electrical & Electronics Engineering in Linguistic Synthesis with a Fuzzy Logic Controller.
Publisher. International Journal of Human-Computer Studies,
Drobics, M., Winiwater & W., Bodenhofer, U. (2000). (51) 2, 135-147.
Interpretation of self-organizing maps with fuzzy rules. Marichal, G.N., Acosta, L., Moreno, L., Mendez, J.A.
In Proceedings of the 12th IEEE International Confer- & Rodrigo, J. J. (2001). Obstacle Avoidance for a
ence on Tools with Artificial Intelligence (ICTAI00), Mobile Robot: A neuro-fuzzy approach. Fuzzy Sets
p. 0304. IEEE Computer Society Press. and Systems, (124) 2, 171- 179.
Gorrostieta, E. & Pedraza, C. (2006). Neuro Fuzzy Mitra, S. & Hayashi Y. (2000). Neuro-fuzzy rule gen-
Modeling of Control Systems. In Proceedings of the eration: survey in soft computing framework. IEEE
16th IEEE International Conference on Electronics, Transactions on Neural Networks, (11) 3, 748 768.
Communications and Computers (CONIELECOMP
2006), 23 23. IEEE Computer Society Publisher.
Adaptive Neuro-Fuzzy Systems
Mitra, S. & Pal, S. K. (1995). Fuzzy multi-layer percep- Takagi T. & Sugeno M. (1985). Fuzzy identification of
tron, inferencing and rule generation. IEEE Transactions systems and its applications to modeling and control. A
on Neural Networks, (6) 1, 51-63. IEEE Transactions on Systems, Man, and Cybernetics,
(15), 116-132.
Monti, A. (1996). A fuzzy-based black-box approach
to IGBT modeling. In Proceedings of the Third IEEE Von Altrock, C. (1995). Fuzzy Logic and Neuro Fuzzy
International Conference on Electronics, Circuits, and Applications Explained. Prentice Hall Publishers.
Systems: ICECS 96. 1147 - 1150 vol.2. Institute of
Wang L. & Yen J. (1999). Extracting Fuzzy Rules for
Electrical & Electronics Engineering Publisher.
System Modeling Using a Hybrid of Genetic Algo-
Muller, P. & Insua, D.R. (1998). Issues in Bayesian rithms and Kalman Filter. Fuzzy Sets Systems, (101)
Analysis of Neural Network Models. Neural Computa- 353362.
tion (10) 3, 749-770.
Whitfort, T., Matthews, C. & Jagielska, I. (1995). Auto-
Nauck, D. & Kruse R. (1999). Neuro-fuzzy systems mated knowledge acquisition for a fuzzy classification
for function approximation. Fuzzy Sets and Systems problem. In Kasabov, N. K. & Coghill, G. (Editors),
(101) 261-271. Proceedings of the Second New Zealand International
Two-Stream Conference on Artificial Neural Networks
Nauck, D. & Kruse, R. (1998). A neuro-fuzzy approach
and Expert Systems, 227 230. IEEE Computer Society
to obtain interpretable fuzzy systems for function ap-
Press Publisher.
proximation. In Wcci 98: Proceedings of Fuzz-IEEE
98, 1106 - 1111 vol.2. IEEE World Congress on Yen, J. & Langari, R. (1998). Fuzzy Logic: Intelligence,
Computational Intelligence. Institute of Electrical & Control, and Information. Prentice Hall Publishers.
Electronics Engineering Publisher.
Zadeh, L. A. (1994). Fuzzy Logic, Neural Networks,
Pal, S. K. & Mitra S. (1999). Neuro-Fuzzy Pattern and Soft Computing. Communications of the ACM
Recognition: Methods in Soft Computing. John Wiley (37) 3, 77-84.
& Sons Publishers, 1st edition.
Zatwarnicki, K. (2005). Proposal of a neuro-fuzzy
Pal, S.K., Mitra, S. & Mitra, P. (2003). Rough-fuzzy model of a WWW server. Proceedings of the Fifth In-
MLP: modular evolution, rule generation, and evalu- ternational Conference on Intelligent Systems Design
ation. IEEE Transactions on Knowledge and Data and Applications ISDA 05, 141 146. Institute of
Engineering, (15) 1, 14 25. Electrical & Electronics Engineering Publisher.
Qiu F. (2000). Opening the black box of neural networks
with fuzzy set theory to facilitate the understanding of KEy TERMS
remote sensing image processing. In Proceedings of
the IEEE 2000 International Geoscience and Remote Artificial Neural Networks (ANN): An artificial
Sensing Symposium: Taking the Pulse of the Planet: neural network, often just called a neural network
The Role of Remote Sensing in Managing the Environ- (NN), is an interconnected group of artificial neurons
ment, IGARSS 2000. 1531 - 1533 vol.4. Institute of that uses a mathematical model or computational model
Electrical & Electronics Engineering Publisher. for information processing based on a connectionist
approach to computation. Knowledge is acquired by
Rahmoun, A. & Berrani, S. (2001). A genetic-based
the network from its environment through a learning
neuro-fuzzy generator: NEFGEN. ACS/IEEE Inter-
process, and interneuron connection strengths (synaptic
national Conference on Computer Systems and Ap-
weighs) are used to store the acquired knowledge.
plications, 18 23. Institute of Electrical & Electronics
Engineering Publisher. Evolving Fuzzy Neural Network (EFuNN): An
Evolving Fuzzy Neural Network is a dynamic archi-
Sugeno, M. & Kang, G. T. (1998). Structure identifi-
tecture where the rule nodes grow if needed and shrink
cation of fuzzy model. Fuzzy Sets and Systems, (28)
by aggregation. New rule units and connections can
1, 15-33.
be added easily without disrupting existing nodes.
Adaptive Neuro-Fuzzy Systems
The learning scheme is often based on the concept of is trained using unsupervised learning to produce low
winning rule node. dimensional representation of the training samples while
preserving the topological properties of the input space.
Fuzzy Logic: Fuzzy logic is an application area of
The self-organizing map is a single layer feed-forward
fuzzy set theory dealing with uncertainty in reasoning.
network where the output syntaxes are arranged in low
It utilizes concepts, principles, and methods developed
dimensional (usually 2D or 3D) grid. Each input is con-
within fuzzy set theory for formulating various forms
nected to all output neurons. Attached to every neuron
of sound approximate reasoning. Fuzzy logic allows for
there is a weight vector with the same dimensionality
set membership values to range (inclusively) between
as the input vectors. The number of input dimensions
0 and 1, and in its linguistic form, imprecise concepts
is usually a lot higher than the output grid dimension.
like slightly, quite and very. Specifically, it al-
SOMs are mainly used for dimensionality reduction
lows partial membership in a set.
rather than expansion.
Fuzzy Neural Networks (FNN): are Neural Net-
Soft Computing: Soft Computing refers to a
works that are enhanced with fuzzy logic capability such
partnership of computational techniques in computer
as using fuzzy data, fuzzy rules, sets and values.
science, artificial intelligence, machine learning and
Neuro-Fuzzy Systems (NFS): A neuro-fuzzy sys- some engineering disciplines, which attempt to study,
tem is a fuzzy system that uses a learning algorithm model, and analyze complex phenomena. The principle
derived from or inspired by neural network theory to partners at this juncture are fuzzy logic, neuron-com-
determine its parameters (fuzzy sets and fuzzy rules) puting, probabilistic reasoning, and genetic algorithms.
by processing data samples. Thus the principle of soft computing is to exploit the
tolerance for imprecision, uncertainty, and partial truth
Self-Organizing Map (SOM): The self-organiz- to achieve tractability, robustness, low cost solution,
ing map is a subtype of artificial neural networks. It and better rapport with reality.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Adaptive Technology and Its Applications
from the rule set defining some subjacent non-adap- configuration ci in response to some input stimulus
tive device. s S { }, yielding its next configuration ci +1 .
Adaptive actions enable adaptive devices to dynami- Successive applications of rules in response to a
cally change their behavior without external help, by
stream w S * of input stimuli, starting from the initial
modifying their own set of defining rules whenever
configuration c0 and leading to some final configuration
their subjacent rule is executed.
For practical reasons, up to two adaptive actions are c A is denoted c0 *w c (The star postfix operator in
allowed: one to be performed prior to the execution of the formulae denotes the Kleene closure: its preceding
its underlying rule, and the other, after it. element may be re-instantiated or reapplied an arbitrary
An adaptive device behaves just as it were piecewise number of times).
non-adaptive: starting with the configuration of its initial We say that D defines a sentence w if, and only if,
underlying device, it iterates the following two steps,
c0 *w c holds for some c A . The collection L(D) of all
until reaching some well-defined final configuration:
such sentences is called the language defined by D:
While no adaptive action is executed, run the
underlying device; { }
L(D ) = w S * | c0 *w c, c A .
Modify the set of rules defining the device by
executing an adaptive action.
Adaptive (Rule-Driven) Devices
Rule-Driven Devices
An adaptive rule-driven device AD = (ND0 , AM )
A rule-driven device is any formal abstraction whose associates an initial subjacent rule-driven device
ND0 = (C , NR0 , S , c0 , A), to some adaptive mechanism
behavior is described by a rule set that maps each pos-
sible configuration of the device into a corresponding AM, that can dynamically change its behavior by modi-
next one. fying its defining rules.
A device is deterministic when, for any configuration That is accomplished by executing non-null adap-
and any input, a single next configuration is possible. tive actions chosen from a set AA of adaptive actions,
Otherwise, it is said non-deterministic. which includes the null adaptive action a0.
Non-deterministic devices allow multiple valid A built-in counter t starts at 0 and is self-incre-
possibilities for each move, and require backtracking, mented upon any adaptive actions execution. Let Xj
so deterministic equivalents are usually preferable in denote the value of X after j executions of adaptive
practice. actions by AD.
Assume that: Adaptive actions in AA call functions that map AD
current set ARt of adaptive rules into ARt+1 by inserting
D is some rule-driven device, defined as to and removing adaptive rules ar from AM.
Let AR be the set of all possible sets of adaptive
D = (C , R, S , c0 , A).
rules for AD. Any a k A maps the current set of rules
C is its set of possible configurations.
AR t AR into AR t +1AR:
R C (S { }) C is the set of rules describ-
ing its behavior, where e denotes empty stimulus,
representing no events at all. a k : AR AR
S is its set of valid input stimuli.
AM associates to each rule nr p NR of AD underlying
c0 C is its initial configuration. device ND a pair of adaptive actions ba p , aa p AA:
A C is its set of final configurations.
AM AA NR AA
Let c i ( r ) ci +1 (for short, c i ci +1 ) denote the ap-
plication of some rule r = (ci , s, ci +1 ) R to the current
Adaptive Technology and Its Applications
Adaptive Technology and Its Applications
Hence, level-i adaptive actions can modify both the Using state 2 as reference, eliminate empty transi-
set of level-i adaptive rules and the set of elementary tions using states x and y
adaptive actions defining level-(i 1) adaptive func- Add a sequence starting at x, with two transitions
tions. consuming b
Append the sequence of two empty transitions
sharing state 2
A SIMPLE ILLUSTRATIVE EXAMPLE
Append a sequence with three transitions consuming c,
In the following example, graphical notation is used ending at y.
for clarity and conciseness. When drawing automata, Figure 3 shows the first two shape changes of this
(as usual) circles represent states; double-line circles automaton after consuming the two first symbols a
indicate final states; arrows indicate transitions; labels (at state 1) in sentence a2b4c6. In its last shape, the
on the arrows indicate tokens consumed by the transition automaton trivially consumes the remaining b4c6, and
and (optionally) an associated adaptive action. When does not change any more.
representing adaptive functions, automata fragments in There are many other examples of adaptive devices
brackets stand for a group of transitions to be added (+) in the references. This almost trivial and intuitive case
or removed (-) when the adaptive action is applied. was shown here for illustration purposes only.
Figure 1 shows the starting shape of an adaptive
automaton that accepts anb2nc3n, n0. At state 1, it Knowledge Representation
includes a transition consuming a, which performs
The preceding example illustrates how adaptive devices
adaptive action A( ).
use the set of rules as their only element for represent-
Figure 2 defines how A( ) operate: ing and handling knowledge.
A rule (here, a transition) may handle parametric
information in its components (here, the transitions
origin and destination states, the token labeling the
transition, the adaptive function it calls, etc.).
Rules may be combined together in order to represent
Figure 1. Initial configuration of the illustrative adap- some non-elementary information (here, the sequences
tive automaton of transitions consuming tokens b and c keep track
of the value of n in each particular sentence). This way,
1 2 3 rules and their components may work and may be in-
terpreted as low-level elements of knowledge.
a /A () Although being impossible to impose rules on how
to represent and handle knowledge in systems repre-
A () = {
?[ x 2 ]
?[ 2 y ]
[ x 2 y ]
b b c c c
+[ x 2 y ] }
0
Adaptive Technology and Its Applications
Figure 3. Configurations of the adaptive automaton after executing A ( ) once and twice
A
b b c c c
1 2 3
a /A ()
b b b b
1 2
a /A ()
c c c c c c
3
sented with adaptive devices, the details of the learning All those features are vital for conceiving, modeling,
process may be chosen according to the particular needs designing and implementing applications in Artificial
of each system being modeled. Intelligence, which benefits from adaptivity while
In practice, the learning behavior of an adaptive expressing traditionally difficult-to-describe Artificial
device may be identified and measured by tracking Intelligence facts.
the progress of the set of rules during its operation and Listed below are features Adaptive Technology
interpreting the dynamics of its changes. offers to several fields of Computation, especially to
In the above example, when transitions are added to Artificial Intelligence-related ones, indicating their
the automaton by executing adaptive action A ( ), one main impacts and applications.
may interpret the length of the sequence of transitions
consuming b (or c) as a manifestation of the knowl- Adaptive Technology provides a true computation
edge that is being gathered by the adaptive automaton model, constructed around formal foundations.
on the value of n (its exact value becomes available Most Artificial Intelligence techniques in use
after the sub-string of tokens a is consumed). are very hard to express and follow since the
connection between elements of the models and
information they represent is often implicit, so
FUTURE TRENDS their operation reasoning is difficult for a human
to track and plan. Adaptive rule-driven devices
Adaptive abstractions represent a significant theoreti- concentrate all stored knowledge in their rules,
cal advance in Computer Science, by introducing and and the whole logic that handles such information,
exploring powerful non-classical concepts such as: in their adaptive actions. Such properties open for
time-varying behavior, autonomously dynamic rule Artificial Intelligence the possibility to observe,
sets, multi-level hierarchy, static and dynamic adap- understand and control adaptive-device-modeled
tive actions. phenomena. By following and interpreting how
Those concepts allow establishing a modeling style, and why changes occur in the device set of rules,
proper for describing complex learning systems, for and by tracking semantics of adaptive actions, one
efficiently solving traditionally hard problems, for can infer the reasoning of the model reactions to
dealing with self-modifying learning methods, and its input.
for providing computer languages and environments Adaptive devices have enough processing power
for comfortable elaboration of quality programs with to model complex computations. In (Neto, 2000)
dynamically-variant behavior. some well-succeeded use cases are shown with
Adaptive Technology and Its Applications
simple and efficient adaptive devices used instead systems, the behavior of both students and train-
of complex traditional formulations. ers can be inferred and used to decide how to
Adaptive Devices are Turing Machine-equiva- proceed.
lent computation models that may be used in the One can construct Adaptive Devices whose
construction of single-notation full specifications underlying abstraction is a computer language.
of programming languages, including lexical, Statements in such languages may be considered
syntactical, context-dependent static-semantic is- as rules defining behavior of a program. By at-
sues, language built-in features such as arithmetic taching adaptive rules to statements, the program
operations, libraries, semantics, code generation becomes self-modifiable. Adaptive languages are
and optimization, run-time code interpreting, needed for adaptive applications to be expressed
etc. naturally. For adaptivity to become a true pro-
Adaptive devices are well suited for representing gramming style, techniques and methods must be
complex languages, including idioms. Natural developed to construct good adaptive software,
language particularly require several features to since adaptive applications developed so far were
be expressed and handled, as word inflexions, or- usually produced in strict ad-hoc way.
thography, multiple syntax forms, phrase ordering,
ellipsis, permutation, ambiguities, anaphora and
others. A few simple techniques allow adaptive CONCLUSION
devices to deal with such elements, strongly sim-
plifying the effort of representing and processing Adaptive Technology concerns techniques, methods and
them. Applications are wide, including machine subjects referring to actual application of adaptivity.
translation, data mining, text-voice and voice-text Adaptive automata (Neto, 1994) were first proposed
conversion, etc. for practical representation of context-sensitive lan-
Computer art is another fascinating potential guages (Rubinstein, 1995). Adaptive grammars (Iwai,
application of adaptive devices. Music and other 2000) were employed as its generative counterpart
artistic expressions are forms of human language. (Burshteyn, 1990), (Christiansen, 1990), (Cabasino,
Given some language descriptions, computers can 1992), (Shutt, 1993), (Jackson, 2006).
capture human skills and automatically generate For specification and analysis of real time reactive
interesting outputs. Well-succeeded experiments systems, works were developed based on adaptive
were carried out in the field of music, with excel- versions of statecharts (Almeida Jr., 1995), (Santos,
lent results (Basseto, 1999). 1997). An interesting confirmation of power and
Decision-taking systems may use Adaptive Deci- usability of adaptive devices for modeling complex
sion Tables and Trees for constructing intelligent systems (Neto, 2000) was the successful use of Adap-
systems that accept training patterns, learn how tive Markov Chains in a computer music-generating
to classify them, and therefore, classify unknown device (Basseto, 1999).
patterns. Well-succeeded experiments include: Adaptive Decision Tables (Neto, 2001) and Adap-
classifying geometric patterns, decoding sign tive Decision Trees (Pistori, 2006) are nowadays being
languages, locating patterns in images, generat- experimented in decision-taking applications.
ing diagnoses from symptoms and medical data, Experiments have been reported that explore the
etc. potential of adaptive devices for constructing language
Language inference uses Adaptive Devices to inference systems (Neto, 1998), (Matsuno, 2006).
generate formal descriptions of languages from An important area in which adaptive devices shows
samples, by identifying and collecting structural its strength is the specification and processing of natural
information and generalizing on the evidence languages (Neto, 2003). Many other results are being
of repetitive or recursive constructs (Matsuno, achieved while representing syntactical context-depen-
2006). dencies of natural language.
Adaptive Devices can be used for learning pur- Simulation and modeling of intelligent systems are
poses by storing as rules the gathered information other concrete applications of adaptive formalisms, as
on some monitored phenomenon. In educational illustrated in the description of the control mechanism
Adaptive Technology and Its Applications
of an intelligent autonomous vehicle which collects PROPOR 2003, LNAI Volume 2721, Faro, Portugal,
information from its environment and builds maps June 26-27, Springer-Verlag, 2003, pp 94-97. A
for navigation.
Neto, J. J. (2001)*. Adaptive Rule-Driven Devices -
Many other applications for adaptive devices are
General Formulation and Case Study. Lecture Notes
possible in several fields.
in Computer Science. Watson, B.W. and Wood, D.
(Eds.): Implementation and Application of Automata
- 6th International Conference, CIAA 2001, Vol.2494,
REFERENCES
Pretoria, South Africa, July 23-25, Springer-Verlag,
2001, pp. 234-250.
(* or ** - downloadable from LTA Website; ** - in
Portuguese only) Neto, J.J. (1994)*. Adaptive automata for context-
dependent languages. ACM SIGPLAN Notices, v.29,
Almeida Jr., J.R. (1995)**. STAD - Uma ferramenta n.9, p.115-24, 1994.
para representao e simulao de sistemas atravs de
Neto, J.J. (2000)*. Solving Complex Problems Ef-
statecharts adaptativos. So Paulo, 202p. Doctoral The-
ficiently with Adaptive Automata. CIAA 2000 - Fifth
sis. Escola Politcnica, Universidade de So Paulo.
International Conference on Implementation and Ap-
Basseto, B.A., Neto, J.J. (1999)*. A stochastic musi- plication of Automata - London, Ontario, Canada.
cal composer based on adaptive algorithms. Anais do
Neto, J.J., Iwai, M.K. (1998)*. Adaptive automata for
XIX Congresso Nacional da Sociedade Brasileira de
syntax learning. XXIV Conferencia Latinoamericana
Computao. SBC-99, Vol. 3, pp. 105-13.
de Informtica CLEI98, Quito - Ecuador, tomo 1,
Burshteyn, B. (1990). Generation and recognition pp.135-146.
of formal languages by modifiable grammars. ACM
Pistori, H.; Neto, J.J.; Pereira, M.C. (2006)* Adaptive
SIGPLAN Notices, v.25, n.12, p.45-53, 1990.
Non-Deterministic Decision Trees: General Formula-
Cabasino, S.; Paolucci, P.S.; Todesco, G.M. (1992). tion and Case Study. INFOCOMP Journal of Computer
Dynamic parsers and evolving grammars. ACM SIG- Science, Lavras, MG.
PLAN Notices, v.27, n.11, p.39-48, 1992.
Rubinstein, R.S.; Shutt. J.N. (1995). Self-modifying
Christiansen, H. (1990). A survey of adaptable gram- finite automata: An introduction, Information process-
mars. ACM SIGPLAN Notices, v.25, n.11, p.33-44. ing letters, v.56, n.4, 24, p.185-90.
Iwai, M.K. (2000)**. Um formalismo gramatical ad- Santos, J.M.N. (1997)**. Um formalismo adaptativo
aptativo para linguagens dependentes de contexto. So com mecanismos de sincronizao para aplicaes
Paulo 2000, 191p. Doctoral Thesis. Escola Politcnica, concorrentes. So Paulo, 98p. M.Sc. Dissertation.
Universidade de So Paulo. Escola Politcnica, Universidade de So Paulo.
Jackson, Q.T. (2006). Adapting to Babel Adaptivity Shutt, J.N. (1993). Recursive adaptable grammar.
and context-sensitivity parsing: from anbncn to RNA A M.S. Thesis, Computer Science Department, Worcester
Thotic Technology Partners Research Monograph. Polytechnic Institute, Worcester MA.
LTA Website: http://www.pcs.usp.br/~lta
KEy TERMS
Matsuno, I.P. (2006)**. Um Estudo do Processo de In-
ferncia de Gramticas Regulares e Livres de Contexto
Adaptivity: Property exhibited by structures that
Baseados em Modelos Adaptativos. M.Sc. Dissertation,
dynamically and autonomously change their own be-
Escola Politcnica, Universidade de So Paulo.
havior in response to input stimuli.
Neto, J.J.; Moraes, M.de. (2003)* Using Adaptive
Adaptive Computation Model: Turing-powerful
Formalisms to Describe Context-Dependencies in
abstraction that mimic the behavior of potentially self-
Natural Language. Computational Processing of the
modifying complex systems.
Portuguese Language 6th International Workshop,
Adaptive Technology and Its Applications
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Advanced Cellular Neural Networks Image Processing
virtually any spatial filter from DIP theory can be 0.5. The feedback template A = 2 remains unchanged,
implemented on such a feed-forward CNN, ensuring but the value for the bias I must be chosen from the
binary output stability via the definition of a central following analysis:
feedback absolute value greater than 1. For a given state zij element, the contribution wij
of the feed-forward nonlinear filter of (5) may be
expressed as:
ADVANCED CNN IMAGE PROCESSING
wij = 1.0 ps + 0.5 pe
In this section, a description of more complex CNN
= (8 pe ) + 0.5 pe
models is performed in order to provide a deeper insight
into CNN design, including multi-layer structures and = 8 + 1.5 pe (6)
nonlinear templates, and also to illustrate its powerful
DIP capabilities.
where ps is the number of similar pixels in the 3 3
Nonlinear Templates neighbourhood and pe the rest of edge pixels. E.g. if
the central pixel has 8 edge neighbours, wij = 12 8 =
A problem often addressed in DIP edge detection is the 4, whereas if all its neighbours are similar to it, then
robustness against noise (Jain, 1989). In this sense, the wij = 8. Thus, a pixel will be selected as edge depend-
EDGE CNN detector for grey-scale images given by ing on the number of its edge neighbours, providing
the possibility of noise reduction. For instance, edge
detection for pixels with at least 3 edge neighbours
1 1 1
forces that I (4, 5).
A = 2, BEDGE = 1 8 1 , I = -0.5 The main result is that the inclusion of nonlinearities
1 1 1 (4) in the definition of B coefficients and, by extension,
the pixel-wise definition of the main CNN parameters
gives rise to more powerful and complex DIP filters
is a typical example of a weak-against-noise filter, as a (Chua & Roska, 1993).
result of fixed linear feed-forward template combined
with excitatory feedback. One way to provide the
Morphologic Operators
detector with more robustness against noise is via the
definition of a nonlinear B template of the form:
Mathematical Morphology is an important contributor
to the DIP field. In the classic approach, every morpho-
logic operator is based on a series of simple concepts
b b b 0.5 from Set Theory. Moreover, all of them can be divided
xij xkl > th
BCONTOUR = b 0 b where b = into combinations of two basic operators: erosion and
b b b 1 xij xkl th dilation (Serra, 1982). Both operators take two pieces of
data as input: the binary input image and the so-called
(5) structuring element, which is usually represented by
a 33 template.
This nonlinear template actually defines different A pixel belongs to an object if it is active (i.e. its
coefficients for the surrounding pixels prior to perform value is 1 or black), whereas the rest of pixels are
the spatial filtering of the input image X. Thus, a CNN classified as background, zero-valued elements. Basic
defined with nonlinear templates is generally dependent morphologic operators are defined using only object
of X, and can not be treated as an isotropic model. pixels, marked as 1 in the structuring element. If a
Just two values for the surrounding coefficients of B pixel is not used in the match, it is left blank. Both
are allowed: one excitatory for greater than a threshold dilation and erosion operators may be defined by the
th luminance differences with the central pixel (i.e. edge structuring elements
pixels), and the other inhibitory, doubled in absolute
value, for similar pixels, where th is usually set around
Advanced Cellular Neural Networks Image Processing
background.
More complex morphologic operators are based where X and Y are the input and output images, re-
on structuring elements that also contains background spectively, and T is a spatial operator defined over a
pixels. This is the case of the Hit and Miss Transform neighbourhood Sr around each pixel X(i, j), as defined
(HMT), a generalized morphologic operator used to in (1). Based on this neighbourhood, spatial operators
identify certain local pixel configurations. For instance, can be grouped into two types: Single Point Process-
the structuring elements defined by ing Operators, also known as Mapping Operators, and
Local Processing Operators, which can be defined by
a spatial filter (i.e. 2D-discrete convolution) mask
0 1 1 1
(Jain, 1989).
0 1 1 0 1 0 The simplest form of T is obtained when Sr is 1 pixel
0 0 0 0 0 0 size. In this case, Y only depends of the intensity value
and (8) of X for every pixel and T becomes an intensity level
transformation function, or mapping, of the form
are used to find 90 convex corner object pixels within
the image. A pixel will be selected as active in the output s = T(r) (11)
image if its local neighbourhood exactly matches with
that defined by the structuring element. However, in
where r and s are variables that represent grey
order to calculate a full, non-orientated corner detector
it will be necessary to perform 8 HMT, one for each
level in X and Y, respectively.
According to this formulation, mappings can be
rotated version of (8), OR-ing the 8 intermediate output
achieved by direct application of a function over a
images to obtain the final image (Fisher et al., 2004).
range of input intensity levels. By properly choosing
In the CNN context, the HMT may be obtained in
the form of T, a number of effects can be obtained, as
a straightforward manner by:
the grey-level inversion, dynamic range compression
or expansion (i.e. contrast enhancement), and threshold
1 sij = 1
A = 2, BHMT : bij = , I = 0.5 ps binarization for obtaining binary masks used in analysis
0 otherwise and morphologic DIP.
(9) A mapping is linear if its function T is also linear.
Otherwise, T is not linear and the mapping is also non-
where S = {sij} is the structuring element and ps is the linear. An example of nonlinear mapping is the CNN
total number of active pixels in it. output function (3). It consists of three linear segments:
Since the input template B of the HTM CNN is two saturated levels, 1 and +1, and the central linear
defined via the structuring element S, and given that segment with unitary slope that connects them. This
there are 29 = 512 distinct 3 3 possible structuring function is said to be piecewise linear and is closely
elements, there will also be 512 different hit-and-miss related to the well-known sigmoid function utilized in
erosions. For achieving the opposite result, i.e. hit-and- the Hopfield ANN (Chua & Roska, 1993). It performs
miss dilation, the threshold must be the opposite of that a mapping of intensity values stored in Z in the [1,
in (9) (Chua & Roska, 2002).
Advanced Cellular Neural Networks Image Processing
+1] range. The bias I controls the average point of the a = 0, b = 1, m = (g + f)/2, d = (g f)/2
input range, where the output function gives a zero- (16)
valued outcome.
Starting from the original CNN cell or neuron The DRC network can be easily applied to a first
(1)-(3), a brief review of the Dynamic Range Control order piecewise polynomial approximation of nonlinear,
(DRC) CNN model first defined in (Fernndez et al., continuous mappings. One of the valid possibilities is
2006) follows. This network is designed to perform a the multi-layer DRC CNN implementation of error-
piecewise linear mapping T over X, with input range controlled Chebyshev polynomials, as described in
[md, m+d] and output range [a, b]. Thus, (Fernndez et al., 2006). The possible mappings include,
among many others, the absolute value, logarithmic,
a < X (i, j ) m d exponential, radial basis and integer and real-valued
b a
T X (i, j ) = (X (i, j ) m )+ b +2 a m d < X (i, j ) m + d power functions.
2 d
b m + d < X (i, j ) < +
In order to be able to implement this function in There is a continuous quest by engineers and special-
a multi-layer CNN, the following constraints must ists: compete with and imitate nature, especially some
be met: smart animals. Vision is one particular area which
computer engineers are interested in. In this context, the
b a 2 and d 1 (13) so-called Bionic Eye (Werblin et al., 1995) embedded
in the CNN-UM architecture is ideal for implementing
A CNN cell which controls the desired input range many spatio-temporal neuromorphic models.
can be defined with the following parameters: With its powerful image processing toolbox and
a compact VLSI implementation (Rodrguez et al.,
A1 = 0, B1 = 1/d, I1 = -m/d (14) 2004), the CNN-UM can be used to program or mimic
different models of retinas and even combinations of
This network performs a linear mapping between them. Moreover, it can combine biologically based
[md, m+d] and [1,+1]. Its output is the input of a models, biologically inspired models, and analogic
second CNN whose parameters are: artificial image processing algorithms. This combina-
tion will surely bring a broader kind of applications
A2 = 0, B2 = (b a)/2, I2 = (b + a)/2 (15) and developments.
Advanced Cellular Neural Networks Image Processing
CNN is now a fundamental and powerful toolkit for Rodrguez, A., Lin, G., Carranza, L., Roca, E.,
real-time nonlinear image processing tasks, mainly due Carmona, R., Jimnez, F., Domnguez, R., & Espejo, A
to its versatile programmability, which has powered its S. (2004). ACE16k: The Third Generation of Mixed-
hardware development for visual sensing applications Signal SIMD-CNN ACE Chips Toward VSoCs. IEEE
(Roska et al., 1999). Transactions on Circuits and Systems I: Regular Papers,
51, 851863.
Roska, T., & Chua, L.O. (1993). The CNN Universal
REFERENCES
Machine: An Analogic Array Computer. IEEE Transac-
tions on Circuits and Systems II: Analog and Digital
Chua, L.O., & Roska, T. (2002). Cellular Neural
Processing, 40, 163173.
Networks and Visual Computing. Foundations and
Applications. Cambridge, UK: Cambridge University Roska, T., Zarndy, ., Zld, S., Fldesy, P., & Szolgay,
Press. P. (1999). The Computational Infrastructure of Ana-
logic CNN Computing Part I: The CNN-UM Chip
Chua, L.O., & Roska, T. (1993). The CNN Paradigm.
Prototyping System. IEEE Transactions on Circuits
IEEE Transactions on Circuits and Systems I: Funda-
and Systems I: Fundamental Theory and Applications,
mental Theory and Applications, 40, 147156.
46, 261268.
Chua, L.O., & Yang, L. (1988). Cellular Neural Net-
Serra, J. (1982). Image Analysis and Mathematical
works: Theory and Applications. IEEE Transactions
Morphology. London, UK: Academic Press.
on Circuits and Systems, 35, 12571290.
Venetianer, P.L., Werblin, F., Roska, T., & Chua, L.O.
Fernndez, J.A., Preciado, V.M., & Jaramillo, M.A.
(1995). Analogic CNN Algorithms for Some Image
(2006). Nonlinear Mappings with Cellular Neural
Compression and Restoration Tasks. IEEE Transac-
Networks. Lecture Notes in Computer Science, 4177,
tions on Circuits and Systems I: Fundamental Theory
350359.
and Applications, 42, 278-284.
Fisher, R., Perkins, S., Walker, A., & Wolfart, E. (2004).
Werblin, F., Roska, T., & Chua, L.O. (1995). The
Hypermedia Image Processing Reference (HIPR2).
Analogic Cellular Neural Network as a Bionic Eye.
Website: http://homepages.inf.ed.ac.uk/rbf/HIPR2,
International Journal of Circuit Theory and Applica-
University of Edinburgh, UK.
tions, 23, 541-569.
Jain, A.K. (1989). Fundamentals of Digital Image
Processing. Englewood Cliffs, NJ, USA: Prentice-
KEy TERMS
Hall.
Kk, L., & Zarndy, A. (1998). Implementation of Large Bionics: The application of methods and systems
Neighborhood Non-Linear Templates on the CNN found in nature to the study and design of engineering
Universal Machine. International Journal of Circuit systems. The word seems to have been formed from
Theory and Applications, 26, 551-566. biology and electronics and was first used by J.
E. Steele in 1958.
Perko, M., Fajfar, I., Tuma, T., & Puhan, J. (1998).
Fast Fourier Transform Computation Using a Digital Chebyshev Polynomial: An important type of
CNN Simulator. 5th IEEE International Workshop polynomials used in data interpolation, providing the
on Cellular Neural Network and Their Applications best approximation of a continuous function under the
Proceedings, 230-236. maximum norm.
Perko, M., Fajfar, I., Tuma, T., & Puhan, J. (2000). Dynamic Range: A term used to describe the ratio
Low-Cost, High-Performance CNN Simulator Imple- between the smallest and largest possible values of a
mented in FPGA. 6th IEEE International Workshop variable quantity.
on Cellular Neural Network and Their Applications
FPGA: Acronym that stands for Field-Program-
Proceedings, 277-282.
mable Gate Array, a semiconductor device invented
Advanced Cellular Neural Networks Image Processing
0
Xiaoyu Huang
University of Shanghai for Science & Technology, China
Kallol Bagchi
University of Texas at El Paso, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Agent-Based Intelligent System Modeling
used to assist the user in performing non-repetitive tasks, not have global awareness in the multi-agent system.
such as seeking information, shopping, scheduling, Agent-based modeling allows a researcher to set
monitoring, control, negotiation, and bargaining. different parameters and behavior rules of individual
Intelligent agents may come in various shapes and agents. The modeler makes assumptions that are most
forms such as knowbots, softbots, taskbots, personal relevant to the situation at hand, and then watches
agents, shopbots, information agents, etc. No matter phenomena emerge from the interactions of the agents.
what shape or form they have, intelligent agents exhibit Various hypotheses can be tested by changing agent
one or more of the following characteristics: parameters and rules. The emergent collective pattern
of the agent society often leads to results that may not
Autonomous: Being able to exercise control over have been predicated.
their own actions. One of the main advantages of ABM over traditional
Adaptive/Learning: Being able to learn and mathematical equation based modeling is the ability
adapt to their external environment. to model individual styles and attributes, rather than
Social: Being able to communicate, bargain, assuming homogeneity of the whole population.
collaborate, and compete with other agents on Traditional models based on analytical techniques often
behalf of their masters (users). become intractable as the systems reach real-world level
Mobile: Being able to migrate themselves from of complexity. ABM is particularly suitable for studying
one machine/system to another in a network, such system dynamics that are generated from interactions
as the Web. of heterogeneous individuals. In recent years, ABM has
Goal-oriented: Being able to act in accordance been used in studying many real world systems, such
with built-in goals and objectives. as stock markets (Castiglione 2000), group selection
Communicative: Being able to communicate (Pepper 2000), and workflow and information diffusion
with people or other agents thought protocols (Neri 2004). Bonabeau (2002) presents a good summary
such as agent communication language (ACL). of ABM methodology and the scenarios where ABM
Intelligent: Being able to exhibit intelligent is appropriate.
behavior such as reasoning, generalizing, learning, ABM is, however, not immune from criticism. Per
dealing with uncertainty, using heuristics, and Bonabeau (2002), an agent-based model will only
natural language processing. be as accurate as the assumptions and data that went
into it, but even approximate simulations can be very
valuable. It has also been observed that ABM relies on
AGENT-BASED MODELING simplified models of rule-based human behavior that
often fail to take into consideration the complexity of
Using intelligent agents and their actions and human cognition. Besides, it suffers from unwrapping
interactions in a given environment to simulate the problem as the solution is built into the program and
complex dynamics of a system is referred to as agent- thus prevents occurrence of new or unexpected events
based modeling. ABM research is closely related to the (Macy, 2002).
research in complex systems, emergence, computational
sociology, multi agent systems, evolutionary
programming, and intelligent organizations. In ABM, ABM FOR INTELLIGENT SySTEMS
system behavior results from individual behaviors and
collective behaviors of the agents. Researchers of ABM An intelligent system is a system that can sense and
are interested in how macro phenomena are emerging respond to its environment in pursuing its goals
from micro level behaviors among a heterogeneous and objectives. It can learn and adapt based on past
set of interacting agents (Holland, 1992). Every agent experience. Examples of intelligent systems include,
has its attributes and its behavior rules. When agents but not limited to, the following: biological life such
encounter in the agent society, each agent individually as human beings, artificial intelligence applications,
assesses the situation and makes decisions on the basis robots, organizations, nations, projects, and social
of its behavior rules. In general, individual agents do movements.
Agent-Based Intelligent System Modeling
Walter Fritz (1997) suggests that the key components designed or empirically grounded. In practice, a study
of an intelligent system include objectives, senses, may start with simple models, often with designed A
concepts, growth of a concept, present situation, agents and environments, to explore certain specific
response rules, mental methods, selection, actions, dynamics of the system.
reinforcement, memory and forgetting, sleeping, The design model is refined through the calibration
and patterns (high level concepts). It is apparent that process, in which design parameters are modified to
traditional analytical modeling techniques are not improve the desired characteristics of the model. The
able to model many of the components of intelligent final step in the modeling process is validation where
systems, let alone the complete system dynamics. we check the agent individual behavior, interactions,
However, ABM lends itself well to such a task. All and emergent properties of the system against expected
those components can be models as agents (albeit design features. Validation usually involves comparison
some in abstract sense). An intelligent system is thus of model outcomes, often at the macro-level, with
made of inter-related and interactive agents. ABM is comparable outcomes in the real world (Midgley, el
especially suitable for intelligent systems consist of a at., 2007). Figure 1 shows the complete modeling
large number of heterogeneous participants, such as a process. A general tutorial on ABM is given by Macal
human organization. and North (2005).
Agent-based modeling for intelligent systems starts with We present an example of using agent-based intelligent
a thorough analysis of the intelligent systems. Since system modeling for studying the acceptance and
the system under consideration may exhibit complex diffusion of innovative ideas or technology. Diffusion
behaviors, we need to identify one or a few key features of innovation has been studied extensively over the
to focus on. Given a scenario of the target intelligent last few decades (Rogers, 1995). However, traditional
system, we first establish a set of objectives that we research in innovation diffusion has been grounded on
aim to achieve via the simulation of the agent-based case based analysis and analytical systems modeling
representation of the intelligent system. The objectives
of the research can be expressed as a set of questions
to which we seek answers (Doran, 2006).
A conceptual model is created to lay out the Figure 1. Agent-based modeling process
requirements for achieving the objectives. This includes
defining the entities, such as agents, environment, START
resources, processes, and relationships. The conceptual
modeling phase answers the question of whatwhat
are needed. The design model determines how the Set Obje ctives
Agent-Based Intelligent System Modeling
(e.g., using differential and difference equations). newer and perceived better innovation. The four groups
Agent-based modeling for diffusion of innovation is of agents and their relationships are depicted in Figure
relatively new. Our example is adopted from a model 2. It is common, although not necessary, to assume that
created by Michael Samuels (2007), implemented with those four groups make up the entire population.
a popular agent modeling systemNetLogo. In a traditional diffusion model, such as the Bass
The objective of innovation diffusion modeling is model (Bass, 1996), the diffusion rate depends only
to answer questions such as how an idea or technology on the number of adopters (and potential adopters,
is adopted in a population, how different people (e.g., given fixed population size). Characteristics of
innovators, early adopters, and change agents) influence individuals in the population are ignored. Even in those
each other, and under what condition an innovation models where it is assumed that potential adopters
will be accepted or rejected by the population. In the have varying threshold for adopting an innovation
conceptual modeling, we identify various factors that (Abrahamson and Rosenkopf, 1997), the individuality
influence an individuals propensity for adopting the is very limited. However, in agent-based modeling, the
innovation. Those factors are broadly divided into to two types of individuals and individual characteristics are
categories: internal influences (e.g., word-of-mouth) essentially unbounded. For example, we can divide
and external influences (e.g. mass media). Any factor easily adopters into innovators, early adopters, and
that exerts its influence through individual contact is late adopters, etc. If necessary, various demographic
considered internal influence. and social-economic features can be bestowed to
Individuals in the target population are divided individual agents. Furthermore, both internal influence
into four groups: adopter, potential (adopter), change and external influence can be further attributed to more
agent, and disrupter. Adopters are those who have specific causes. For example, internal influence through
adopted the innovation, while potentials are those social networks can be divided into traditional social
who have certain likelihood to adopt the innovation. networks that consists friends and acquaintances and
Change agents are the champions of the innovation. virtual social networks formed online. Table 1 lists
They are very knowledgeable and enthusiastic about the typical factors that affect the propensity of adopting
innovation, and often play a critical role in facilitating an innovation.
its- diffusion. Disrupters are those who play an opposite An initial study of innovation diffusion, such as the
role of change agents. They are against the current one in Michael Samuels (2007), can simply aggregate
innovation, oftentimes because they favor an even all internal influences into word-of-month and all
external influences into mass media. Each potential
adopters tendency of converting to an adopter is
Figure 2. Agents and influences influenced by chance encounter with other agents. If a
potential adopter meets a change agent, who is an avid
promoter of the innovation, he would become more
knowledgeable about the advantages of the innovation,
and more likely to adopt. An encounter with a disrupter
Change Agent
Table1. Typical internal and external influences
Internal influence External influence
Word-of-mouth Newspapers
Potential Adopter
Telephone Television
Laws, policies and
Email
regulations
Environment
Instant message Culture
Chat Internet/Web
Dotted line:External influence Blog Online communities
Solid line: Internal influence Disrupter Social networks (online/
RSS
offline)
Agent-Based Intelligent System Modeling
creates the opposite effect, as a disrupter favors a ABM uses a bottom-up approach, creating
different type of innovation. emergent behaviors of an intelligent system through A
In order for the simulated model to accurately reflect actors rather than factors. However, macro-level
a real-world situation, the model structure and parameter factors have direct impact on macro behaviors of the
values should be carefully selected. For example, we system. Macy and Willer (2002) suggest that bringing
need to decide how much influence each encounter those macro-level factors back will make agent-based
will result; what is the probability of encountering a modeling more effective, especially in intelligent
change agent or a disrupter; how much influence is systems such as social organizations.
coming from the mass media, etc. We can get these Recent intelligent systems research has developed the
values through surveys, statistical analysis of empirical concept of integrating human and machine-based data,
data, or experiments specifically designed to elicit data knowledge, and intelligence. Kirn (1996) postulates
from real world situations. that the organization of the 21st century will involve
artificial agents based system highly intertwined with
human intelligence of the organization. Thus, a new
TRENDS AND RESEARCH ISSUES challenge for agent-based intelligent system modeling
is to develop models that account for interaction,
As illustrated through the example of modeling the aggregation, and coordination of intelligent agent and
diffusion of innovation in an organization, industry, or human agents. The ABM will represent not only the
society, agent-based modeling can be used to model human players in an intelligent system, but also the
the adaptation of intelligent systems that consist of intelligent agents that are developed in real-world
intelligent individuals. As most intelligent systems applications in those systems.
are complex in both structure and system dynamics,
traditional modeling tools that require too many
unrealistic assumptions have become less effective CONCLUSION
in modeling intelligent systems. In recent years,
agent-based modeling has found a wide spectrum of Modeling intelligent systems involving multiple
applications such as in business strategic solutions, intelligent players has been difficult using traditional
supply chain management, stock markets, power approaches. We have reviewed recent development
economy, social evolution, military operations, security, in agent-based modeling and suggest agent-based
and ecology (North and Macal, 2007). As ABM tools modeling is well suited for studying intelligent
and resources become more accessible, research and systems, especially those systems with sophisticated
applications of agent-based intelligent system modeling and heterogeneous participants. Agent-based modeling
are expected to increase in the near future. allows us to model system behaviors based on the
Some challenges remain, though. Using ABM actions and interactions of individuals in the system.
to model intelligent systems is a research area that Although most ABM research focuses on local rules
draws theories from other fields, such as economics, and behaviors, it is possible that we integrate global
psychology, sociology, etc., but without its own well influences in the models. ABM represents a novel
established theoretic foundation. ABM has four key approach to model intelligent systems. Combined with
assumptions (Macy and Willer, 2002): Agents act traditional modeling approaches (for example, micro-
locally with little or no central authority; agents are level simulation as proposed in MoSeS), ABM offers
interdependent; agents follow simple rules, and agents researchers a promising tool to solve complex and
are adaptive. However, some of those assumptions may practical problems and to broaden research endeavors
not be applicable to intelligent system modeling. Central (Wu, 2007).
authorities, or central authoritative information such as
mass media in the innovation diffusion example, may
play an important role in intelligent organizations. Not
all agents are alike in an intelligent system. Some may
be independent, non-adaptive, or following complex
behavior rules.
Agent-Based Intelligent System Modeling
Agent-Based Intelligent System Modeling
Wu, B. (2007). A Hybrid Approach for Spatial MSM. Intelligent System: A system that has a coherent
NSF/ESRC Agenda Setting Workshop on Agent-Based set of components and subsystems working together A
Modeling of Complex Spatial Systems: April 14-16, to engage in goal-driven activities.
2007
Intelligent System Modeling: The process of
construction, calibration, and validation of models of
KEy TERMS intelligent systems.
Multi-Agent System: A distributed system with a
Agent Based Modeling: Using intelligent agents
group of intelligent agents that communicate, bargain,
and their actions and interactions in a given environment
compete, and cooperate with other agents and the
to simulate the complex dynamics of a system.
environment to achieve goals designated by their
Diffusion of Innovation: Popularized by Everett masters.
Rogers, it is the study of the process by which an
Organizational Intelligence: The ability of an
innovation is communicated and adopted over time
organization to perceive, interpret, and select the most
among the members of a social system.
appropriate response to the environment in order to
Intelligent Agent: An autonomous software advance its goals.
program that is able to learn and adapt to its environment
in order to perform certain tasks delegated to it by its
master.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AI and Ideas by Statistical Mechanics
developed in a series of about 30 published papers Simulated Annealing (VFSR), created by the author
from 1981-2001 (Ingber, 1983; Ingber, 1985; Ingber, for military simulation studies (Ingber, 1989), which A
1992; Ingber, 1994; Ingber, 1995; Ingber, 1997). These has evolved into Adaptive Simulated Annealing (ASA)
papers also address long-standing issues of informa- (Ingber, 1993). However, it is the author's experience
tion measured by electroencephalography (EEG) as that the Art and Science of sampling complex systems
arising from bottom-up local interactions of clusters requires tuning expertise of the researcher as well as
of thousands to tens of thousands of neurons interact- good codes, and GA or SA likely would do as well on
ing via short-ranged fibers), or top-down influences of cost functions for this study.
global interactions (mediated by long-ranged myelin- If there are not analytic or relatively standard math
ated fibers). SMNI does this by including both local functions for the transformations required, then these
and global interactions as being necessary to develop transformations must be performed explicitly numeri-
neocortical circuitry. cally in code such as TRD. Then, the ASA_PARALLEL
OPTIONS already existing in ASA (developed as part of
Statistical Mechanics of Financial the1994 National Science Foundation Parallelizing ASA
Markets (SMFM) and PATHINT Project (PAPP)) would be very useful to
speed up real time calculations (Ingber, 1993). Below,
Tools of financial risk management, developed to only a few topics relevant to ISM are discussed. More
process correlated multivariate systems with differ- details are in a previous report (Ingber, 2006).
ing non-Gaussian distributions using modern copula
analysis enables bona fide correlations and uncertain-
ties of success and failure to be calculated. Since 1984, SMNI AND SMFM APPLIED TO
the author has published about 20 papers developing a ARTIFICIAL INTELLIGENCE
Statistical Mechanics of Financial Markets (SMFM),
many available at http://www.ingber.com. These are Neocortex has evolved to use minicolumns of neurons
relevant to ISM, to properly deal with real-world dis- interacting via short-ranged interactions in macrocol-
tributions that arise in such varied contexts. umns, and interacting via long-ranged interactions
Gaussian copulas are developed in a project Trad- across regions of macrocolumns. This common ar-
ing in Risk Dimensions (TRD) (Ingber, 2006). Other chitecture processes patterns of information within
copula distributions are possible, e.g., Student-t distri- and among different regions of sensory, motor, as-
butions. These alternative distributions can be quite sociative cortex, etc. Therefore, the premise of this
slow because inverse transformations typically are not approach is that this is a good model to describe and
as quick as for the present distribution. Copulas are analyze evolution/propagation of ideas among defined
cited as an important component of risk management populations.
not yet widely used by risk management practitioners Relevant to this study is that a spatial-temporal
(Blanco, 2005). lattice-field short-time conditional multiplicative-
noise (nonlinear in drifts and diffusions) multivariate
Sampling Tools Gaussian-Markovian probability distribution is de-
veloped faithful to neocortical function/physiology.
Computational approaches developed to process dif- Such probability distributions are a basic input into
ferent approaches to modeling phenomena must not the approach used here. The SMNI model was the first
be confused with the models of these phenomena. For physical application of a nonlinear multivariate calculus
example, the meme approach lends it self well to a developed by other mathematical physicists in the late
computational scheme in the spirit of genetic algorithms 1970s to define a statistical mechanics of multivariate
(GA). The cost/objective function that describes the nonlinear nonequilibrium systems (Graham, 1977;
phenomena of course could be processed by any other Langouche et al., 1982).
sampling technique such as simulated annealing (SA).
One comparison (Ingber & Rosen, 1992) demonstrated
the superiority of SA over GA on cost/objective func-
tions used in a GA database. That study used Very Fast
AI and Ideas by Statistical Mechanics
SMNI builds from synaptic interactions to minicolum- SMNI studies have detailed that maximal numbers of
nar, macrocolumnar, and regional interactions in neo- attractors lie within the physical firing space of both
cortex. Since 1981, a series of SMNI papers has been excitatory and inhibitory minicolumnar firings, consis-
developed model columns and regions of neocortex, tent with experimentally observed capacities of audi-
spanning mm to cm of tissue. Most of these papers tory and visual STM, when a "centering" mechanism
have dealt explicitly with calculating properties of STM is enforced by shifting background noise in synaptic
and scalp EEG in order to test the basic formulation interactions, consistent with experimental observations
of this approach (Ingber, 1983; Ingber, 1985; Ingber under conditions of selective attention (Ingber, 1985;
& Nunez, 1995). Ingber, 1994).
The SMNI modeling of local mesocolumnar These calculations were further supported by high-
interactions (convergence and divergence between resolution evolution of the short-time conditional-prob-
minicolumnar and macrocolumnar interactions) was ability propagator using PATHINT (Ingber & Nunez,
tested on STM phenomena. The SMNI modeling of 1995). SMNI correctly calculated the stability and
macrocolumnar interactions across regions was tested duration of STM, the primacy versus recency rule,
on EEG phenomena.
Figure 1. Illustrated are three biophysical scales of neocortical interactions: (a)-(a*)-(a') microscopic neurons;
(b)-(b') mesocolumnar domains; (c)-(c') macroscopic regions (Ingber, 1983). SMNI has developed appropriate
conditional probability distributions at each level, aggregating up from the smallest levels of interactions. In
(a*) synaptic inter-neuronal interactions, averaged over by mesocolumns, are phenomenologically described by
the mean and variance of a distribution . Similarly, in (a) intraneuronal transmissions are phenomenologically
described by the mean and variance of . Mesocolumnar averaged excitatory (E) and inhibitory (I) neuronal
firings M are represented in (a'). In (b) the vertical organization of minicolumns is sketched together with their
horizontal stratification, yielding a physiological entity, the mesocolumn. In (b') the overlap of interacting
mesocolumns at locations r and r from times t and t + t is sketched. In (c) macroscopic regions of neocortex
are depicted as arising from many mesocolumnar domains. (c') sketches how regions may be coupled by long-
ranged interactions.
0
AI and Ideas by Statistical Mechanics
random access to memories within tenths of a second Figure 2. Scales of interactions among minicolumns
as observed, and the observed 72 capacity rule of are represented, within macrocolumns, across macro- A
auditory memory and the observed 42 capacity rule columns, and across regions of macrocolumns
of visual memory.
SMNI also calculates how STM patterns (e.g.,
from a given region or even aggregated from multiple
regions) may be encoded by dynamic modification of
synaptic parameters (within experimentally observed
ranges) into long-term memory patterns (LTM) (Ing-
ber, 1983).
AI and Ideas by Statistical Mechanics
defined populations (of a reasonable but small popula- used to save multiple optima during sampling. Some
tion samples, e.g., of 100-1000), with macrocolumns algorithms might label these states as "mutations" of
defined by their local parameters within specific regions optimal states. It is important to be able to include them
(larger samples of populations) and with parameterized in final decisions, e.g., to apply additional metrics of
long-ranged inter-regional and external connectivities. performance specific to applications. Experience with
Parameters of a given subset of macrocolumns will risk-managing portfolios shows that all criteria are
be fit using ASA to patterns representing Ideas, akin not best considered by lumping them all into one cost
to acquiring hard-wired long-term (LTM) patterns. function, but rather good judgment should be applied to
Parameters of external and inter-regional interactions multiple stages of pre-processing and post-processing
will be determined that promote or inhibit the spread when performing such sampling, e.g., adding additional
of these Ideas, by determining the degree of fits and metrics of performance.
overlaps of probability distributions relative to the
seeded macrocolumns.
That is, the same Ideas/patterns may be represented FUTURE TRENDS
in other than the seeded macrocolumns by local conflu-
ence of macrocolumnar and long-ranged firings, akin Given financial and political motivations to merge in-
to STM, or by different hard-wired parameter LTM formation discussed in the Introduction, it is inevitable
sets that can support the same local firings in other that many AI algorithms will be developed, and many
regions (possible in nonlinear systems). SMNI also current AI algorithms will be enhanced, to address
calculates how STM can be dynamically encoded into these issues.
LTM (Ingber, 1983).
Small populations in regions will be sampled to
determine if the propagated Idea(s) exists in its pattern CONCLUSION
space where it did exist prior to its interactions with the
seeded population. SMNI derives nonlinear functions It seems appropriate to base an approach for propa-
as arguments of probability distributions, leading to gation of generic ideas on the only system so far
multiple STM, e.g., 72 for auditory memory capac- demonstrated to develop and nurture ideas, i.e., the
ity. Some investigation will be made into nonlinear neocortical brain. A statistical mechanical model of
functional forms other than those derived for SMNI, neocortical interactions, developed by the author and
e.g., to have capacities of tens or hundreds of patterns tested successfully in describing short-term memory and
for ISM. EEG indicators, Ideas by Statistical Mechanics (ISM)
(Ingber, 2006) is the proposed model. ISM develops
Application of TRD Analysis subsets of macrocolumnar activity of multivariate
stochastic descriptions of defined populations, with
This approach includes application of methods of port- macrocolumns defined by their local parameters within
folio risk analysis to such statistical systems, correct- specific regions and with parameterized endogenous
ing two kinds of errors committed in multivariate risk inter-regional and exogenous external connectivities.
analyses: (E1) Although the distributions of variables Tools of financial risk management, developed to
being considered are not Gaussian (or not tested to see process correlated multivariate systems with differ-
how close they are to Gaussian), standard statistical ing non-Gaussian distributions using modern copula
calculations appropriate only to Gaussian distribu- analysis, importance-sampled using ASA, will enable
tions are employed. (E2) Either correlations among bona fide correlations and uncertainties of success and
the variables are ignored, or the mistakes committed failure to be calculated.
in (E1) incorrectly assuming variables are Gaussian
are compounded by calculating correlations as if all
variables were Gaussian.
It should be understood that any sampling algorithm
processing a huge number of states can find many
multiple optima. ASA's MULTI_MIN OPTIONS are
AI and Ideas by Statistical Mechanics
AI and Ideas by Statistical Mechanics
Aedn C. Culhane
Harvard School of Public Health, USA
Alice J. Armstrong
The George Washington University, USA
John Quackenbush
Harvard School of Public Health, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AI Methods for Analyzing Microarray Data
AI Methods for Analyzing Microarray Data
with fewer variables while retaining the same predic- and groups similar data points using some distance
tive power (Guyon & Elisseeff, 2003). For this reason measure such that two data points that are most simi- A
multivariate methods are more appropriate. lar are grouped together in a cluster by making them
As an alternative, wrappers consider the learning children of a parent node in the tree. This process is
algorithm as a black-box and use prediction accuracy to repeated in a bottom-up fashion until all data points
evaluate feature subsets (Kohavi & John, 1997). Wrap- belong to a single cluster (corresponding to the root
pers are more direct than filter methods but depend on the of the tree).
particular learning algorithm used. The computational Hierarchical and other clustering approaches,
complexity associated with wrappers is prohibitive including K-means, have been applied to microarray
due to curse of dimensionality, so typically filters are data (Causton et al., 2003). Hierarchical clustering
used with forward selection (starting with an empty set was applied to study gene expression in samples from
and adding features one by one) instead of backward patients with diffuse large B-cell lymphoma (DLBCL)
elimination (starting with all features and removing resulting in the discovery of two subtypes of the dis-
them one by one). Dimension reduction approaches ease. These groups were found by analyzing microar-
are also used for multivariate feature selection. ray data from biopsy samples of patients who had not
been previously treated. These patients continued to
Dimension Reduction Approaches be studied after chemotherapy, and researchers found
that the two newly discovered disease subtypes had
Principal component analysis (PCA) is widely used for different survival rates, confirming the hypothesis that
dimension reduction in machine learning (Wall et al., the subtypes had significantly different pathologies
2003). The idea behind PCA is quite intuitive: correlated (Alizadeh et al., 2000).
objects can be combined to reduce data dimensional- While clustering simply groups the given data based
ity. Relationships between gene expression profiles in on pair-wise distances, when information is known a
a data matrix can be expressed as a linear combination priori about some or all of the data i.e. labels, a super-
such that colinear variables are regressed onto a new vised approach can be used to obtain a classifier that
set of coordinates. PCA, its underlying method Single can predict the label of new instances.
Value Decomposition (SVD), related approaches such as
correspondence analysis (COA), and multidimensional Classification (Supervised Learning)
scaling (MDS) have been applied to microarray data
and are reviewed by Brazma & Culhane (2005). Studies The large dimensionality of microarray data means that
have reported that COA or other dual scaling dimension all classification methods are susceptible to over-fitting.
reduction approaches such as spectral map analysis may Several supervised approaches have been applied to
be more appropriate than PCA for decomposition of microarray data including Artificial Neural Networks
microarray data (Wouters et al., 2003). (ANNs), Support Vector Machines (SVMs), and k-NNs
While PCA considers the variance of the whole among others (Hastie et al., 2001).
dataset, clustering approaches examine the pairwise A very challenging and clinically relevant prob-
distance between instances or features. Therefore, these lem is the accurate diagnosis of the primary origin of
methods are complementary and are often both used metastatic tumors. Bloom et al. (2004) applied ANNs
in exploratory data analysis. However, difficulties in to the microarray data of 21 tumor types with 88%
interpreting the results in terms of discrete genes limit accuracy to predict the primary site of origin of meta-
the application of these methods. static cancers with unknown origin. A classification
of 84% was obtained on an independent test set with
Clustering important implications for diagnosing cancer origin
and directing therapy.
What we see as one disease is often a collection of In a comparison of different SVM approaches,
disease subtypes. Class discovery aims to discover multicategory SVMs were reported to outperform other
these subtypes by finding groups of instances with popular machine learning algorithms such as k-NNs
similar expression patterns. Hierarchical clustering is and ANNs (Statnikov et al., 2005) when applied to 11
an agglomerative method which starts with a singleton publicly available microarray datasets related to cancer.
AI Methods for Analyzing Microarray Data
It is worth noting that feature selection can significantly Friedman et al. (2000) introduced a multinomial
improve classification performance. model framework for BNs to reverse-engineer networks
and showed that this method differs from clustering in
Cross-Validation that it can discover gene interactions other than cor-
relation when applied to yeast gene expression data.
Cross-validation (CV) is appropriate in microarray stud- Spirtes et al. (2002) highlight some of the difficulties of
ies which are often limited by the number of instances applying this approach to microarray data. Nevertheless,
(e.g. patient samples). In k-fold CV, the training set is many extensions of this research direction have been
divided into k subsets of equal size. In each iteration explored. Correlation is not necessarily a good predictor
k-1 subsets are used for training and one subset is of interactions, and weak interactions are essential to
used for testing. This process is repeated k times and understand disease progression. Identifying the biologi-
the mean accuracy is reported. Unfortunately, some cally meaningful interactions from the spurious ones is
published studies have applied CV only partially, by challenging, and BNs are particularly well-suited for
applying CV on the creation of the prediction rule modeling stochastic biological processes.
while excluding feature selection. This introduces a The exponential growth of data produced by mi-
bias in the estimated error rates and over-estimates croarray technology as well as other high-throughput
the classification accuracy (Simon et al., 2003). As a data (e.g. protein-protein interactions) call for novel AI
consequence, results from many studies are contro- approaches as the paradigm shifts from a reductionist
versial due to methodological flaws (Dupuy & Simon, to a mechanistic systems view in the life sciences.
2007). Therefore, models must be evaluated carefully
to prevent selection bias (Ambroise & McLachlan,
2002). Nested CV is recommended, with an inner CV FUTURE TRENDS
loop to perform the tuning of the parameters and an
outer CV to compute an estimate of the error (Varma Uncovering the underlying biological mechanisms
& Simon, 2006). that generate these data is harder than prediction and
Several studies which have examined similar bio- has the potential to have far reaching implications for
logical problems have reported poor overlap in gene understanding disease etiologies. Time series analysis
expression signatures. Brenton et al. (2005) compared (Bar-Joseph, 2004) is a first step to understanding the
two gene lists predictive of breast cancer prognosis dynamics of gene regulation, but, eventually, we need to
and found only 3 genes in common. Even though the use the technology not only to observe gene expression
intersection of specific gene lists is poor, the highly data but also to direct intervention experiments (Peer
correlated nature of microarray data means that many et al., 2001, Yoo et al., 2002) and develop methods to
gene lists may have similar prediction accuracy (Ein- investigate the fundamental problem of distinguishing
Dor et al., 2004). Gene signatures identified from dif- correlation from causation.
ferent breast cancer studies with few genes in common
were shown to have comparable success in predicting
patient survival (Buyse et al., 2006). CONCLUSION
Commonly used supervised learning algorithms
yield black box models prompting the need for interpre- We have reviewed AI methods for pre-processing,
table models that provide insights about the underlying clustering, feature selection, classification and mecha-
biological mechanism that produced the data. nistic analysis of microarray data. The clusters, gene
lists, molecular fingerprints and network hypotheses
Network Analysis produced by these approaches have already shown
impact; from discovering new disease subtypes and
Bayesian networks (BNs), derived from an alliance biological markers, predicting clinical outcome for
between graph theory and probability theory, can directing treatment as well as unraveling gene networks.
capture dependencies among many variables (Pearl, From the AI perspective, this field offers challenging
1988, Heckerman, 1996). problems and may have a tremendous impact on biol-
ogy and medicine.
AI Methods for Analyzing Microarray Data
REFERENCES Friedman N., Linial M., Nachman I., & Peer D. (2000).
Using Bayesian networks to analyze expression data. A
Alizadeh A.A., Eisen M.B., Davis R.E., Ma C., Lossos Journal of Computational Biology, 7(3-4), 601-20.
I.S., Rosenwald A., et al. (2000). Distinct types of diffuse
Golub T. R., Slonim D. K., Tamayo P., Huard C.,
large B-cell lymphoma identified by gene expression
Gaasenbeek M., Mesirov J. P., et al. (1999). Molecular
profiling. Nature, 403(6769), 503-11.
Classification of Cancer: Class Discovery and Class
Ambroise C., & McLachlan G.J. (2002). Selection bias Prediction by Gene Expression Monitoring. Science,
in gene extraction on the basis of microarray gene-ex- 286 (5439), 531.
pression data. Proceedings of the National Academy
Guyon, I., & Elisseff, A. (2003). An introduction to
of Sciences, 99(10), 6562-6.
variable and feature selection. Journal of Machine
Bar-Joseph Z. (2004). Analyzing time series gene ex- Learning Research, 3, 1157-1182.
pression data. Bioinformatics, 20(16), 2493-503.
Hastie T., Tibshirani R., & Friedman J. (2001). The
Bloom G., Yang I.V., Boulware D., Kwong K.Y., Cop- Elements of Statistical Learning: Data Mining, Infer-
pola D., Eschrich S., et al. (2004). Multi-platform, ence, and Prediction. New York: Springer Series in
multi-site, microarray-based human tumor classifica- Statistics.
tion. American Journal of Pathology, 164(1), 9-16.
Heckerman D. (1996). A Tutorial on Learning with
Brenton J.D., Carey L.A., Ahmed A.A., & Caldas C. Bayesian Networks. Technical Report MSR-TR-95-06.
(2005). Molecular classification and molecular fore- Microsoft Research.
casting of breast cancer: ready for clinical application?
Inza I., Larraaga P., Blanco R., & Cerrolaza A.J.
Journal of Clinical Oncology, 23(29), 7350-60.
(2004). Filter versus wrapper gene selection approaches
Brazma A., & Culhane AC. (2005). Algorithms for in DNA microarray domains. Artificial Intelligence in
gene expression analysis. In Jorde LB., Little PFR, Medicine, special issue in Data mining in genomics
Dunn MJ., Subramaniam S. (Eds.) Encyclopedia of and proteomics, 31(2), 91-103.
Genetics, Genomics, Proteomics and Bioinformatics.,
Kohavi R., & John G.H. (1997). Wrappers for feature
(3148 -3159) London: John Wiley & Sons.
subset selection, Artificial Intelligence, 97(1-2), 273-
Buyse, M., Loi S., Vant Veer L., Viale G., Delorenzi 324.
M., Glas A.M., et al. (2006). Validation and clinical
Nutt C.L., Mani D.R., Betensky R.A., Tamayo P.,
utility of a 70-gene prognostic signature for women with
Cairncross J.G., Ladd C., et al. (2003). Gene Expression-
node-negative breast cancer. Journal of the National
based Classification of Malignant Gliomas Correlates
Cancer Institute, 98, 1183-92.
Better with Survival than Histological Classification.
Causton H.C., Quackenbush J., & Brazma A. (2003) Mi- Cancer Research, 63, 1602-1607.
croarray Gene Expression Data Analysis: A Beginners
Peer D, Regev A, Elidan G, & Friedman N. (2001).
Guide. Oxford: Blackwell Science Limited.
Inferring subnetworks from perturbed expression pro-
Dupuy A., & Simon RM. (2007). Critical review of files. Bioinformatics, 17 S1, S215-24.
published microarray studies for cancer outcome and
Pearl J. (1988). Probabilistic Reasoning in Intelligent
guidelines on statistical analysis and reporting. Journal
Systems: Networks of Plausible Inference, San Mateo:
of the National Cancer Institute, 99(2), 147-57.
Morgan Kaufmann Publishers.
Ein-Dor L., Kela I., Getz G., Givol D., & Domany E.
Quackenbush J. (2002). Microarray data normalization
(2004). Outcome signature genes in breast cancer: is
and transformation, Nature Genetics, 32, 496501.
there a unique set? Bioinformatics, 21(2), 171-8.
Quackenbush J. (2006). Microarray Analysis and Tumor
Eisen M.B., Spellman P.T., Brown P.O., & Botstein
Classification. The New England Journal of Medicine,
D. (1998). Cluster analysis and display of genome-
354(23), 2463-72.
wide expression patterns. Proceedings of the National
Academy of Sciences, 95, 14863-14868.
AI Methods for Analyzing Microarray Data
Simon R., Radmacher M.D., Dobbin K., & McShane Yoo C., Thorsson V., & Cooper G.F. (2002). Discovery
L.M. (2003). Pitfalls in the use of DNA microarray data of causal relationships in a gene-regulation pathway
for diagnostic and prognostic classification. Journal of from a mixture of experimental and observational
the National Cancer Institute, 95(1), 14-8. DNA microarray data. Biocomputing: Proceedings of
the Pacific Symposium, 7, 498-509
Spirtes, P., Glymour, C., Scheines, R. Kauffman, S.,
Aimale, V., & Wimberly, F. (2001). Constructing Bayes-
ian Network Models of Gene Expression Networks KEy TERMS
from Microarray Data. Proceedings of the Atlantic
Symposium on Computational Biology, Genome In- Curse of Dimensionality: A situation where the
formation Systems and Technology. number of features (genes) is much larger than the
number of instances (biological samples) which is
Statnikov A., Aliferis C.F., Tsamardinos I., Hardin
known in statistics as p >> n problem.
D., & Levy S. (2005). A comprehensive evaluation of
multicategory classification methodsfor microarray Feature Selection: A problem of finding a subset (or
gene expression cancer diagnosis. Bioinformatics, subsets) of features so as to improve the performance
21(5), 631-643 of learning algorithms.
Troyanskaya O., Cantor M., Sherlock G., Brown P., Microarray: A microarray is an experimental assay
Hastie T., Tibshirani R., et al. (2001). Missing value which measures the abundances of mRNA (intermedi-
estimation methods for DNA microarrays. Bioinformat- ary between DNA and proteins) corresponding to gene
ics, 17(6), 520-5. expression levels in biological samples.
Tusher V.G., Tibshirani R., & Chu G. (2001). Signifi- Multiple testing problem: A problem that occurs
cance analysis of microarrays applied to the ionizing when a large number of hypotheses are tested simul-
radiation response. Proceedings of the National Acad- taneously using a user-defined cut off p-value which
emy of Sciences, 98(9), 5116-5121. may lead to rejecting a non-negligible number of null
hypotheses by chance.
Varma, S., & Simon, R. (2006). Bias in error estimation
when using cross-validation for model selection. BMC Over-Fitting: A situation where a model learns
Bioinformatics, 7, 91 spurious relationships and as a result can predict training
data labels but not generalize to predict future data.
Witten, I. H. & Frank, E. (2000). Data Mining: Practi-
cal Machine Learning Tools and Techniques with Java Supervised Learning: A learning algorithm that
Implementations. Morgan Kaufmann Publishers Inc. is given a training set consisting of feature vectors as-
sociated with class labels and whose goal is to learn
Wall, M., Rechtsteiner, A., & Rocha, L. (2003). Singular
a classifier that can predict the class labels of future
value decomposition and principal component analy-
instances.
sis. In D.P. Berrar, W. Dubitzky, M. Granzow (Eds.)
A Practical Approach to Microarray Data Analysis. Unsupervised Learning: A learning algorithm that
(91-109). Norwell: Kluwer. tries to identify clusters based on similarity between
features or between instances or both but without tak-
Wouters, L., Gohlmann, H.W., Bijnens, L., Kass,
ing into account any prior knowledge.
S.U., Molenberghs, G., & Lewi, P.J. (2003). Graphi-
cal exploration of gene expression data: a comparative
study of three multivariate methods. Biometrics, 59,
1131-1139
0
Emilio Soria-Olivas
University of Valencia, Spain
Antonio J. Serrano-Lpez
University of Valencia, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
An AI Walk from Pharmacokinetics to Marketing
Focusing on problems that are closer to the real- to enable the formulation of a statistical model. Examples
life application that will be described in next section, of systems of this kind are text recommendation systems
there are also a number of recent works involving the like the newsgroup filtering system, NewsWeeder,
use of ML for drug delivery in kidney disease. For which uses words from its texts as features.
instance, a comparison of renal-related adverse drug
reactions between rofecoxib and celecoxib, based on Marketing
the WHO/Uppsala Monitoring Centre safety database,
was carried out by (Zhao, Reynolds, Lejkowith, The latest marketing trends are more concerned about
Whelton, & Arellano, 2001). Disproportionality in the maintaining current customers and optimizing their
association between a particular drug and renal-related behaviour than getting new ones. For this reason,
adverse drug reactions was evaluated using a Bayesian relational marketing focuses on what a company must
confidence propagation neural network method. A do to achieve this objective. The relationships between a
study of prediction of cyclosporine dosage in patients company and its costumers follow a sequence of action-
after kidney transplantation using neural networks and response system, where the customers can modify their
kernel-based methods was carried out in (Camps et al., behaviour in accordance with the marketing actions
2003). In (Gaweda, Jacobs, Brier, & Zurada, 2003), a developed by the company.
pharmacodynamic population analysis in CRF patients The development of a good and individualized
using ANNs was performed. Such models allow for policy is not easy because there are many variables
adjusting the dosing regime. Finally, in (Martn et al., to take into account. Applications of this kind can
2003) , the use of neural networks was proposed for be viewed as a Markov chain problem, in which a
the optimization of EPO dosage in patients undergoing company decides what action to take once the customer
anaemia connected with CRF. properties in the current state (time t), are known.
Reinforcement Learning (RL) can be used to solve this
Web Recommender Systems task since previous applications have demonstrated its
suitability in this area. In (Sun, 2003), RL was applied
Recommender systems are widely used in web sites to analyse mailing by studying how an action in time
including Google. The main goal of these systems is to t influences actions in following times. In (Abe et al.,
recommend objects which a user might be interested in. 2002) and (Pednault, Abe & Zadrozny., 2002), several
Two main approaches have been used: content-based RL algorithms were benchmarked in mailing problems.
and collaborative filtering (Zukerman & Albrecht, In (Abe, 2004), RL was used to optimize cross channel
2001), although other kinds of techniques have also marketing.
been proposed (Burke, 2002).
Collaborative recommenders aggregate ratings
of recommendations of objects, find user similarities AI CONTRIBUTIONS IN REAL-LIFE
based on their ratings, and finally provide new APPLICATIONS
recommendations based on inter-user comparisons.
Some of the most relevant systems using this technique Previous section showed a review of related work.
are GroupLens/NetPerceptions and Recommender. In this section, we will focus on showing authors
The main advantage of collaborative techniques is experience in using AI to solve real-life problems. In
that they are independent from any machine-readable order to show up the versatility of AI methods, we will
representation of the objects, and that they work well focus on particular applications from three different
for complex objects where subjective judgements are fields of knowledge, the same that were reviewed in
responsible for much of the variation in preferences. previous section.
Content-based learning is used when a users past
behaviour is a reliable indicator of his/her future Pharmacokinetics
behaviour. It is particularly suitable for situations in
which users tend to exhibit idiosyncratic behaviour. Although we have also worked with other
However, this approach requires a system to collect pharmacokinetic problems, in this work, we focus
relatively large amounts of data from each user in order on maybe the most relevant problem, which is the
An AI Walk from Pharmacokinetics to Marketing
optimization of EPO dosages in patients within a be better to have an automatic model that suggests
haemodialysis program. Patients who suffer from the actions to be made in order to attain the targeted A
CRF tend to suffer from an associated anaemia, as range of Hb, rather than this indirect approach. This
well. EPO is the treatment of choice for this kind of reflection made us research on new models, and we
anaemia. The use of this drug has greatly reduced came up with the use of RL (Sutton & Barto, 1998).
cardiovascular problems and the necessity of multiple We are currently working on this topic but we have
transfusions. However, EPO is expensive, making the already achieved promising results, finding policies
already costly CRF program even more so. Moreover, (sequence of actions) that appear to be better than those
there are significant risks associated with EPO such followed in the hospital, i.e., there are a higher number
as thrombo-embolisms and vascular problems, if of patients within the desired target of Hb at the end
Haemoglobin (Hb) levels are too high or they increase of the treatment (Martn et al., 2006a).
too fast. Consequently, optimizing dosage is critical
to ensure adequate pharmacotherapy as well as a Web Recommender Systems
reasonable treatment cost.
Population models, widely used by Pharmacoki- A completely different application is described in
netics researchers, are not suitable for this problem this subsection, namely, the development of web
since the response to the treatment with EPO is highly recommender systems. The authors proposed a new
dependent on the patient. The same dosages may have approach to develop recommender systems based on
very different responses in different patients, most collaborative filtering, but also including an analysis of
notably the so-called EPO-resistant patients, who do the feasibility of the recommender by using a prediction
not respond to EPO treatment, even after receiving stage (Martn et al., 2006b).
high dosages. Therefore, it is preferable to focus on The very basic idea was to use clustering algorithms
an individualized treatment. in order to find groups of similar users. The following
Our first approach to this problem was based on clustering algorithms were taken into account: K-
predicting the Hb level given a certain administered Means, FCM, HCA, EM algorithm, SOMs and ART.
dose of EPO. Although the final goal is to individualize New users were assigned to one of the groups found
EPO doses, we did not predict EPO dose but Hb by these clustering algorithms, and then they were
level. The reason is that EPO predictors would model recommended with web services that were usually
physicians protocol whereas Hb predictors model accessed by other users of his/her same group, but
bodys response to the treatment, hence being a more had not yet been accessed by these new users (in order
objective approach. In particular, the following to maximize the usefulness of the approach). Using
models were used: GARCH (Hamilton, 1994), MLP, controlled data sets, the study concluded that ART and
FIR neural network, Elmans recurrent neural network SOMs showed a very good behaviour with data sets
and SVR (Haykin, 1999). Accurate prediction models of very different characteristics, whereas HCA and
were obtained, especially when using ANNs and SVR. EM showed an acceptable behaviour provided that
Dynamic neural networks (i.e., FIR and recurrent) did the dimensionality of the data set was not too high and
not outperform notably the static MLP probably due to the overlap was slight. Algorithms based on K-Means
the short length of the time series (Martn et al., 2003). achieved the most limited success in the acceptance of
An easy-to-use software application was developed to offered recommendations.
be used by clinicians, in which after filling in patients Even though the use of RL was only slightly
data and a certain EPO dose, the predicted Hb level for studied, it seems to be a suitable choice for this
next month was shown. problem, since the internal dynamics of the problem
Although prediction models were accurate, we is easily tackled by RL, and moreover the interference
realized that this prediction approach had a major between the recommendation interface and the user can
flaw. Despite obtaining accurate models, we had not be minimized with an adequate definition of the rewards
yet achieved a straightforward way to transfer the (Hernndez, Gaudioso, & Boticario, 2004).
extracted knowledge to daily clinical practice, because
clinicians had to play with different doses to analyse
the best solution to attain a certain Hb level. It would
An AI Walk from Pharmacokinetics to Marketing
An AI Walk from Pharmacokinetics to Marketing
Seelf-Organizing Maps to design a marketing campaign. Sutton, R. S., & Barto, A. G. (1998). Reinforcement
Proceedings of the 2nd International Conference on Learning: An Introduction. MIT Press, Cambridge, A
Machine Intelligence ACIDCA-ICMI 2005, 259-265. MA, USA.
Gray, D. L., Ash, S. R., Jacobi, J. & Michel, A. N. Zhao, S. Z., Reynolds, M. W., Leikowith, J., Whelton,
(1991). The training and use of an artificial neural A., & Arellano, F. M. (2001). A comparison of renal-
network to monitor use of medication in treatment of related adverse drug reactions between rofecoxib
complex patients. Journal of Clinical Engineering, 16 and celecoxib, based on World Health Organization/
(4), 331-336. Uppsala Monitoring Centre safety database. Clinical
Therapeutics, 23 (9), 1478-1491.
Hamilton, J. D. (1994). Time Series Analysis, Princeton
University Press, Princeton NJ, USA. Zukerman, I., & Albrecht, D. (2001). Predictive
statistical models for user modeling. User Modeling
Haykin, S. (1999). Neural Networks (2nd ed.). Prentice
and User-Adapted Interaction, 11, 5-18.
Hall, Englewood Cliffs, NJ, USA.
Hernndez, F., Gaudioso, E. & Boticario, J. G. (2004)
KEy TERMS
A reinforcement approach to achieve unobstrusive and
interactive recommendation systems for web-based
Agent: In RL terms, it is the responsible of
communities. Proceedings of Adaptive Hypermedia
making decisions according to observations of its
2004, 409-412.
environment.
Lisboa, P. J. G. (2002). A review of evidence of health
Environment: In RL terms, it is every external
benefit from artificial neural networks in medical
condition to the agent.
intervention. Neural Networks, 15 (1), 11-39.
Exploration-Explotation Dilemma: It is a classical
Martn, J. D., Soria, E., Camps, G., Serrano, A. J., Prez,
RL dilemma, in which a trade-off solution must be
J. J., & Jimnez, N. V. (2003). Use of neural networks for
achieved. Exploration means random search of new
dosage indidualisation of erythropoietin in patients with
actions in order to achieve a likely (but yet unknown)
secondary anemia to chronic renal failure. Computers
better reward than all the known ones, while explotation
in Biology and Medicine, 33 (4), 361-373.
is focused on exploiting the current knowledge for the
Martn, J. D., Soria, E., Chorro, V., Climente. M., & maximization of the reward (greedy approach).
Jimnez, N. V. (2006a). Reinforcement Learning for
Life-Time Value: It is a measure widely used in
anemia management in hemodialysis patients treated
marketing applications that offers the long-term result
with erythropoietic stimulating factors. Proceedings
that has to be maximized.
of the Workshop Planning, Learning and Monitoring
with uncertainty and dynamic worlds, European Reward: In RL terms, the immediate reward is
Conference on Artificial Intelligence 2006, 19-24. the value returned by the environment to the agent
depending on the taken action. The long-term reward
Martn, J. D., Palomares, A., Balaguer, E., Soria,
is the sum of all the immediate rewards throughout a
E., Gmez, J., & Soriano, A. (2006b) Studying the
complete decision process.
feasibility of a recommender in a citizen web portal
based on user modeling and clustering algorithms. Sensitivity: Similar measure that offers the ratio
Expert Systems with Aplications, 30 (2), 299-312. of positives that are correctly classified by the model.
(Refer to Specificity.)
Pednault, E., Abe, N., & Zadrozny, B. (2002). Sequential
cost-sensitive decision making with reinforcement Specificity: Success rate measure in a classification
learning. Proceedings of the Eighth ACM SIGKDD problem. If there are two classes (namely, positive and
International Conference on Knowledge Discovery negative), specificity measures the ratio of negatives
and Data Mining 2002, 259-268. that are correctly classified by the model.
Sun, P. (2003). Constructing learning models from
data: The dynamic catalog mailing problem. Ph. D.
Dissertation, Tsinghua University, China.
Anamika Gupta
University of Delhi, India
Naveen Kumar
University of Delhi, India
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Algorithms for Association Rule Mining
Algorithms for Association Rule Mining
ARM Algorithms
ing to which, no superset of an infrequent item-set can counting candidate sets of different cardinality in a
be frequent. single scan (Brin S., Motwani R., Ullman J.D., & Tsur
Agarwal et al. (Agarwal, R., Imielinski T., & Swami S. 1997). Pincer Search algorithm uses a bi-directional
A. 1993), (Agarwal, R., & Swami A., 1994) proposed strategy to prune the candidate set from top (maximal)
Apriori algorithm, which is the most popular iterative and bottom (1-itemset) (Lin D. & Kedem Z.M. 1998).
algorithm in this category. It starts, with finding the Partitioning and Sampling strategies have also been
frequent item-sets of size one and goes up level by proposed to speed up the counting task. An excellent
level, finding candidate item-sets of size k by joining comparison of Apriori algorithm and its variants has
item-sets of size k-1. Two item-sets, each of size k-1 been given in (Hipp J., Gntzer U., Nakhaeizadeh G.
join to form an item-set of size k if and only if they have 2000).
first k-2 items common. At each level the algorithm
prunes the candidate item-sets using anti-monotonic Tree Based Algorithms
property and subsequently scans the database to find
the support of pruned candidate item-sets. The process Tree based algorithms have been proposed to overcome
continues till the set of frequent item-sets is non- the problem of multiple database scans. These algo-
empty. Since each iteration requires a database scan, rithms compress (sometimes lossy) the database into a
maximum number of database scans required is same tree data structure and reduce the number of database
as the size of maximal item-set. Fig. 3 and Fig 4 gives scans appreciably. Subsequently the tree is used to mine
the pseudo code of Apriori algorithm and a running for support of all frequent item-sets.
example respectively. Set-Enumeration tree used in Max Miner algorithm
Two of the major bottlenecks in Apriori algorithm (Bayardo R.J. 1998) orders the candidate sets while
are i) number of passes and ii) number of candidates searching for maximal frequent item-sets. The data
generated. The first is likely to cause I/O bottleneck structure facilitates quick identification of long frequent
and the second causes heavy load on memory and CPU item-sets based on the information gathered during each
usage. Researchers have proposed solutions to these pass. The algorithm is particularly suitable for dense
problems with considerable success. Although detailed databases with maximal item-sets of high cardinality.
discussion of these solutions is beyond the scope of Han et. al. (Han, J., Pei, J., & Yin, Y. 2000) pro-
this article, a brief mention is necessary. posed Frequent Pattern (FP)-growth algorithm which
Hash techniques reduce the number of candidates performs a database scan and finds frequent item-sets
by making a hash table and discarding a bucket if it has of cardinality one. It arranges all frequent item-sets in
support less than the ms. Thus at each level memory a table (header) in the descending order of their sup-
requirement is reduced because of smaller candidate ports. During the second database scan, the algorithm
set. The reduction is most significant at lower levels. constructs in-memory data structure called FP-Tree
Maintaining a list of transaction ids for each candidate by inserting each transaction after rearranging it in
set reduces the database access. Dynamic Item-set descending order of the support. A node in FP-Tree
Counting algorithm reduces the number of scans by stores a single attribute so that each path in the tree
Algorithms for Association Rule Mining
represents and counts the corresponding record in the Frequent Closed Item-sets (FCI). This method finds
database. A link from the header connects all the nodes closures based on Galois closure operators and com-
of an item. This structural information is used while putes the generators. Galois closure operator h(X) for
mining the FP-Tree. FP-Growth algorithm recursively some X I is defined as the intersection of transactions
generates sub-trees from FP-Trees corresponding to in D containing item-set X. An item-set X is a closed
each frequent item-set. item-set if and only if h(X) = X. One of the smallest
Coenen et. al. (Coenen F., Leng P., & Ahmed arbitrarily chosen item-set p, such that h(p) = X is
S. 2004) proposed Total Support Tree (T-Tree) and known as generator of X.
Partial Support Tree (P-Tree) data structures which Close method is based on Apriori algorithm. It
offer significant advantage in terms of storage and starts from 1- item-sets, finds the closure based on
execution. These data structures are compressed set Galois closure operator, goes up level by level com-
enumeration trees and are constructed after one scan puting generators and their closures (i.e. FCI) at each
of the database and stores all the item-sets as distinct level. At each level, candidate generator item-sets of
records in database. size k are found by joining generator item-sets of size
k-1 using the combinatorial procedure used in Apriori
Discovery of Frequent Closed Item-Sets algorithm. The candidate generators are pruned using
two strategies i) remove candidate generators whose
Level Wise Approach all subsets are not frequent ii) remove the candidate
generators if closure of one of its subsets is superset of
Pasquier et. al. (Pasquier N., Bastide Y., Taouil R. the generator. Subsequently algorithm finds the support
& Lakhal L. 1999) proposed Close method to find of pruned candidate generator. Each iteration requires
Algorithms for Association Rule Mining
Figure 4. Running example of apriori algorithm for finding frequent itemsets (ms = 40%)
one pass over the database to construct the set of FCI Concept Lattice Based Approach
and count their support.
Concept lattice is a core structure of Formal Concept
Tree Based Approach Analysis (FCA). FCA is a branch of mathematics based
on Concept and Concept hierarchies. Concept (A,B) is
Wang et. al. (Wang J., Han J. & Pei J. 2003) proposed defined as a pair of set of objects A (known as extent)
Closet+ algorithm to compute FCI and their supports and set of attributes B (known as intent) such that set
using FP-tree structure. The algorithm is based on divide of all attributes belonging to extent A is same as B
and conquers strategy and computes the local frequent and set of all objects containing attributes of intent
items of a certain prefix by building and scanning its B is same as A. In other words, no object other than
projected database. objects of set A contains all attributes of B and no at-
tribute other than attributes in set B is contained in all
objects of set A. Concept lattice is a complete lattice of
all Concepts. Stumme G., (1999) discovered that intent
0
Algorithms for Association Rule Mining
Exhibit A.
add extent {all transactions} in the list of extents A
For each item i I
for each set X in the list of extents
find X {set of transactions containing i}
include in the list of extents if not included earlier
EndFor
EndFor
B of the Concept (A,B) represents the closed item-set, Figure 5. Concept lattice
which implies that all algorithms for finding Concepts
can be used to find closed item-sets. Kuznetsov S.O.,
& Obiedkov S.A. (2002) provides a comparison of
performance of various algorithms for concepts. The
nave method to compute Concepts, proposed by Ganter
is given in Exhibit A.
This method generates all the Concepts i.e. all closed
item-sets. Closed item-sets generated using this method
in example 1 are {A},{B} ,{C},{A,B},{A,C},{B,D},{B,
C,D}, {B,D,E}, {B,C,D,E}. Frequent Closed item-sets
are {A} ,{B},{C},{B,D},{B,C,D},{B,D,E}.
Concept lattice for frequent closed item-sets is
given in Figure 5.
Algorithms for Association Rule Mining
Exhibit B.
For each frequent item-set I,
generate all non-empty subsets of I
For every non-empty subset s of I,
Output the rule s (I-s) if support(I) / support (s) >= mc
EndFor
EndFor
chines to achieve scalability (Hipp J., Guntzer U., & Databases, Proceedings of the 1993 ACM International
Nakhaeizadeh G., 2000). Algorithms, which partition Conference on Management of Data, 207-216, Wash-
the dataset exchange counters while the algorithms, ington, D.C.
which partition the counters, exchange datasets incur-
Agrawal R., & Srikant R., (1994), Fast Algorithms
ring high communication cost.
for Mining Association Rules in Large Databases,
Proceedings of the Twentieth International Conference
on VLDB, pp. 487-499, Santiago, Chile
FUTURE TRENDS
Bayardo R.J. (1998), Efficiently Mining Long Patterns
Discovery of Frequent Closed Item-sets (FCI) is a From Databases, Proceedings of the ACM Interna-
big lead in ARM algorithms. With the current growth tional Conference on Management of Data.
rate of databases and increasing applications of ARM
in various scientific and commercial applications we Brin S., Motwani R., Ullman J. D., & Tsur S., (1997),
envisage tremendous scope for research in parallel, Dynamic Item-set Counting and Implication Rules for
incremental and distributed algorithms for FCI. Use of Market Basket Data. ACM Special Interest Group on
lattice structure for FCI offers promise of scalability. On Management of Data, 26(2):255
line mining on streaming datasets using FCI approach Coenen F., Leng P., & Ahmed S., (2004) Data Structure
is an interesting direction to work on. for Association Rule Mining: T-Trees and P-Trees,
IEEE TKDE, Vol. 16, No. 6
CONCLUSION Han, J., Pei, J., & Yin, Y., (2000), Mining Frequent
Patterns Without Candidate Generation, Proceedings
The article presents the basic approach for Association of the ACM International Conference on Management
Rule Mining, focusing on some common algorithms of Data, ACM Press, 1-12.
for finding frequent item-sets and frequent closed Han, J., & Kamber, M., (2006), Data Mining: Con-
item-sets. Various approaches have been discussed to cepts and Techniques, 2nd ed. Morgan Kaufmann
find such item-sets. Formal Concept Analysis approach Publishers.
for finding frequent closed item-sets is also discussed.
Generation of rules from frequent items-sets and fre- Hipp, J., Gntzer, U., & Nakhaeizadeh, G., (2000),
quent closed item-sets is briefly discussed. The article Algorithms for Association Rule Mining: A General
addresses the scalability issues involved in various Survey and Comparison, SIGKDD Explorations.
algorithms. Kuznetsov, S.O., & Obiedkov, S.A., (2002), Comparing
Performance of Algorithms For Generating Concept
Lattices, Journal of Experimentation and Theoretical
REFERENCES Artificial Intelligence.
Agarwal, R., Imielinski T., & Swami A., (1993), Min- Lin, D., & Kedem, Z. M., (1998), Pincer Search: A New
ing Association Rules Between Sets of Items in Large Algorithm for Discovering the Maximum Frequent
Algorithms for Association Rule Mining
Sets. Proceedings of the 6th International Conference B = {geG | gIm for all meB} (the set of objects com-
on Extending Database Technology (EDBT), Valencia, mon to the attributes in B). A
Spain.
A formal concept of the context (G,M,I) is a pair (A,B)
Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L.,
with AG,BM,
(1999), Efficient Mining of Association Rules Us-
ing Closed Item-set Lattices, Information Systems,
A=B and B=A
24(1):25-46
Stumme, G., (1999), Conceptual Knowledge Discovery A is called the extent and B is the intent of the concept
with Frequent Concept Lattices, FB4-Preprint 2043, (A,B).
TU Darmstadt
Wang, J., Han, J., & Pei, J., (2003), Closet+: Searching Frequent Closed Item-Set: An item-set X is a closed
for the Best Strategies for Mining Frequent Closed Item- item-set if there exists no item-set X such that:
sets, Proceedings of the 9th ACM SIGKDD International
i. X is a proper superset of X,
Conference on Knowledge Discovery and Data Mining,
ii. Every transaction containing X also contains
Pages 236-245, New York, USA, ACM Press.
X.
Zaki, M. J., (2000), Generating Non-Redundant Asso-
ciation Rules, Proceedings of the International Confer- A closed item-set X is frequent if its support exceeds
ence on Knowledge Discovery and Data Mining. the given support threshold.
Zaki, M.J., & Hsiao C.,J.,(2005), Efficient algorithms Galois Connection: Let D = (O,I,R) be a data
for mining closed item-sets and their lattice structure. mining context where O and I are finite sets of objects
IEEE Transactions on Knowledge and Data Engineer- (transactions) and items respectively. R O x I is a
ing, 17(4): 462-478. binary relation between objects and items. For O O,
and I I, we define as shown in Exhibit C.
Exhibit C.
f(O): 2O 2I g(I): 2I 2O
f(O) = (i I | o O, (o,i) R} g( I) = (o O | i I, (o,i) R}
Algorithms for Association Rule Mining
Ambient Intelligence A
Fariba Sadri
Imperial College London, UK
Kostas Stathis
Royal Holloway, University of London, UK
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Ambient Intelligence
located nearby. This person has left home without The demands that drive AmI and provide opportuni-
his medication and would like to find out where ties are for improvement of safety and quality of life,
to access similar drugs. He has asked his D-Me, enhancements of productivity and quality of products
in natural language, to investigate this. Dimitrios and services, including public services such as hospitals,
happens to suffer from a similar health problem schools, military and police, and industrial innovation.
and uses the same drugs. His D-Me processes the AmI is intended to facilitate human contact and com-
incoming request for information, and decides munity and cultural enhancement, and ultimately it
neither to reveal Dimitrios identity nor offer direct should inspire trust and confidence.
help, but to provide the elderly persons D-Me with Some of the technologies required for AmI are
a list of the closest medicine shops and potential summarised in Figure 1.
contact with a self-help group. AmI work builds on ubiquitous computing and sen-
2. Carmen plans her journey to work. It asks AmI, by sor network and mobile technologies. To provide the
voice command, to find her someone with whom intelligence and naturalness required, it is our view that
she can share a lift to work in half an hour. She then significant contributions can come from advances in
plans the dinner party she is to give that evening. artificial intelligence and agent technologies. Artificial
She wishes to bake a cake, and her e-fridge flashes intelligence has a long history of research on plan-
a recipe on the e-fridge screen and highlights the ning, scheduling, temporal reasoning, fault diagnosis,
ingredients that are missing. Carmen completes hypothetical reasoning, and reasoning with incomplete
her shopping list on the screen and asks for it to and uncertain information. All of these are techniques
be delivered to the nearest distribution point in her that can contribute to AmI where actions and decisions
neighbourhood. All goods are smart tagged, so she have to be taken in real time, often with dynamic and
can check the progress of her virtual shopping from uncertain knowledge about the environment and the
any enabled device anywhere, and make alterations. user. Agent technology research has concentrated on
Carmen makes her journey to work, in a car with agent architectures that combine several, often cogni-
dynamic traffic guidance facilities and traffic sys- tive, capabilities, including reactivity and adaptability,
tems that dynamically adjust speed limits depend- as well as the formation of agent societies through
ing on congestion and pollution levels. When she communication, norms and protocols.
returns home the AmI welcomes her and suggests Recent work has attempted to exploit these tech-
that on the next day she should telework, as a big niques for AmI. In (Augusto and Nugent 2004) the
demonstration is planned in downtown. use of temporal reasoning combined with active data-
COMPONENTS
AMBIENT INTELLIGENCE
SOFTWARE PLATFORM
Ambient Intelligence
bases are explored in the context of smart homes. In AMBIENT INTELLIGENCE FOR
(Sadri 2007) the use of temporal reasoning together INDEPENDENT LIVING A
with agents is explored to deal with similar scenarios,
where information observed in a home environment is One major use of AmI is to support services for in-
evaluated, deviations from normal behaviour and risky dependent living, to prolong the time people can
situations are recognised and compensating actions are live decently in their own homes by increasing their
recommended. autonomy and self-confidence. This may involve the
The relationship of AmI to cognitive agents is elimination of monotonous everyday activities, moni-
motivated by (Stathis and Toni 2004) who argue that toring and caring for the elderly, provision of security,
computational logic elevates the level of the system to or saving resources. The aim of such AmI applications
that of a user. They advocate the KGP agent model is to help:
(Kakas, et al 2004) to investigate how to assist a trav-
eller to act independently and safely in an unknown maintain safety of a person by monitoring his en-
environment using a personal communicator. (Augusto vironment and recognizing and anticipating risks,
et al 2006) address the process of taking decisions in and taking appropriate actions,
the presence of conflicting options. (Li and Ji 2005) provide assistance in daily activities and require-
offer a new probabilistic framework based on Bayesian ments, for example, by reminding and advising
Networks for dealing with ambiguous and uncertain about medication and nutrition, and
sensory observations and users changing states, in improve quality of life, for example by providing
order to provide correct assistance. personalized information about entertainment and
(Amigoni et al 2005) address the goal-oriented as- social activities.
pect of AmI applications, and in particular the planning
problem within AmI. They conclude that a combination This area has attracted a great deal of attention in
of centralised and distributed planning capabilities are recent years, because of increased longevity and the
required, due to the distributed nature of AmI and the aging population in many parts of the world. For such
participation of heterogeneous agents, with different an AmI system to be useful and accepted it needs to be
capabilities. They offer an approach based on the Hi- versatile, adaptable, capable of dealing with changing
erarchical Task Networks taking the perspective of a environments and situations, transparent and easy, and
multi-agent paradigm for AmI. even pleasant, to interact with.
The paradigm of embedded agents for AmI en- We believe that it would be promising to explore
vironments with a focus on developing learning and an approach based on providing an agent architec-
adaptation techniques for the agents is discussed in ture consisting of a society of heterogeneous, intel-
(Hagras et al 2004, and Hagras and Callaghan 2005). ligent, embedded agents, each specialised in one or
Each agent is equipped with sensors and effectors and more functionalities. The agents should be capable
uses a learning system based on fuzzy logic. A real AmI of sharing information through communication, and
environment in the form of an intelligent dormitory their dialogues and behaviour should be governed by
is used for experimentation. context-dependent and dynamic norms.
Privacy and security in the context of AmI appli- The basic capabilities for intelligent agents in-
cations at home, at work, and in the health, shopping clude:
and mobility domains are discussed in (Friedewald et
al 2007). For such applications they consider security Sensing: to allow the agent observe the environ-
threats such as surveillance of users, identity theft and ment
malicious attacks, as well as the potential of the digital Reactivity: to provide context-dependent dynamic
divide amongst communities and social pressures. behaviour and the ability to adapt to changes in
the environment
Planning: to provide goal-directed behaviour
Goal Decision: to allow dynamic decisions about
which goals have higher priorities
Ambient Intelligence
Action execution: to allow the agent to affect the ways the technology will be used to improve peoples
environment. lives. In this section we discuss new opportunities and
challenges for the integration of AmI with what people
All of these functionalities also require reasoning do in ordinary settings. We abstract away from hardware
about spatio-temporal constraints reflecting the envi- trends and we focus on areas that are software related
ronment in which an AmI system operates. and are likely to play an important role in the adoption
Most of these functionalities have been integrated of AmI technologies.
in the KGP model (Kakas et al, 2004), whose archi- A focal point is the observation that people discover
tecture is shown in Figure 2 and implemented in the and understand the world through visual and conver-
PROSOCS system (Bracciali et al, 2006). The use of sational interactions. As a result, in the coming years
reactivity for communication and dialogue policies we expect to see the design of AmI systems to focus in
has also been discussed in, for example, (Sadri et al, ways that will allow humans to interact in natural ways,
2003). The inclusion of normative behaviour has been using their common skills such as speaking, gesturing,
discussed in (Sadri et al, 2006) where we also consider glancing. This kind of natural interaction (Leibe et al
how to choose amongst different types of goals, depend- 2000) will complement existing interfaces and will
ing on the governing norms. For a general discussion require that AmI systems be capable of representing
on the importance of norms in artificial societies see virtual objects, possibly in 3D, as well as capture
(Pitt, 2005). peoples moves in the environment and identify which
KGP agents are situated in the environment via of these moves are directed to virtual objects.
their physical capabilities. Information received from We also expect to see new research directed towards
the environment (including other agents) updates the processing of sensor data with different information
agents state and provides input to its dynamic cycle (Massaro and Friedman 1990) and different kind of
theory, which, in turn, determines the next steps in terms formats such as audio, video, and RFID. Efficient
of its transitions, using its reasoning capabilities. techniques to index, search, and structure these data
and ways to transform them to the higher-level semantic
information required by cognitive agents will be an
FUTURE TRENDS important area for future work. Similarly, the reverse
of this process is likely to be of equal importance,
As most other information and communication tech- namely, how to translate high-level information to
nologies, AmI is not likely to be good or bad on its the lower-level signals required by actuators that are
own, but its value will be judged from the different situated in the environment.
Given that sensors and actuators will provide the
link with the physical environment, we also anticipate
further research to address the general linking of AmI
Figure 2. The architecture of a KGP agent systems to already existing computing infrastructures
such as the semantic web. This work will create hybrid
environments that will need to combine useful informa-
tion from existing wired technologies with information
from wireless ones (Stathis et al 2007). To enable the
creation of such environments we imagine the need
to build new frameworks and middleware to facilitate
integration of heterogeneous AmI systems and make
the interoperation more flexible.
Another important issue is how the human experi-
ence in AmI will be managed in a way that will be as
unobtrusive as possible. In this we foresee that develop-
ments in cognitive systems will play a very important
role. Although there will be many areas of cognitive
system behaviour that will need to be addressed, we
Ambient Intelligence
anticipate that development of agent models that adapt applications that can test such a combination is AmI
and learn (Sutton and Barto 1998), to be of great im- supporting independent living. For such applications A
portance. The challenge here will be how to integrate we have identified the trends that are likely to play an
the output of these adaptive and learning capabilities important role in the future.
to the reasoning and decision processes of the agent.
The resulting cognitive behaviour must differentiate
between newly learned concepts and existing ones, as REFERENCES
well as discriminate between normal behaviour and
exceptions. Augusto, J.C., Liu, J., Chen L. (2006). Using ambient
We expect that AmI will emerge with the formation intelligence for disaster management. In the Proceedings
of user communities who live and work in a particular of the 10th International Conference on Knowledge-
locality (Stathis et al 2006). The issue then becomes based Intelligent Information and Engineering Systems
how to manage all the information that is provided and (KES 2006), Springer Verlag.
captured as the system evolves. We foresee research to
address issues such as semantic annotations of content, Augusto, J.C., Nugent, C. D. (2004). The use of temporal
and partitioning and ownership of information. reasoning and management of complex events in smart
Linking in local communities with smart homes, homes. In Proceedings of the European Conference on
e-healthcare, mobile commerce, and transportation Artificial Systems (ECAI), 778-782.
systems will eventually give rise to a global AmI sys- Bracciali, A., Endriss, U., Demetriou, N., Kakas,
tem. For applications in such a system to be embraced A.C., Lu, L., Stathis, K. (2006). Crafting the mind
by people we will need to see specific human factors of PROSOCS agents. Applied Artificial Intelligence
studies to decide how unobtrusive, acceptable and 20(2-4), 105-131.
desirable the actions of the AmI environment seem
to people who use them. Some human factors studies Ducatel, K., Bogdanowicz, M., Scapolo, F., Leijten,
should focus on issues of presentation of objects and J., Burgelman J.-C. (2001). Scenarios for ambient in-
agents in a 3D setting, as well as on the important is- telligence in 2010. IST Advisory Group Final Report,
sues of privacy, trust and security. European Commission.
To make possible the customization of system in- Ducatel, K., Bogdanowicz, M., Scapolo, F., Leijten, J.,
teractions to different classes of users, it is required to Burgelman J.-C. (2003). Ambient intelligence : from
acquire and store information about these users. Thus vision to reality. IST Advisory Group Draft Report,
for people to trust AmI interactions in the future we European Commission.
must ensure that the omnipresent intelligent environ-
ment maintains privacy in an ethical manner. Ethical Dutton, W. H. (1999). Society on the line: information
or, better, normative behaviour cannot only be ensured politics in the digital age, Oxford, Oxford University
at the cognitive level (Sadri et al 2006), but also at the Press.
lower, implementation level of the AmI platform. In Friedewald M., Vildijiounaite, E., Punie, Y. Wright,
this context, ensuring that communicated information D. (2007). Privacy, identity and security in ambient
is encrypted, certified, and follows transparent security intelligence: a scenario analysis. Telematics and In-
policies will be required to build systems less vulner- formatics, 24, 15-29.
able to malicious attacks. Finally, we also envisage
changes to business models that would characterise Hagras, H., Callaghan, V., Colley, M., Clarke, G.,
AmI interactions (Hax and Wielde 2001). Pounds-Cornish, A., Duman, H. (2004). Creating an
ambient intelligence environment using embedded
agents. IEEE Intelligent Systems, 19(6), 12-20.
CONCLUSION Hagras, H. and Callaghan, V. (2005). An intelligent
fuzzy agent approach for realizing ambient intelligence
The successful adoption of AmI is predicated on the in intelligent inhabited environments. IEEE Transac-
suitable combination of ubiquitous computing, artificial tions on Systems, Man, and Cybernetics - Part A:
intelligence and agent technologies. A useful class of Systems and Humans, 35(1), 55-65.
Ambient Intelligence
Hax, A., and Wilde, D, II. (2001). The Delta Model Stathis, K., Kafetzoglou, S., Papavasiliou, S., and Bro-
discovering new sources of profitability in a ne- muri, S. (2007). Sensor network grids: agent environ-
tworked economy. European Management Journal. ments combined with QoS in wireless sensor networks.
9, 379-391. In Proceedings of the 3rd International Conference on
Autonomic and Autonomous Systems, IEEE. 47-52.
Kakas, A., Mancarella, P., Sadri, F. Stathis, K. Toni,
F. (2004). The KGP model of agency. In Proceedings Stathis, K. And Toni, F. (2004). Ambient intelligence
of European Conference on Artificial Intelligence, using KGP agents. Workshop at the Second European
33-37. Symposium on Ambient Intelligence, Lecture Notes
in Compuer Science 3295, 351-362.
Li, X. and Ji, Q. (2005). Active affective state de-
tection and user assistance with dynamic bayesian Sutton, R. S. and Barto, G. A. (1998). Reinforcement
networks. IEEE Transactions on Systems, Man, and learning: an introduction. MIT Press.
Cybernetics: Special Issue on Ambient Intelligence,
The Aware Home Initiative (2007), http://www.
35(1), 93-105.
cc.gatech.edu/fce/house/house.html.
Leibe, B., Starner, T., Ribarsky, W., Wartell, Z., Krum,
The Oxygen Project (2007), http://www.oxygen.lcs.
D., Singletary, B., and Hodges, L. (2000). The Per-
mit.edu.
ceptive Workbench: towards spontaneous and natural
interaction in semi-immersive virtual environments. The Sony Interaction Lab (2007), http://www.sonycsl.
IEEE Virtual Reality Conference (VR2000). 13-20. co.jp/IL/index.html.
Massaro, D. W., and D. Friedman. (1990). Models
of integration given multiple sources of information.
Psychological Review. 97, 225-252. TERMS AND DEFINITIONS
Pitt, J. (2005) The open agent society as a platform for
the user-friendly information society. AI Soc. 19(2), Artificial Societies: Complex systems consisting
123-158. of a, possibly large, set of agents whose interaction
are constrained by norms and the roles the agents are
Sadri, F., Stathis, K., and Toni, F. (2006). Normative responsible to play.
KGP agents. Journal of Computational and Mathema-
tical Organizational Theory. 12(2-3), 101-126. Cognitive Agents: Software agents endowed with
high-level mental attitudes, such as beliefs, goals and
Sadri, F. (2007). Ambient intelligence for care of the plans.
elderly in their homes. In Proceedings of the 2nd Work-
shop on Artificial Intelligent Techniques for Ambient Context Awareness: Refers to the idea that comput-
Intelligence (AITAmI 07), 62-67. ers can both sense and react according to the state of
the environment they are situated. Devices may have
Sadri, F., Toni, F., Torroni, P. (2003). Minimally intrusive information about the circumstances under which they
negotiating agents for resource sharing. In Proceedings are able to operate and react accordingly.
of the 8th International Joint Conference on Artificial
Intelligence (IJCAI 03), 796-801. Natural Interaction: The investigation of the re-
lationships between humans and machines aiming to
Stathis, K., de Bruijn, O., Spence, R. and Purcell, P. create interactive artifacts that respect and exploit the
(2006) Ambient intelligence: human-agent interactions natural dynamics through which people communicate
in a networked community. In Purcell, P. (ed) Networked and discover the real world.
Neighbourhoods: The Connected Community in Con-
text (Springer), 279-304. Smart Homes: Homes equipped with intelligent
sensors and devices within a communications infra-
structure that allows the various systems and devices
to communicate with each other for monitoring and
maintenance purposes.
0
Ambient Intelligence
INTRODUCTION may find several works that appeared even before the
ISTAG vision pointing in the direction of Ambient
The trend in the direction of hardware cost reduction Intelligence trends.
and miniaturization allows including computing de- In what concerns Artificial Intelligence (AI), Ambi-
vices in several objects and environments (embedded ent Intelligence is a new meaningful step in the evolution
systems). Ambient Intelligence (AmI) deals with a new of AI (Ramos, 2007). AI has closely walked side-by-side
world where computing devices are spread everywhere with the evolution of Computer Science and Engineer-
(ubiquity), allowing the human being to interact in ing. The building of the first artificial neural models and
physical world environments in an intelligent and un- hardware, with the Walter Pitts and Warren McCullock
obtrusive way. These environments should be aware work (Pitts & McCullock, 1943) and Marvin Minsky
of the needs of people, customizing requirements and and Dean Edmonds SNARC system correspond to the
forecasting behaviours. first step. Computer-based Intelligent Systems, like the
AmI environments may be so diverse, such as MYCIN Expert System (Shortliffe, 1976) or network-
homes, offices, meeting rooms, schools, hospitals, based Intelligent Systems, like AUTHORIZERs AS-
control centers, transports, touristic attractions, stores, SISTANT (Rothi, 1990) used by American Express for
sport installations, and music devices. authorizing transactions consulting several Data Bases
Ambient Intelligence involves many different disci- are the kind of systems of the second step of AI. From
plines, like automation (sensors, control, and actuators), the 80s Intelligent Agents and Multi-Agent Systems
human-machine interaction and computer graphics, have established the third step, leading more recently
communication, ubiquitous computing, embedded to Ontologies and Semantic Web. From hardware to
systems, and, obviously, Artificial Intelligence. In the the computer, from the computer to the local network,
aims of Artificial Intelligence, research envisages to from the local network to the Internet, and from the
include more intelligence in the AmI environments, Internet to the Web, Artificial Intelligence was on the
allowing a better support to the human being and the state of the art of computing, most of times a little bit
access to the essential knowledge to make better deci- ahead of the technology limits.
sions when interacting with these environments. Now the centre is no more in the hardware, or in
the computer, or even in the network. Intelligence must
be provided to our daily-used environments. We are
BACKGROUND aware of the push in the direction of Intelligent Homes,
Intelligent Vehicles, Intelligent Transportation Systems,
Ambient Intelligence (AmI) is a concept developed Intelligent Manufacturing Systems, even Intelligent
by the European Commissions IST Advisory Group Cities. This is the reason why Ambient Intelligence
ISTAG (ISTAG, 2001)(ISTAG, 2002). ISTAG believes concept is so important nowadays (Ramos, 2007).
that it is necessary to take a holistic view of Ambient Ambient Intelligence is not possible without Artifi-
Intelligence, considering not just the technology, but cial Intelligence. On the other hand, AI researchers must
the whole of the innovation supply-chain from sci- be aware of the need to integrate their techniques with
ence to end-user, and also the various features of the other scientific communities techniques (e.g. Automa-
academic, industrial and administrative environment tion, Computer Graphics, Communications). Ambient
that facilitate or hinder realisation of the AmI vision Intelligence is a tremendous challenge, needing the
(ISTAG, 2003). Due to the great amount of technolo- better effort of different scientific communities.
gies involved in the Ambient Intelligence concept we
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Ambient Intelligence Environments
There is a miscellaneous of concepts and tech- or are we dealing with a critical situation, or even an
nologies related with Ambient Intelligence. Ubiquitous emergency? In this Control Centre the intelligent alarm A
Computing, Pervasive Computing, Embedded Sys- processor will exhibit different outputs according to the
tems, and Context Awareness are the most common. identified situation (Vale, Moura, Fernandes, Marques,
However these concepts are different from Ambient Rosado, Ramos, 1997). Automobile Industry is also
Intelligence. investing in Context Aware systems, like near-accident
The concept of Ubiquitous Computing (UbiComp) detection. Human-Computer Interaction scientific com-
was introduced by Mark Weiser during his tenure as munity is paying lots of attention to Context Awareness.
Chief Technologist of the Palo Alto Research Center Context Awareness is one of the most desired concepts
(PARC) (Weiser, 1991). Ubiquitous Computing means to include in Ambient Intelligence, the identification
that we have access to computing devices anywhere in of the context is important for deciding to act in an
an integrated and coherent way. Ubiquitous Computing intelligent way.
was mainly driven by Communications and Comput- There are different views of the importance of other
ing devices scientific communities but now is involv- concepts and technologies in the Ambient Intelligence
ing other research areas. Ambient Intelligence differs field. Usually these differences are derived from the
from Ubiquitous Computing because sometimes the basic scientific community of the authors. ISTAG see
environment where Ambient Intelligence is considered the technology research requirements from different
is simply local. Another difference is that Ambient points of view (Components, Integration, System, and
Intelligence makes more emphasis on intelligence User/Person). In (ISTAG, 2003) the following ambient
than Ubiquitous Computing. However, ubiquity is a components are mentioned: smart materials; MEMS
real need today and Ambient Intelligence systems are and sensor technologies; embedded systems; ubiquitous
considering this feature. communications; I/O device technology; adaptive soft-
A concept that sometimes is seen as a synonymous ware. In the same document ISTAG refers the following
of Ubiquitous Computing is Pervasive Computing. intelligence components: media management and han-
According to Teresa Dillon, Ubiquitous Computing dling; natural interaction; computational intelligence;
is best considered as the underlying framework, the context awareness; and emotional computing.
embedded systems, networks and displays which are Recently Ambient Intelligence is receiving a
invisible and everywhere, allowing us to plug-and- significant attention from Artificial Intelligence Com-
play devices and tools, On the other hand, Pervasive munity. We may refer the Ambient Intelligence Work-
Computing, is related with all the physical parts of shops organized by Juan Augusto and Daniel Shapiro
our lives; mobile phone, hand-held computer or smart at ECAI2006 (European Conference on Artificial
jacket (Dillon, 2006). Intelligence) and IJCAI2007 (International Joint
Embedded Systems mean that electronic and Conference on Artificial Intelligence) and the Special
computing devices are embedded in current objects or Issue on Ambient Intelligence, coordinated by Carlos
goods. Today goods like cars are equipped with mi- Ramos, Juan Augusto and Daniel Shapiro to appear
croprocessors; the same is true for washing machines, in the March/April2008 issue of the IEEE Intelligent
refrigerators, and toys. Embedded Systems community Systems magazine.
is more driven by electronics and automation scientific
communities. Current efforts go in the direction to in-
clude electronic and computing devices in the most usual AMBIENT INTELLIGENT PROTOTyPES
and simple objects we use, like furniture or mirrors. AND SySTEMS
Ambient Intelligence differs from Embedded Systems
since computing devices may be clearly visible in AmI Here we will analyse some examples of Ambient Intel-
scenarios. However, there is a clear trend to involve ligence prototypes and systems, divided by the area of
more embedded systems in Ambient Intelligence. application.
Context Awareness means that the system is aware
of the current situation we are dealing with. An example
is the automatic detection of the current situation in a
Control Centre. Are we in presence of a normal situation
Ambient Intelligence Environments
Since the first experiences with NAVLAB 1 (Thorpe, AmI in Tourism and Cultural Heritage
Herbert, Kanade, Shafer, 1988) Carnegie Mellon Uni-
versity has developed several prototypes for Autono- Tourism and Cultural Heritage are good application
mous Vehicle Driving and Assistance. The last one, areas for Ambient Intelligence. Tourism is a grow-
NAVLAB 11, is an autonomous Jeep. Most of the car ing industry. In the past tourists were satisfied with
industry companies are doing research in the area of pre-defined tours, equal for all the people. However
Intelligent Vehicles for several tasks like car parking there is a trend in the customization and the same tour
assistance or pre-collision detection. can be conceived to adapt to tourists according their
Another example of AmI application is related preferences.
with Transports, namely in connection with Intelligent Immersive tour post is an example of such experi-
Transportation Systems (ITS). The ITS Joint Program of ence (Park, Nam, Shi, Golub, Van Loan, 2006). MEGA
the US Department of Transportation identified several is an user-friend virtual-guide to assist visitors in the
areas of applications, namely: arterial management; Parco Archeologico della Valle del Temple in Agrigento,
freeway management; transit management; incident an archaeological area with ancient Greek temples in
management; emergence management; electronic pay- Agrigento, located in Sicily, Italy (Pilato, Augello,
ment; traveller information; information management; Santangelo, Gentile, Gaglio, 2006). DALICA has been
crash prevention and safety; roadway operations and used for constructing and updating the user profile of
management; road weather management; commercial visitors of Villa Adriana in Tivoli, near Rome, Italy
vehicle operations; and intermodal freight. In all these (Constantini, Inverardi, Mostarda, Tocchio, Tsintza,
application areas Ambient Intelligence can be used. 2007).
Several studies point to the aging of population dur- The human being spends considerable time in work-
ing the next decades. While being a good result of ing places like offices, meeting rooms, manufacturing
increasing of life expectation, this also implies some plants, control centres.
Ambient Intelligence Environments
SPARSE is a project initially created for helping FlyMaster NAV+ is a free-flight on-board pilot As-
Power Systems Control Centre Operators in the diagno- sistant (e.g. gliding, paragliding), using the FlyMaster A
sis and restoration of incidents (Vale, Moura, Fernandes, F1 module with access to GPS and sensorial informa-
Marques, Rosado, Ramos, 1997). It is a good example tion. FlyMaster Avionics S.A., a spin-off, was created
of context awareness since the developed system is to commercialize these products (see figure 2).
aware of the on-going situation, acting in different
ways according the normal or critical situation of the
power system. This system is evolving for an Ambient AMBIENT INTELLIGENCE PLATFORMS
Intelligence framework applied to Control Centres.
Decision Making is one of the most important Some companies and academic institutions are invest-
activities of the human being. Nowadays decisions ing in the creation of Ambient Intelligence generation
imply to consider many different points of view, so platforms.
decisions are commonly taken by formal or informal The Endeavour project is developed by the California
groups of persons. Groups exchange ideas or engage University in Berkeley (http://endeavour.cs.berkeley.
in a process of argumentation and counter-argumenta- edu/). The project aims to specify, design, and imple-
tion, negotiate, cooperate, collaborate or even discuss ment prototypes at a planet scale, self organized and
techniques and/or methodologies for problem solving. involving an adaptive Information Utility.
Group Decision Making is a social activity in which Oxygen enables pervasive human centred comput-
the discussion and results consider a combination of ing through a combination of specific user and system
rational and emotional aspects. ArgEmotionAgents technologies (http://www.oxygen.lcs.mit.edu/). This
is a project in the area of the application of Ambient project provides speech and vision technologies en-
Intelligence in the group argumentation and decision abling us to communicate with Oxygen as if we were
support considering emotional aspects and running in interacting with another person, saving much time and
the Laboratory of Ambient Intelligence for Decision effort (Rudolph, 2001).
Support (LAID), seen in Figure 1 (Marreiros, Santos, The Portolano project was developed in the Uni-
Ramos, Neves, Novais, Machado, Bulas-Cruz, 2007), versity of Washington and seeks to create a testbed for
a kind of an Intelligent Decision Room. This work has research into the emerging field of invisible computing
also a part involving ubiquity support. (http://portolano.cs.washington.edu/). The invisible
computing is possible with devices so highly optimized
AmI in Sports to particular tasks that they bend into the world and
require little technical knowledge from the users (Esler,
Sports involve high-level athletes and many more prac- Hightower, Anderson, Borrielo, 1999).
titioners. Many sports are done without any help of the The EasyLiving project of Microsoft Research
associated devices, opening here a clear opportunity Vision Group corresponds to a prototype architecture
for Ambient Intelligence to create sports assistance and associated technologies for building intelligent
devices and environments. environments (Brumitt, Meyers, Krumm, Kern, Shafer,
Ambient Intelligence Environments
2000). EasyLiving goal is to facilitate the interaction in the Elderly Care. Patients will receive this support to
of people with other people, with computer, and with allow a more autonomous life in their homes. However
devices (http://research.microsoft.com/easyliving/). automatic acquisition of vital signals (e.g. blood pres-
sure, temperature) will allow to do automatic emergency
calls when the patient health is in significant trouble.
FUTURE TRENDS The person monitoring will also be done in his/her
home, trying to detect differences in expected situa-
Ambient Intelligence deals with a futuristic notion for tions and habits.
our lives. Most of the practical experiences concerning The home support will achieve the normal personal
Ambient Intelligence are still in a very incipient phase, and family life. Intelligent Homes will be a reality. The
due to the recent existence of this concept. Today, it is home residents will pay less attention to normal home
not clear the separation between the computer and the management aspects, for example, how many bottles
environments. However, for new generations things will of red wine are available for the week meals or if the
be more transparent, and environments with Ambient specific ingredients for a cake are all available.
Intelligence will be more widely accepted. AmI for job support are also expected. Decision
In the area of transport, AmI will cover several Support Systems will be oriented to on-the-job envi-
aspects. The first will be related with the vehicle itself. ronments. This will be clear in offices, meeting rooms,
Several performances start to be available, like the au- call centres, control centres, and plants.
tomatic identification of the situation (e.g. pre-collision
identification, identification of the driver conditions).
Other aspects will be related with the traffic information. CONCLUSION
Today, GPS devices are generalized, but they deal with
static information. Joining on-line traffic conditions This article presents the state of the art in which con-
will enable the driver to avoid roads with accidents. cerns Ambient Intelligence field. After the history of
Technology is giving good steps in the direction of the concept, we established some related concepts
automatic vehicle driving. But in the near future the definitions and illustrated with some examples. There
developed systems will be seen more like driver as- is a long way to follow in order to achieve the Ambi-
sistants in spite of autonomous driving systems. ent Intelligence concept, however in the future, this
Another area where AmI will experience a strong concept will be referred as one of the landmarks in the
development will be the area of Health Care, especially Artificial Intelligence development.
Ambient Intelligence Environments
Ambient Intelligence Environments
L. Venkata Subramaniam
IBM Research, India Research Lab, India
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Analytics for Noisy Unstructured Text Data I
transcripts etc. Agents are expected to summarize an from documents, each document is converted into a
interaction as soon as they are done with it and before document vector. Documents are represented in a vec-
picking up the next one. As the agents work under im- tor space; each dimension of this space represents a
mense time pressure hence the summary logs are very single feature and the importance of that feature in that
poorly written and sometimes even difficult for human document gives the exact distance from the origin. The
interpretation. Analysis of such call logs are important simplest representation of document vectors uses the
to identify problem areas, agent performance, evolving binary event model, where if a feature j V appears in
problems etc. document di, then the jth component of di is 1 otherwise
In this chapter we will be focussing on automatic it is 0. One of the most popular statistical classification
classification of noisy text. Automatic text classifica- techniques is naive Bayes (McCallum, 1998). In the
tion refers to segregating documents into different naive Bayes technique the probability of a document
topics depending on content. For example, categorizing di belonging to class c is computed as:
customer emails according to topics such as billing
problem, address change, product enquiry etc. It has Pr( c, d )
Pr( c | d ) =
important applications in the field of email categori- Pr( d )
zation, building and maintaining web directories e.g.
DMoz, spam filter, automatic call and email routing
in contact center, pornographic material filter and so = Pr( c ) Pr( d | c )
Pr( d )
on.
Pr( c ) Pr( d | c )
00
Analytics for Noisy Unstructured Text Data I
and negative documents. The rule is then added to the applying statistical information and general linguistic
rule set. The remaining documents are used to build knowledge followed by dimensionality reduction A
other rules in the next iteration. by linear transformation. The actual categorization
In both statistical as well as rule based text clas- system is based on minimum least-squares approach.
sification techniques, the content of the text is the sole The system is evaluated on the tasks of categorizing
determiner of the category to be assigned. However abstracts of paper-based German technical reports and
noise in the text distorts the content and hence read- business letters concerning complaints. Approximately
ers can expect the categorization performance to get 80% classification accuracy is obtained and it is seen
affected by noise in the text. Classifiers are essentially that the system is very robust against recognition or
trained to identify correlation between extracted features typing errors.
(words) with different categories which can be later Issues with categorizing OCRed documents are also
utilized to categorize new documents. For example, discussed by many other authors (Brooks & Teahan,
words like exciting offer get a free laptop might have 2007), (Hoch, 1994) and (Taghva et. al., 2001).
stronger correlation with category spam emails than
non-spam emails. Noise in text distorts this feature Categorization of ASRed Documents
space excitinng ofer get frree lap top will be new set
of features and the categorizer will not be able to re- Automatic Speech Recognition (ASR) is simply the
late it to the spam emails category. The feature space process of converting an acoustic signal to a sequence of
explodes as the same feature can appear in different words. Researchers have proposed different techniques
forms due to spelling errors, poor recognition, wrong for speech recognition tasks based on Hidden Markov
transcription, etc. In the remaining part of this section model (HMM), neural networks, Dynamic time warp-
we will give an overview how people have approached ing (DTW) (Trentin & Gori, 2001). The performance
the problem of categorizing noisy text. of an ASR system is typically measured in terms of
Word Error Rate (WER), which is derived from the
Categorization of OCRed Documents Levenshtein distance, working at word level instead
of character. WER can be computed as
Electronically recognized handwritten documents and
documents generated from OCR process are typical
examples of noisy text because of the errors introduced S+D+I
WER =
by the recognition process. Vinciarelli (Vinciarelli, N
2004) has studied the characteristics of noise present
in such data and its effects on categorization accuracy. where S is the number of substitutions, D is the number
A subset of documents from the Reuters-21578 text of the deletions, I is the number of the insertions, and
classification dataset were taken and noise was intro- N is the number of words in the reference. Bahl et.al.
duced using two methods: first a subset of documents (Bahl et. al. 1995) have built an ASR system and dem-
were manually written and recognized using an offline onstrated its capability on benchmark datasets.
handwriting recognition system. In the second the OCR ASR systems give rise to word substitutions, dele-
based extraction process was simulated by randomly tions and insertions, while OCR systems produce es-
changing a certain percentage of characters. According sentially word substitutions. Moreover, ASR systems
to them for recall values up to 60-70 percent depending are constrained by a lexicon and can give as output only
on the sources, the categorization system is robust to words belonging to it, while OCR systems can work
noise even when the Term Error Rate is higher than without a lexicon (this corresponds to the possibility
40 percent. It was also observed that the results from of transcribing any character string) and can output
the handwritten data appeared to be lower than those sequences of symbols not necessarily corresponding
obtained from OCR simulations. Generic systems to actual words. Such differences are expected to have
for text categorization based on statistical analysis of strong influence on performance of systems designed
representative text corpora have been proposed (Bayer for categorizing ASRed documents in comparison to
et. al., 1998). Features are extracted from training texts categorization of OCRed documents. A lot of work on
by selecting substrings from actual word forms and automatic call type classification for the purpose of
0
Analytics for Noisy Unstructured Text Data I
categorizing calls (Tang et al., 2003), call routing (Kuo selection typically employs some statistical measures
and Lee, 2003; Haffner et al., 2003), obtaining call log over the training corpus and ranks features in order of
summaries (Douglas et al., 2005), agent assisting and the amount of information (correlation) they have with
monitoring (Mishne et al., 2005) has appeared in the respect to the class labels of the classification task at
past.Here calls are classified based on the transcription hand. After the feature set has been ranked, the top
from an ASR system. One interesting work on seeing few features are retained (typically order of hundreds
effect of ASR noise on text classification was done on or a few thousand) and the others are discarded. Feature
a subset of benchmark text classification dataset Re- selection should be able to eliminate wrongly spelt
uters-215782 (Agarwal et. al., 2007). They read out and words present in the training data provided (i) the
automatically transcribed 200 documents and applied a proportion of wrongly spelt words is not very large
text classifier trained on clean Reuters-21578 training and (ii) there is no regular pattern in spelling errors4.
corpus3. Surprisingly, in spite of high degree of noise, However it has been observed, even at high degree
they did not observe much degradation in accuracy. of spelling errors the classification accuracy does not
suffer much (Agarwal et al., 2007).
Effect of Spelling Errors on Rule based classification techniques also get nega-
Categorization tively affected by spelling errors. If the training data
contains spelling errors then some of the rules may
Spelling errors are an integral part of written textelec- not get the required statistical significance. Due to
tronic as well as non-electronic. Every reader reading spelling errors present in the test data a valid rule may
this book must have been scolded by their teacher not fire and worse, an invalid rule may fire leading to
in school for spelling words wrongly! In this era of a wrong categorization. Suppose RIPPER has learnt
electronic text people have become less careful while a rule set like:
writing resulting poorly written text containing ab-
breviations, short forms, acronyms, wrong spellings. Assign category sports IF
Such electronic text documents including email, chat (the document contains {\it sports}) OR
log, postings, SMSs are sometimes difficult to interpret (the document contains {\it exercise} AND {\it out-
even for human beings. It goes without saying that text door}) OR
analytics on such noisy data is a non trivial task. (the document contains {\it exercise} but not {\it home-
Wrong spellings can affect automatic classification work} {\it exam}) OR
performance in multiple ways depending on the nature (the document contains {\it play} AND {\it rule}) OR
of the classification technique being used. In the case
of statistical techniques, spelling differences distort the
feature space. If training as well as the test data corpus A hypothetical test document containing repeated
are noisy, while learning the model the classifier will occurrences of exercise, but each time wrongly spelt as
treat variants of the same words as different features. exarcise, will not be categorized to the sports category
As a result the observed joint probability distribution and hence lead to misclassification.
will be different from the actual distribution. If the
proportion of wrongly spelt words is high then the
distortion can be significant and will hurt the accuracy CONCLUSION
of the resultant classifier. However, if the classifier
is trained on a clean corpus and the test documents In this chapter we have looked at noisy text analytics.
are noisy, then wrongly spelt words will be treated as This topic is gaining in importance as more and more
unseen words and will not help in classification. In an noisy data gets generated and needs processing. In
unlikely situation a wrongly spelt word present in a particular we have looked at techniques for correcting
test document may become a different valid feature noisy text and for doing classification. We have pre-
and worse, may become a valid indicative feature of sented a survey of existing techniques in the area and
a different class. A standard technique in the text clas- have shown that even though it is a difficult problem
sification process is feature selection which happens it is possible to address it with a combination of new
after feature extraction and before training. Feature and existing techniques.
0
Analytics for Noisy Unstructured Text Data I
0
Analytics for Noisy Unstructured Text Data I
ENDNOTES
1
According to http://www.mrc-cbu.cam.ac.uk/
%7Emattd/Cmabrigde/, this is an internet hoax.
However we found it interesting and hence in-
cluded here.
2
http://www.daviddlewis.com/resources/testcol-
lections/
3
This dataset is available from http://kdd.ics.uci.
edu/databases/reuters_transcribed/reuters_tran-
scribed.html
4
Note: this assumption may not hold true in the
case of cognitive errors
0
0
Shourya Roy
IBM Research, India Research Lab, India
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Analytics for Noisy Unstructured Text Data II
applications such as semantic role labelling (Sang et. Named Entity Annotation of Web Posts
al., 2005).
There is also recent work on correcting the output of Extraction of named entities is a key IE task. It seeks
SMS text (Aw et. al., 2006) (Choudhury et. al., 2007), to locate and classify atomic elements in the text into
OCR errors (Nartker et. al., 2003) and ASR errors predefined categories such as the names of persons, or-
(Sarma & Palmer, 2004). ganizations, locations, expressions of times, quantities,
monetary values, percentages, etc. Entity recognition
systems either use rule based techniques or statistical
INFORMATION EXTRACTION FROM models. Typically a parser or a parts of speech tagger
NOISy TEXT identifies elements such as nouns, noun phrases, or
pronouns. These elements along with surface forms
The goal of Information Extraction (IE) is to automati- of the text are used to define templates for extract-
cally extract structured information from the unstruc- ing the named entities. For example, to tag company
tured documents. The extracted structured information names it would be desirable to look at noun phrases
has to be contextually and semantically well-defined that contain the words company or incorporated in
data from a given domain. A typical application of IE is them. These rules can be automatically learnt using
to scan a set of documents written in natural language a tagged corpus or could be defined manually. Most
and populate a database with the information extracted. known approaches do this on clean well formed text.
The MUC (Message Understanding Conference) con- However, named entity annotation of web posts such
ference was one effort at codifying the IE task and as online classifieds, product listings etc. is harder be-
expanding it (Chinchor, 1998). cause these texts are not grammatical or well written.
There are two basic approaches to the design of IE In such cases reference sets have been used to annotate
systems. One comprises the knowledge engineering parts of the posts (Michelson & Knoblock, 2005). The
approach where a domain expert writes a set of rules reference set is thought of as a relational set of data
to extract the sought after information. Typically the with a defined schema and consistent attribute values.
process of building the system is iterative whereby a Posts are now matched to their nearest records in the
set of rules is written, the system is run and the output reference set. In the biological domain gene name an-
examined to see how the system is performing. The notation, even though it is performed on well written
domain expert then modifies the rules to overcome any scientific articles, can be thought of in the context of
under- or over-generation in the output. The second noise, because many gene names overlap with common
is the automatic training approach. This approach is English words or biomedical terms. There have been
similar to classification where the texts are appropriately studies on the performance of the gene name annotator
annotated with the information being extracted. For when trained on noisy data (Vlachos, 2006).
example, if we would like to build a city name extractor,
then the training set would include documents with all Information Extraction from OCRed
the city names marked. An IE system would be trained Documents
on this annotated corpus to learn the patterns that would
help in extracting the necessary entities. Documents obtained from OCR may have not only
An information extraction system typically consists unknown words and compound words, but also incor-
of natural language processing steps such as morpho- rect words due to OCR errors. In their work Miller
logical processing, lexical processing and syntactic et. al. (Miller et. al., 2000) have measured the effect
analysis. These include stemming to reduce inflected of OCR noise on IE performance. Many IE methods
forms of words to their stem, parts of speech tagging work directly on the document image to avoid errors
to assign labels such as noun, verb, etc. to each word resulting from converting to text. They adopt keyword
and parsing to determine the grammatical structure of matching by searching for string patterns and then use
sentences. global document models consisting of keyword models
and their logical relationships to achieve robustness
in matching (Lu & Tan, 2004). The presence of OCR
errors has a detrimental effect on information access
0
Analytics for Noisy Unstructured Text Data II
0
Analytics for Noisy Unstructured Text Data II
R. Golding & Y. Schabes (1996). Combining Tri- Sarma & D. Palmer (2004). Context-based Speech Rec-
gram-Based and Feature-Based Methods for Context- ognition Error Detection and Correction. In Proceed-
Sensitive Spelling Correction. Proceedings of the ings of the Human Language Technology Conference
Thirty-Fourth Annual Meeting of the Association for of the North American Chapter of the Association for
Computational Linguistics (7178). Computational Linguistics: HLT-NAACL 2004.
K. Kukich (1992). Technique for Automatically Correc- K. Taghva, T. Narkter & J. Borsack (2004). Information
ting Words in Text. ACM Computing Survey. Volume Access in the Presence of OCR Errors. ACM Hardcopy
24 (4) (377439). Document Processing Workshop, Washington, DC,
USA. (1-8)
Y. Lu & C. L. Tan (2004). Information Retrieval in
Document Image Databases. IEEE Transactions on K. Taghva, T. Narkter, J. Borsack, Lumos. S., A. Condit,
Knowledge and Data Engineering. Vol 16, No. 11. & Young (2001). Evaluating Text Categorization in
(1398-1410) the Presence of OCR Errors. In Proceedings of IS&T
SPIE 2001 International Symposium on Electronic
L. Mangu & E. Brill (1997). Automatic Rule Acquisi-
Imaging Science and Technology, (68-74).
tion for Spelling Correction. Proc. 14th International
Conference on Machine Learning. (187194). E. M. Zamora, J. J. Pollock, & A. Zamora (1983).
The Use of Trigram Analysis for Spelling Error De-
M. Michelson & C. A. Knoblock (2005). Semantic
tection. Information Processing and Management 17.
Annotation of Unstructured and Ungrammatical Text.
305-316.
In Proceedings of the International Joint Conference
on Artificial Intelligence.
D. Miller, S. Boisen, R. Schwartz, R. Stone & R. Wei-
KEy TERMS
schedel (2000). Named Entity Extraction from Noisy
Input: Speech and OCR. Proceedings of the Sixth Con-
ference on Applied Natural Language Processing. Automatic Speech Recognition: Machine recogni-
tion and conversion of spoken words into text.
T. Nartker, K. Taghva, R. Young, J. Borsack, and A.
Condit (2003). OCR Correction Based On Document Data Mining: The application of analytical methods
Level Knowledge. In Proc. IS&T/SPIE 2003 Intl. Symp. and tools to data for the purpose of identifying patterns,
on Electronic Imaging Science and Technology, volume relationships or obtaining systems that perform useful
5010, Santa Clara, CA. tasks such as classification, prediction, estimation, or
affinity grouping.
T. Nasukawa, D. Punjani, S. Roy, L. V. Subramaniam
& H. Takeuchi (2007). Adding Sentence Boundaries to Information Extraction: Automatic extraction of
Conversational Speech Transcriptions Using Noisily structured knowledge from unstructured documents.
Labeled Examples. In Proceedings of the IJCAI 2007
Knowledge Extraction: Explicitation of the internal
Workshop on Analytics for Noisy Unstructured Text
knowledge of a system or set of data in a way that is
Data (AND 2007), Hyderabad, India.
easily interpretable by the user.
S. Roy & L. V. Subramaniam (2006). Automatic Gen-
Noisy Text: Text with any kind of difference in the
eration of Domain Models for Call-Centers from Noisy
surface form, from the intended, correct or original
Transcriptions. In Proceedings of the Joint conference of
text.
the Association for Computational Linguistics and the
International Committee on Computational Linguistics Optical Character Recognition: Translation of
(ACL-COLING 2006), Sydney, Australia. images of handwritten or typewritten text (usually
captured by a scanner) into machine-editable text.
E. T. K. Sang, S. Canisius, A. van den Bosch & T.
Bogers (2005). Applying Spelling Error Correction Rule Induction: Process of learning, from cases
Techniques for Improving Semantic Role Labelling. or instances, if-then rule relationships that consist of
In Proceedings of CoNLL. an antecedent (if-part, defining the preconditions or
0
Analytics for Noisy Unstructured Text Data II
0
0
Alberto Curra
University of A Corua, Spain
M. Gloria Lpez
University of A Corua, Spain
Virginia Mato
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Angiographic Images Segmentation Techniques
Angiographic Images Segmentation Techniques
Methods Based on Differential Geometry their morphological features can be preserved and ir-
The methods that are based on differential geometry relevant elements eliminated. The main morphological
treat images as hypersurfaces and extract their fea- operations are the following:
tures using curvature and surface crests. The points of
hypersurfaces crest correspond to the central axis of Dilatation: Expands objects, fills up empty spaces,
the structure of a vessel. This method can be applied and connects disjunct regions.
to bidimensional as well as tridimensional images; Erosion: Contracts objects, separates regions.
angiograms are bidimensional images and are therefore Closure: Dilatation + Erosion.
modelled as tridimensional hypersurfaces. Opening: Erosion + Dilatation.
Examples of reconstructions can be found in Prinet "Top hat" transformation: Extracts the struc-
et al (Prinet, Mona & Rocchisani, 1995), who treat the tures with a linear shape
images as parametric surfaces and extract their features "Watershed transformation: "Inundates the
by means of surfaces and crests. image that is taken as a topographic map , and
extracts the parts that are not "flooded".
Correspondence Filters Methods
Eiho and Qian (Eiho & Qian, 1997) use a purely
The correspondence filter approach convolutes the morphological approach to define an algorithm that
image with multiple correspondence filters so as to consists of the following steps:
extract the regions of interest. The filters are designed
to detect different sizes and orientations. 1. Application of the top hat operator to emphasize
Poli and Valli (Poli, R & Valli, 1997) apply this the vessels
technique with an algorithm that details a series of 2. Erosion to eliminate the areas that do not cor-
multiorientation linear filters that are obtained as linear respond to vessels
combinations of Gaussian kernels. These filters are 3. Extraction of the tree from a point provided by
sensitive to different vessel widths and orientations. the user and on the basis of grey levels.
Mao et al (Mao, Ruan, Bruno, Toumoulin, Col- 4. Slimming down of the tree
lorec & Haigron, 1992) also use this type of filters in 5. Extraction of edges through watershed trans-
an algorithm based on visual perception models that formation
affirm that the relevant parts of the objects in images
with noise appear normally grouped.
MODEL-BASED TECHNIQUES
Morphological Mathematical Methods
These approaches use explicit vessel models to extract
Mathematical morphology defines a series of operators the vascular tree. They can be divided into four catego-
that apply structural elements to the images so that
Angiographic Images Segmentation Techniques
Angiographic Images Segmentation Techniques
A generalized cylinder (GC) is a solid whose central Contrary to the approaches based on pattern recognition,
axis is a 3D curve. Each point of that axis has a limited where local operators are applied to the entire image,
and closed section that is perpendicular to it. A CG is techniques based on arterial follow-up are based on the
therefore defined in space by a spatial curve or axis application of local operators in an area that presumibly
and a function that defines the section in that axis. The belongs to a vessel and that cover its length. From a
section is usually an ellipse. Tecnically, GCs should given point of departure the operators detect the central
be included in the parametric methods section, but the axis and, by analyzing the pixels that are orthogonal
work that has been done in this field is so extense that to the tracking direction, the vessels edges. There are
it deserves its own category. various methods to determine the central axis and the
The construction of the coronary tree model requires edges: some methods carry out a sequential track-
one single view to build the 2D tree and estimate the ing and incorporate connectivity information after a
sections. However, there is no information on the depth simple edge detection operation, other methods use
or the area of the sections, so a second projection will this information to sequentially track the contours.
be required. There are also approaches based on the intensity of
the crests, on fuzzy sets, or on the representation of
Angiographic Images Segmentation Techniques
graphs, where the purpose lies in finding the optimal are then used to formulate a hierarchy with which to
road in the graph that represents the image. create the model. This type of system does not offer
Lu and Eiho (Lu, Eiho, 1993) have described a any good results in arterial bifurcations or in arteries
follow-up algorithm for the vascular edges in angiog- with occlusions.
raphies that considers the inclusion of branches and Another approach (Stansfield, 1986) consists in
consists of three steps: formulating a rules-based Expert System to identify
the arteries. During the first phase, the image is pro-
1. Edge detection cessed without making use of domain knowledge to
2. Branch search extract segments of the vessels. It is only in the second
3. Tracking of sequential contours phase that domain knowledge on cardiac anatomy and
physiology is applied.
The user must provide the point of departure, the The latter approach is more robust than the former;
direction, and the search range. The edge points are but it presents the inconvencience of not combining all
evaluated with a differential smoothening operator in a the segments into one vascular structure.
line that is perpendicular to the direction of the vessel.
This operator also serves to detect the branches.
FUTURE TRENDS
TECHNIQUES BASED ON ARTIFICIAL It cannot be said that one technique has a more promising
INTELLIGENCE future than another, but the current tendency is to move
away from the abovementioned classical segmentation
Approaches based on Artificial Intelligence use high- algorithms towards 3D and even 4D reconstructions
level knowledge to guide the segmentation and delinea- of the coronary tree.
tion of vascular structures and sometimes use different Other lines of research focus on obtaining angio-
types of knowledge from various sources. graph images by means of new acquisition technologies
One possibility (Smets, Verbeeck, Suetens, & such as Magnetic Resonance, Computarized High
Oosterlinck, 1988) is to use rules that codify knowl- Speed Tomography, or two-armed angiograph de-
edge on the morphology of blood vessels; these rules vices that achieve two simultaneous projections in
Angiographic Images Segmentation Techniques
combination with the use of ultrasound intravascular skeletons. In Edoardo Ardizzone and Vito Di Gesu,
devices. This type of acquisition simplifies the creation editors, Proceedings of 11th International Conference
of tridimensional structures, either directly from the on Image Analysis and Processing (ICIAP 2001), 495-
acquisition or after a simple processing of the bidi- 500, Palermo, Italy, IEEE Computer Society.
mensional images.
Malladi, R., Sethian, J. A., & Vemuri, B. C. (1995).
Shape modeling with front propagation: a level set
approach. Pattern Analysis and Machine Intelligence,
REFERENCES
IEEE Transactions on, 17:158-175.
Brown, B. G., Bolson E., Frimer, M., & Dodge, H. T. Mao, F., Ruan, S.,Bruno, A., Toumoulin, C., Collorec,
(1977). Quantitative coronary arteriography. Circula- R., & Haigron, P. (1992). Extraction of structural
tion, 55:329-337. features in digital subtraction angiography. Biomedi-
cal Engineering Days, 1992.,Proceedings of the 1992
Brown, B. G., Bolson E., Frimer, M., & Dodge, H. T.
International, 166-169.
(1982). Arteriographic assessment of coronary athero-
sclerosis. Arteriosclerosis, 2:2-15. McInerney, T., & Terzopoulos, D.(1997). Medical
image segmentation using topologically adaptable
Eiho, S., & Qian, Y. (1997). Detection of coronary
surfaces. In CVRMedMRCAS 97: Proceedings of
artery tree using morphological operator. In Computers
the First Joint Conference on Computer Vision, Virtual
in Cardiology 1997, pages 525-528.
Reality and Robotics in Medicine and Medial Robotics
Gonzalez, R. C., & Woods, R. E. (1996). Digital Image and Computer-Assisted Surgery, 23-32, London, UK,
Proccessing. Addison-Wesley Publishing Company, Springer-Verlag.
Inc. Reading, Massachusets, USA.
OBrien, J. F., & Ezquerra, N. F. (1994). Automated
Greenes, R. A., & Brinkley, K. F. (2001). Imaging Sys- segmentation of coronary vessels in angiographic im-
tems. De Medical informatics: computer applications age sequences utilizing temporal, spatial and structural
in health care and biomedicine. Pp. 485 538. Second constraints. (Technical report), Georgia Institute of
Edition. 2001. Ed. Springer-Verlag. New York. USA. Technology.
Guo, D., & Richardson, P. (1998) . Automatic vessel Pappas, T. N, & Lim, J.S. (1984). Estimation of coronary
extraction from angiogram images. In Computers in artery boundaries in angiograms. Appl. Digital Image
Cardiology 1998, 441 - 444. Processing VII, 504:312-321.
Jain, R.C., Kasturi, R., & Schunck,B. G. (1995). Ma- Pellot, C., Herment, A., Sigelle, M., Horain, P., Maitre,
chine Vision.McGraw-Hill. H., & Peronneau, P. (1994). A 3d reconstruction of
vascular structures from two x-ray angiograms using
Kirbas, C. & Quek, F. (2004). A review of vessel an adapted simulated annealing algorithm. Medical
extraction techniques and algorithms. ACM Comput. Imaging, IEEE Transactions on, 13:48-60.
Surv., 36(2),81-121.
Petrocelli, R. R., Manbeck, K. M., & Elion, J. L. (1993).
Klein, A. K., Lee, F., & Amini, A. A. (1997). Quan- Three dimensional structure recognition in digital an-
titative coronary angiography with deformable spline giograms using gauss-markov methods. In Computers
models. IEEE Transactions on Medical Imaging, in Cardiology 1993. Proceedings., 101-104.
16(5):468-482
Poli, R., & Valli, G. (1997). An algorithm for real-time
Lu, S., & Eiho, S. (1993). Automatic detection of the vessel enhancement and detection. Computer Methods
coronary arterial contours with sub-branches from an and Programs in Biomedicine, 52:1-22.
x-ray angiogram.In Computers in Cardiology 1993.
Proceedings., 575-578. Prinet, V., Mona, O., & Rocchisani, J. M. (1995).
Multi-dimensional vessels extraction using crest lines.
Nystrm, I., Sanniti di Baja, G., & Svensson, S. (2001). In Engineering in Medicine and Biology Society, 1995.
Representing volumetric vascular structures using curve IEEE 17th Annual Conference, 1:393-394.
Angiographic Images Segmentation Techniques
Quek, F. H. K., & Kirbas, C. (2001). Simulated wave body. A CT obtains many images by rotating around
propagation and traceback in vascular extraction. In the body. A computer combines all these images into a A
Medical Imaging and Augmented Reality, 2001. Pro- final image that represents the bodycut like a slice.
ceedings. International Worksho, 229-234.
Expert System: Computer or computer program
Shmueli, K., Brody, W. R., & Macovski, A. (1983). that can give responses that are similar to those of an
Estimation of blood vessel boundaries in x-ray images. expert.
Opt. Eng., 22:110-116.
Segmentation: In computer vision, segmentation
Smets, C., Verbeeck, G., Suetens, P., & Oosterlinck, A. refers to the process of partitioning a digital image into
(1988). A knowledge-based system for the delineation multiple regions. The goal of segmentation is to sim-
of blood vessels on subtraction angiograms. Pattern plify and/or change the representation of an image into
Recogn. Lett., 8(2):113-121. something that is more meaningful and easier to analyze.
Image segmentation is typically used to locate objects
Stansfield, S. A. (1986). Angy: A rule-based expert
and boundaries (structures) in images, in this case, the
system for automatic segmentation of coronary vessels
coronary tree in digital angiography frames.
from digital subtracted angiograms. PAMI, 8(3):188-
199. Stenosis: A stenosis is an abnormal narrowing
in a blood vessel or other tubular organ or structure.
A coronary artery thats constricted or narrowed is
KEy TERMS called stenosed. Buildup of fat, cholesterol and other
substances over time may clog the artery. Many heart
Angiography: Image of blood vessels obtained by attacks are caused by a complete blockage of a vessel
any possible procedure. in the heart, called a coronary artery.
Artery: Each of the vessels that take the blood from Thresholding: A technique for the processing of
the heart to the other bodyparts. digital images that consists in applying a certain prop-
erty or operation to those pixels whose intensity value
Computerized Tomography: Exploration of X- exceeds a defined threshold.
rays that produces detailed images of axial cuts of the
M Isabel Martnez
University of A Corua, Spain
Manuel F. Herrador
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
ANN Application in the Field of Structural Concrete
information moves in one direction only (from input reverse proportion to strength attained. The science of
to output), and networks with partial or total feedback, rheology deals with the study of the behavior of fresh A
where information can flow in any direction. concrete. A variety of tests can be used to determine
Finally, learning rules and training type must be flowability of fresh concrete, the most popular amongst
defined. Learning rules are divided in supervised and them being the Abrams cone (Abrams, 1922) or slump
non-supervised (Brown & Harris, 1994) (Lin & Lee, cone test (Domone, 1998).
1996) and within the latter, self-organizing learning The second phase (and longest over time) is the
and reinforcement learning (Hoskins & Himmelblau, hardened phase of concrete, which determines the
1992). The type of training will be determined by the behavior of the structure it gives shape to, from the point
type of learning chosen. of view of serviceability (by imposing limitations on
cracking and compliance) and resistance to failure (by
An Introduction to Concrete (Material imposing limitations on the minimal loads that can be
and Structure) resisted, as compared to the internal forces produced by
external loading), always within the frame of sufficient
Structural concrete is a construction material created durability for the service life foreseen.
from the mixture of cement, water, aggregates and The study of structural concrete from every
additions or admixtures with diverse functions. The goal point of view has been undertaken following many
is to create a material with rock-like appearance, with different optics. The experimental path has been very
sufficient compressive strength and the ability to adopt productive, generating along the past 50 years a database
adequate structural shapes. Concrete is moldable during (with a tendency to scatter) which has been used to
its preparation phase, once the components have mixed sanction studies carried along the second and third
together go produce a fluid mass which conveniently path that follow. The analytical path also constitutes
occupies the cavities in a mould named formwork. After a fundamental tool to approach concrete behavior,
a few hours, concrete hardens thanks to the chemical both from the material and structural point of view.
hydration reaction experimented by cement, generating Development of theoretical behavior models goes back
a paste which envelops the aggregates and gives the to the early 20th century, and theoretical equations
ensemble the appearance of an artificial rock somewhat developed since have been corrected through testing
similar to a conglomerate. (as mentioned above) before becoming a part of codes
Hardened concrete offers good compressive and specifications. This method of analysis has been
strength, but very low tensile strength. This is why reinforced with the development of numerical methods
structures created with this material must be reinforced and computational systems, capable of solving a great
by use of steel rebars, configured by rods which are number of simultaneous equations. In particular, the
placed (before pouring the concrete) along the lines Finite Element Method (and other methods in the same
where calculation predicts the highest tensile stresses. family) and optimization techniques have brought
Cracking, which reduces the durability of the structure, a remarkable capacity to approximate behavior of
is thus hindered, and sufficient resistance is guaranteed structural concrete, having their results benchmarked in
with a very low probability of failure. The entirety may applications by the aforementioned experimental
formed by concrete and rebar is referred to as Structural testing.
Concrete (Shah, 1993). Three basic lines of study are thus available. Being
Two phases thus characterize the evolution of complementary between them, they have played
concrete in time. In the first phase, concrete must be a decisive role in the production of national and
fluid enough to ensure ease of placement, and a time international codes and rules which guide or legislate
to initial set long enough to allow transportation from the project, execution and maintenance of structural
plant to worksite. Flowability depends basically on concrete works. Concrete is a complex material, which
the type and quantity of the ingredients in the mixture. presents a number of problems for analytical study, and
Special chemical admixtures (such as plasticizers and so is an adequate field for the development of analysis
superplasticizers) guarantee flowability without grossly techniques based on neural networks (Gonzalez,
increasing the amount of water, whose ratio relative to Martnez and Carro, 2006)
the amount of cement (or water/cement ratio, w/c) is on
ANN Application in the Field of Structural Concrete
Application of Artificial Neural Networks to Network parameters used are as follows. The number
problems in the field of structural concrete has unfolded of training samples is 550; number of input layer neurons
in the past few years in two ways. On one hand, is 8; number of hidden layer neurons is 10; number of
analytical and structural optimization systems faster output layer neurons is 4; type of backpropagation is
than traditional (usually iterative) methods have been LevenbergMarquardt backpropagation; activation
generated starting with expressions and calculation function is sigmoidal function; learning rate; 0.01;
rules. On the other, the numerous databases created number of epochs is 3000; sum-square error achieved
form the large amount of tests published in the scientific is 0.08. The network had been tested with 50 samples
community have allowed for the development of very and yielded the average error of 6.1%.
powerful ANN which have thrown light on various Hadi studies various factors when choosing network
complex phenomena. In a few cases, specific designed architecture and backpropagation algorithm type. When
codes have been improved through the use of these two layers of hidden neurons are used, precision is not
techniques; some examples follow. improved while computation time is increased. The
number of samples depends on the complexity of the
Application of Artificial Neural Networks problem and the number of input and output parameters.
to Optimization Problems If a value is fixed for the input costs, there are no
noticeable precision improvements between training
Design of concrete structures is based on the the network with 200 or 1000 samples. When costs are
determination of two basic parameters: member introduced as input parameters, 100 samples are not
thickness (effective depth d, depth of a beam or slab enough to achieve convergence in training. Finally, the
section measured from the compression face to the training algorithm is also checked, studying the range
centroid of reinforcement) and amount of reinforcement between pure backpropagation (too slow for training),
(established as the total area As of steel in a section, backpropagation with momentum and with adaptive
materialized as rebars, or the reinforcement ratio, learning, backpropagation with LevenbergMarquardt
the ratio between steel area and concrete area in the updating rule and fast learning backpropagation. The
section). Calculation methods are iterative, since a latter is finally retained since it requires less time to
large number of conditions must be verified in the get the network to converge while providing very good
structure, and the aforementioned parameters are results (Demuth, H. & Beale, M.,1995)
fixed as a function of three basic conditions which are
sequentially followed: structural safety, maximum Application of Artificial Neural Networks
ductility at failure and minimal cost. Design rules, to Prediction of Concrete Physical
expressed through equations, allow for a first solution Parameters Measurable Through
which is corrected to meet all calculation scenarios, Testing: Concrete Strength and
finally converging when the difference between input Consistency
and output parameters are negligible.
In some cases it is possible to develop optimization Other neural network applications are supported by
algorithms, whose analytical formulation opens the way large experimental databases, created through years of
to the generation of a database. Hadi (Hadi, 2003) has research, which allow for the prediction of phenomena
performed this work for simply supported reinforced with complex analytical formulation.
concrete beams, and the expressions obtained after One of these cases is the determination of two basic
the optimization process determine the parameters concrete parameters: its workability when mixed,
specified above, while simultaneously assigning the cost necessary for ease of placement in concrete, and its
associated to the optimal solution (related to the cost compressive strength once hardened, which is basic
of materials and formwork). With these expressions, to the evaluation of the capacity of the structure.
Hadi develops a database with the following variables: The variables that necessarily determine these two
applied flexural moment (M), compressive strength parameters are the components of concrete: amounts of
of concrete (fc), steel strength (fy), section width (b), cement, water, fine aggregate (sand), coarse aggregate
section depth (h), and unit costs of concrete (Cc), steel (small gravel and large gravel), and other components
(Cs) and formwork (Cf). such as pozzolanic additions (which bring soundness
0
ANN Application in the Field of Structural Concrete
and delayed strength increase, especially in the case of Slump output (errors up to 25%). This is due to the
fly ash and silica fume) and admixtures (which fluidify fact that the relation between the chosen variables and A
the fresh mixture allowing the use of reduced amounts of strength is much stronger than in the case of slump,
water). There are still no analytical or numerical models which is influenced by other non-contemplated variables
that faithfully predict fresh concrete consistency (related (e. g. type and power of concrete mixer, mixing order
to flowability, and usually evaluated by the slump of components, aggregate moisture) and the method
of a molded concrete cone) or compressive strength for measurement of consistency, whose adequacy for
(determined by crushing of prismatic specimens in a the particular type of concrete used in the database is
press). questioned by some authors.
zta et al. (zta, Pala, zbay, Kanca, alar &
Bat, 2006) have developed a neural network from 187 Application of Artificial Neural Networks
concrete mixes, for which all parameters are know, using to the Development of Design Formulae
169 of them for training and 18, randomly selected, for and Codes
verification. Database variables are sometimes taken as
a ratio between them, since there is available knowledge The last application presented in this paper is the
about the dependency of slump and strength on such response analysis to shear forces in concrete beams.
parameters. The established range for the 7 parameter These forces generate transverse tensile stresses in
set is shown in Table 1. concrete beams which require placement of rebars
Network architecture, as determined by 7 input perpendicular to the beam axis, known as hoops or
neurons and two hidden layers of 5 and 3 neurons ties. Analytical determination of failure load from the
respectively. variables that intervene in this problem is very complex,
The back-propagation learning algorithm has been and in general most of the formulae used today are based
used in feed-forward two hidden-layers. The learning on experimental interpolations with no dimensional
algorithm used in the study is scaled conjugate gradients consistency. Cladera and Mar (Cladera & Mar, 2004)
algorithm (SCGA), activation function is sigmoidal have studied the problem through laboratory testing,
function, and number of epochs is 10,000. The prediction developing a neural network for the strength analysis
capacity of the network is better in the Compressive of beams with no shear reinforcement. They rely on a
Strength output (maximum error of 6%) than in the database compiled by Bentz (Bentz, 2000) and Kuchma
(Kuchma, 2002), where the variables are effective depth
(d), beam width (b, though introduced as d/b), shear
Table 1. Input parameter range span (a/d, see Figure 1), longitudinal reinforcement
ratio (l = As/bd) and compressive strength of concrete
Input parameters Minimum Maximum (fc). Of course, failure load is provided for each of
W/B (ratio, %)a the 177 tests found in the database. They use 147
W (kg/m)b 0
tests to train the network and 30 for verification, on
a one layer architecture with 10 hidden neurons and
s/a (ratio, %)c
a retropropagation learning mechanism. The ranges
FA (ratio, %) d
0 0
AE (kg/m)e 0.0 0.0
SF (ratio, %)f
SP (kg/m)g . . Table 2 Input parameter ranges
(a) [Water]/[binder] ratio, considering binder as the lump sum of
Parameter Minimum Maximum
cement, fly ash and silica fume
(b) Amount of water d(mm) 0. 00
(c) [Amount of sand]/[Total aggregate (sand+small gravel+large d/b 0. .
gravel)] (%) 0.0 .
(d) Percentage of cement substituted by fly ash
fc(MPa) . 0.
(e) Amount of air-entraining agent
(f) Percentage of cement substituted by silica fume a/d . .
(g) Amount of superplasticizer Vfail(kN) . .
ANN Application in the Field of Structural Concrete
Table 3. Comparison between available codes and proposed equations for shear strength.
Procedure ACI - ACI - MC-0 EC- AASHTO Eq.() Eq.()
Average . . . .0 . . .
Median . . . 0. . . .
S t a n d a r d
0. 0.0 0. 0. 0. 0. 0.
deviation
CoV (%) . . . .0 .0 . .
Minimum 0. 0. 0. 0. 0. 0. 0.
Maximum . . . . . . .
for the variables are shown on Table 2. Almost 8000 layers. The most commonly used training algorithms
iterations were required to attain best results. are descent gradient with momentum and adaptive
The adjustment provided by training presents an learning, and Levenberg-Marquardt.
average ratio Vtest/Vpred of 0.99, and 1.02 in validation. The biggest potential of ANNs is their capacity
The authors have effectively created a laboratory to generate virtual testing laboratories which
with a neural network, in which they test (within substitute with precision expensive real laboratory
parameter range) new beams by changing exclusively tests within the proper range of values. A methodical
one parameter each time. Finally, they come up with testing program throws light on the influence of
two alternative design formulae that improve noticeably the different variables in complex phenomena at
any given formula developed up to that moment. Table reduced cost.
3 presents a comparison between those two expressions The field of structural concrete counts upon
(named Eq. 7 and Eq. 8) and others found in a series extensive databases, generated through the years,
of international codes. that can be analyzed with this technique. An
effort should be made to compile and homogenize
these databases to extract the maximum possible
CONCLUSION knowledge, which has great influence on structural
safety.
The field of structural concrete shows great
potential for the application of neural networks.
Successful approaches to optimization, prediction ACKNOWLEDGMENT
of complex physical parameters and design
formulae development have been presented. This work was partially supported by the Spanish
The network topology used in most cases for Ministry of Education and Science (Ministerio de
structural concrete is forward-feed, multilayer with Educacin y Ciencia) (Ref BIA2005-09412-C03-01),
backpropagation, typically with one or two hidden grants (Ref. 111/2006/2-3.2) funded by the Spanish
ANN Application in the Field of Structural Concrete
Ministry of Enviroment ( Ministerio de Medio ambiente) learning. Computers and Chemical Engineering, vol.
and grants from the General Directorate of Research, 16(4). 241-251. A
Development and Innovation (Direccin Xeral de
Kuchma D. (1999-2002) Shear data bank. University
Investigacin, Desenvolvemento e Innovacin) of the
of Illinois, Urbana-Champaign.
Xunta de Galicia (Ref. PGIDT06PXIC118137PN).
The work of Juan L. Prez is supported by an FPI grant Lin, C.T. & Lee, C.S.(1996). Neural Fuzzy Systems:
(Ref. BES-2006-13535) from the Spanish Ministry A neuro-fuzzy synergism to intelligent systems.
of Education and Science (Ministerio de Educacin Prentice-Hall.
y Ciencia).
McCulloch, W. S. & Pitts, W. (1943). A Logical Calculus
of Ideas Immanent in Nervous Activity. Bulletin of
Mathematical Biophysics. (5). 115-133.
REFERENCES
zta, A. Pala, M. zbay E. Kanca E. alar N. & Bhatti
Abrams, D.A. (1922). Proportion Concrete Mixtures. M.A. (2006) Predicting the compressive strength and
Proceedings of the American Concrete Institute, 174- slump of high strength concrete using neural network.
181. Construction and Building Materials. (20). 769775.
Bentz, EC. (2000). Sectional analysis of reinforced Shah, SP. (1993). Recent trends in the science and
concrete members. PhD thesis, Department of Civil technology of concrete, concrete technology, new
Engineering, University of Toronto. trends, industrial applications. Proceedings of the
international RILEM workshop, London, E & FN
Brown, M. & Harris, C. (1994). Neurofuzzy adaptive
Spon. 118.
modelling and control. Prentice-Hall.
Wasserman, P. (1989) Neural Computing, Ed. Van
Cladera, A. & Mar, A.R. (2004). Shear design procedure
Nostrand Reinhold, New York.
for reinforced normal and high-strength concrete beams
using artificial neural networks. Part I: beams without
stirrups. Engineering Structures (26) 917926
Demuth, H. & Beale, M. (1995). Neural network toolbox KEy TERMS
for use with MATLAB. MA: The Mathworks, Inc.
Compression: Stress generated by pressing or
Domone, P.(1998). The Slump Flow Test for High- squeezing.
Workability Concrete. Cement and Concrete Research
(28-2), 177-182. Consistency: The relative mobility or ability of
freshly mixed concrete or mortar to flow; the usual
Gonzlez B. (2002). Hormigones con ridos reciclados measurement for concrete is slump, equal to the
procedentes de demoliciones: dosificaciones, subsidence measured to the nearest 1/4 in. (6 mm) of
propiedades mecnicas y comportamiento estructural a molded specimen immediately after removal of the
a cortante. PhD thesis, Department of Construction slump cone.
Technology, University of A Corua.
Ductility: That property of a material by virtue of
Gonzlez, B. Martnez, I. and Carro, D. (2006). which it may undergo large permanent deformation
Prediction of the consistency of concrete by means of without rupture.
the use of ANN. Artificial Neural Networks in Real-Life
Applications.Ed. Idea Group Inc. 188-200 Formwork: Total system of support for freshly
placed concrete including the mold or sheathing that
Hadi, M (2003). Neural networks applications in contacts the concrete as well as supporting members,
concrete structures. Computers and Structures (81) hardware, and necessary bracing; sometimes called
373381 shuttering in the UK.
Hoskins, J.C. & Himmelblau, D.M.(1992). Process Shear Span: Distance between a reaction and the
control via artificial neural networks and reinforcement nearest load point.
ANN Application in the Field of Structural Concrete
Juan Rabual
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
ANN Development with EC Tools
been published about different techniques in this area method with the shape of binary strings. In this way,
(Cant-Paz & Kamath, 2005). These techniques follow much work has emerged that codifies the values of the
the general strategy of an evolutionary algorithm: an weights by means of a concatenation of the binary values
initial population consisting of different genotypes, each which represent them (Whitley, Starkweather & Bogart,
one of them codifying different parameters (typically, 1990). The big advantage of these approximations is
the weight of the connections and / or the architecture their generality and that they are very simple to apply,
of the network and / or the learning rules), and is ran- i.e., it is very easy and quick to apply the operators of
domly created. This population is evaluated in order to uniform crossover and mutation on a binary string.
determine the fitness of each individual. Afterwards, The disadvantage of using this type of codification is
this population is repeatedly made to evolve by means the problem of permutation. This problem was raised
of different genetic operators (replication, crossover, upon considering that the order in which the weights
mutation, etc.) until a determined termination criteria are taken in the string causes equivalent networks to
is fulfilled (for example, a sufficiently good individual possibly correspond with totally different individuals.
is obtained, or a predetermined maximum number of This leads the crossing operator to become very inef-
generations is achieved). ficient. Logically, the weight value codification has
Essentially, the ANN generation process by means also emerged in the form of real number concatenation,
of evolutionary algorithms is divided into three main each one of them associated with a determined weight
groups: evolution of the weights, architectures, and (Greenwood 1997). By means of genetic operators
learning rules. designed to work with this type of codification, and
given that the existing ones for bit string cannot be
Evolution of Weights used here, several studies (Montana & Davis, 1989)
showed that this type of codification produces better
The evolution of the weights begins with a network with results and with more efficiency and scalability than
a predetermined topology. In this case, the problem is to the BP algorithm.
establish, by means of training, the values of the network
connection weights. This is generally conceived as a Evolution of the Architectures
problem of minimization of the network error, taken,
for example, as the result of the Mean Square Error of The evolution of the architectures includes the genera-
the network between the desired outputs and the ones tion of the topological structure; i.e., the topology and
achieved by the network. Most the training algorithms, connectivity of the neurons, and the transfer function
such as the backpropagation algorithm (BP) (Rumel- of each neuron of the network. The architecture of a
hart, Hinton & Williams, 1986), are based on gradient network has a great importance in order to success-
minimization. This has several drawbacks (Whitley, fully apply the ANNs, as the architecture has a very
Starkweather & Bogart, 1990), the most important is significant impact on the process capacity of the net-
that quite frequently the algorithm becomes stuck in work. In this way, on one hand, a network with few
a local minimum of the error function and is unable connections and a lineal transfer function may not be
of finding the global minimum, especially if the error able to resolve a problem that another network hav-
function is multimodal and / or non-differentiable. ing other characteristics (distinct number of neurons,
One way of overcoming these problems is to carry out connections or types of functions) would be able to
the training by means of an Evolutionary Algorithm resolve. On the other hand, a network having a high
(Whitley, Starkweather & Bogart, 1990); i.e., formulate number of non-lineal connections and nodes could be
the training process as the evolution of the weights in overfitted and learn the noise which is present in the
an environment defined by the network architecture training as an inherent part of it, without being able to
and the task to be done (the problem to be solved). discriminate between them, and in the end, not have a
In these cases, the weights can be represented in the good generalization capacity. Therefore, the design of
individuals genetic material as a string of binary values a network is crucial, and this task is classically carried
(Whitley, Starkweather & Bogart, 1990) or a string of out by human experts using their own experience, based
real numbers (Greenwood, 1997). Traditional genetic on trial and error, experimenting with a different set
algorithms (Holland, 1975) use a genotypic codification of architectures. The evolution of architectures has
ANN Development with EC Tools
been possible thanks to the appearance of constructive First, the parametric representations have to be
and destructive algorithms (Sietsma & Dow, 1991). In mentioned. The network can be represented by a set A
general terms, a constructive algorithm begins with of parameters such as the number of hidden layers,
a minimum network (with a small number of layers, the number of connections between two layers, etc.
neurons and connections) and successively adds new There are several ways of codifying these parameters
layers, nodes and connections, if they are necessary, inside the chromosome (Harp, Samad & Guha, 1989).
during the training. A destructive algorithm carries out Although the parametric representations can reduce the
the opposite operation, i.e., it begins with a maximum length of the chromosome, the evolutionary algorithm
network and eliminates unnecessary nodes and con- makes a search in a limited space within the possible
nections during the training. However, the methods searchable space that represents all the possible ar-
based on Hill Climbing algorithms are quite susceptible chitectures. Another type of non-direct codification is
into falling to a local minimum (Angeline, Suders & based on a representational system with the shape of
Pollack, 1994). grammatical rules (Yao & Shi, 1995). In this system,
In order to develop ANN architectures by means the network is represented by a set of rules, with shape
of an evolutionary algorithm, it is necessary to decide of production rules, which will build a matrix that
how to codify a network inside the genotype so it can represents the network.
be used by the genetic operators. For this, different Other types of codification, more inspired in the
types of network codifications have emerged. world of biology, are the ones known as growing
In the first codification method, direct codification, methods. With them, the genotype does not codify
there is a one-to-one correspondence between the genes the network any longer, but instead it contains a set of
and the phenotypic representation (Miller, Todd & instructions. The decodification of the genotype con-
Hedge, 1989). The most typical codification method sists of the execution of these instructions, which will
consists of a matrix C=(cij) of NxN size which repre- provoke the construction of the phenotype (Husbands,
sents an architecture of N nodes, where cij indicates the Harvey, Cliff & Miller, 1994). These instructions usu-
presence or absence of a connection between the i and ally include neural migrations, neuronal duplication or
j nodes. It is possible to use cij=1 to indicate a connec- transformation, and neuronal differentiation.
tion and cij=0 to indicate an absence of connection. In Finally, and within the indirect codification meth-
fact, cij could take real values instead of Booleans to ods, there are other methods which are very different
represent the value of the connection weight between from the ones already described. Andersen describes
neuron i and j, and in this way, architecture and a technique in which each individual of a population
connections can be developed simultaneously (Alba, represents a hidden node instead of the architecture
Aldana & Troya, 1993). The restrictions which are (Andersen & Tsoi, 1993). Each hidden layer is con-
required in the architectures can easily be incorporated structed automatically by means of an evolutionary
into this representational scheme. For example, a feed- process which uses a genetic algorithm. This method
forward network would have non-zero coefficients has the limitation that only feed-forward networks can
only in the upper right hand triangle of the matrix. be constructed and there is also a tendency for various
These types of codification are generally very simple nodes with a similar functionality to emerge, which
and easy to implement. However, they have a lot of inserts some redundancy inside the network that must
disadvantages, such as scalability, the impossibility be eliminated.
of codifying repeated structures, or permutation (i.e., One important characteristic is that, in general,
different networks which are functionally equivalent these methods only develop architectures, which is
can correspond with different genotypes) (Yao & Liu, the most common, or else architectures and weights
1998). together. The transfer function of each architecture
As a counterproposal to this type of direct codifi- node is assumed to have been previously determined
cation method, there are also the indirect codification by a human expert, and that it is the same for all of
types in existence. With the objective of reducing the the network nodes (at least, for all of the nodes of the
length of the genotypes, only some of the characteristics same layer), although the transfer function has been
of the architecture are codified into the chromosome. shown to have a great importance on the behaviour of
Within this type of codification, there are various types the network (Lovell & Tsoi, 1992). Few methods have
of representation.
ANN Development with EC Tools
ANN Development with EC Tools
Crosher D. (1993) The artificial evolution of a gener- Ribert A., Stocker E., Lecourtier Y. & Ennaji A. (1994)
alized class of adaptive processes. Preprints of AI93 Optimizing a Neural Network Architecture with an A
Workshop on Evolutionary Computation. 18-36. Adaptive Parameter Genetic Algorithm. Lecture
Notes in Computer Science. Springer-Verlag. (1240)
Greenwood G.W. (1997) Training partially recurrent
527-535.
neural networks using evolutionary strategies. IEEE
Trans. Speech Audio Processing. (5) 192-194. Rumelhart D.E., Hinton G.E. & Williams R.J. (1986)
Learning internal representations by error propaga-
Harp S.A., Samad T. & Guha A. (1989) Toward the
tion. Parallel Distributed Processing: Explorations
genetic synthesis of neural networks. Proc. 3rd Int.
in the Microstructures of Cognition. D. E. Rumelhart
Conf. Genetic Algorithms and Their Applications.
& J.L. McClelland, Eds. Cambridge, MA: MIT Press.
360-369.
(1) 318-362.
Haykin, S. (1999). Neural Networks (2nd ed.). Engle-
Sietsma J. & Dow R. J. F. (1991) Creating Artificial
wood Cliffs, NJ: Prentice Hall.
Neural Networks that generalize. Neural Networks.
Holland, J.J. (1975) Adaptation in natural and artifi- (4) 1: 67-79.
cial systems. Ann Arbor, MI: University of Michigan
Turney P., Whitley D. & Anderson R. (1996) Special
Press.
issue on the baldwinian effect. Evolutionary Computa-
Husbands P., Harvey I., Cliff D. & Miller G. (1994) tion. 4(3) 213-329.
The use of genetic algorithms for the development of
Whitley D., Starkweather T. & Bogart C. (1990)
sensorimotor control systems. From Perception to Ac-
Genetic algorithms and neural networks: Optimizing
tion. (P. Gaussier and JD Nicoud, eds.). Los alamitos
connections and connectivity. Parallel Comput., Vol.
CA: IEEE Press.
14, No 3. 347-361.
Kim H., Jung S., Kim T. & Park K. (1996) Fast learning
Yao X. & Shi Y. (1995) A preliminary study on design-
method for backpropagation neural network by evolu-
ing artificial neural networks using co-evolution. Proc.
tionary adaptation of learning rates. Neurocomputing,
IEEE Singapore Int. Conf. Intelligence Control and
11(1) 101-106.
Instrumentation. 149-154.
Lovell D.R. & Tsoi A.C. (2002) The Performance of the
Yao X. & Liu Y. (1998) Toward designing artificial
Neocognitron with various S-Cell and C-Cell Transfer
neural networks by evolution. Appl. Math. Computa-
Functions, Intell. Machines Lab., Dep. Elect. Eng.,
tion. vol. 91, no. 1, 83-90.
Univ. Queensland, Tech. Rep.
McCulloch W.S., & Pitts, W. (1943) A Logical Calculus
KEy TERMS
of Ideas Immanent in Nervous Activity. Bulletin of
Mathematical Biophysics. (5) 115-133.
Artificial Neural Networks: Interconnected set
Miller G.F., Todd P.M. & Hedge S.U. (1989) Designing of many simple processing units, commonly called
neural networks using genetic algorithms. Proceed- neurons, that use a mathematical model, that represents
ings of the Third International Conference on Genetic an input/output relation,
algorithms. San Mateo, CA: Morgan Kaufmann. 379-
Back-Propagation Algorithm: Supervised learn-
384.
ing technique used by ANNs, that iteratively modifies
Montana D. & David L. (1989) Training feed-forward the weights of the connections of the network so the
neural networks using genetic algorithms. Proc. 11th error given by the network after the comparison of the
Int. Joint Conf. Artificial Intelligence. San Mateo, CA: outputs with the desired one decreases.
Morgan Kaufmann. 762-767.
Evolutionary Computation: Set of Artificial In-
Rabual, J.R. & Dorado J. (2005) Artificial Neural telligence techniques used in optimization problems,
Networks in Real-Life Applications. Idea Group Inc. which are inspired in biologic mechanisms such as
natural evolution.
ANN Development with EC Tools
0
Vronique Amarger
University of Paris, France
Joel Bernier
SAGEM REOSC, France
Kurosh Madani
University of Paris, France
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
ANN-Based Defects Diagnosis of Industrial Optical Devices
from other microscopy techniques, have motivated our proposed method (Voiry, Houbre, Amarger, & Madani,
preference for this imaging technique. The first of them 2005) includes four phases:
is related to the higher sensitivity of this technique
comparing to the other classical microscopy techniques Pre-processing: DIC issued digital image trans-
(Dark Field, Bright Field) (Flewitt & Wild, 1994). formation in order to reduce lighting heterogene-
Furthermore, the DIC microscopy is robust regarding ity influence and to enhance the aimed defects
lighting non-homogeneity. Finally, this technology visibility,
provides information relative to depth (3-th dimen- Adaptive matching: adaptive process to match
sion) which could be exploited to typify roughness or defects,
defects depth. This last advantage offers precious ad- Filtering and segmentation: noise removal and
ditional potentiality to characterize scratches and digs defects outlines characterization.
flaws in high-tech optical devices. Therefore, Nomarski Defect image extraction: correct defect represen-
microscopy seems to be a suitable technique to detect tation construction.
surface imperfections.
On the other hand, since they have shown many Finally, the image associated to a given detected
attractive features in complex pattern recognition and gives an isolated (from other items) representation
classification tasks (Zhang, 2000) (Egmont-Petersen, of the defect (e.g. depicts the defect in its immediate
de Ridder, & Handels, 2002), artificial neural network environment), like depicted in Figure 2.
based techniques are used to solve difficult problems. But, information contained in such generated
In our particular case, the problem is related to the images is highly redundant and these images dont
classification of small defects on a great observations have necessarily the same dimension (typically this
surface. These promising techniques could however dimension can turn out to be hundred times as high).
encounter difficulties when dealing with high dimen- That is why this raw data (images) can not be directly
sional data. Thats why we are also interested in data processed and has first to be appropriately encoded,
dimensionality reducing methods. using some transformations. Such ones must naturally
be invariant with regard to geometric transformations
(translation, rotation and scaling) and robust regarding
DEFECTS DETECTION AND different perturbations (noise, luminance variation and
CLASSIFICATION background variation). Fourier-Mellin transformation
is used as it provides invariant descriptors, which are
The suggested diagnosis process is described in broad considered to have good coding capacity in classifica-
outline in the diagram of Figure 1. Every step is pre- tion tasks (Choksuriwong, Laurent, & Emile, 2005)
sented, first detection and data extraction phases and (Derrode, 1999) (Ghorbel, 1994). Finally, the processed
then classification phase coupled with dimensionality features have to be normalized, using the centring-re-
reduction. In a second part, some investigations on real ducing transformation. Providing a set of 13 features
industrial data are carried out and the obtained results using such transform, is a first acceptable compromise
are presented. between industrial environment real-time processing
constraints and defect image representation quality
Detection and Data Extraction (Voiry, Madani, Amarger, & Houbre, 2006).
ANN-Based Defects Diagnosis of Industrial Optical Devices
Figure 2. Images of characteristic items: (a) Scratch; (b) dig; (c) dust; (d) cleaning marks
A
Dimensionality Reduction 1
ECCA = (d ijn d ijp ) 2 F (d ijp )
2 i j i
To obtain a correct description of defects, we must (1)
consider more or less important number of Fourier-Mel-
lin invariants. But dealing with high-dimensional data Where dijn (respectively d ijp ) is the Euclidean distance
poses problems, known as curse of dimensionality between vectors xi and xj of considered distribution in
(Verleysen, 2001). First, sample number required to original space (resp. in projected space), and F is a
reach a predefined level of precision in approximation decreasing function which favours local topology with
tasks increases exponentially with dimension. Thus, respect to the global topology. This energy function is
intuitively, the sample number needed to properly minimized by stochastic gradient descent (Demartines
learn problem becomes quickly much too large to be & Hrault, 1995):
collected by real systems, when dimension of data
increases. Moreover surprising phenomena appear d ijn d ijp
i j , xip = (t ) u ( (t ) d ijp )( xip x jp ),
when working in high dimension (Demartines, 1994): d p
ij
for example, variance of distances between vectors
(2)
remains fixed while its average increases with the space
dimension, and Gaussian kernel local properties are
also lost. These last points explain that behaviour of a Where : + [0;1] and : + + are two de-
number of artificial neural network algorithms could creasing functions representing respectively a learning
be affected while dealing with high-dimensional data. parameter and a neighbourhood factor. CCA provides
Fortunately, most real-world problem data are located also a similar method to project, in continuous way, new
in a manifold of dimension p (the data intrinsic dimen- points in the original space onto the projected space,
sion) much smaller than its raw dimension. Reducing using the knowledge of already projected vectors.
data dimensionality to this smaller value can therefore But, since CCA encounters difficulties with unfold-
decrease the problems related to high dimension. ing of very non-linear manifolds, an evolution called
In order to reduce the problem dimensionality, we CDA has been proposed (Lee, Lendasse, Donckers,
use Curvilinear Distance Analysis (CDA). This tech- & Verleysen, 2000). It involves curvilinear distances
nique is related to Curvilinear Component Analysis (in order to better approximate geodesic distances on
(CCA), whose goal is to reproduce the topology of a the considered manifold) instead of Euclidean ones.
n-dimension original space in a new p-dimension space Curvilinear distances are processed in two steps way.
(where p<n) without fixing any configuration of the First is built a graph between vectors by consider-
topology (Demartines & Hrault, 1993). To do so, a ing k-NN, e, or other neighbourhood, weighted by
criterion characterizing the differences between original Euclidean distance between adjacent nodes. Then the
and projected space topologies is processed: curvilinear distance between two vectors is computed
as the minimal distance between these vectors in the
graph using Dijkstras algorithm. Finally the original
CCA algorithm is applied using processed curvilinear
ANN-Based Defects Diagnosis of Industrial Optical Devices
distances. This algorithm allows dealing with very experiments called A and B were carried out, using two
non-linear manifolds and is much more robust against different optical devices. Table 1 shows the different
the choices of a and l functions. parameters corresponding to these experiments. Its
It has been successfully used as a preliminary step important to note that, in order to avoid false classes
before maximum likelihood classification in (Lennon, learning, items images depicting microscopic field
Mercier, Mouchot, & Hubert-Moy, 2001) and we have boundaries or two (or more) different defects were
also showed its positive impact on neural network discarded from used database. Furthermore, studied
technique based classification performance (Voiry, optical devices were not specially cleaned, what ac-
Madani, Amarger, & Bernier, 2007). In this last paper, counts for the presence of some dusts and cleaning
we have first demonstrated that a synthetic problem marks. Items of these two databases were labelled by
(nevertheless defined from our real industrial data) an expert with two different labels: dust (class1) and
whose intrinsic dimensionality is two, is better treated other defects (class -1). Table 1 shows also items
by MLP after 2D dimension reduction than in its raw repartition between the two defined classes.
expression. We have also showed that CDA performs Using this experimental set-up, classification experi-
better for this problem than CCA and Self Organizing ment was performed. It involved a multilayer perceptron
Map pre-processing. with n input neurons, 35 neurons in one hidden layer,
and 2 output neurons (n-35-2) MLP. First this artificial
Implementation on Industrial Optical neural network was trained for discrimination task be-
Devices tween classes 1 and -1, using database B. This training
phase used BFGS (Broyden, Fletcher, Goldfarb, and
In order to validate the above-presented concepts and Shanno) with Bayesian regularization algorithm, and
to provide an industrial prototype, an automatic control was achieved 5 times. Subsequently, the generaliza-
system has been realized. It involves an Olympus B52 tion ability of obtained neural network was processed
microscope combined with a Corvus stage, which al- using database A. Since database A and B issued from
lows scanning an entire optical component (presented different optical devices, such generalization results
in Figure 3). 50x magnification is used, that leads to are significant. Following this procedure, 14 different
microscopic 1.77 mm x 1.33 mm fields and 1.28 m x experiments were conducted with the aim of studying
1.28 m sized pixels. The proposed image processing the global classification performance and the impact
method is applied on-line. A post-processing software of CDA dimensionality reduction on this performance.
enables to collect pieces of a defect that are detected in First experiment used original Fourrier-Mellin issued
different microscopic fields (for example pieces of a features (13-dimensional), the others used the same
long scratch) to form only one defect, and to compute features after CDA n-dimensional space reduction
an overall cartography of checked device (Figure 3). (with n varying between 2 and 13). Figure 4 depicts
These facilities were used to acquire a great number global classification performances (calculated by av-
of Nomarski images, from which were extracted de- eraging percentage of well-classified items for the 5
fects images using aforementioned technique. Two trainings) for the 14 different experiments, as well as
Figure 3. Automatic control system and cartography of a 100mm x 65mm optical device
ANN-Based Defects Diagnosis of Industrial Optical Devices
Figure 4. Classification performances for different CDA issued data dimensionality. Classification performances
using raw data (13-dimensional) are also depicted as dotted lines.
100
95
90
% of classifier correct answers
85
80
C las s -1 c las s ific ation pe rform anc e
75 C las s 1 c las s ific ation per form anc e
G lobal c las s ific ation per form anc e
70
65
60
55
50
2 3 4 5 6 7 8 9 10 11 12 13
Data Dimensionality
ANN-Based Defects Diagnosis of Industrial Optical Devices
criminate the false defects (correctable defects) from Ghorbel, F. (1994). A Complete Invariant Description
abiding (permanent) ones. In this paper is described for Gray Level Images by the Harmonic Analysis Ap-
a complete framework, which allows detecting all de- proach. Pattern Recognition, 15, 1043-1051.
fects present in a raw Nomarski image and extracting
Lee, J. A., Lendasse, A., Donckers, N., & Verleysen,
pertinent features for classification of these defects.
M. (2000). A Robust Nonlinear Projection Method. In
Obtained proper performances for dust versus other
European Symposium on Artificial Neural Networks
defects classification task with MLP neural network has
- ESANN2000.
demonstrated the pertinence of proposed approach. In
addition, data dimensionality reduction permits to use Lennon, M., Mercier, G., Mouchot, M. C., & Hubert-
low complexity classifier (while keeping performance Moy, L. (2001). Curvilinear Component Analysis for
level) and therefore to save processing time. Nonlinear Dimensionality Reduction of Hyperspectral
Images. Proceedings of SPIE, 4541, 157-168.
Verleysen, M. (2001). Learning high-dimensional
REFERENCES
data. In LFTNC2001 - NATO Advanced Research
Workshop on Limitations and Future Trends in Neural
Bouchareine, P. (1999). Mtrologie des Surfaces.
Computing.
Techniques de lIngnieur, R1390.
Voiry, M., Houbre, F., Amarger, V., & Madani, K. (2005).
Chatterjee, S. (2003). Design Considerations and Fabri-
Toward Surface Imperfections Diagnosis Using Optical
cation Techniques of Nomarski Reflection Microscope.
Microscopy Imaging in Industrial Environment. IAR &
Optical Engineering, 42, 2202-2212.
ACD Workshop 2005 Proceedings, 139-144.
Choksuriwong, A., Laurent, H., & Emile, B. (2005).
Voiry, M., Madani, K., Amarger, V., & Bernier, J. (2007).
Comparison of invariant descriptors for object rec-
Impact of Data Dimensionality Reduction on Neural
ognition. IEEE International Conference on Image
Based Classification: Application to Industrial Defects
Processing (ICIP), 377-380.
Classification. Proceedings of the 3rd International
Demartines, P. (1994). Analyse de Donnes par Rseaux Workshop on Artificial Neural Networks and Intelligent
de Neurones Auto-Organiss. PhD Thesis - Institut Information Processing - ANNIIP 2007, 56-65.
National Polytechnique de Grenoble.
Voiry, M., Madani, K., Amarger, V., & Houbre, F. (2006).
Demartines, P. & Hrault, J. (1993). Vector Quantiza- Toward Automatic Defects Clustering in Industrial
tion and Projection Neural Network. Lecture Notes in Production Process Combining Optical Detection and
Computer Science, 686, 328-333. Unsupervised Artificial Neural Network Techniques.
Proceedings of the 2nd International Workshop on
Demartines, P. & Hrault, J. (1995). CCA : Curvilinear
Artificial Neural Networks and Intelligent Information
Component Analysis. Proceedings of 15th workshop
Processing - ANNIIP 2006, 25-34.
GRETSI.
Zhang, G. P. (2000). Neural Networks for Classifica-
Derrode, S. (1999). Reprsentation de Formes Planes
tion: A Survey. IEEE Trans.on Systems, Man, and
Niveaux de Gris par Diffrentes Approximations de
Cybernetics - Part C: Applications and Reviews, 30,
Fourier-Mellin Analytique en vue dIndexation de Bases
451-462.
dImages. PhD Thesis - Universit de Rennes I.
Egmont-Petersen, M., de Ridder, D., & Handels, H.
KEy TERMS
(2002). Image Processing with Neural Networks - A
Review. Pattern Recognition, 35, 2279-2301.
Artificial Neural Networks: A network of many
Flewitt, P. E. J. & Wild, R. K. (1994). Light Microscopy. simple processors (units or neurons) that imitates
In Physical Methods for materials characterisation. a biological neural network. The units are connected
by unidirectional communication channels, which
carry numeric data. Neural networks can be trained
to find nonlinear relationships in data, and are used
ANN-Based Defects Diagnosis of Industrial Optical Devices
in applications such as robotics, speech recognition, Data Raw Dimension: When data is described
signal processing or medical diagnosis. by vectors (sets of characteristic values), data raw A
dimension is simply the number of components of
Backpropagation algorithm: Learning algorithm
these vectors.
of ANNs, based on minimising the error obtained from
the comparison between the outputs that the network Detection: Identification of a phenomenon among
gives after the application of a set of network inputs others from a number of characteristic features or
and the outputs it should give (the desired outputs). symptoms. In our work, it consists in identifying
surface irregularities on optical devices.
Classification: Affectation of a phenomenon to a
predefined class or category by studying its character- MLP (Multi Layer Perceptron): This widely
istic features. In our work it consists in determining the used artificial neural network employs the perceptron
nature of detected optical devices surface defects (for as simple processor. The model of the perceptron,
example dust or other type of defects). proposed by Rosenblatt is as follows:
Data Dimensionality Reduction: Data dimension-
ality reduction is the transformation of high-dimensional
data into a meaningful representation of reduced dimen-
sionality. The goal is to find the important relationships
between parameters and reproduce those relationships
in a lower dimensionality space. Ideally, the obtained
representation has a dimensionality that corresponds to
the intrinsic dimensionality of the data. Dimensional-
ity reduction is important in many domains, since it
facilitates classification, visualization, and compression
of high-dimensional data. In our work its performed
using Curvilinear Distance Analysis.
Data Intrinsic Dimension: When data is described
by vectors (sets of characteristic values), data intrinsic In this diagram, the X represent the inputs and
dimension is the effective number of degrees of free- Y the output of the neuron. Each input is multiplied
dom of the vectors set. Generally, this dimension is by the weight w, a threshold b is subtracted from the
smaller than the data raw dimension because it may result and finally Y is processed by the application of
exist linear and/or non-linear relations between the an activation function f. The weights of the connection
different components of the vectors. are adjusted during a learning phase using backpropa-
gation algorithm.
Manuel Lama
University of Santiago de Compostela, Spain
INTRODUCTION tional tools based on AI, and (2) the most relevant AI
techniques applied on the development of intelligent
Governments and institutions are facing the new de- educational systems.
mands of a rapidly changing society. Among many
significant trends, some facts should be considered
(Silverstein, 2006): (1) the increment of number and EXAMPLES OF EDUCATIONAL TOOLS
type of students; and (2) the limitations imposed by BASED ON AI
educational costs and course schedules. About the for-
mer, the need of a continuous update of knowledge and The field of Artificial Intelligence can contribute with
competences in an evolving work environment requires interesting solutions to the needs of the educational
life-long learning solutions. An increasing number of domain (Kennedy, 2002). In what follows, the type
young adults are returning to classrooms in order to of systems that can be built based on AI techniques
finish their graduate degrees or attend postgraduate are outlined.
programs to achieve an specialization on a certain
domain. About the later, due to the emergence of new Intelligent Tutoring Systems
types of students, budget constraints and schedule
conflicts appear. Workers and immigrants, for instance, The Intelligent Tutoring Systems are applications
are relevant groups for which educational costs and that provide personalized/adaptive learning without
job incompatible schedules could be the key factor the intervention of human teachers (VanLehn, 2006).
to register into a course or to give up a program after They are constituted by three main components: (1)
investing time and effort on it. In order to solve the knowledge of the educational contents, (2) knowledge
needs derived from this social context, new educational of the student, and (3) knowledge of the learning pro-
approaches should be proposed: (1) to improve and cedures and methodologies. These systems promise to
extend the online learning courses, which would reduce radically transform our vision of online learning. As
student costs and allows to cover the educational needs opposed to the hypertext-based e-learning applications,
of a higher number of students, and (2) to automate which provide the students with a certain number of
learning processes, then reducing teacher costs and opportunities to search for the correct answer before
providing a more personalized educational experience showing it, the intelligent tutoring systems perform
anytime, anywhere. like coaches not only after the introduction of the re-
As a result of this context, in the last decade an sponse, but also offering suggestions when the students
increasing interest on applying computer technologies doubt or are blocked during the process of solving the
in the field of Education has been observed. On this problem. In this way, the assistance guide the learning
regard, the paradigms of the Artificial Intelligence process rather than merely saying what is correct or
(AI) field are attracting an special attention to solve what is wrong.
the issues derived from the introduction of computers There exist numerous examples of intelligent tutor-
as supporting resources of different learning strategies. ing systems, some of them developed at universities
In this paper we review the state-of-art of the applica- as research projects while others created with business
tion of Artificial Intelligence techniques in the field of goals. Among the first ones, the Andes systems (Van-
Education, focusing on (1) the most popular educa- Lehn, Lynch, Schulze, Shapiro, Shelby, Taylor, Treacy,
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AI and Education
Weinstein & Wintersgill, 2005), developed under the shows the progress carried out by the students and
guidance of Kurt VanLehn of the University of Pittsburg, determines their proficiency and degree of competence A
is a popular example. The system is in charge of guid- on the use of foreign languages. As for commercial
ing the students while they try to solve different sets of applications, Intellimetric is a Web-based system that
problems and exercises. When the student ask for help lets students to submit their work online (Intellimetric,
in the middle of an activity, the system either provides 2007). In a few seconds, the AI-supported grading
hints in order to step further towards the solution or engine automatically provides the score of the work.
points out what was wrong in some earlier step. Andes The company claims a reliability of 99%, meaning that
was successfully evaluated during 5 years in the Naval 99 percent of the time the engines scores match those
Academy of the United States and can be downloaded provided by human teachers.
for free. Another relevant system is Cognitive Tutor
(Koedinger, Anderson, Hadley & Mark, 1997), is a Computer Supported Collaborative
comprehensive secondary mathematics curricula and Learning
computer-based tutoring program developed by John R.
Anderson, professor at the Carnegie Mellon University. The environments of computer supported collaborative
The Cognitive Tutor is an example of how research learning are aimed at facilitating the learning process
prototypes can be evolved into commercial solutions, providing the students both the context and tools to
as it is nowadays used in 1,500 schools in the United interact and work in a collaborative way with their class-
States. On the business side, Read-On! is presented as mates (Soller, Martinez, Jermann & Muehlenbrock,
a product that teaches reading comprehension skills 2005). In intelligent-based systems, the collaboration is
for adults. It analyzes and diagnoses the specific defi- usually carried out with the help of software agents in
ciencies and problems of each student and then adapts charge of mediating and supporting student interaction
the learning process based on that features (Read On, to achieve the proposed learning objectives.
2007). It includes an authoring tool that allows course The research prototypes are the suitable test-beds
designers to adapt course contents to different student to prove new ideas and concepts, to provide the best
profiles in a fast and flexible way. collaborative strategies. The DEGREE system, for
instance, allows the characterization of group behav-
Automatic Evaluation Systems iours as well as the individual behaviours of the people
constituting them, on the basis of a set of attributes
Automatic Evaluation Systems are mainly focused on or tags. The mediator agent utilizes those attributes,
evaluating the strengths and weaknesses of students in which are introduced by students, in order to provide
different learning activities through assessment tests recommendations and suggestions to improve the in-
(Conejo, Guzmn, Millan, Trella, Perez-de-la-Cruz. teraction inside each group (Barros & Verdejo, 2000).
& Rios, 2004). In this way, these systems not only In the business domain there exist multiple solutions
perform the automatic correction of the test, but also although they do not offer intelligent mediation to
derive automatically useful information about the facilitate the collaborative interactions. The DEBBIE
competences and skills obtained by the students during system (DePauw Electronic Blackboard for Interactive
the educational process. Education) is one of the most popular (Berque, John-
Among the automatic evaluation systems, we could son, Hutcheson, Jovanovic, Moore, Singer & Slattery,
highlight ToL (Test On Line) (Tartaglia & Tresso, 2002), 2000). It was originally developed at the beginning of
which have been used by Physics students in the Poly- year 2000 at the University of Depauw, and managed
technic University of Milano. The system is composed later by the DyKnow company, which was specifically
of a database of tests, an algorithm for question selec- created to make profit with DEBBIE (Schnitzler, 2004).
tion, and a mechanism for the automatic evaluation The technology that currently offers DyKnow allows
of tests, which can be additionally configured by the both teachers and students to instantaneously share
teachers. CELLA (Comprehensive English Language information and ideas. The final goal is to support
Learning Assesment) (Cella, 2007) is another system student tasks in the classroom by eliminating the need
that evaluates the student competence on using and of performing simple tasks, as for instance backing up
understanding the English language. The application the teachers presentations. The students could therefore
AI and Education
be more focused on understanding as well as analyzing use of student models. Broadly speaking, these models
the concepts presented by the teacher. imply the construction of a qualitative representation
of student behavior in terms of existing background
Game-Based Learning knowledge about a domain (McCalla, 1992). These
representations can be further used in intelligent tutor-
Learning based on serious games, a term coined to ing systems, intelligent learning environments, and
distinguish between learning-oriented games used in to develop autonomous intelligent agents that may
education and purely entertaining-oriented games, deal collaborate with human students during the learning
with the utilization of the motivational power and at- process. The introduction of machine learning tech-
tractiveness of games in the educational domain in order niques facilitates to update and extend the first versions
to improve the satisfaction and performance of students of student models in order to adapt to the evolution
when acquiring new knowledge and skills. This type of each student as well as the possible changes and
of learning allows to carry out activities in complex modifications of contents and learning activities (Sison
educational environments that would be impossible to & Shimura, 1998). The most popular student model-
implement, because of budget, time, infrastructure and ing techniques are (Beck, Stern, & Haugsjaa, 1996):
security limitations, with traditional resources (Michael overlay models and bayesian network models. The first
& Chen, 2005; Corti, 2006). method consists on considering the student model as a
NetAids is an institution that develop games to subset of the knowledge of an expert in the domain on
teach concepts of global citizenship and to sensitize to which the learning is taking place. In fact, the degree
fight against poverty. One of its first games, released of learning is measured in terms of the comparison
in 2002, called NetAid World Class, consists on taking between the knowledge acquired and represented in
the identity of a real child living in India and to resolve the student model with the background initially stored
the real problems that confront the poor children in this in the expert model. The second method deals with the
region (Stokes, 2005). In 2003 the game was used by representation of the learning process as a network
40.000 students in different Schools across the United of knowledge states. Once defined, the model should
States. In the business and entertainment arena, many infer, from the tutor-student interaction, the probability
games exist that can be resorted to reach educational of the student on being in a certain state.
goals. Among the most popular ones, Brain Training of
Nintendo (Brain Training, 2007) challenges the user to Intelligent Agents and Agent-Based
improve her mental shape by doing memory, reasoning Systems
and mathematical exercises. The final goal is to reach
an optimal cerebral age after some regular training. Software agents are considered software entities, such as
software programs or robots, that present, with different
degree, three main attributes: autonomy, cooperation
and learning (Nwana, 1996). Autonomy refers to the
AI TECHNIQUES IN EDUCATION
principle that an agent can operate on their own (act-
ing and deciding upon its own representation of the
The intelligent educational systems reviewed above are
world). Cooperation refers to the ability to interact
based on a diversity of artificial intelligence techniques
with other agents via some communication language.
(Brusilovsky & Peylo, 2003). The most frequently
Finally, learning is essential to react or interact with
used in the field of education are: (1) personalization
the external environment. Teams of intelligent agents
mechanisms based on student and group models, (2)
build up MultiAgent Systems (MAS). In this type of
intelligent agents and agent-based systems, and (3)
systems each agent has either incomplete information
ontologies and semantic web techniques.
or limited capabilities for solving the problem at hand.
Other important aspect concerns with the lack of cen-
Personalization Mechanisms tralized global control; therefore, data is distributed
all over the system and computation is asynchronous
The personalization techniques, which are the basis of
(Sycara, 1998). Many important tasks can be carried
intelligent tutoring systems, involve the creation and
out by intelligent agents in the context of learning and
0
AI and Education
educational systems (Jafari, 2002, Snchez, Lama, Research in this field is very active and faces am-
Amorim, Riera, Vila & Barro, 2003): the monitoring bitious goals. In some decades it could be possible to A
of inputs, outputs, and the activity outcomes produced dream about sci-fi environments in which the students
by the students; the verification of deadlines during would have brain interfaces to directly interact with an
homework and exercise submission; automatic answer- intelligent assistant (Koch, 2006), which would play
ing of student questions; and the automatic grading of the role of a tutor with a direct connection with learn-
tests and surveys. ing areas of the brain.
Ontologies aim to capture and represent consensual In this paper we have reviewed the state-of-art of the
knowledge in a generic way, and that they may be reused application of Artificial Intelligence techniques in the
and shared across software applications (Gmez-Prez, field of Education. AI approaches seem promising to
Fernndez-Lpez & Corcho, 2004). An ontology is improve the quality of the learning process and then
composed of concepts or classes and their attributes, to satisfy the new requirements of a rapidly changing
the relationships between concepts, the properties of society. Current AI-based systems such as intelligent
these relationships, and the axioms and rules that ex- tutoring systems, computer supported collaborative
plicitly represents the knowledge of a certain domain. learning and educational games have already proved
In the educational domain, several ontologies have the possibilities of applying AI techniques. Future
been proposed: (1) to describe the learning contents applications will both facilitate personalized learning
of technical documents (Kabel, Wielinga, & de How, styles and help the tasks of teachers and students in
1999), (2) to model the elements required for the traditional classroom settings.
design, analysis, and evaluation of the interaction
between learners in computer supported cooperative
learning (Inaba, Tamura, Ohkubo, Ikeda, Mizoguchi REFERENCES
& Toyoda, 2001), (3) to specify the knowledge needed
to define new collaborative learning scenarios (Barros, Silverstein, S. (2006) Colleges see the future in tech-
Verdejo, Read & Mizoguchi, 2002), (4) to formalize the nology, Los Angeles Times.
semantics of learning objects that are based on metadata
standards (Brase & Nejdl, 2004), and (5) to describe Kennedy, K. (2002) Top 10 Smart technologies for
the semantics of learning design languages (Amorim, Schools: Artificial Intelligence. Retrieved from: http://
Lama, Snchez, Riera & Vila, 2006). www.techlearning.com/db_area/archives/TL/2002/11/
topten5.html
VanLehn, K. (2006) The Behavior of Tutoring Sys-
FUTURE TRENDS tems. International Journal of Artificial Intelligence
in Education, 16:227-265.
The next generation of adaptive environments will in-
VanLehn, K., Lynch, C., Schulze, K., Shapiro, J.A.,
tegrate pedagogical agents, enriched with data mining
Shelby, R., Taylor L., Treacy D., Weinstein A. & Win-
and machine learning techniques, capable of providing
tersgill M. (2005) The Andes Physics Tutoring System:
cognitive diagnosis of the learners that will help to
Lessons Learned. International Journal of Artificial
determine the state of the learning process and then
Intelligence in Education, 15:147-204.
optimize the selection of personalized learning designs.
Moreover, improved models of learners, facilitators, Koedinger K., Anderson J.R., Hadley, W.H. & Mark
tasks and problem-solving processes, combined with the M.A. (1997) Intelligent tutoring goes to school in the
use of Ontologies and reasoning engines, will facilitate big city. International Journal of Artificial Intelligence
the execution of learning activities on either online in Education, 8: 30-43.
platforms or traditional classroom settings.
AI and Education
Read On! (2007). Retrieved from: http://www.stot- Brain Training (2007) Retrieved from: http://
tlerhenke.com/products/index.htm es.videogames.games.yahoo.com/especiales/brain-
training
Conejo, R., Guzmn, E., Millan, E., Trella, M., Perez-
de-la-Cruz, J.L. & Rios, A. (2004) SIETTE: A Web- Brusilovsky, P. & Peylo, C. (2003) Adaptive and
Based Tool for Adaptive Testing. International Journal Intelligent Web-based Educational Systems. Interna-
of Artificial Intelligence in Education, 14:29-61. tional Journal of Artificial Intelligence in Education,
13:156169.
Tartaglia A. & Tresso E. (2002) An Automatic Evalua-
tion System for Technical Education at the University McCalla, G. (1992) The central importance of student
Level. IEEE Transactions on Education, 45(3):268- modeling to intelligent tutoring. En: New Directions for
275. Intelligent Tutoring Systems, Springer Verlag.
CELLA Comprehensive English Language Learn- Sison, R. & Shimura, M. (1998) Student modeling and
ing Assessment (2007) Retrieved from: http://www. machine learning. International Journal of Artificial
ets.org Intelligence in Education, 9:128-158.
Intellimetric (2007) Retrieved from: http://www.van- Beck, J., Stern, M. & Haugsjaa, E. (1996) Applications
tagelearning.com/intellimetric/ of AI in Education. Crossroads, 3(1):11-15.
Soller, A., Martinez, A., Jermann, P., & Muehlenbrock, Nwana, H.S. (1996) Software Agents: An Overview.
M. (2005) From Mirroring to Guiding: A Review of Knowledge Engineering Review. 11(2): 205-244.
State of the Art Technology for Supporting Collab-
Sycara, K.P. (1998) Multiagent Systems. AI Magazine,
orative Learning. International Journal of Artificial
19(2): 79-92.
Intelligence in Education, 15:261-290.
Jafari, A. (2002) Conceptualizing intelligent agents for
Barros, B. & Verdejo, M.F. (2000) Analysing student
teaching and learning. Educause Quaterly, 25(3):28-
interaction processes in order to improve collabora-
34.
tion: the DEGREE approach. International Journal of
Artificial Intelligence in Education, 11:221-241. Snchez, E., Lama, M., Amorim, R., Riera, A., Vila, J &
Barro, S. (2003) A multi-tiered agent-based architecture
Berque, D., Johnson, D., Hutcheson, A., Jovanovic,
for a cooperative learning environment. Proceedings
L., Moore, K., Singer, C. & Slattery, K. (2000) The
of IEEE Euromicro Conference on Parallel and Dis-
design of an interface for student note annotation in a
tributed Processing (PDP 2003), 2003.
networked electronic classroom. Journal of Network
and Computer Applications, 23(2):77-91. Gmez-Prez, A., Fernndez-Lpez, M. & Corcho, O.
(2004) Ontological Engineering, Springer Verlag.
Schnitzler, P. (2004) Becker bets on education software
startup. The Indianapolis Business Journal, 24(44). Kabel, S., Wielinga, B. & de How, R. (1999) Ontolo-
Retrieved from: http://www.dyknowvision.com/news/ gies for indexing Technical Manuals for Instruction.
articles/BeckerBetsOnEdSoftware.html Proceedings of the AIED-Workshop on Ontologies
for Intelligent Educational Systems, LeMans, France,
Michael, D. & Chen, S. (2005) Serious Games: Games
44-53.
That Educate, Train, and Inform. Course Technology
PTR. Inaba, A., Tamura, T., Ohkubo, R., Ikeda, M., Mizo-
guchi, R. & Toyoda, J. (2001) Design and Analysis of
Corti, K. (2006) Gamesbased Learning: a serious busi-
Learners Interaction based on Collaborative Learning
ness application. Report on PixelLearning. Retrieved
Ontology. Proceedings of the Second European Confer-
from: http://www.pixelearning.com/docs/serious-
ence on Computer-Supported Collaborative Learning
gamesbusinessapplications.pdf
(Euro-CSCL2001), 308-315.
Stokes, B. (2005) Videogames have changed: time to
Barros, B., Verdejo, M.F., Read, T. & Mizoguchi, R.
consider Serious Games? The Development Education
(2002) Applications of a Collaborative Learning Ontol-
Journal, 11(2).
AI and Education
ogy. Proceedings of the Second Mexican International Game-Based Learning: A new type of learning that
Conference on Artificial Intelligence (MICAI 2002), combines educational content and computer games in A
301-310. order to improve the satisfaction and performance of
students when acquiring new knowledge and skills.
Brase, J. & Nejdl, W. (2004) Ontologies and Metadata
for eLearning. Handbook on Ontologies, Springer- Intelligent Tutoring Systems: A computer program
Verlag. that provides personalized/adaptive instruction to stu-
dents without the intervention of human beings.
Amorim, R., Lama, M., Snchez, E., Riera, A. & Vila,
X.A. (2006) A Learning Design Ontology based on the Ontologies: A set of concepts within a domain
IMS Specification. Journal of Educational Technology that capture and represent consensual knowledge in a
& Society, 9(1):38-57. generic way, and that they may be reused and shared
across software applications.
Koch, C. (2006) Christof Koch forecast the future. New
Scientist. Retrieved from: http://www.newscientist. Software Agents: Software entities, such as
com/channel/opinion/science-forecasts/dn10626- software programs or robots, characterized by their
christof-koch-forecasts-the-future.html autonomy, cooperation and learning capabilities.
Student Models: Representation of student be-
KEy TERMS havior and degree of competence in terms of existing
background knowledge about a domain.
Automatic Evaluation Systems: Applications
focused on evaluating the strengths and weaknesses
of students in different learning activities through as-
sessment tests.
Computer Supported Collaborative Learning
(CSCL): A research topic on supporting collaborative
learning methodologies with the help of computers and
collaborative tools.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AI and Rubble-Mound Breakwater Stability
ity is studied in Mase et al.s (1995) pioneering work, on the seaward and leeward sides, respectively. The
focusing on a particular stability formula. Medina et armour layer consists in turn of two layers of stones A
al. (2003) train and test an Artificial Neural Network with a weight W=69 g 10%; those in the upper layer
with stability data from six laboratories. The inputs are painted in blue, red and black following horizontal
are the relative wave height, the Iribarren number and bands, while those in the lower layer are painted in
a variable representing the laboratory. Kim and Park white, in order to easily identify after a test the dam-
(2005) compare different ANN models on an analysis aged areas, i.e., the areas where the upper layer has
revolving around one empirical stability formula, as did been removed. The filter layer is made up of a gravel
Mase et al.s (1995). Yagci et al. (2005) apply different with a median size D50 = 15.11 mm and a thickness of
kinds of neural networks and fuzzy logic, characterising 4 cm. Finally, the core consists of a finer gravel, with
the waves by their height, period and steepness. D50 = 6.95 mm, D15 = 5.45 mm, and D85 = 8.73 mm,
and a porosity n = 42%. The density of the stones and
gravel is r = 2700 kg/m3.
PHySICAL MODEL AND ANN MODEL Waves were measured at six different stations along
the longitudinal, or x-axis, of the flume. With the origin
The Artificial Neural Networks were trained and tested of x located at the rest position of the wave paddle, the
on the basis of laboratory tests carried out in a wave first wave gauge, S1, was located at x=7.98 m. A group
flume of the CITEEC Laboratory, University of La of three sensors, S2, S3 and S4, was used to separate
Corua. The flume section is 4 m wide and 0.8 m high, the incident and the reflected waves. The central wave
with a length of 33 m (Figure 1). Waves are generated gauge, S3, was placed at x=12.28 m, while the position
by means of a piston-type paddle, controlled by an of the others, S2 and S4, was varied according to the
Active Absorption System (AWACS) which ensures wave generation period of each test (Table 1). Another
that the waves reflected by the model are absorbed at wave gauge, S5, was located 25 cm in front of the model
the paddle. breakwater toe, at x=13.47 m, and 16 cm to the right
The model represents a typical three-layer rubble- (as seen from the wave paddle) of the flume centreline,
mound breakwater in 15 m of water, crowned at +9.00 so as not to interfere with the video recording of the
m, at a 1:30 scale. Its slopes are 1:1.50 and 1:1.25
Table 1. Relative water depth (kh), wave period (T), and separation between sensors S2, S3 and S4 in the stabil-
ity tests
AI and Rubble-Mound Breakwater Stability
tests. Finally, a wave gauge (S6) was placed to the lee damage parameter is more adequate for the Artificial
of the model breakwater, at x=18.09 m. Neural Network model:
Both regular and irregular waves were used in the
stability tests. This article is concerned with the eight nD50
S=
regular wave tests, carried out with four different wave (1 p )b
periods. The water depth in the flume was kept constant
throughout the tests (h=0.5 m). Each test consisted of a where D50 is the median size of the armour stones, p
number of wave runs with a constant value of the wave is the porosity of the armour layer, b is the width of
period T, related to the wavenumber k by the model breakwater, and n is the number of units
1 displaced after each wave run. In this case, D50 = 2.95
[gk tanh(kh)] 2,
T =2 cm, p = 0.40, and b = 50 cm.
The incident wave height was nondimensionalized
where g is the gravitational acceleration. The wave by means of the zero-damage wave height of the SPM
periods and relative water depths (kh) of the tests are (1984) formulation,
shown in Table 1.
Each wave run consisted of 200 waves. In the first 1
run of each test, the generated waves had a model
3
3
height H=6 cm (corresponding to a wave height in the W KD r
1 cot
prototype Hp=1.80 m); in the subsequent runs, the wave w
H0 =
height was increased in steps of 1 cm (7 cm, 8 cm, 9 r
cm, etc.), so that the model breakwater was subject to
ever more energetic waves.
Four damage levels (Losada et al., 1986) were
used to characterize the stability situation of the model where KD=4 is the stability coefficient, w=1000 kg/m3
breakwater after each wave run: is the water density (freshwater used in the laboratory
tests), and a is the breakwater slope. With these val-
(0) No damage. No armour units have been moved ues, H0 = 9.1 cm. The nondimensional incident wave
from their positions. height is given by
(1) Initiation of damage. Five or more armour units
have been displaced. H
H* =
(2) Iribarren damage. The displaced units of the H0
armours first (outer) layer have left uncovered
an area of the second layer large enough for a where H stands for the incident wave height.
stone to be removed by waves. Most of the previous applications of Artificial
(3) Initiation of destruction. The first unit of the Neural Networks in Civil Engineering use multilayer
armours second layer has been removed by wave feedforward networks trained with the backpropagation
action. algorithm (Freeman and Skapura, 1991; Johansson et
al., 1992), which will also be employed in this study;
As the wave height was increased through a test, their main advantage lies in their generalisation capa-
the damage level also augmented from the initial no bilities. Thus this kind of network may be used, for
damage to initiation of damage, Iribarren damage, instance, to predict the armour damage that a model
and eventually initiation of destruction, at which breakwater will sustain under certain conditions, even
point the test was terminated and the model rebuilt for if these conditions were not exactly part of the data set
the following test. The number of wave runs in a test with which the network was trained. However, the pa-
varied from 10 to 14. rameters describing the conditions (e. gr., wave height
The foregoing damage levels provide a good semi- and period) must be within the parameter ranges of the
quantitative assessment of the breakwater stability stability tests with which the ANN was trained.
condition. However, the following nondimensional
AI and Rubble-Mound Breakwater Stability
In this case, the results from the stability tests of the Figure 2. Regression analysis. Complete data set.
model rubble-mound breakwater described above were A
used to train and test the Artificial Neural Network. The
eight stability tests comprised 96 wave runs. The input
to the network was the nondimensional wave height
(H*) and the relative water depth (kh) of a wave run,
and the output, the resulting nondimensional damage
parameter (S). Data from 49 wave runs, corresponding
to the four stability tests T20, T21, T22, and T23, were
used for training the network; while data from 46 wave
runs, pertaining to the remaining four tests (T10, T11,
T12, and T13) were used for testing it. This distribution
of data made sure that each of the four wave generation
periods (Table 1) was present in both the training and
the testing data sets.
First, an Artificial Neural Network with 10 sigmoid
neurons in the hidden layer and a linear output layer was
trained and tested 10 times. The ANN was trained by
means of the Bayesian Regularisation method (MacKay,
1992), known to be effective in avoiding overfitting.
The average MSE values were 0.2880 considering all The testing data set comprised also four stability
the data, 0.2224 for the training data set, and 0.3593 tests (T10, T11, T12 and T13). The inherent difficulty
for the testing data set. The standard deviations of of the problem is apparent in test T11 (Figure 4), in
the MSE values were 5.9651x10-10, 9.0962x10-10, and which the nondimensional damage parameter (S) does
7.7356x10-10, for the complete data set, the training not increase in the wave run at H* =1.54, but sud-
and the testing data sets, respectively. Increasing the denly soars by about 100% in the next wave run, at
number of neural units in the hidden layer to 15 did H* =1.65. Such differences from one wave run to the
not produce any significant improvement in the aver- next are practically impossible to capture by the ANN
age MSE values (0.2879, 0.2222 and 0.3593 for all model, given that the inputs to the ANN model either
the data, the training data set and the testing data set, vary only slightly, by less than 7% in this case (the
respectively), so the former Artificial Neural Network, nondimensional wave height, H*) or do not vary at all
with 10 neurons in the hidden layer, was retained. (the relative water depth, kh). It should be remembered
The following results correspond to a training and that, when computing the damage after a given wave
testing run of this ANN with a global MSE of 0.2513. run, the ANN does not have any information about the
The linear regression analysis indicates that the ANN damage level before that wave run, unlike the physical
data fit very well to the experimental data over the whole model. Yet the ANN performs well, yielding an MSE
range of the nondimensional damage parameter S. In value of 0.3678 with the testing data set.
effect, the correlation coefficient is 0.983, and the equa-
tion of the best linear fit, y = 0.938 x 0.00229 , is very
FUTURE TRENDS
close to that of the diagonal line y = x (Figure 2).
The results obtained with the training data set (sta-
In this study, results from stability tests carried out with
bility tests T20, T21, T22 and T23) show an excellent
regular waves were used. Irregular wave tests should
agreement between the ANN model and the physical
also be analyzed by means of Artificial Intelligence,
model (Figure 3). In three of the four tests (T20, T22
and it is the authors intention to do so in the future.
and T23) the ANN data mimic the measurements on the
Breakwater characteristics are another important aspect
model breakwater almost to perfection. In test T21, the
of the problem. The ANN cannot extrapolate beyond
physical model experiences a brusque increase in the
the ranges of wave and breakwater characteristics on
damage level at H* =1.65, which is slightly softened
which it was trained. The stability tests used for this
by the ANN model. The MSE value is 0.1441.
AI and Rubble-Mound Breakwater Stability
Figure 3. ANN () and physical model results () for the stability tests T20, T21, T22 and T23 (training data
set)
Figure 4. ANN () and physical model results () for the stability tests T10, T11, T12 and T13 (testing data
set)
AI and Rubble-Mound Breakwater Stability
AI and Rubble-Mound Breakwater Stability
0
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Artificial Intelligence for Information Retrieval
This weight is determined by the frequency of a stem of mass data has been derived from computational
within the text of a document (Savoy, 2003). linguistics (Hartrumpf, 2006).
In multimedia retrieval, the context is essential For application and domain specific knowledge,
for the selection of a form of query and document another approach is taken to improve the representation
representation. Different media representations may of documents. The representation scheme is enriched
be matched against each other or transformations may by exploiting knowledge about concepts of the domain
become necessary (e.g. to match terms against pictures (Lin & Demner-Fushman, 2006).
or spoken language utterances against documents in
written text). Match Between Query and Document
As information retrieval needs to deal with
vague knowledge, exact processing methods are Once the representation has been derived, a crucial
not appropriate. Vague retrieval models like the aspect of an information retrieval system is the
probabilistic model are more suitable. Within these similarity calculation between query and document
models, terms are provided with weights corresponding representation. Most systems use mathematical simi-
to their importance for a document. These weights larity functions such as the Cosine. The decision for
mirror different levels of relevance. a specific function is based on heuristics or empirical
The result of current information retrieval systems evaluations. Several approaches use machine learning
are usually sorted lists of documents where the top for long term optimization of the matching between
results are more likely to be relevant according to the term and document. E.g. one approach applies genetic
system. In some approaches, the user can judge the algorithm to adapt a weighting function to a collection
documents returned to him and tell the systems which (Almeida et al., 2007).
ones are relevant for him. The system then resorts Neural networks have been applied widely in IR.
the result set. Documents which contain many of the Several network architectures have been applied for
words present in the relevant documents are ranked retrieval tasks, most often the so-called spreading activa-
higher. This relevance feedback process is known to tion networks are used. Spreading activation networks
greatly improve the performance. Relevance feedback are simple Hopfield-style networks, however, they do
is also an interesting application for machine learning. not use the learning rule of Hopfield networks. They
Based on a human decisions, the optimization step can typically consist of two layers representing terms and
be modeled with several approaches, e.g. with rough documents. The weights of connections between the
sets (Singh & Dey 2005). In Web environments, a click layers are bi-directional and initially set according to
is often interpreted as an implicit positive relevance the results of the traditional indexing and weighting
judgment (Joachims & Radlinski, 2007). algorithms (Belkin, 2000). The neurons corresponding
to the terms of the users query are activated in the term
Advanced Representation Models layer and activation spreads along the weights into
the document layer and back. Activation represents
In order to represent documents in natural language, the relevance or interest and reaches potentially relevant
content of these documents needs to be analyzed. This terms and documents. The most highly activated docu-
is a hard task for computer systems. Robust semantic ments are presented to the user as result. A closer look
analysis for large text collections or even multime- at the models reveals that they very much resemble
dia objects has yet to be developed. Therefore, text the traditional vector space model of Information
documents are represented by natural language terms Retrieval (Mandl, 2000). It is not until after the second
mostly without syntactic or semantic context. This is step that associative nature of the spreading activation
often referred to as the bag-of-words approach. These process leads to results different from a vector space
keywords or terms can only imperfectly represent an model. The spreading activation networks successfully
object because their context and relations to other tested with mass data do not take advantage of this
terms are lost. associative property. In some systems the process is
However, great progress has been made and systems halted after only one step from the term layer into the
for semantic analysis are getting competitive. Advanced document layer, whereas others make one more step
syntactic and semantic parsing for robust processing
Artificial Intelligence for Information Retrieval
back to the term layer to facilitate learning (Kwok & the algorithm adapts the weights of the winning neuron
Grunfeld, 1996). and its neighbor. In that way, neighboring clusters have A
Queries in information retrieval systems are a high similarity.
usually short and contain few words. Longer queries The information retrieval applications of SOMs
have a higher probability to achieve good results. As a classify documents and assign the dominant term as
consequence, systems try to add good terms to a query name for the cluster. For real world large scale col-
entered by a user. Several techniques have been applied. lections, one two-dimensional grid is not sufficient. It
Either these terms are taken from top ranked documents would be either too big or each node would contain
or terms similar to the original ones are used. Another too many documents consequently. Neither would be
technique is to use terms from documents from the same helpful for users, therefore, a layered architecture is
category. For this task, classification algorithms from adopted. The highest layer consists of nodes which
machine learning are used (Sebastiani, 2002). represent clusters of documents. The documents of
Link analysis applies well known measures from these nodes are again analyzed by a SOM. For the
bibliometric analysis to the Web. The number links user, the system consists of several two-dimensional
pointing to a Web page is used as an indicator for its maps of terms where similar terms are close to each
quality (Borodin et al., 2005). PageRank assigns an other. After choosing one node, he may reach another
authority value to each Web page which is primarily a two-dimensional SOM.
function of its back links. Additionally, it assumes that The information retrieval paradigm for the SOM is
links from pages with high authority should be weighed browsing and navigating between layers of maps. The
higher and should result in a higher authority for the SOM seems to be a very natural visualization. However,
receiving page. To account for the different values the SOM approach has some serious drawbacks.
each page has to distribute, the algorithm is carried
out iteratively until the result converges (Borodin et The interface for interacting with several layers
al., 2005). Machine Learning approaches complement of maps makes the system difficult to browse.
link analysis. Decisions of humans about the quality Users of large text collections need primarily
of Web pages are used to determine design features of search mechanisms which the SOM itself does
these pages which are good indicators of their quality. not offer.
Machine learning models are applied to determine the The similarity of the document collection is
quality of pages not judged yet (Mandl, 2006, Marti reduced to two dimensions omitting many po-
& Hearst, 2002). tentially interesting aspects.
Learning from users has been an important strategy The SOM unfolds its advantages for human-
to improve systems. In addition to the content, artificial computer-interaction better for a small number
intelligence methods have been used to improve the of documents. A very encouraging application
user interface. would be the clustering of the result set. The
neurons would fit on one screen, the number of
Value Added Components for User terms would be limited and therefore, the reduc-
Interfaces tion to two dimensions would not omit so many
aspects.
Several Researchers have implemented information
retrieval systems based on the Kohonen self organiz- User Classification and Personalization
ing map (SOM), a neural network model for unsuper-
vised classification. They provide an associative user Adaptive information retrieval approaches intend to
interface where neighborhood of documents expresses tailor the results of a system to one user and his inter-
a semantic relation. Implementations for large collec- ests and preferences. The most popular representation
tions can be tested on the internet (Kohonen, 1998). scheme relies on the representation scheme used in
The SOM consists of a usually two-dimensional grid information retrieval where a document-term-matrix
of neurons, each associated with a weight vector. Input stores the importance or weight of each term for each
documents are classified according to the similarity document. When a term appears in a document, this
between the input pattern and the weight vectors, and, weight should be different form zero. User interest can
Artificial Intelligence for Information Retrieval
also be stored like a document. Then the interest is a relevance feedback information for personalization.
vector of terms. These terms can be ones that a user Instead, it focuses on the central aspect of a retrieval
has entered or selected in a user interface or which the function, the calculation of the similarity between docu-
system has extracted from documents for which the ment and query. Like other fusion methods, MIMOR
user has shown interest by viewing or downloading accepts the result of individual retrieval systems like
them (Agichtein et al., 2006). from a black box. These results are fused by a linear
An example for such a system is UCAIR which combination which is stored during many sessions. The
can be installed as a browser plugin. UCAIR relies weights for the systems experience a change through
on a standard web search engine to obtain a search learning. They adapt according to relevance feedback
result and a primary ranking. This ranking is now information provided by users and create a long-term
being modified by re-ranking the documents based model for future use. That way, MIMOR learns which
on implicit feedback and a stored user interest profile systems were successful in the past (Mandl & Womser-
(Shen et al., 2005). Hacker, 2004).
Most systems use this method of storing the user
interest in a term vector. However, this method has FUTURE TRENDS
several drawbacks. The interest profile may not be stable
and the user may have a variety of diverging interests Information retrieval systems are applied in more and
for work and leisure which are mixed in one profile. more complex and diverse environments. Searching
Advanced individualization techniques personal- e-mail, social computing collections and other specific
ize the underlying system functions. The results of domains pose new challenges which lead to innovative
empirical studies have shown that relevance feedback systems. These retrieval applications require thorough
is an effective technique to improve retrieval quality. and user oriented evaluation. New evaluation measures
Learning methods for information retrieval need to and standardized test collections are necessary to
extend the range of relevance feedback effects beyond achieve reliable evaluation results.
the modification of the query in order to achieve long- In user adaptation, recommendation systems are an
term adaptation to the subjective point of view of the important trend for future improvement. Recommenda-
user. The mere change of the query often results in tion systems need to be seen in the context of social
improved quality; however, the information is lost after computing applications. System developers face the
the current session. growth of user generated content which allows new
Some systems change the document representation reasoning methods.
according to the relevance feedback information. In New application like question answering relying on
a vector space metaphor, the relevant documents are more intelligent processing can be expected to gain more
moved toward the query representation. This approach market share in the near future (Hartrumpf, 2006)
also comprises some problems. Because only a fraction
of the documents are affected by the modifications, the
basic data from the indexing process is changed to a CONCLUSION
somewhat heterogeneous state. The original indexing
result is not available anymore. Knowledge management is of main importance for
Certainly, this technique is inadequate for fusion the information society. Documents written in natural
approaches where several retrieval methods are com- language contain an important share of the knowl-
bined. In this case, several basic representations would edge available. Consequently, retrieval is crucial for
need to be changed according to the influence of the the success of knowledge management systems. AI
corresponding methods on the relevant documents. technologies have been widely applied in retrieval
The indexes are usually heterogeneous, which is often systems. Exploiting knowledge more efficiently is a
considered an advantage of fusion approaches. A high major research field. In addition, user oriented value
computational overload would be the consequence. added systems require intelligent processing and ma-
The MIMOR (Multiple Indexing and Method-Object chine learning in many forms.
Relations) approach does not rely on changes to the An important future trend for AI methods in IR will
document or the query representation when processing be the context specific adaptation of retrieval methods.
Artificial Intelligence for Information Retrieval
Machine learning can be applied to find optimized Kohonen T. (1998). Self-organization of very large
functions for collections or queries. document collections: state of the art. In Proceedings A
8th Intl Conf Artificial Neural Networks. Springer, 1.
65-74.
REFERENCES
Kwok, K., & Grunfeld, L. (1996). TREC-4 ad-hoc,
routing retrieval and filtering experiments using PIRCS.
Agichtein, E., Brill, E., Dumais, S., & Ragno, R. (2006).
In: Harman Donna (ed.). The fourth Text Retrieval
Learning user interaction models for predicting web
Conference (TREC-4). NIST Special Publ 500-236.
search result preferences. Annual International ACM
Gaithersburg, MY.
Conference on Research and Development in Informa-
tion Retrieval (SIGIR) Seattle. ACM Press. 3-10. Lin, J., & Demner-Fushman, D. (2006). The role of
knowledge in conceptual retrieval: a study in the domain
de Almeida, H.M., Gonalves, M.A, Cristo, M., &
of clinical medicine. In Annual International ACM
Calado, P. (2007). A combined component approach
Conference on Research and Development in Informa-
for finding collection-adapted ranking functions based
tion Retrieval (SIGIR) Seattle. ACM Press. 99-106
on genetic programming. Annual International ACM
Conference on Research and Development in Infor- Mandl, T. (2000). Tolerant Information Retrieval with
mation Retrieval (SIGIR) Amsterdam. ACM Press. Backpropagation Networks. Neural Computing & Ap-
399-406. plications 9(4), 280-289.
Belkin, R. (2000). Finding out about: a Cognitive Per- Mandl, T. (2006). Implementation and evaluation of a
spective on Search Engine Technology and the WWW. quality based search engine. In Proceedings of the 17th
Cambridge et al.: Cambridge University Press. ACM Conference on Hypertext and Hypermedia (HT
06) Odense, Denmark. ACM Press. 73-84.
Brusilovsky, P., Kobsa, A. & Nejdl, W. (eds.) (2007).
The adaptive Web: Methods and strategies of Web Mandl, T., & Womser-Hacker, C. (2004). A Framework
personalization. Heidelberg: Springer. for long-term Learning of Topical User Preferences in
Information Retrieval. New Library World, 105(5/6)
Boughanem, M., & Soul-Dupuy, C. (1998). Mercure
184-195.
at trec6. In Voorhees, E. & Harman, D. (eds.). The
sixth text retrieval conf (TREC-6). NIST Special Publ Savoy, J. (2003). Cross-language information retrieval:
500-240. Gaithersburg, MY. experiments based on CLEF 2000 corporaSebastiani, F.
(2002). Machine Learning in Automated Text Catego-
Borodin, A., Roberts, G., Rosenthal, J. & Tsaparas, P.
rization. ACM Computing Surveys 34(1), 1-47.
(2005). Link analysis ranking: algorithms, theory, and
experiments. ACM Transactions on Internet Technology Shen, X., Tan, B. & Zhai, C. (2005). Context-sensitive
(TOIT) 5(1), 231-297. information retrieval using implicit feedback. Annual
International ACM Conference on Research and De-
Hartrumpf, S. (2006). Extending Knowledge and
velopment in Information Retrieval (SIGIR), Seattle,
Deepening Linguistic Processing for the Question
WA. ACM Press. 43-50.
Answering System InSicht. Accessing Multilingual
Information Repositories, 6th Workshop of the Cross- Singh, S., & Dey Lipika (2005). A rough-fuzzy docu-
Language Evalution Forum, CLEF. [LNCS 4022] ment grading system for customized text information
Springer. 361-369 retrieval. Information Processing & Management
41(2), 195-216.
Ivory M. & Hearst, M. (2002). Statistical Profiles of
Highly-Rated Sites. ACM CHI Conference on Hu- Zhang, D., Chen, X. & Lee, W. (2005). Text clas-
man Factors in Computing Systems. ACM Press. sification with kernels on the multinomial manifold.
367-374. Annual International ACM Conference on Research
and Development in Information Retrieval (SIGIR)
Joachims, T., & Radlinski, F. (2007). Search engines
Seattle. ACM Press. 266-273.
that learn from implicit feedback. IEEE Computer
40(8), 34-40.
Artificial Intelligence for Information Retrieval
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AI in Computer-Aided Diagnosis
cited: rule-based reasoning, artificial neural networks, Feature Extraction with ANNs
bayesian networks, case-based reasoning. To these,
the statistical methods, the genetic algorithms and the The feature extraction with the use of Artificial Neural
decision trees can be added. Networks functions basically as a selection of charac-
A problem that reaches most of the applications teristics that represent the original data set.
of pattern recognition is the data dimensionality. The This selection of characteristics is related to a pro-
dimensionality is associated with the number of at- cess in which a data set is transformed into a space
tributes that represent a pattern, that is, the dimension of characteristics that, in theory, accurately describes
of the search space. When this space contains only the the same information as the original space of the data.
most relevant attributes, the classification process is However, the transformation is projected in such a way
faster and consumes little processing resources (Jain that the data set is represented by a reduced effective
et al., 2000), and also allows for greater precision of characteristic, keeping most of the intrinsic informa-
the classifier. tion to the data, that is, the original data set suffers a
In the problems of medical image processing, the significant dimensionality reduction (Haykin, 1999).
importance of the dimensionality reduction is accen- The dimensionality reduction is extremely useful
tuated; therefore normally the images to be processed in applications that involve digital image processing,
are composed of a very great number of pixels, used which normally depend on a very high number of data
as basic attributes in the classification. points to be manipulated.
The feature extraction is a common boarding to ef- In summary, the feature extraction with ANNs trans-
fect the dimensionality reduction. Of general form, an forms the original set of pixels into a map, of reduced
extraction algorithm creates a new set of attributes from dimensions, that represents the original image without
transformations or combinations of the original set. a significant loss of information.
Some methods are studied with the intention to For this function, self-organizing neural networks
promote the feature extraction and, consequently, the are normally used, as for example, the Kohonens Self-
dimensionality reduction, such as statistical methods, Organizing Map (SOM).
methods based on the signal theory, and artificial neural The self-organizing map searches ways to transform
networks (Verikas & Bacauskiene, 2002). one determined pattern into a bi-dimensional map,
As example of the use of artificial neural networks following a certain topological order (Haykin, 1999).
in the support to the medical diagnosis, we can cite The elements that compose the map are distributed in
the research of Papadopoulos et al. (2005) and Andr an only layer, having formed a grid (Figure 1).
& Rangayan (2006).
AI in Computer-Aided Diagnosis
All the elements of the grid receive the input signal It can be evidenced that the technique of artificial
of all variables, associated to its respective weights. neural networks highlights its great versatility and A
The calculation of its value of exit is carried through robustness, providing sufficiently satisfactory results,
by one determined function, on the basis of the weights when used and implemented well.
of the connections, and it is used to identify the win- The use of an automatic system of image analysis
ning element. can assist the radiologist, when used as a tool of second
Mathematically, each element of the grid is repre- opinion, or second reading, in the analysis of possible
sented by a vector composed of the weights of con- inexact cases.
nection, with the same dimension of the input space, It is also observed that the use of the proposed
that is, the amount of elements that compose the vector methodology represents a significant profit in the im-
corresponds to the amount of input variables of the age processing of chest radiographs, for its peculiar
problem (Haykin, 1999). characteristics.
Methodology
REFERENCES
As application example, a self-organizing neural
network for the feature extraction of images of chest Ambrsio, P.E. (2007). Self-Organizing Neural Net-
radiographs was developed, objectifying the charac- works in the Characterization of Interstitial Lung
terization of normal and abnormal patterns. Diseases in Chest Radiographs. Thesis. Ribeiro Preto:
Each original image was divided in 12 parts, hav- School of Medicine of Ribeirao Preto, University of
ing as base the anatomical division normally used in Sao Paulo. [In Portuguese]
the diagnosis of the radiologist. Each part is formed
Andr, T. C. S. S.; Rangayan, R. M. (2006). Classifi-
by approximately 250,000 pixels.
cation of breast masses in mammograms using neural
With the use of the proposal self-organizing network,
networks with shape, edge-sharpness and texture fea-
a reduction for only 240 representative elements was
tures. Journal of Electronic Imaging. 15(1).
obtained, with satisfactory results in the final pattern
classification. Azevedo-Marques, P.M. (2001). Diagnstico Auxiliado
A detailed description of the methodology can be por Computador na Radiologia. Radiologia Brasileira.
found in (Ambrsio, 2007; Azevedo-Marques et al., 34(5), 285-293. [In Portuguese]
2007).
Azevedo-Marques, P.M., Ambrsio, P.E., Pereira-
Junior, R.R., Valini, R.A. & Salomo, S.C. (2007).
Characterization of Interstitial Lung Disease in Chest
FUTURE TRENDS
Radiographs using SOM Artificial Neural Network.
International Journal of Computer Assisted Radiology
The developed study shows the possibilities of appli-
and Surgery. 2 (suppl. 1), 368-370.
cation of the self-organizing networks in the feature
extraction and dimensionality reduction; however, Doi, K. (2005). Current Status and Future Potential
other types of neural networks can also be used for this of Computer-Aided Diagnosis in Medical Imaging.
purpose. New studies need to be carried out to compare The British Journal of Radiology. 78(special issue),
the results and adequacy of the methodology. S3-S19.
Fenton, J. J.; Taplin, S. H.; Carney, P. A.; Abraham,
L.; Sickles, E. A.; DOrsi, C.; Berns, E. A.; Cutter, G.;
CONCLUSION
Hendrick, R. E.; Barlow, W. E.; Elmore, J. G. (2007).
Influence of computer-aided detection on performance
The contribution of the Information Technology is
of screening mammography. New England Journal of
undeniable as support tool to the medical decision
Medicine. 356 (14), 1399-1409.
making. The Artificial Intelligence presents itself as
a great source of important techniques to be used in Giger, M.L. (2002). Computer-Aided Diagnosis in
this direction. Radiology. Academic Radiology. 9, 1-3.
AI in Computer-Aided Diagnosis
Gonzalez, R.C. & Woods, R.E. (2001). Digital Image KEy TERMS
Processing (2nd ed.). Reading, MA: Addison-Wes-
ley. Computer-Aided Diagnosis: Research area that en-
close the development of computational techniques and
Haykin, S. (1999). Neural Networks (2nd ed.). Engle-
procedures for aid to the health professionals in process
wood Cliffs, NJ: Prentice Hall.
of decision making for the medical diagnosis.
Jain, A.K., Duin, R.P.W. & Mao, J. (2000). Statistical
Dimensionality Reduction: Finding a reduced data
Pattern Recognition: A Review. IEEE Transactions
set, with the capacity of mapping a bigger set.
on Pattern Analysis and Machine Intelligence. 22(1),
4-37. Feature Extraction: Finding of representative
features of a determined problem from samples with
Jiang, Y.; Nishikawa, R. M.; Schimidt, R. A.; Toledano,
different characteristics.
A. Y.; Doi, K. (2001) Potential of computer-aided
diagnosis to reduce variability in radiologists interpre- Medical Images: Images generated in special
tations of mammograms depicting microcalcification. equipment, used for aid to the medical diagnosis. Ex.:
Radiology. 220(3), 787-794. X-Ray images, Computer Tomography, Magnetic
Resonance Images.
Kahn-Jr, C.E. (1994). Artificial Intelligence in Radi-
ology: Decision Support Systems. RadioGraphics. Pattern Recognition: Research area that enclose
14(4), 849-861. the development of methods and automatized tech-
niques for identification and classification of samples
Kononenko, I. (2001). Machine Learning for Medical
in specific groups, in accordance with representative
Diagnosis: History, State of the Art and Perspective.
characteristics.
Artificial Intelligence in Medicine, 23, 89-109.
Radiological Diagnosis: Medical diagnosis based
Papadopoulos, A; Fotiadis, D. I.; Likas, A. (2005).
in analysis and interpretation of patterns observed in
Characterization of clustered microcalcifications in
medical images.
digitized mammograms using neural networks and
support vector machines. Artificial Intelligence in Self-Organizing Maps: Category of algorithms
Medicine. 34(2), 141-150. based on artificial neural networks that searches, by
means of self-organization, to create a map of char-
Theodoridis, S. & Koutroumbas, K. (2003). Pattern
acteristics that represents the involved samples in a
Recognition (2nd ed.). Amsterdam: Elsevier.
determined problem.
Thurfjell, E.L., Lernevall, K.A. & Taube, A.A.S.
(1994). Benefit of Independent Double Reading in a
Population-Based Mammography Screening Program.
Radiology. 191, 241-244.
Verikas, A. & Bacauskienes, M. (2003). Feature Selec-
tion with Neural Networks. Pattern Recognition Letters.
23(11), 1323-1335.
0
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Artificial Neural Networks and Cognitive Modelling
of the medium frequency verbs a dip followed by manipulation of meaningless, ungrounded symbols is
a gradual improvement that resembled the U-shaped exemplified by Searles Chinese Room thought-experi-
curve found in human performance. ment (1980). Connectionism, by contrast, offered an
approach that was based on learning, made little use
of symbols, and was related to the way in which the
THE STRENGTHS AND LIMITATIONS brain worked. Arguably, one of the main contributions
that connectionism has made to the study and under-
OF CONNECTIONIST COGNITIVE
standing of mind has been the development of a shared
MODELLING vocabulary between those interested in cognition, and
those interested in studying the brain.
The models outlined above exhibit five typical features
The second and third features relate to the way in
of connectionist models of cognition: (i) They provide
which artificial neural nets can both provide a model
an account that is related to and inspired by the opera-
of a cognitive process and simulate a task, and provide
tions of the brain; (ii) They can be used both to model
a good fit to the empirical data. In Cognitive Psychol-
mental processes, and to simulate the actual behaviour
ogy, the emphasis had been on building models that
involved; (iii) They can provide a good fit to the data
could account for the empirical results from human
from psychology experiments; (iv) The model, and its
subjects, but which did not incorporate simulations of
fit to the data, is achieved without explicit programming
experimental tasks. Alternatively, in Artificial Intel-
and (v) They often provide new accounts of the data.
ligence, models were developed that performed tasks
We discuss these features in turn.
in ways that resembled human behaviour, but which
First there is the idea that a connectionist cognitive
took little account of detailed psychological evidence.
model is inspired by, and related to, the way in which
However, as in the two models described here, con-
brains work. Connectionism is based on both the al-
nectionist models both simulated the performance of
leged operation of the nervous system and on distrib-
the human tasks, and were able to fit the data from
uted computation. Neuron-like units are connected by
psychological investigations.
means of weighted links, in a manner that resembles
The fourth feature is that of achieving the model and
the synaptic connections between neurons in the brain.
the fit to the data without explicit handwiring. It can
These weighted links capture the knowledge of the
be favourably contrasted to the symbolic programming
system; they may be arrived at either analytically or
methodology of Artificial Intelligence, where the model
by training the system with repeated presentations of
is programmed step by step, leaving room for ad hoc
input-output training examples. Much of the interest in
modifications and kludges. The fifth characteristic is the
connectionist models of cognition was that they offered
possibility of providing a novel explanation of the data.
a new account of the way in which knowledge was
In their model of word pronunciation, Seidenberg and
represented in the brain. For instance, the behaviour
McClelland showed that their artificial neural network
of the past tense learning model can be described in
provided an integrated (single mechanism) account of
terms of rule following but its underlying mechanism
data on both regular and exception words where pre-
does not contain any explicit rules. Knowledge about
viously the old cognitive modelling conventions had
the formation of the past tense is distributed across the
forced an explanation in terms of a dual route. Similarly,
weights in the network.
the past-tense model was formulated as a challenge to
Interest in brain-like computing was fuelled by a
rule-based accounts: although childrens performance
growing dissatisfaction with the classical symbolic
can be described in terms of rules, it was claimed that
processing approach to modelling mind and its rela-
the model showed that the same behaviour could be
tionship to the brain. Even though theories of symbol
accounted for by means of an underlying mechanism
manipulation could account for many aspects of hu-
that does not use explicit rules.
man cognition, there was concern about how such
In its glory days, connectionisms claims about
symbols might be learnt and represented in the brain.
novel explanations of stimulated much debate. There
Functionalism (Putnam, 1975) explicitly insisted that
was also much discussion of the extent to which con-
details about how intelligence and reasoning were actu-
nectionism could provide an adequate account of
ally implemented were irrelevant. Concern about the
higher mental processes. Fodor and Pylyshyn (1988)
Artificial Neural Networks and Cognitive Modelling
mounted an attack on the representational adequacy at the point where the easy targets have been
of connectionism. Connectionists retaliated, and in identified but the tougher problems remain. A
papers such as van Gelders (1990) the argument was Difficult challenges to be met include the idea of
made that not only could they provide an account of scaling up models to account for wider ranges of
the structure sensitive processes underlying human phenomena, and building models that can account
language, but that connectionism did so in a novel for more than one behaviour.
manner: the eliminative connectionist position. Greater understanding: As a result of our greater
Now the dust has subsided, connectionist models understanding of the operation and inherent
do not seem as radically different to other modelling limitations of artificial neural nets, some of their
approaches as was once supposed. It was held that attraction has faded with their mystery. They
one of their strengths was their ability to model mental have become part of the arsenal of statistical
processes, simulate behaviour, and provide a good fit methods for pattern recognition, and much recent
to data from psychology experiments without being research on artificial neural networks has focused
explicitly programmed to do so. However, there is more on questions about whether the best level
now greater awareness that decisions about factors such of generalisation has been efficiently achieved,
as the architecture of the net, the form its representa- than on modelling cognition.
tions will take, and even the interpretation of its input
and output, are tantamount to a form of indirect, or Also there is greater knowledge of the limitations of
extensional programming. artificial neural nets, such as the problem of the cata-
Controlling the content, and presentation of the strophic interference associated with backpropagation.
training sample, is an important aspect of extensional Backpropagation performs impressively when all of the
programming. When Pinker and Prince (1988) criticised training data are presented to the net on each training
the past tense model, an important element of their cycle, but its results are less impressive when such
criticisms was that the experimenters had unrealisti- training is carried out sequentially and a net is fully
cally tailored the environment to produce the required trained on one set of items before being trained on a
results, and that the results were an artifact of the train- new set. The newly learned information often interferes
ing data. Although the results indicated a U-shaped with, and overwrites, previously learned information.
curve in the rate of acquisition, as occurs with children, For instance, McCloskey and Cohen (1989) used back-
Pinker and Prince argued that this curve occurred only propagation to train a net on the arithmetic problem of
because the net was exposed to the verbs in an unreal- + 1 addition (e.g. 1+1, 2+1, , 9+1). They found that
istically structured order. Further research has largely when they proceeded to train the same net to add 2 to
answered these criticisms, but it remains the case that a given number, it forgot how to add 1. Sequential
selection of the input, and control of the way that it training of this form results in catastrophic interfer-
is presented to the net, affects what the net learns. A ence. Sharkey and Sharkey (1995) demonstrated that
similar argument can be made about the selection of it is possible to avoid the problem if the training set is
input representations. sufficiently representative of the underlying function, or
In summary: there has been debate about the novelty there are enough sequential training sets. In terms of this
of connectionism, and its ability to account for higher example, if the function to be learned is both + 1 and +
level cognitive processing. There is however general 2, then training sets that incorporate enough examples
acknowledgement that the approach made a lasting con- of each could lead to the net learning to add either 1 or
tribution by indicating how cognitive processes could 2 to a given number. However, this is at the expense
be implemented at the level of neurons. Despite this, of being able to discriminate between those items that
the connectionist approach to cognitive modelling is have been learned from those that have not.
no longer as popular as it once was. Possible reasons This example is related to another limitation of ar-
are considered below: tificial neural nets: their inability to extrapolate beyond
their training set. Although humans can readily grasp
Difficult challenges: A possible reason for the the idea of adding one to any given number, it is not so
diminished popularity of artificial neural nets is straightforward to train the net to extrapolate beyond
that as Elman (2005) suggests, we have arrived the data on which it is trained. It has been argued
Artificial Neural Networks and Cognitive Modelling
(Marcus, 1998) that this inability of artificial neural used as the basis for robotic controllers (e.g. Nolfi and
nets trained using backpropagation to generalise beyond Floreano, 2000). Such changes will ensure a future
their training space provides a major limitation to the for connectionist modeling, and stimulate a new set of
power of connectionist nets: an important one, since questions about the emergence of cognition in response
humans can readily generalise universal relationships to to an organisms interaction with the environment.
unfamiliar instances. Clearly there are certain aspects
of cognition, particularly those to do with higher level
human abilities, such as their reasoning and planning CONCLUSION
abilities, that are more difficult to capture within con-
nectionist models. In this article we have described two landmark con-
nectionist cognitive models, and considered their char-
Changing zeitgest: There is now an increased acteristic features. We outlined the debates over the
interest in more detailed modelling of brain func- novelty and sufficiency of connectionism for modelling
tion, and a concomitant dissatisfaction with the cognition, and argued that in some respects the approach
simplicity of cognitive models that often consisted shares features with the modelling approaches that
of a small number of neurons connected in three preceded it. Reasons for a gradual waning of interest
rows (Hawkins, 2004). Similarly, there is greater in connectionism were identified, and possible futures
impatience with the emphasis in connectionism were discussed. Connectionism has had a strong impact
on the biologically implausible backpropaga- on cognitive modelling, and although its relationship to
tion learning algorithm. At the same time, there the brain is no longer seen as a strong one, it provided
is greater awareness of the role the body plays an indication of the way in which cognitive processes
in cognition, and the relationships between the could be accounted for in the brain. It is argued here
body, the brain, and the environment (e.g. Clark, that although the approach is no longer ubiquitous, it
1999). Traditional connectionist models do not will continue to form an important component of future
fit easily with the new emphasis on embodied cognitive models, as they take account of the interactions
cognition (e.g. Pfeifer and Scheier, 1999). between thought, brains and the environment.
One expected future trend will be to focus on the difficult Clark, A. (1997) Being there: Putting brain, body and
challenges. It is likely that investigations will explore world together again. Cambridge, MA: MIT Press
how models of isolated processes, such as learning the
past tense of verbs, might fit into more general accounts Elman, J.L.. (2005) Connectionist models of cogni-
of language learning, and of cognition. There are fur- tive development: Where next? Trends in Cognitive
ther questions to be addressed about the developmental Sciences, 9, 111-11
origin of many aspects of cognition, and the origin and Fodor, J.A. & Pylyshyn, Z. (1988) Connectionism and
progression of developmental disorders. cognitive architecture: A critical analysis. Cognition,
Connectionist cognitive modelling is likely to 28, 3-71.
change in response to the new zeitgest. A likely scenario
is that artificial neural nets will continue to be used for Hawkins, J. (2004) On intelligence. New York: Henry
cognitive modeling, but not exclusively as they were Holt and Company, Owl Books.
before. They will continue to form part of hybrid ap- Marcus, G.F. (1998) Rethinking eliminative connec-
proaches to cognition (Sun, 2003), in combination with tionism. Cognitive Psychology, 37, 243-282.
symbolic methods. Similarly, artificial neural nets can
be used in combination with evolutionary algorithms McClelland, J.L. & Rumelhart, D.E. & the PDP Re-
to form the basis of adaptive responses to the environ- search Group Parallel Distributed Processing Vol 2:
ment. Rather than training artificial neural nets, they Psychological and Biological Models. Cambridge,
can be adapted by means of evolutionary methods, and MA: MIT Press. (1986)
Artificial Neural Networks and Cognitive Modelling
McCloskey, M. & Cohen, N.J. (1989) Catastrophic van Gelder, T. (1990) Compositionality: A Connection-
interference in connectionist networks: The sequential ist variation on a classical theme. Cognitive Science, A
learning problem. The Psychology of Learning and 14, pp 355-364.
Motivation, Vol 24, 109-165.
Nolfi, S. and Floreano, D. (2004) Evolutionary robotics: KEy TERMS
The biology, intelligence and technology of self-orga-
nising machines. Cambridge, MA: MIT Press Chinese Room: In Searles thought experiment,
he asks us to imagine a man sitting in a room with a
Pinker, S. & Prince,A. (1988) On language and con-
number of rule books. A set of symbols is passed into
nectionism: Analysis of a parallel distributed processing
the room. The man processes the symbols according
model of language acquisition. In Connections and
to the rule books, and passes a new set of symbols
symbols (S., Pinker and J. Mehler, Eds), Cambridge,
out of the room. The symbols posted into the room
MA: Bradford/MIT Press pp 73-194
correspond to a Chinese question, and the symbols he
Pfeifer, R. and Scheier, C. (1999) Understanding intel- passes out are the answer to the question, in Chinese.
ligence. Cambridge, MA: MIT Press However, the man following the rules has no knowl-
edge of Chinese. The example suggests a computer
Putnam, H. (1975) Philosophy and our mental life. In
program could similarly follow rules in order to answer
H. Putnam (Ed.) Mind, language and reality: Philo-
a question without any understanding.
sophical papers (vol 2) Cambridge, UK: Cambridge
University Press, pp 48-73 Classical Symbol Processing: The classical view
of cognition was that it was analogous to symbolic
Rumelhart, D.E. and McClelland, J.L. (1986) On
computation in digital computers. Information is repre-
learning the past tenses of English verbs. In Parallel
sented as strings of symbols, and cognitive processing
Distributed Processing: Explorations in the Microstruc-
involves the manipulation of these strings by means
ture of Cognition, Vol 2. Psychological and Biological
of a set of rules. Under this view, the details of how
Models. (Rumelhart, D.E. and McClelland, J.L., eds)
such computation is implemented are not considered
pp 216-271, MIT Press
important.
Seidenberg, M.S. and McClelland, J.L. (1989) A distrib-
Connectionism: Connectionism is the term used to
uted developmental model of visual word recognition
describe the application of artificial neural networks to
and naming. Psychological Review, 96, 523-568.
the study of mind. In connectionist accounts, knowledge
Searle, J.R. (1980) Minds, brains and programs. Be- is represented in the strength of connections between
havioural and Brain Sciences, 3: 417-424. Reprinted a set of artificial neurons.
in J. Haugeland. Ed. Mind design. Montgomery, VT:
Eliminative Connectionism: The eliminative
Bradford Books, 1981.
connectionist is concerned to provide an account of
Sharkey, N.E. & Sharkey, A.J.C. (1995) An Analysis cognition that eschews symbols, and operates at the
of Catastrophic Interference, Connection Science, 7, subsymbolic level. For instance, the concept of dog
3/4, 313-341. could be captured in a distributed representation as a
number of input features (e.g. four-footed, furry, barks
Sharkey, A.J.C. and Sharkey, N. (2003) Cognitive etc) and would then exist in the net in the form of the
Modelling: Psychology and Connectionism. In M.A. weighted links between its neuron like units.
Arbib (Ed) The Handbook of Brain Theory and Neural
Networks, A Bradford Book: MIT Press Generalisation: Artificial neural networks, once
trained, are able to generalise beyond the items on
Sun, R. (2003) Hybrid Connectionist/Symbolic Sys- which they were trained and to produce a similar
tems. In M.A. Arbib (Ed) The Handbook of Brain output in response to inputs that are similar to those
Theory and Neural Networks, A Bradford Book: MIT encountered in training
Press pp 543-547
Implementational Connectionism: In this less
extreme version of connectionism, the goal is to find
Artificial Neural Networks and Cognitive Modelling
INTRODUCTION BACKGROUND
More than 50 years ago connectionist systems (CSs) The ANNs or CSs emulate the biological neural net-
were created with the purpose to process information works in that they do not require the programming of
in the computers like the human brain (McCulloch tasks but generalise and learn from experience. Current
& Pitts, 1943). Since that time these systems have ANNs are composed by a set of very simple processing
advanced considerably and nowadays they allow us to elements (PEs) that emulate the biological neurons and
resolve complex problems in many disciplines (clas- by a certain number of connections between them.
sification, clustering, regression, etc.). But this advance Until now, researchers that pretend to emulate the
is not enough. There are still a lot of limitations when brain, have tried to represent in ANNs the importance
these systems are used (Dorado, 1999). Mostly the the neurons have in the Nervous System (NS). How-
improvements were obtained following two different ever, during the last decades research has advanced
ways. Many researchers have preferred the construc- remarkably in the Neuroscience field, and increasingly
tion of artificial neural networks (ANNs) based in complex neural circuits, as well as the Glial System
mathematic models with diverse equations which lead (GS), are being observed closely. The importance of
its functioning (Cortes & Vapnik, 1995; Haykin, 1999). the functions of the GS leads researchers to think that
Otherwise other researchers have pretended the most their participation in the processing of information in the
possibly to make alike these systems to human brain NS is much more relevant than previously assumed. In
(Rabual, 1999; Porto, 2004). that case, it may be useful to integrate into the artificial
The systems included in this article have emerged models other elements that are not neurons.
following the second way of investigation. CSs which Since the late 80s, the application of innovative and
pretend to imitate the neuroglial nets of the brain are carefully developed cellular and physiological tech-
introduced. These systems are named Artificial Neu- niques (such as patch-clamp, fluorescent ion-sensible
roGlial Networks (ANGNs) (Porto, 2004). These CSs images, confocal microscopy and molecular biology)
are not only made of neuron, but also from elements to glial studies has defied the classic idea that astro-
which imitate glial neurons named astrocytes (Araque, cytes merely provide a structural and trophic support
1999). These systems, which have hybrid training, have to neurons and suggests that these elements play more
demonstrated efficacy when resolving classification active roles in the physiology of the Central Nervous
problems with totally connected feed-forward multi- System.
layer networks, without backpropagation and lateral New discoveries are now unveiling that the glia
connections. is intimately linked to the active control of neural
activity and takes part in the regulation of synaptic
neurotransmission (Perea & Araque, 2007). Abundant
evidence has suggested the existence of bidirectional
communication between astrocytes and neurons, and
the important active role of the astrocytes in the NSs
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Artificial NeuroGlial Networks
physiology (Araque et al., 2001; Perea & Araque, 2005). MAIN FOCUS OF THE ARTICLE
This evidence has led to the proposal of a new concept
in synaptic physiology, the tripartite synapse, which All the design possibilities, for the architecture as well
consists of three functional elements: the presynaptic as for the training process of an ANN, are basically
and postsynaptic elements and the surrounding astro- oriented towards minimising the error level or reducing
cytes (Araque et al., 1999). The communication between the systems learning time. As such, it is in the optimi-
these three elements has highly complex characteristics, sation process of a mechanism, in case the ANN, that
which seem to reflect more reliably the complexity of we must find the solution for the many parameters of
the information processing between the elements of the elements and the connections between them.
the NS (Martin & Araque, 2005). Considering possible future improvements that
So there is no question about the existence of com- optimize an ANN with respect to minimal error and
munication between astrocytes and neurons (Perea & minimal training time, our models will be the brain
Araque, 2002). In order to understand the motives of circuits, in which the participation of elements of the
this reciprocated signalling, we must know the differ- GS is crucial to process the information. In order to
ences and similarities that exist between their proper- design the integration of these elements into the ANN
ties. Only a decade ago, it would have been absurd and elaborate a learning method for the resulting
to suggest that these two cell types have very similar ANGN that allows us to check whether there is an
functions; now we realise that the similarities are improvement in these systems, we have analysed the
striking from the perspective of chemical signalling. main existing training methods that will be used for
Both cell types receive chemical inputs that have an the elaboration. We have analysed Non-Supervised
impact on the ionotropic and metabotropic receptors. and Supervised Training methods, and other methods
Following this integration, both cell types send signals that use or combine some of their characteristics and
to their neighbours through the release of chemical complete the analysis: Training by Reinforcement,
transmittors. Both the neuron-to-neuron signalling Hybrid Training and Evolutionary Training.
and the neuron-to-astrocyte signalling show plastic
properties that depend on the activity (Pasti et al., Observed Limitations
1997). The main difference between astrocytes and
neurons is that many neurons extend their axons over Several experiments with ANNs have shown the exis-
large distances and conduct action potentials of short tence of conflicts between the functioning of the CS and
duration at high speed, whereas the astrocytes do not biological neuron networks, due to the use of methods
exhibit any electric excitability but conduct calcium that did not reflect reality. For instance, in the case of a
spikes of long duration (tens of seconds) over short multilayer perceptron, which is a simple CS, the synaptic
distances and at low speed. The fast signalling, and the connections between the PEs have weights that can be
input/output functions in the central NS that require excitatory or inhibitory, whereas in the natural NS, are
speed, seem to belong to the neural domain. But what the neurons that seem to represent these functions, not
happens with slower events, such as the induction of the connections; recent research (Perea & Araque, 2007)
memories, and other abstract processes such as thought indicates that the cells of the GS, more concretely the
processes? Does the signalling between astrocytes con- astrocytes, also play an important role.
tribute to their control? As long as there is no answer Another limitation concerns the learning algorithm
to these questions, research must continue; the present known as Backpropagation, which implies that the
work offers new ways to advance through the use of change of the connections value requires the back-
Artificial Intelligence (AI) techniques. wards transmission of the error signal in the ANN.
Therefore not only it is pretended to improve the It was traditionally assumed that this behaviour was
CSs incorporating elements imitating astrocytes, but it impossible in a natural neuron, which, according to
is also intended to benefit Neuroscience with the study the dynamic polarisation theory of Ramn y Cajal
of brain circuits since other point of view, the AI. (1911), is unable to efficiently transmit information
The most recent works in this area are presented inversely through the axon until reaching the cellu-
by Porto et al (Porto et al., 2007; Porto et al., 2005; lar soma; new research however has discovered that
Porto, 2004). neurons can send information to presynaptic neurons
Artificial NeuroGlial Networks
under certain conditions, either by means of existing We present ANGNs (figure 1) that include both
mechanisms in the dendrites or else through various artificial neurons and processing control elements A
interventions of glial cells such as astrocytes. that represent the astrocytes, and whose functioning
If the learning is supervised, it implies the existence follows the steps that were successfully applied in the
of an instructor, which in the context of the brain construction and use of CS: design, training, testing
means a set of neurons that behave differently from and execution.
the rest in order to guide the process. At present, the Also, since the computational studies of the learn-
existence of this type of neurons is biologically inde- ing with ANNs are beginning to converge towards
monstrable, but the GS seems to be strongly implied in evolutionary computation methods (Dorado, 1999), we
this orientation and may be the element that configures will combine the optimisation in the modification of
an instructor that until now had not been considered. the weights (according to the results of the biological
It is in this context that the present study analyses models) with the use of Genetic Algorithms (GAs) in
to what extent the latest discoveries in Neuroscience order to find the best solution for a given problem. This
(Araque et al., 2001; Perea & Araque, 2002) contribute evolutionary technique was found to be very efficient in
to these networks: discoveries that proceed from cere- the training phase of the CS (Rabual, 1998), because it
bral activity in areas that are believed to be involved helps to adapt the CS to the optimal solution according
in the learning and processing of information (Porto to the inputs that enter the system and the outputs that
et al., 2007). must be produced by the system. This adaptation phe-
nomenon takes place in the brain thanks to the plasticity
Artificial Neuroglial Networks of its elements and may be partly controlled by the GS;
it is for this reason that we consider the GA as a part
Many researchers have used the current potential of of the artificial glia. The result of this combination
computers and the efficiency of computational models to is a hybrid learning method (Porto, 2004).
elaborate biological computational models and reach The design of the ANGNs is oriented towards clas-
a better understanding of the structure and behaviour sification problems that are solved by means of simple
of both pyramidal neurons, which are believed to be networks, i.e. multilayer networks, although future
involved in learning and memory processes (LeRay research may lead to the design of models in more
et al., 2004, Fernndez et al., 2007), and astrocytes complex networks. It seems a logical approach to start
(Porto, 2004; Perea & Araque, 2002). These models the design of these new models with simple ANNs, and
have provided a better understanding of the causes and to orientate the latest discoveries on astrocytes and
factors that are involved in the specific functioning of pyramidal neurons in information processing towards
biological circuits. The present work will use these new their use in classification networks, since the control
insights to progress in the field of Computer Sciences of the reinforcement or weakening of the connections
and more concretely in AI. in the brain is related to the adaptation or plasticity
of the connections, which lead to the generation of
activation ways. This process can therefore improve
Figure 1. Artificial NeuroGlial network scheme the classification of the patterns and their recognition
by the ANGN.
A detailed description of the functioning of the
N e u ro n ANGNs and results with these systems can be found
N e u ro n
in Porto et al (Porto, 2004; Porto et al., 2005; Porto
N e u ro n et al., 2007).
N e u ro n N e u ro n
N e u ro n
N e u ro n
FUTURE TRENDS
A s tro c y te
N e u ro n
We keep on analysing other synaptic modification
A s tro c y te possibilities based on brain behaviour to apply them
A s tro c y te
Artificial NeuroGlial Networks
to new CSs which can solve simple problems with difficult experiments carried out in laboratories, as well
simple architectures. as providing new ideas for research.
Moreover, given that it has been proved that the glia
acts upon complex brain circuits, and that the more an
individuals brain has developed, the more glia he has REFERENCES
in his nervous system (following what Cajal said one
hundred years ago (Ramn y Cajal, 1911), we are ap- Araque, A., Prpura, V., Sanzgiri, R., & Haydon, P. G.
plying the observed brain behaviour to more complex (1999). Tripartite synapses: glia, the unacknowledged
network architectures. Particularly after having checked partner. Trends in Neuroscience, 22(5).
that a more complex network architecture achieved
better results in the problem presented here. Araque, A., Carmignoto, G., & Haydon, P. G. (2001):
For the same reason, we intend to analyse how the Dynamic Signaling Between Astrocytes and Neurons.
new CSs solve complex problems, for instance time Annu. Rev. Physiol, 63, 795-813.
processing ones where totally or partially recurrent Cortes, C., & Vapnik, V. (1995). Support-Vector Net-
networks would play a role. These networks could works. Machine Learning, 20.
combine their functioning with this new behaviour.
Dorado, J. (1999). Modelo de un Sistema para la Se-
leccin Automtica en Dominios Complejos, con una
CONCLUSION Estrategia Cooperativa, de Conjuntos de Entrenamiento
y Arquitecturas Ideales de Redes de Neuronas Artificia-
This article presents CSs composed by artificial neu- les Utilizando Algoritmos Genticos. Tesis Doctoral.
rons and artificial glial cells. The design of artificial Facultad de Informtica. Universidade da Corua.
models did not aim at obtaining a perfect copy of the Fernndez, D., Fuenzalida, M., Porto, A., & Buo, W.
natural model but a series of behaviours whose final (2007). Selective shunting of NMDA EPSP component
functioning is approached to it as much as possible. by the slow after hyperpolarization regulates synaptic
Nevertheless, a close similarity between both is indis- integration at apical dendrites of CA1 pyramidal neu-
pensable to improve the output, and may result in more rons. Journal of Neurophysiology, 97, 3242-3255.
intelligent behaviours.
The synaptic modifications introduced in the CSs, Haykin, S. (1999). Neural Networks. 2nd Edition,
and based on the modelled brain processes enhance Prentice Hall.
the training of multilayer architectures. LeRay, D., Fernndez, D., Porto, A., Fuenzalida, M., &
We must remember that the innovation of the ex- Buo, W. (2004). Heterosynaptic Metaplastic Regula-
isting ANNs models towards the development of new tion of Synaptic Efficacy in CA1 Pyramidal Neurons of
architectures is conditioned by the need to integrate the Rat Hippocampus. Hippocampus 14, 1011-1025.
new parameters into the learning algorithms so that they
can adjust their values. New parameters, that provide Martn, E.D., & Araque, A. (2005). Astrocytes and
the process element models of the ANNs with new The Biological Neural Networks. Artificial Neural
functionalities, are harder to come by than optimizations Networks in Real-Life Applications. Hershey PA, USA:
of the most frequently used algorithms that increase the Idea Group Inc.
calculations and basically work on the computational McCulloch, W.S., & Pitts, W. (1943). A Logical Cal-
side of the algorithm. The ANGNs integrate new ele- culus of Ideas Immanent in Nervous Activity. Bulletin
ments and thanks to a hybrid method this approach did of Mathematical Biophysics, 5, 115-133.
not complicate the training process.
The research with these ANGNs benefits AI because Pasti, L., Volterra, A., Pozzan, R., & Carmignoto, G.
it can improve information processing capabilities (1997). Intracellular calcium oscillations in astrocytes:
which would allow us to deal with a wider range of a highly plastic, bidirectional form of communication
problems. Moreover, this has indirectly benefited between neurons and astrocytes in situ. Journal of
Neuroscience since experiments with computational Neuroscience, 17, 7817-7830.
models that simulate brain circuits pave the way for
0
Artificial NeuroGlial Networks
Perea, G., & Araque, A. (2002). Communication betwen recently their crucial role in the information process-
astrocytes and neurons: a complex language. Journal ing was discovered. A
of Physiology. Paris: Elsevier Science.
Backpropagation Algorithm: A supervised
Perea, G., & Araque, A. (2005). Properties of synapti- learning technique used for training ANNs, based on
cally evoked astrocyte calcium signal reveal synaptic minimising the error obtained from the comparison
information processing by astrocytes. The Journal of between the outputs that the network gives after the
Neuroscience, 25(9), 2192-2203. application of a set of network inputs and the outputs
it should give (the desired outputs).
Perea, G., & Araque, A. (2007). Astrocytes Potentiate
Transmitter Release at Single Hippocampal Synapses. Evolutionary Computation: Solution approach
Science, 317(5841), 1083 1086. guided by biological evolution, which begins with
potential solution models, then iteratively applies al-
Porto, A. (2004). Computational Models for optimiz-
gorithms to find the fittest models from the set to serve
ing the Learning and the Information Processing in
as inputs to the next iteration, ultimately leading to a
Adaptive Systems, Ph.D. Thesis, Faculty of Computer
model that best represents the data.
Science, University of A Corua.
Genetic Algorithms: Genetic algorithms (GAs) are
Porto, A., Araque, A., & Pazos, A. (2005). Artificial
adaptive heuristic search algorithm premised on the
Neural Networks based on Brain Circuits Behaviour
evolutionary ideas of natural selection and genetic. The
and Genetic Algorithms. LNCS. 3512, 99-106.
basic concept of GAs is designed to simulate processes
Porto, A., Araque, A., Rabual, J., Dorado, J., & Pazos, in natural system necessary for evolution, specifically
A. (2007). A New Hybrid Evolutionary Mechanism those that follow the principles first laid down by Charles
Based on Unsupervised Learning for Connectionist Darwin of survival of the fittest. As such they represent
Systems. Neurocomputing, 70(16-18), 2799-2808. an intelligent exploitation of a random search within a
defined search space to solve a problem.
Rabual, J. (1998). Entrenamiento de Redes de Neu-
ronas Artificiales con Algoritmos Genticos. Tesis de Glial Sytem: Commonly called glia (greek for
Licenciatura. Dep. Computacin. Facultad de Infor- glue), are non-neuronal cells that provide support
mtica. Universidade da Corua. and nutrition, maintain homeostasis, form myelin,
and participate in signal transmission in the nervous
Ramn y Cajal, S. (1911). Histologie du sisteme nerveux system. In the human brain, glia cells are estimated to
de I`homme et des vertebres. Maloine, Paris. outnumber neurons by about 10 to 1.
Hybrid Training: :earning method that combines
KEy TERMS the supervised and unsupervised training of Connec-
tionist Systems.
Artificial Neural Network: A network of many
simple processors (units or neurons) that imitates Synapse: Specialized junctions through which
a biological neural network. The units are connected the cells of the nervous system signal to each other
by unidirectional communication channels, which and to non-neuronal cells such as those in muscles or
carry numeric data. Neural networks can be trained glands.
to find nonlinear relationships in data, and are used
in applications such as robotics, speech recognition,
signal processing or medical diagnosis.
Astrocytes: Astrocytes are a sub-type of the glial
cells in the brain. They perform many functions, in-
cluding the formation of the blood-brain barrier, the
provision of nutrients to the nervous tissue, and play a
principal role in the repair and scarring process in the
brain. They modulate the synaptic transmission and
Sarabjeet Kochhar
University of Delhi, India
Association rule mining, one of the fundamental tech- The problem of association rule mining (ARM) was
niques of data mining, aims to extract interesting cor- introduced by Agrawal et al. (1993). Large databases of
relations, frequent patterns or causal structures among retail transactions called the market basket databases,
sets of items in data. which accumulate in departmental stores provided
An association rule is of the form X Y and indi- the motivation of ARM. The basket corresponds to a
cates that the presence of items in the antecedent of rule physical retail transaction in a departmental store and
(X) implies the presence of items in the consequent of consists of the set of items a customer buys. These
rule (Y). For example, the rule {PC, Color Printer} transactions are recorded in a database called the
{computer table} implies that people who purchase a transaction database. The goal is to analyze the buying
PC (personal computer) and a color printer also tend to habits of customers by finding associations between the
purchase a computer table. These associations, however, different items that customers place in their shopping
are not based on the inherent characteristics of a domain baskets. The discovered association rules can also be
(as in a functional dependency) but on the co-occur- used by management to increase the effectiveness of
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Association Rule Mining
Association Rule Mining
Figure 2(a). Example of database transactions The second criterion of interestingness is the strength
of the rule. A rule which has confidence greater than the
TID Items
user specified minimum confidence threshold (minconf)
10 A, B,C is interesting to the user.
20 A, C
Confidence, however, can sometimes be misleading.
For instance, the confidence of a rule can be high even if
30 A, C, D antecedent and consequent of the rule are independent.
40 B, C, E Lift (also called Interest) and Conviction of a rule are
other commonly used measures for rule interestingness
50 A, C, E
(Dunham, 2002). A suitable measure of rule strength
needs to be identified for an application.
Figure 2(b). Itemsets of size one and two MINING OF ASSOCIATION RULES
Itemset AC
Since an association rule is an implication among
itemsets, the brute-force approach to association rule
generation requires examining relationships between
all possible item-sets. Such a process would typically
Item A Item C involve counting the co-occurrence of all itemsets in
D and subsequently generating rules from them.
For n items, there are 2n possible itemsets that need
to be counted. This may require not only prohibitive
amount of memory, but also complex indexing of
Figure 2(c). Computation of support and confidence counters. Since the user is interested only in frequent
rules, one needs to count only the frequent itemsets
Sup(A): 4 (80%), before generating association rules. Thus, the problem
Supp(C): 5 (100%) of mining association rules is decomposed into two
Sup (AB): 1 (20%)
phases:
Sup(AC): 4 (80%),
Sup (ABC): 1 (20%),
Sup (ABCD): 0 (0%) Phase I: Discovery of frequent itemsets
Sup(ABCDE): 0 (0%)
Phase II: Generation of rules from the frequent itemsets
Confidence (A C) = 4/4 (100%) discovered in phase I.
Confidence (C A) = 4/5 (80%)
Various algorithms for ARM differ in their ap-
proaches to optimize the time and storage requirement
With n items in I, the total number of possible for counting in the first phase. Despite reducing the
association rules is very large (O(3n)). However, the search space by imposing minsup constraint, frequent
majority of these rules (associations) existing in D are itemset generation is still a computationally expensive
not interesting for the user. Interestingness measures process. The Anti-monotone property of support is an
are employed to reduce the number of rules discovered important tool to further reduce the search space and
by the algorithms. Foremost criterion of interestingness is used in most association rule algorithms (Agrawal
is the high prevalence of both the item-sets and rules, et al., 1994). According to this property, support of an
which is specified by the user as minimum support itemset never exceeds the support of any of its sub-
value. An itemset (rule) whose support is greater than sets i.e. if X is an itemset, then for each of its subsets
or equal to a user specified minimum support (minsup) Y, sup(X) <= sup(Y). This property makes the set of
threshold is called a Frequent Itemset (rule). frequent itemsets downward closed. The task of rule
Association Rule Mining
Association Rule Mining
Food
(Support = 48%)
Bread Beverages
(Support = 14%) (Support = 34%)
Pepsi Coke
(Support = 6%) (Support = 12%)
3. Multidimensional Association Rules: Associa- data, limited main memory, continual updating
tion rules essentially expose intra-record linkages. of data and online processing so as to be able to
If links exist between different values of the same make decisions on the fly (Cheng et al. 2007).
attribute (dimension), the association is called 5. Association Rule Mining with Multiple Mini-
single-dimensional association rule, while if link- mum Supports: In large departmental stores
ages span multiple dimensions, the associations where the number of items is very large, using
are called multi-dimensional association rules single support threshold sometimes does not yield
(Kamber et al., 1997). For example the following interesting rules. Since buying trends for different
single dimensional rule contains conjuncts in both items often vary vastly, the use of different support
antecedent and consequent from a single attribute thresholds for different items is recommended
buys. (Liu et al., 1999).
6. Negative Association Rules: Typical associa-
Buys (X, milk) and Buys(X, butter) Buys(X, tion rules discover correlations between items
bread) that are bought during the transactions and are
called positive association rules. Negative as-
Multidimensional associations contain two or sociation rules discover implications in all the
more predicates, for example items, irrespective of whether they were bought
or not. Negative association rules are useful in
Age (X,19-25) and Occupation(X, student) market-basket analysis to identify products that
Buys(X, coke) conflict with each other or complement each
other. The importance of negative association
4. Association Rules in Streaming Data: The rules was highlighted in (Brin et al., 1997; Wu
aforementioned types of rules pertain to static et al. 2004).
data repositories. Mining rules from data streams,
however, is far more complex (Hidber, 1999).
The field is relatively recent and poses additional
challenges such as one-look characteristics of
Association Rule Mining
Association Rule Mining
Savasere A., Omiecinski E., & Navathe S. (1995), An Cheng J., Ke Y., Ng W. (2007), A Survey on Algorithms
efficient algorithm for mining association rules in large for Mining Frequent Itemsets Over Data Streams, in
databases. In Proceedings of the 21st VLDB Confer- international journal of Knowledge and Information
ence, pages 432-443, Zurich, Switzerland. Systems.
Srikant R. & Agrawal, R. (1996), Mining quantitative Wu X., Zhang C., and Zhang S. (2004), Efficient mining
association rules in large relational tables, Proceedings of both positive and negative association rules. ACM
of the ACM SIGMOD Conference on Management of Trans. Inf. Syst. 22, 3 (Jul. 2004), 381-405.
Data, Montreal, Canada.
Srikant R. & Agrawal R (1995), Mining Generalized KEy TERMS
Association Rules, Proceedings of. the 21st Intl Confer-
ence on Very Large Databases, Zurich, Switzerland. Association Rule: An implication of the form X
Y, in a transactional data base with parameters support
Brin S., Motwani R. & Silverstein C (1997), Beyond
(s) and confidence (c). X and Y are set of items, s is
market basket: Generalizing association rules to cor-
the fraction of transactions containing XY and c%
relations, Proceedings of SIGMOD.
of transactions containing X also contain Y.
Dunham M.(2002), Data Mining: Introductory and
Classification: Data mining technique that con-
Advanced Topics, Prentice Hall.
structs a model (classifier) from historical data (train-
Brin S., Motwani R., Ullman J. D., & Tsur S. (1997), ing data) and uses it to predict the category of unseen
Dynamic itemset counting and implication rules for tuples.
market basket data. SIGMOD Record (ACM Special
Clustering: Data mining technique to partition data
Interest Group on Management of Data), 26(2):255.
objects into a set of groups such that the intra group
Lin D. & Kedem Z. M. (1998), Pincer search: A new similarity is maximized and inter group similarity is
algorithm for discovering the maximum frequent sets, minimized.
Proceedings of the 6th Intl Conference on Extending
Data Mining: Extraction of interesting, non-trivial,
Database Technology (EDBT), Valencia, Spain.
implicit, previously unknown and potentially useful
Zaki M. J. & Ogihara M. (1998), Theoretical founda- information or patterns from data in large databases.
tions of association rules, Proceedings of 3rd SIGMOD
Data Stream: A continuous flow of data from a
Workshop on Research Issues in Data Mining and
data source, e.g., a sensor, stock ticker, monitoring
Knowledge Discovery (DMKD98), Seattle, Wash-
device, etc. A data stream is characterized by its un-
ington, USA.
bounded size.
Han J. & Kamber M. (2006), Data Mining: Concepts and
Descriptive Mining Technique: Data mining tech-
Techniques, 2nd ed. Morgan Kaufmann Publishers.
nique that induces a model describing the characteristics
Han J. & Fu Y. (1999), Mining Multiple-Level Asso- of data. These techniques are usually unsupervised and
ciation Rules in Large Databases. IEEE Trans. Knowl. totally data driven.
Data Eng. 11(5): 798-804.
Predictive Mining Technique: Data mining
Kamber M., Han J. & Chiang J., (1997), Metarule- technique that induces a model from historical data in
Guided Mining of Multi-Dimensional Association supervised manner, and uses the model to predict some
Rules Using Data Cubes. KDD 1997: 207-210. characteristic of new data.
Liu B., Hsu W., Ma Y., (1999) Mining Association Rules
with Multiple Minimum Supports, Proceedings of the
ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, San Diego, CA, USA.
Automated Cryptanalysis A
Otokar Groek
Slovak University of Technology, Slovakia
Pavol Zajac
Slovak University of Technology, Slovakia
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Automated Cryptanalysis
Exhibit A.
Table 1. Performance of the three-layer decryption of a table-transposition cipher using a brute-force search.
First filter was negative predicate-based, removing all decrypts with first 4 letters not forming a valid n-gram
(about 90 % of texts were removed). Score was then computed as the count of valid tetragrams in the whole text.
If this count was lower then given threshold (12), then the text was removed in the score-based filter. Finally,
remaining texts were scored using the dictionary words.
Algorithm integrates the three layers of plaintext three-layers is the most flexible, and more layers can
recognition, namely negative test predicate, fast scor- even lead to performance decrease. Experimental results
ing function and precise scoring function, as a three- are shown in Table 1.
layer filter. The final scoring function is also used to
sort the outputs. First filter should be very fast, with Negative Filtering
very low error probability. Fast score should be easy
to compute, but it is not required to precisely identify The goal of the negative test predicate is to identify
the correct plaintext. Correct plaintext recognition is candidate texts that are NOT plaintext (with very high
the role of precise scoring function. In the algorithm, probability, ideally with certainty). People can clearly
the best score is the highest one. If the score is com- recognize the wrong text just by looking at it. It is in the
puted in the opposite meaning, the algorithm must be area of artificial intelligence to implement this ability in
rewritten accordingly. computers. However, most nowadays AI methods (e.g.
In some cases, we can integrate a fast scoring func- neural networks) seem to be too slow, to be applicable
tion within the negative test or with the precise scoring, in this stage of a brute-force algorithm, as every text
leading to two-layer filters, as in (Zajac, 2006a). It must be evaluated with this predicate.
is also possible to use even more steps of predicate- Most of the methods for fast negative text filter-
based and score-based filtering, respectively. However, ing are based on prohibited n-grams. As an n-gram
experiments show that the proposed architecture of
0
Automated Cryptanalysis
we would only consider a sequence of n consecutive two scoring functions: one that is fast but less precise,
letters. If the alphabet size is N, then it is possible to with lower threshold value, and one that is very precise, A
n
create N possible n-grams. For higher n, only a small but harder to compute.
fraction of them can appear in valid text in a given An example of scoring function distributions can be
language (Zajac, 2006b). By using a lexical tree or found in Figures 1 and 2. Scoring function in Figure 1 is
lookup table, it is easy (and fast) to verify, whether a much more precise than in Figure 2, but computational
given n-gram is valid or not. Thus a natural test is to time required for evaluation is doubled. Moreover, scor-
check every n-gram in the text, whether it is valid or ing function in Figure 1 was created from a reduced
not. There are two basic problems arising with this ap- dictionary fitted to a given ciphertext. Evaluation based
proach the real plaintext can contain (intentionally) on a complete dictionary is slower, more difficult to
misspelled, uncommon or foreign words, and thus implement, and can even be less precise.
our n-gram database can be incomplete. We can limit Scoring functions can be based on dictionary words,
our test to some specific patterns, e.g. too long run of n-grams statistics, or other specific statistics. It is
consecutive vowels/consonants. These patterns can be difficult to provide a one-fits-all scoring function, as
checked in time dependent on the plaintext candidate decryption process for different cipher types has impact
length. A filter can also be based on checking only a on actual scoring function results. E.g. when trying
few n-grams on a fixed or random position in the text, to decrypt a transposition cipher, we already know
e.g. the first four letters. which letters appear with which frequency, and thus
The rule for rejecting texts should be based on letter frequency based statistics do not play any role in
the exact type of the cipher we are trying to decipher. scoring. On the other hand they are quite significant for
For example, if the first four letters of the decrypted substitution ciphers. Most common universal scoring
plaintext does not depend on some part of the key, the functions are (see also Ganesan & Sherman, 1993):
filter based only on their validity would not be effec-
tive. An interesting question is, whether it is possible 1. Number of dictionary words in the text / frac-
to create a system which can effectively learn its filter tion of meaningful text
rules from existing decrypted texts, even in the process Scoring based on dictionary words is very precise,
of decryption. if we have a large enough dictionary. Even if not
every word in the hidden message is in our dic-
Scoring Functions tionary, it is very improbable that some incorrect
decryption contains some larger fraction of a text
With the negative filter step we can eliminate around composed of dictionary words. Removing short
90% of candidate texts or more. The number of texts to words from dictionary can increase the precision.
be verified is still very huge, and thus we need to apply Another possibility is to use weights based on word
more precise methods of plaintext recognition. We use length as in (Russell, Clark & Stepney, 2003).
a scoring function that assigns a quantity score to Dictionary words can be found using lexical trees.
every text that has survived elimination in previous In some languages, we should use dictionary of
steps. Here the higher score means higher likeness that word stems, instead of the whole words. Speed
a given text is a valid plaintext. For each scoring func- of evaluation depends on the length of the text
tion we can assign a threshold, and it should be very and the average length of words in dictionary,
improbable that a valid plaintext have score under this respectively.
threshold. Actual threshold value can either be found 2. Rank distance of n-grams
experimentally (by evaluating large number of real Rank of the n-gram is its position depending on
texts), or can be based on a statistical analysis. Speed order based n-gram frequencies (merged for all n
of the scoring function can be determined by using up to given bound dependent on language). This
classical algorithm complexity estimates. Precision method is used (Cavnar & Trenkle, 1994) in fast
of the scoring can be defined by means of separation automated language recognition: compute ranks
of valid and invalid plaintexts, respectively. There is of n-grams of given text and compare it with ranks
a trade-off involved in scoring, as faster scoring func- of significant n-grams obtained from large corpus
tions are less precise and vice-versa. Thus we apply of different languages. Correct language should
Automated Cryptanalysis
Figure 1. Distribution of score among 9! possible decrypts (table transposition cipher with key given by a per-
mutation of 9 columns), ciphertext size is 90 characters. Score was computed as a weighted sum of lengths of
(reduced) dictionary words found in the text. Single highest score belongs to the correct plaintext.
184906
153009
23694
1198 65 5 1 0 0 1
have the smallest distance. Even if this method is very low especially for transposition ciphers. In
can be adapted for plaintext recognition, e.g. by our experiments the scores have normal distribu-
creating random corpus, or encrypted corpus, tion, and usually the correct plaintext does not
it does not seem practical. have highest possible values. On the other hand,
3. Frequency distance of n-gram statistics this scoring function is easy to evaluate, and can
Score of the text can be estimated from the dif- be customized for a given ciphertext. Thus it can
ference of measured frequency of n-grams and be used as a fast scoring function fastScore(X).
estimated frequencies from large corpus (Clark, 5. Index of coincidence (and other similar statis-
1998; Spillman, Janssen, Nelson & Kepner, 1993). tics)
We suppose that correct plaintext would have the Index of coincidence (Friedman 1920), denoted
smallest distance from corpus statistics. However by IC, is a suitable and fast statistics applicable
due to statistical properties of the text, this is not for ciphers that modify the letter frequency. The
always true for short or specific texts. Thus the notion comes from probability of the same letter
precision of this scoring function is higher for occurring in two different texts at the same posi-
longer texts. Speed of the evaluation depends on tion. Encrypted texts are considered random, and
the text size and size of n. have (normalized) index of coincidence near to
4. Scoring tables for n-grams the 1.0. On the other hand, a plaintext has much
If we consider the statistics of all n-grams, we higher IC near the value expected for a given
will see that most n-grams contribute a very language and alphabet. For English language the
small value to the final score. We can consider expected value is 1.73. As with all statistics based
contribution of only the most common n-grams. on scoring function, its precision is influenced
For a given language (and a given ciphertext) we by the length of the text. Index of coincidence is
can prepare a table of n-gram scores with fixed most suitable for polyalphabetic ciphers, where
number of entries. Score is evaluated as the sum encryption depends on the position of the letter in
of scores assigned to n-grams in a given candidate the text (e.g. Vigenre cipher). It can be adapted
text (Clark & Dawson, 1998). We can assign both to other cipher types (Bagnall, McKeown & Ray-
positive scores for common valid n-grams, and ward-Smith, 1997), e.g. for transposition ciphers
negative scores for common invalid/supposedly when considering that alphabet is created by all
rare n-grams. However, precision of this method possible n-grams with some n > 1.
Automated Cryptanalysis
Figure 2. Distribution of score among 9! possible decrypts (table transposition cipher with key given by a per-
mutation of 9 columns), ciphertext size is 90 characters. Score is the count of digrams with frequency higher A
than 1% (in Slovak language) in a given text. The correct plaintext has scored 73.
111941 113251
59855
41959
23617
6931
128 4233 917 48
3-10 11-17 18-24 25-31 32-38 39-45 46-52 53-59 60-66 67-74
Automated Cryptanalysis
Automated Cryptanalysis
Pavol Zajac
Slovak University of Technology, Slovakia
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Automated Cryptanalysis of Classical Ciphers
Automated Cryptanalysis of Classical Ciphers
On the other hand, only the most complex algorithms tering is described in-depth in the article Automated
achieve really high accuracy of plaintext recognition. cryptanalysis Language processing.
Thus the careful balance of the complexity of plaintext
recognition algorithms and its accuracy is required. It Applications of Artificial Intelligence
is unlikely that automated cryptanalysis produces only Methods
one possible result, but it is possible to limit the set
of possible decrypts to a manageable size. Reported Artificial Intelligence (AI) methods can be used in four
results should be sorted according to their probability main areas of the automated cryptanalysis:
of being the true plaintext.
A generic brute force algorithm with plaintext 1. Plaintext recognition: The goal of the AI is to
recognition can be described by the pseudo-code in supply negative predicates that filter out wrong
Exhibit A. decrypts, and scoring functions that assess the
We have identified three layers of plaintext recogni- texts likeness to natural language.
tion, namely negative test predicate, fast scoring func- 2. Key-search heuristics: The goal of the AI is to
tion and precise scoring function. All three functions provide heuristics to speed-up the decryption
are used as a three-layer filter, and final scoring function process either by constraining the key-space, or
is also used to sort the outputs. First filter should be by guiding the selection of next keys to be tried in
very fast, and should have very low error probability. the decryption. This area is most often researched,
Fast score should be easy to compute, but it is not as it can provide clear experimental results, and
required to precisely identify the correct plaintext. Cor- meaningful evaluation.
rect plaintext recognition is the role of precise scoring 3. Plaintext estimation: The goal of the AI is to
function. In the algorithm, the best score is the highest estimate the meaning of the plaintext from the
one. If the score is computed in the opposite meaning, partial decryption, or to estimate some parts of
the algorithm must be rewritten accordingly. the plaintext based on external data (e.g. a sender
In some cases, we can integrate a fast scoring func- of a ciphertext, historical and geographic context,
tion within negative test or with the precise scoring, specific grammatical rules etc.) Estimated parts of
leading to two-layer filters, as in (Zajac, 2006a). It the plaintext can then lead to much easier com-
is also possible to use even more steps of predicate- plete decryption. This area of research is mainly
based and score-based filtering, respectively. However, unexplored, and plaintext estimation is done by
experiments show that the proposed architecture of the cryptanalyst.
three-layers is the most flexible, and more layers can 4. Automatic security evaluation: The goal of the
even lead to performance decrease. Scoring and fil- cryptanalysis is not only to break ciphers and to
Exhibit A.
INPUT: ciphertext string Y = y0 y1 y2yn
OUTPUT: ordered sequence S of possible plaintexts with their scores
1. Let S = { }
2. For each key K K do
2.1. Let X = dK( Y) be a candidate plaintext.
2.2. Compute n egative test p redicate filter(X). If predicate is true, continue w ith
step 2.
2.3. Compute fast scoring function fastScore(X). If fastScore(X) < LIMITF, continue
with step 2.
2.4. Compute precise scoring function score(X). If score(X) < LIMIT, continue with
step 2.
2.5. Let S = S {<score(X), X> }
3. Sort S by key score(X) descending.
4. Return S.
Automated Cryptanalysis of Classical Ciphers
learn secrets, but it is also used when creating based. One of the unexplored challenges is to consider
new ciphers to evaluate their security. Although application of multiple relaxation techniques. First A
most classical ciphers are already outdated, their heuristic can be used to shrink the key-space, and then
cryptanalysis is still important, e.g. in teaching either the brute-force search or another heuristic is used
the modern computer security principles. When with more precision to finish the decryption.
teaching classical ciphers, it is useful to have an
AI tool (e.g. an expert system), that can automate
the evaluation of cipher security (at least under FUTURE TRENDS
some weaker assumptions). Although much work
is done in automatic evaluation of modern secu- The results obtained strongly depend on the size of the
rity protocols, we are unaware of some tools to ciphertext, and decryptions are usually only partial.
evaluate classical cipher designs. Techniques of the automated cryptanalysis also need
to be fitted to a given problem. E.g. attacks on substitu-
Area that is best researched is the area of Key-search tion ciphers can use individual letter statistics, but for
heuristics. It immediately follows from the fact that attacks intended for transposition ciphers these statistics
brute force search through the whole key-space can are invariant and make no sense in using. Automated
be considered as a very crude method of decryption. cryptanalysis is usually studied only in context of these
Most classical ciphers were not designed with careful two main types of ciphers, but there is a broad area of
consideration of the text statistics. We can assign score unexplored problems concerning different classical
for each key in the key-space that is correlated with the cipher types, such as running key type ciphers. Specific
probability that text decrypted by given key is the plain- uses of AI techniques can fail for some cryptosystems
text. The score, when considered over the key-space, as pointed by Wagner, S., Affenzeller, M. & Schragl,
certainly have some local maxima, which can lead either D. (2004). Cryptanalysis also depends on the language
immediately to a meaningful plaintext, or a text from (Zajac, 2006b), although there are some notable excep-
which plaintext is easily guessed. Thus it can be use- tions when considering similar languages.
ful to consider various relaxation techniques to search As the computational power increases, even just
through the key-space with the goal of maximizing recently used ciphers, like Data Encryption Standard
scoring function. One of the earliest demonstrations of (DES), are becoming subject of automated cryptanalysis
relaxation techniques for breaking substitution ciphers (e.g. Nalini & Raghavendra Rao, 2007). Beside ap-
are presented by Peleg & Rosenfeld (1979) and Hunter plication of heuristics to cryptanalysis, a lot of further
& McKenzie (1983). Successful attacks applicable for research is required in areas of plaintext estimation
many classical ciphers can be implemented using basic and automatic security evaluation. An expert system
hill climbing, through tabu search, simulated annealing that would cover these areas and connect them with
and applications of genetic/evolution algorithms (Clark AI for plaintext recognition and search heuristics
& Dawson, 1998). Genetic algorithms have achieved can be a strong tool to teach computer security or to
many successes in breaking classical ciphers as dem- help forensic analysis or historical studies involving
onstrated by Mathews (1993), or Clark (1994), and encrypted materials.
can even break a rotor machine (Bagnall, McKeown
& Rayward-Smith, 1997). Russell, Clark & Stepney
(1998) present anagramming attack using a solver based CONCLUSION
on an ant colony optimisation algorithm.
These types of attack try to converge to the correct This article is concerned with an automated cryptan-
key by small changes of the actual key. Success rate of alysis of classical ciphers, where classical ciphers are
the attacks is usually measured by the fraction of the considered as a cipher from before WW2, or pencil-
reconstructed key and/or text. Relaxation methods can and-paper ciphers. Optimization heuristics are quite
find with a high probability the keys, or the plaintext successful in attacks targeted to these ciphers, but
approximations, even if it is not feasible to search the they usually cannot be made fully-automatic. Their
whole key-space. The success mainly depends on the application usually differs according to a character of
ciphertext size, since the scoring is usually statistics- the analysed cipher systems. An important research
Automated Cryptanalysis of Classical Ciphers
direction is extending the techniques from classical Ramesh, R.S. & Athithan, G. & Thiruvengadam, K.
cryptanalysis to automated decryption of modern digital (1993). An automated approach to solve simple sub-
cryptosystems. Another important problem is to create stitution ciphers. Cryptologia, 17(2), 202-218.
set of fully-automatic cryptanalytic tools or a complete
Russell, M. D. & Clark, J. A. & Stepney, S. (2003).
expert system that can be adapted to various types of
Making the most of two heuristics: Breaking transposi-
ciphers and languages.
tion ciphers with ants. Proceedings of IEEE Congress
on Evolutionary Computation (CEC 2003). IEEE Press,
2653--2658.
REFERENCES
Schatz, B. (1977). Automated analysis of cryptograms.
Bagnall,T. & McKeown, G. P. & Rayward-Smith, V. Cryptologia, 1(2), 265-274. Also in: Cryptology:
J. (1997). The cryptanalysis of a three rotor machine yesterday, today, and tomorrow, Artech House 1987,
using a genetic algorithm. In Thomas Back, editor, ISBN: 0-89006-253-6.
Proceedings of the Seventh International Conference
Wagner, S. & Affenzeller, M. & Schragl, D. (2004).
on Genetic Algorithms (ICGA97), San Francisco, CA.
Traps and Dangers when Modelling Problems for
Morgan Kaufmann.
Genetic Algorithms. Cybernetics and Systems, pp.
Carrol, J. & Martin, S. (1986). The automated crypt- 79-84.
analysis of substitution ciphers. Cryptologia, 10(4).
Zajac, P. (2006a). Automated Attacks on Transposi-
193-209.
tion Ciphers. Begabtenfrderung im MINT Bereich
Clark, A. (1994). Modern optimisation algorithms 14, 61-76.
for cryptanalysis. In Proceedings of the 1994 Second
Zajac, P. (2006b). Ciphertext language identification.
Australian and New Zealand Conference on Intelligent
Journal of Electrical Engineering, 57 (7/s), 26--29.
Information Systems, November 29 - December 2,
258-262.
Clark, A. & Dawson, E. (1998). Optimisation Heu- KEy TERMS
ristics for the Automated Cryptanalysis of Classical
Ciphers. Journal of Combinatorial Mathematics and Brute-Force Attack: Exhaustive cryptanalytic
Combinatorial Computing, vol. 28, 63-86. technique that searches the whole key-space to find
the correct key.
Hunter, D.G.N. & McKenzie, A. R. (1983). Experi-
ments with Relaxation Algorithms for Breaking Simple Ciphertext: The encrypted text, a string of letters
Substitution Ciphers. Comput. J. 26(1), 68-71 from alphabet C of a given cryptosystem by a given
key K K..
Jakobsen, T. (1995). A fast method for cryptanalysis
of substitution ciphers. Cryptologia, 19(3). pp. 265- Classical Cipher: A classical cipher system is a
274. five-tuple (P,C,K,E,D), where P, C, define plaintext and
ciphertext alphabet, K is the set of possible keys, and
Kahn, D. (1974): The codebreakers. Wiedenfeld and for each K K, there exists an encryption algorithm
Nicolson, London. eK E, and a corresponding decryption algorithm dK
Matthews, R.A.J. (1993). The use of genetic algorithms D such that dK (eK(x)) = x for every input x P
in cryptanalysis. Cryptologia, 17(4), 187-201. and K K..
Nalini, N. & Raghavendra Rao, A. (2007). Attacks Cryptanalysis: Is a process of trying to decrypt
of simple block ciphers via efficient heuristics. Infor- given ciphertext and/or find the key without, or with
mation Sciences: an International Journal 177 (12), only partial knowledge of the key. It is also a research
2553--2569. area studying techniques of cryptanalysis.
Peleg, S. & Rosenfeld, A. (1979). Breaking Substitution Key-Space: Set of all possible keys for a given
Ciphers Using a Relaxation Algorithm. Communica- ciphertext. Key-space can be limited to a subspace of
tions of the ACM 22(11), 598--605. the whole K by some prior knowledge.
0
Automated Cryptanalysis of Classical Ciphers
Arturo Serrano
iTEAM, Polytechnic University of Valencia, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Automatic Classification of Impact-Echo Spectra I
dataset (Mei, 2004). All these studies used multilayer Let ep be the error incurred in using this reduced
perceptron neural network and monosensor impact- reconstruction. By definition xp = yp + ep, so A
echo systems.
k =N
In a recent work, we classified impact-echo data
by neural networks using temporal and frequency p = a
k = r +1
k,p uk . (2)
features extracted from the signals, finding that the
better features were frequency features (Salazar, Uni,
Serrano, & Gosalbez, 2007). Thus the present work RECOGNITION OF DEFECT PATTERNS
is focused in exploiting only spectra information of IN IMPACT-ECHO SPECTRA -SIMULA-
the impact-echo signals. These spectra contain a large TIONS
amount of redundant information. We applied Principal
Component Analysis (PCA) to spectra for compress- Impact-Echo Signals
ing and removing noise. The proposed classification
problem and the use of spectra PCA components as Simulated signals came from full transient dynamic
classification features are a new proposal in application analysis of 100 3D finite element models of simulated
of neural networks to impact-echo testing. parallelepiped-shape material of 0.07x0.05x0.22m.
There is evidence that the first components of PCA (width, height and length) supported to one third and
retain essentially all of the useful information and this two thirds of the block length (direction z). Figure 1
compression optimally removes noise and can be used shows different examples of the models of defective
to identify unusual spectra (Bailer-Jones, 1996), (Bailer- pieces. From the transient analysis the dynamic response
Jones, Irwin, & Hippel, 1998), (Xu et al., 2004). The of the material structure (time-varying displacements
principal components represent sources of variance in in the structure) under the action of a transient load is
the data. The projection of the pth spectrum onto the estimated. The transient load, i.e. the hammer impact,
kth principal component is known as the admixture was simulated by applying a force-time history of a half
coefficient ak,p. The most significant principal compo- sine wave with a period of 64s as a uniform pressure
nents contain those features which are most strongly load on two elements at the centre of the model front
correlated in many of the spectra. It follows that noise face. The elastic material constants for the simulated
(which is uncorrelated with any other features by material were: density 2700 kg/m3, elasticity modulus
definition) will be represented in the less significant 69500 Mpa. and Poissons ratio 0.22.
components. Thus by retaining only the more signifi- Elements having dimensions of about 0.01 m. were
cant components to represent the spectra we achieve used in the models. These elements can accurately
a data compression that preferentially remove noise. capture the frequency response up to 40 kHz. Surface
The reduced reconstruction, yp of the pth spectrum xp, is displacement waveforms were taken from the simula-
obtained by using only the first r principal components tion results at 7 nodes in different locations on the
to reconstruct the spectrum, i.e. material surface, see Figure 1a. Signals consisted of
5000 samples recorded at a sampling frequency of 100
k =r kHz. To make possible to compare simulations with
y p = x + ak , p u k , r < N, (1) experiments, the second derivative of the displacement
k =1
was calculated to work with accelerations, since the
sensors available for experiments were mono-axial
where x is the mean spectrum which is subtracted accelerometers. These accelerations were measured
from the spectra before the eigenvectors are calcu- in the normal direction to the plane of the material
lated, and uk is the kth principal component. x can surface accordingly to the configuration of the sensors
be considered as the zeroth eigenvector, although the in Figure 1a.
degree of variance it explains depends on the specific
data set and may be much less than that explained by Feature Extraction and Selection
the first eigenvectors.
We investigate if the changes in the spectra, particularly
in the zones of the fundamental frequencies, are related
Automatic Classification of Impact-Echo Spectra I
Figure 1. Finite element models with different defects and 7-sensor configuration
1a. Half-passing through crack oriented in plane ZY 1b. passing through hole oriented in axis Y
with the shape, orientation and dimension of the de- first two options could estimate a variable number of
fects. The information of the spectra for each channel components per channel, and they could select more
consists of n/2 values as half of the number of points components for the channels with worst signals, i.e.
used to calculate the Fast Fourier Transform (FFT). signals with low signal to noise relation (SNR), due to
Due to the 7-channel impact-echo system setup applied, problems in measuring (e.g. bad contact in the interface
the number of data available for each impact-echo test sensor and material). Thus we select a fixed number
was 7*n/2, e.g. for a FFT calculated with 256 points, of components=20 per channel, that explained more
896 values would be available as entries for classi- than 95% of the data variance for each of the channels,
fiers. This high number of entries could be unsuitable so the total number of components was 7*20=140 for
for the training stage of neural networks. Considering one model.
impact-echo signal spectra redundancy, PCA was ap- The initial entries for the classification stage were
plied in two steps. At first step, PCA was applied to the then 140 features (spectra components) for the 100
spectra of each channel as a feature extraction method. simulation models. For simulations 20 replicates for
At second step, PCA was applied to the component set each model were added that corresponded to the rep-
(spectra compressed) obtained in the first step for all etitions performed in the experiments. The replicates
the channels and records as dimensionality reduction were generated using random Gaussian noise with
and feature selection method. Thus, a compressed and 0.1-standard deviation of the original signals; then total
representative pattern of the spectra for the multichan- of records for simulations was 2000 with 140 spectra
nel impact-echo inspection was obtained. components.
The size of the FFT employed was 1024 points since PCA was applied again to reduce the dimensionality
using less points the resolution was not good enough of the classification space and to select the best spectra
for classifications. Once the spectra were estimated for features for classification. After some preliminary tests,
all the models they were grouped and normalized by 50 was set as a number of components for classification.
maximum per channel. There were considered three Using this number of components, the explained vari-
options to establish the number of components at the ance was 98%. With the 50 sorted components obtained,
first PCA step: select a number of components that an iterative process of classification varying the number
explain a minimum of the variance in the data, or a of components was applied using LDA and kNN as
number of components such the variance increment classifiers. The curve described by the set of classifica-
is minimum, or a fixed number of components. The tion error and number of components (5,10,15,,50)
Automatic Classification of Impact-Echo Spectra I
values has an inflection point where the information completely distinguishable and multiple-defects class
provided for the components perform the best clas- was the worst classified at every classification levels. A
sification. Following this feature selection process, a The percentage of success could be very much higher
reduced set of features (better spectra components) by increasing classification success for multiple-defect
was obtained. Those features were used as entries for class. This fact was caused because the multiple-defects
ANNs, improving the performance of the classifica- models consisted in models with various cracks, and it
tion, instead of using all the spectra components. The yield confusion between the crack and multiple-defect
number of selected components for ANN classification classes. The percentage of success decreases for more
varied from 20 to 30, depending on classification level complex classifications, with RBF lowest performance
(material condition, kind of defect, defect orientation, of 69% for 12 classes.
defect dimension).
The classification proceeded applying the Leave-
One-Out method, avoiding records of replicas or FUTURE TRENDS
repetitions of a test piece were in the training stage of
that piece, so generalization of pattern learning was The proposed methodology was tested with particu-
forced. Thus some of the records used in training and lar kind of material and defects and configuration of
test corresponded to models or specimens with the multichannel testing. It could be tested using models
same kind of defect but located in different positions, and specimens of different materials, sizes, sensor
and the rest of records corresponded to others kind of configurations, and signal processing parameters.
defective pieces. Results presented in next sections are There exist several techniques and algorithms of
referring to mean error in testing stage. classification that can be explored for the proposed
problem. Recently a model of independent component
Simulation Results analysis (ICA) was proposed for impact-echo (Sala-
zar, Vergara, Igual, Gosalbez, & Miralles, 2004), and
Figure 2a shows the results of classification by kNN and new classifiers based on mixtures of ICAs have been
LDA with linear, Mahalanobis, and quadratic distances proposed (Salazar, Vergara, Igual, & Gosalbez, 2005),
for simulations at level 4 of the classification tree. The (Salazar, Vergara, Igual, & Serrano, 2007), that include
best percentage of classification success (75.9) is ob- issues as semisupervision in training stage. The use
tained by LDA-quadratic and LDA-Mahalanobis with of prior knowledge in the training stage is critical in
25 components. Those components were selected and order to obtain suitable models for different kind of
used as inputs for the input layer of the networks. One classifications. Those kind of techniques could give
hidden layer was used (different number of neurons more understating on how labelled and labelled data
were tried to obtain the best configuration of the neu- change model learned by the classifier. In addition
ron number at this layer), and the number of neurons more research is needed on the shape of the clas-
at the output layer was set as the number of classes, sification space (impact-echo signal spectra), outlier
depending on the classification level. A validation probability, and decision region of the classes for the
stage and resilient propagation training method were proposed problem.
used in classifications with MLP. The spread parameter
was tuned for RBF, Figure 2b shows how the spread
affects the classification results in the defect dimen- CONCLUSION
sion level, and in this case the minimum error (0.31)
is for spread value 1.6. We demonstrate the feasibility of using neural networks
Summarised general results by different classifica- to extract patterns of different kinds of defects from
tion methods for simulations are showed in Table 1. impact-echo signal spectra in simulations. The meth-
The best classification performance is obtained by LDA odology used was very restricted because there was
with quadratic distance, but results of RBF are fairly only one piece for a defect in certain localization in the
comparable. Due to classes are not equally-probable bulk and it was not in the training stage, so classifier
at each level, general results are weighted by class had to assign the right class with the patterns of pieces
probability, see Figure 3. Homogeneous class was of the same class in other localizations. Results could
Automatic Classification of Impact-Echo Spectra I
Figure 2. LDA, kNN results and tuning of RBF parameter at Simulations, level 4 of classification
2a. LDA, kNN results for simulations at fourth level of classifi- 2b. RBF spread tuning for simulations at fourth level of clas-
cation sification
be used to implement the proposed method in real ap- Bishop C.M. (2004). Neural newtworks for pattern
plications of quality evaluation of materials; in those recognition. Oxford: Oxford University Press.
applications the database collected during reasonable
Carino, N. J. (2001). The impact-echo method: an
time could have samples similar to the tested piece,
overview. In Structures Congress and Exposition (Ed.),
making easier the classification process.
(pp. 1-18).
Duda, R., Hart, P. E., & Stork, D. G. (2000). Pattern
REFERENCES classification. (2 ed.) New York: Wiley-Interscience .
Mei, X. (2004). Neural network for rapid depth
Bailer-Jones, C. (1996). Neural Network Classifiation
evaluation of shallow cracks in asphalt pavements.
of Stellar Spectra. University of Cambridge.
Computer-aided civil and infrastructure engineering,
Bailer-Jones, C., Irwin, M., & Hippel, T. (1998). 19, 223-230.
Automated classification of stellar spectra - II. Two-
Pratt, D. & Sansalone, M. (1992). Impact-echo signal
dimensional classification with neural networks and
interpretation using artificial intelligence. ACI Materi-
principal components analysis. Monthly Notices of the
als Journal, 89, 178-187.
Royal Astronomical Society, 298, 361-377.
Automatic Classification of Impact-Echo Spectra I
Salazar, A., Uni, J., Serrano, A., & Gosalbez, J. (2007). Xu, R., Nguyen, H., Sobol, P., Wang, S. L., Wu, A., &
Neural Networks for Defect Detection in Non-Destruc- Johnson, K. E. (2004). Application of Principal Com-
tive Evaluation by Sonic Signals. Lecture Notes in ponent Analysis to the FTIR Spectra of Disk Lubricant
Computer Science, 4507, 638-645. to Study LubeCarbon Interactions. IEEE Transactions
on Magnetics, 40, 3186-3189.
Salazar, A., Vergara, L., Igual, J., & Gosalbez, J. (2005).
Blind source separation for classification and detection
of flaws in impact-echo testing. Mechanical Systems
and Signal Processing, 19, 1312-1325. KEy TERMS
Salazar, A., Vergara, L., Igual, J., Gosalbez, J., & Mi- Artificial Neural Network (ANN): A mathematical
ralles, R. (2004). ICA Model Applied to Multichannel model inspired in biological neural networks. The units
Non-destructive Evaluation by Impact-echo. Lecture are called neurons connected in various input, hidden
Notes in Computer Science, 3195, 470-477. and output layers. For a specific stimulus (numerical
data at the input layer) some neurons are activated
Salazar, A., Vergara, L., Igual, J., & Serrano, A. (2007).
following an activation function and producing numeri-
Learning Hierarchies from ICA Mixtures. In I. 2. 20th
cal output. Thus ANN is trained, storing the learned
International Joint Conference on Neural Networks
model in weight matrices of the neurons. This kind
(Ed.).
of processing has demonstrated to be suitable to find
Sansalone, M. & Street, W. (1997). Impact-echo: Non- nonlinear relationships in data, being more flexible
destructive evaluation of concrete and masonry. New in some applications than models extracted by linear
York: Bullbrier Press. decomposition techniques.
Stavroulakis, G. E. (1999). Impact-echo from a unilat- Finite Element Method (FEM): It is a numerical
eral interlayer crack. LCP-BEM modelling and neural analysis technique to obtain solutions to the differential
identification. Engineering Fracture Mechanics, 62, equations that describe, or approximately describe a
165-184. wide variety of problems. The underlying premise of
FEM states that a complicated domain can be sub-di-
Xiang, Y. & Tso, S. K. (2002). Detection and classifica-
vided into a series of smaller regions (the finite elements)
tion of flaws in concrete structure using bispectra and
in which the differential equations are approximately
neural networks. NDT&E International, 35, 19-27.
Automatic Classification of Impact-Echo Spectra I
solved. By assembling the set of equations for each related with classify data in categories. Classification
region, the behavior over the entire problem domain consists in learning a model for separating the data
is determined. categories, that kind of machine learning can be ap-
proached using statistical (parametric or no-parametric
Impact-Echo Testing: A non-destructive evalua-
models) or heuristic techniques. If some prior informa-
tion procedure based on monitoring the surface motion
tion is given in learning process, it is called supervised
resulting from a short-duration mechanical impact.
or semi-supervised, else it is called unsupervised.
From analyses of the vibrations measured by sensors, a
diagnosis of the material condition can be obtained. Principal Component Analysis (PCA): A method
for achieving a dimensionality reduction. It represents
Non-Destructive Evaluation (NDE): NDE, ND
a set of N-dimensional data by means of their projec-
Testing or ND Inspection techniques are used in quality
tions onto a set of r optimally defined axes (principal
control of materials. Those techniques do not destroy
components). As these axes form an orthogonal set,
the test object and extract information on the internal
PCA yields a data linear transformation. Principal
structure of the object. To detect different defects such
components represent sources of variance in the data.
as cracking and corrosion, there are different methods
Thus the most significant principal components show
of testing available, such as X-ray (where cracks show
those data features which vary the most.
up on the film), ultrasound (where cracks show up as
an echo blip on the screen) and impact-echo (cracks Signal Spectra: Set of frequency components
are detected by changes in the resonance modes of decomposed from an original signal in time domain.
the object). There exist several techniques to map a function in time
domain to frequency domain as Fourier and Wavelet
Pattern Recognition: An important area of research
transforms, and its inverse transforms that allow re-
concerned to discover or identify automatically figures,
constructing the original signal.
characters, shapes, forms, and patterns without active
human participation in the decision process. It is also
Arturo Serrano
iTEAM, Polytechnic University of Valencia, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Automatic Classification of Impact-Echo Spectra II
Figure 1. Classification tree with percentages of success in classification by RBF network. Numbers in brackets are results
for simulations (paper I). General results are weighted by class probability since classes are not equally-probable.
Quality of material
100 99.2 92.6
General results
(100) (97.5) (25) 98.25 / ( 92)
One Multiple Level 1 Material
Homogeneous defect condition
defects
100 76 94 .5 77.9
(100) (96 .9) (79 .2) (25)
89.12 / (83)
00
Automatic Classification of Impact-Echo Spectra II
S6 S7 S4
x
S5 S3
z supports
impact excitation
0
Automatic Classification of Impact-Echo Spectra II
Feature Extraction and Selection All the classification used Leave-One-Out method.
Repetitions of a piece in testing were not used in its
The methodology followed for feature extraction, training stage to avoid classifier to memorize the
feature selection, reduction of dimensionality and pieces instead of to generalize patterns. Table 1 shows
classification in the impact-echo signal spectra was summarised results for all the classifiers applied at the
the applied in Paper I. After signal acquisition a four- different levels of classification, these results refer to
stage procedure was followed: feature extraction, mean error in testing stage.
dimensionality reduction, feature selection, and clas-
sification with ANNs. Experimental Results
In the feature extraction stage, a 1024-points
FFT was applied to the measured signals and these General results of classifications for experiments in
spectra were compressed by PCA, selecting the first Table 1 show the RBF as the best classifier, improving
20 components per each channel. Thus entries for the its performance near to 20% with regard to simulation
dimensionality reduction stage were 140 components results in paper I at the more complex level of clas-
(7channelsx20) for the 84 lab specimens. For each sification (12 classes). The percentage of classification
experiment (specimen) were performed around 22 success improved for every class at each level, particu-
repetitions, so the total of records was 1881 for experi- larly for multiple-defect class from 25% up to 92.6%
ments each one with 140 spectra components. In the at first level and 89.1% at fourth level, see Figure 1.
dimensionality reduction stage PCA reduced the 140 In experiments, specimens with multiple-defects were
spectra components to 50 spectra components with a prepared combining cracks and holes, so there was not
92% explained variance. This matrix of 50 selected much confusion with multiple-defect and one-defect
components by 1881 records was the input for a feature classes.
selection process which objective was to found the Real experiments of impact-echo involved random
best number of components for classification. Then variables in its execution, as the force injected in the
various tests of classification using LDA and kNN vary- impact excitation, and the position of the sensors that
ing the number of components from 5 to 50 by incre- can vary from piece to piece due to they are manually
ments of 5 were applied. The components corresponding controlled. Those variables yield repetitions of the
to the best percentage of success in classification with experiments with its corresponding signal spectra that
kNN and LDA were selected as entries for the stage separate better class regions than Gaussian noise used
of classification with MLP and RBF. The number of to obtain replicates of the simulated model signals.
spectra components varied from 10 to 30 depending The results of experiment classifications confirm the
on the classification level. Parameters as spread for feasibility of using neural networks for pattern recogni-
RBF, and the number of neurons in the hidden layer tion of defects in impact-echo signals. Table 2 contains
for MLP were tuned to obtain the best classification the confusion matrix at level defect orientation. Ho-
percentage of success of the ANNs. mogeneous class is perfectly classified, and all the rest
0
Automatic Classification of Impact-Echo Spectra II
of six classes are well classified, except the hole Y (Salazar, Vergara, Igual, & Serrano, 2007). That kind
class (48.8% success). This class is frequently confused of modelling and learning procedure could be suitable
with all the classes of cracks; it could be due to defect for the classification of materials tested by impact-
geometry does not allow produce a discernible wave echo. Training stage and percentage of supervision is
pattern from propagation wave phenomena. In addition a critical subject in order to develop a suitable model
multiple-defect class is sometimes confused with cracks from the data for classification. Thus depending on the
and hole X. It is due to particular patterns of one of kind of defective materials used in training a better
the defects inside some multiple-defect specimens are adapted model for a specific classification would be
more dominant in the spectra, causing multiple-defect defined. Then a decision fusion made by various clas-
spectra be alike to crack or hole Y spectra. sifiers could be more suitable than the decision made
by one classifier.
FUTURE TRENDS
CONCLUSION
The problem of material evaluation defined different
levels of classification in a hierarchical outline with dif- We demonstrate the feasibility of using neural networks
ferent kind of insight on quality of the tested material. to extract patterns of different kinds of defects from
It could be considered restate the problem to classify impact-echo signal spectra in lab experiments. General
defects by ranges of defect size, independently of its results of the applied neural networks show RBF as the
shape or orientation, this kind of classification is very more suitable technique for the impact-echo problem
useful in industries as marble factories. The applicabil- even in complex levels of classifications, discerning
ity of the proposed methodology has to be confirmed up to 12 classes of homogeneous, one-defective and
with application on different materials. multiple-defect materials.
RBF neural network yielded good results for all The proposed methodology has yield encouraging
levels of classification, but more algorithms have to be results with controlled lab experiments (same dimen-
tested, taking into account the feasibility of its imple- sions of the specimens, good-wave propagation mate-
mentation in a real-time application and the improve- rial, and well-defined defects). The procedure has to
ment of the classification percentage of success. For be tested for processing real industry materials with
instance, new algorithms of classification exploit linear a range of different dimensions, kind of defects and
dependencies in the data, and allow semi-supervised microstructures for which impact-echo signal spectra
learning (Salazar, Vergara, Igual, Gosalbez, & Miralles, define fuzzy regions for classification.
2004), (Salazar, Vergara, Igual, & Gosalbez, 2005),
0
Automatic Classification of Impact-Echo Spectra II
REFERENCES Salazar, A., Vergara, L., Igual, J., Gosalbez, J., & Mi-
ralles, R. (2004). ICA Model Applied to Multichannel
Abraham O, Leonard C., Cote P., & Piwakowski B. Non-destructive Evaluation by Impact-echo. Lecture
(2000). Time-frequency Analysis of Impact_Echo Notes in Computer Science, 3195, 470-477.
Signals: Numerical Modelling and Experimental Vali-
Salazar, A., Vergara, L., Igual, J., & Serrano, A. (2007).
dation. ACI Materials Journal, 97, 647-657.
Learning Hierarchies from ICA Mixtures. In I. 2. 20th
Bailer-Jones, C. (1996). Neural Network Classifiation International Joint Conference on Neural Networks
of Stellar Spectra. University of Cambridge. (Ed.).
Bailer-Jones, C., Irwin, M., & Hippel, T. (1998). Sansalone, M., Carino, N. J., & Hsu, N. N. (1998).
Automated classification of stellar spectra - II. Two- Transient stress waves interaction with planar flaws.
dimensional classification with neural networks and Batim-Int-Build-Res-Pract, 16, 18-24.
principal components analysis. Monthly Notices of the
Sansalone, M., Lin, Y., & Street, W. (1998). Determin-
Royal Astronomical Society, 298, 361-377.
ing the depth of surface-opening cracks using impact-
Bishop C.M. (2004). Neural newtworks for pattern generated stress waves and time-of-flight techniques.
recognition. Oxford: Oxford University Press. ACI-Materials-Journal, 95, 168-177.
Carino, N. J. (2001). The impact-echo method: an Sansalone, M. & Street, W. (1997). Impact-echo: Non-
overview. In Structures Congress and Exposition (Ed.), destructive evaluation of concrete and masonry. New
(pp. 1-18). York: Bullbrier Press.
Cheeke J.D. (2002). Fundamentals and Applications Stavroulakis, G. E. (1999). Impact-echo from a unilat-
of Ultrasonic Waves. USA: CRC Press LLC. eral interlayer crack. LCP-BEM modelling and neural
identification. Engineering Fracture Mechanics, 62,
Duda, R., Hart, P. E., & Stork, D. G. (2000). Pattern 165-184.
classification. (2 ed.) New York: Wiley-Interscience .
Xiang, Y. & Tso, S. K. (2002). Detection and classifica-
Hill, M., McHung, J., & Turner, J. D. (2000). Cross- tion of flaws in concrete structure using bispectra and
sectional modes in impact-echo testing of concrete neural networks. NDT&E International, 35, 19-27.
structures. Journal-of-Structural-Engineering, 126,
228-234. Xu, R., Nguyen, H., Sobol, P., Wang, S. L., Wu, A., &
Johnson, K. E. (2004). Application of Principal Com-
Mei, X. (2004). Neural network for rapid depth ponent Analysis to the FTIR Spectra of Disk Lubricant
evaluation of shallow cracks in asphalt pavements. to Study LubeCarbon Interactions. IEEE Transactions
Computer-aided civil and infrastructure engineering, on Magnetics, 40, 3186-3189.
19, 223-230.
Pratt, D. & Sansalone, M. (1992). Impact-echo signal
KEy TERMS
interpretation using artificial intelligence. ACI Materi-
als Journal, 89, 178-187.
Accelerometer: A device that measures accelera-
Salazar, A., Uni, J., Serrano, A., & Gosalbez, J. (2007). tion which is converted into an electrical signal that
Neural Networks for Defect Detection in Non-Destruc- is transmitted to signal acquisition equipment. In
tive Evaluation by Sonic Signals. Lecture Notes in impact-echo testing, the measured acceleration refers
Computer Science, 4507, 638-645. to vibration displacements caused by the excitation of
the short impact.
Salazar, A., Vergara, L., Igual, J., & Gosalbez, J. (2005).
Blind source separation for classification and detection Dimensionality Reduction: A process to reduce
of flaws in impact-echo testing. Mechanical Systems the number of variables of a problem. Dimension of a
and Signal Processing, 19, 1312-1325. problem is given by the number of variables (features
or parameters) that represent the data. After signal fea-
ture extraction (that reduce the original signal sample
0
Automatic Classification of Impact-Echo Spectra II
0
0
Antonio Giaquinto
Politecnico di Bari, Italy
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AVI of Surface Flaws on Manufactures I
BACKGROUND B(i, j; k, l) are called the feedback and the input synaptic
operators and uniquely identify the network. A
Cellular Neural Networks consist of processing units The reported circuit model constitutes a hardware
C(i, j), which are arranged in an MN grid, as shown paradigm which allows fast processing of signals.
in Figure 1. For this reason, in the past CNNs were considered
The generic basic unit C(i, j) is called cell: it cor- as an useful framework for defect detection in indus-
responds to a first-order nonlinear circuit, electrically trial applications (Roska 1992). Successively different
connected to the cells, which belong to the set Sr(i, j), CNN-based contributions working in real time and
named sphere of influence of the radius r of C(i, j). aiming at the defect detection in the industrial field
Such set Sr(i, j) is defined as: have been proposed (Bertucco, Fargione, Nunnari
& Risitano 2000), (Occhipinti, Spoto, Branciforte &
{
S r ( i, j ) = C ( k , l ) max
1 k M ,1 l N
( k i , l j ) r } Doddo 2001), (Guinea, Gordaliza, Vicente & Garca-
Alegre 2000), (Perfetti & Terzoli 2000). In (Bertucco,
Fargione, Nunnari & Risitano 2000) and (Occhipinti,
An MN Cellular Neural Network is defined by Spoto, Branciforte & Doddo 2001) non-destructive
an MN rectangular array of cells C(i, j) located at control of mechanical parts in aeronautical industrial
site (i, j), i =1, 2, .., M, j = 1, 2, , N. Each cell C(i, j) production is carried out defining an algorithm which
is defined mathematically by the following state and is implemented by means of CNNs entirely. These
output equations: methods reveal effective, but a complex acquisition
system is required to provide information about the
dxij defectiveness. In (Guinea, Gordaliza, Vicente & Garca-
dt
= xij + A(i, j; k , l ) ykl + B(i, j; k , l )ukl + zij
0
AVI of Surface Flaws on Manufactures I
step an output binary image, in which only defects are Image Matching module. The target of this block is
represented, is yielded. finding the minimum difference between the two im-
The proposed solution needs nor complex acquisi- ages OF and RF. In fact, during the production process,
tion system neither feature extraction, in fact the image the acquiring system could give images in which the
is directly processed and the synthesis parameters of the manufacture is shifted according the four cardinal
system are evaluated from the statistical image proper- directions. This implies that the difference between
ties automatically. Furthermore, the proposed system OF and RF could lead to the detection of false defects.
is well suited for single board implementation. The Image Matching Module minimizes such effects,
The scheme that represents the proposed method looking for the best matching between the image to
is shown in Figure 2. be processed and the reference one. Successively the
As it can be observed, it is formed by three modules: difference image D feeds the Defect Detection module.
a Preprocessing module, an Image Matching module This part aims at the detection of the presence of flaws
and a Defect Detection one. The input images, named on the product under test and gives an output image
O and R, are acquired by means of a camera, which containing only the defects. The output image allows
yields 256-gray levels images, whose dimensions are to activate alarming systems able to detect the presence
m n. The image O represents the manufacture under of flaws, making this industrial task easier, in fact it
test or a part of it. Such image contains the Region of could support experts in their diagnoses. The detailed
Interest (ROI), that is the specific region of an object, implementation of each module will be illustrated in
in which defects are to be detected. the second part of this contribution.
The image R constitutes a reference image, in which
a product without defects (or its part) is depicted. Such
image is stored in a memory and acquired off-line FUTURE TRENDS
during the phase of system calibration. It is used to
detect possible variations caused by the presence of In order to provide the most information related to
dents, scratches or breakings on an observed surface. defects detected by the proposed approach in industrial
In order to allow a good match between the reference processes, features of flaws should be identified. For this
image and the under test one, the preprocessing blocks reason, future works will be devoted to the evaluation of
realize a contrast enhancement, providing images OF different characteristics like dimension of defects, kind
and RF, that constitute the inputs for the subsequent of damage and its degree. Moreover, the advantages
by applying the proposed method in various industrial
fields will be investigated and techniques minimizing
eventual misclassifications in particular applications,
Figure 2. Block diagram of the proposed CNN-based
will be developed.
method
O R
CONCLUSION
Pre-processing Pre-processing In this chapter a CNN-based method for the visual
inspection of surface flaws of manufactures has been
proposed. The approach consists of three modules:
OF RF a Preprocessing Module provides images, in which
contrast is enhanced. An Image Matching Module al-
Image Matching
lows to make up for eventual misalignment between
D
the manufacture under test and the acquisition system.
Finally, the Defect Detection Module enables to extract
Defect Detection images which contain defects of manufactures.
The suggested method offers attractive advantages.
F It reveals general, therefore it can be introduced in
different industrial fields, in which the identification
0
AVI of Surface Flaws on Manufactures I
of superficial anomalies like dents, corrosions or spots C. Garcia (2005), Artificial intelligence applied to au-
on manufactures is a fundamental task. tomatic supervision, diagnosis and control in sheet metal A
Moreover, the suggested method is finalized to the stamping processes, Journal of Material Processing
implementation by means of an architecture, entirely Technology, vol. 164-165, pp. 1351-1357.
formed by Cellular Neural Networks, exploiting the po-
D. Graham, P. Maas, G.B. Donaldson & C. Carr (2004),
tentialities that this kind of network offers in processing
Impact damage detection in carbon fibre composites
signals. Therefore, the proposed approach enables to
using HTS SQUIDs and neural networks, NDT&E
automate in line diagnosis processes reducing operators
International, vol 37, pp. 565-570.
burden in identifying production defects.
D. Guinea, A. Gordaliza, J. Vicente & M.C. Garca-
Alegre (2000), CNN based visual processing for in-
REFERENCES dustrial inspection Proc. of The International Society
for Optical Engineering (SPIE), vol. 3966, no. 45,
G. Acciani, G. Brunetti & G. Fornarelli (2006), Ap- 2000, pp. p. 315-322.
plication of Neural Networks in Optical Inspection and
L. Han, X. Yue & J. Yu (1999), Inspection of Surface
Classification of Solder Joints in Surface Mount Tech-
Defects on Engine Cylinder Wall Using Segmenta-
nology, IEEE Transactions on Industrial Informatics,
tion Image Processing Technique, Proc. of the IEEE
vol. 2, no. 3, ISSN 1551-3203, pp. 200-209.
International Conference on Vehicle Electronics, 6-9
C. Bahlmann, G. Heidemann & H. Ritter (1999), Ar- September, 1999, Changchun, P.R. China, vol. 1, pp.
tificial neural networks for automated quality control 258-260.
of textile seams, Pattern Recognition, vol. 32, pp.
Y.A. Karayiannis, R. Stojanovic, P. Mitropoulos,
1049-1060.
C.Koulamas, T. Stouraitis, S. Koubias & G. Papadopou-
L. Bertucco, G. Fargione, G. Nunnari & A. Risitano los (1999), Defect Detection and Classification on Web
(2000), A Cellular Neural Network Approach for Textile Fabric using Multiresolution Decomposition and
Nondestructive Control of Mechanical Parts Proc. Neural Networks, Proc. Of the 6th IEEE International
of the 6th IEEE Int. Workshop on Cellular Neural Conference on Electronics, Circuits and Systems, 5-8
Networks and Their Applications, Catania, Italy, 23-27 September 1999, Pafos, Cyprus, pp. 765-768.
May 2000, pp. 159-164.
D.A. Karras (2003), Improved Defect Detection
C.Y. Chang, S.Y. Lin & M.D. Jeng (2005), Using Using Support Vector Machines and Wavelet Feature
a Two-layer Competitive Hopfield Neural Network Extraction Based on Vector Quantization and SVD
for Semiconductor Wafer Defect Detection, Proc. of Techniques, Proc. of the International Joint Confer-
the 2005 IEEE International Conference on Automa- ence on Neural Networks, 20-24 July 2003, Portland,
tion, Science and Engineering, August 1-2, 2005, pp. Oregon, USA, pp. 2322-2327
301-306.
A. Kumar (2003), Neural network based detection
L.O. Chua & T. Roska (2002), Cellular Neural Net- of local textile defects , Pattern Recognition, vol. 36,
works and Visual Computing: Foundation and Ap- pp. 1645-1659.
plications, Cambridge University Press, Cambridge,
C. Kwak, J.A. Ventura & K. Tofang-Sazi (2000), A
United Kingdom, 2002.
neural network approach for defect identification and
G. Fornarelli & A. Giaquinto (2007), A CNN- classification on leather fabric, Journal of Intelligent
based Technique for the Identification of Superficial Manufacturing, vol. 11, pp. 485-499.
Anomalies on Manufactures Part II, submitted to
L. Lei (2004), A Machine Vision System for Inspect-
Encyclopedia of Artificial Intelligence, Editors: J.R.
ing Bearing-Diameter, Proc. of the IEEE 5th World
Rabual, J. Dorado & A. Pazos, Information Science
Congress on Intelligent Control and Automation, 15-19
Reference, 2007.
June 2004, Hangzhou P.R. China, pp. 3904-3906.
0
AVI of Surface Flaws on Manufactures I
L. Occhipinti, G. Spoto, M. Branciforte & F. Doddo Automated Visual Inspection: An automatic form
(2001), Defects Detection and Characterization by of quality control normally achieved using one or more
Using Cellular Neural Networks, Proc. of the 2001 cameras connected to a processing unit. Automated
IEEE International Symposium on Circuits and Systems, Visual Inspection has been applied to a wide range of
(ISCAS 2001), 6-9 May 2001, vol. 3, pp. 481-484. products. Its target consists of minimizing the effects
of visual fatigue of human operators who perform the
P.M. Patil, M.M. Biradar & S. Jadhav (2005), Ori-
defect detection in a production line environment.
entated texture segmentation for detecting defects,
Proc. of the International Joint Conference on Neural Cellular Neural Networks: A particular circuit
Networks, 31 July-4 August 2005, Montreal, Canada, architecture which possesses some key features of
pp. 2001-2005. Artificial Neural Networks. Its processing units are
arranged in an MN grid. The basic unit of Cellular
R. Perfetti & L. Terzoli (2000), Analogic CNN algo-
Neural Networks is called cell and contains linear and
rithms for textile applications, International Journal of
non linear circuit elements. Each cell is connected only
Circuit Theory and Applications, no. 28, pp. 77-85.
to its neighbour cells. The adjacent cells can interact
S. Rimac-Drlje, A. Keller & Z. Hocenski (2005), directly with each other, whereas cells not directly
Neural Network Based Detection of Defects in Tex- connected together may affect each other indirectly
ture Surfaces, IEEE International Symposium on because of the propagation effects of the continuous
Industrial Electronics, 20-23 June 2005, Dubrovnik, time dynamics.
Croatia, pp. 1255-1260.
Defect Detection: Extraction of information about
T. Roska (1992), Programmable CNN: a Hardware the presence of an instance in which a requirement is not
Accelerator for Simulation, Learning, and Real-Time satisfied in industrial processes. The aim of Defect De-
Applications, Proc. of the 35th Midwest Symposium, tection consists of highlighting manufactures which are
Washington, USA, August 9-12, 1992, vol.1 pp. 437- incorrect or missing functionality or specifications.
440.
Image Matching: Establishment of the corre-
R. Stojanovic, P. Mitropulos, C.Koullamas, Y. Karayian- spondence between each pair of visible homologous
nis, S. Koubias, G .Papadopoulos (2001), Real-Time image points on a given pair of images, aiming at the
Vision-Based System for Textile Fabric Inspection, evaluation of novelties.
Real-Time Imaging, vol. 7, pp. 507-518.
Industrial Inspection: Analysis pursuing the
X. Yang, G. Pang, N. Yung (2004), Discriminative prevention of unsatisfactory industrial products from
training approaches to fabric defect classification based reaching the customer, particularly in situations where
on wavelet transform, Pattern Recognition, vol. 37, failed manufactures can cause injury or even endanger
pp. 889-899. life.
Region of Interest: A selected subset of samples
KEy TERMS within a dataset identified for a particular purpose. In
image processing, the Region of Interest is identified by
Artificial Neural Networks: A set of basic process- the boundaries of an object. The encoding of a Region
ing units which communicate to each other by weighted of Interest can be achieved by basing its choice on:
connections. These units give rise a parallel processing (a) a value that may or may not be outside the normal
with particular properties such as the ability to adapt range of occurring values; (b) purely separated graphic
or learn, to generalise, to cluster or organise data, to information, like drawing elements; (c) separated
approximate non-linear functions. Each unit receives semantic information, such as a set of spatial and/or
an input from neighbours or external sources and uses it temporal coordinates.
to compute an output signal. Such signal is propagated
to other units or is a component of the network output.
In order to map an input set into an output one a neural
network is trained by teaching patterns, changing its
weights according to proper learning rules.
0
Antonio Giaquinto
Politecnico di Bari, Italy
INTRODUCTION BACKGROUND
Automatic visual inspection takes a relevant place in In the companion paper an approach for defect detec-
defect detection of industrial production. In this field a tion of surface flaws on manufactures is proposed: this
fundamental role is played by methods for the detection approach can be divided into three modules, named
of superficial anomalies on manufactures. Preprocessing module, Image Matching module and
In particular, several systems have been proposed in Defect Detection module, respectively. The first one
order to reduce the burden of human operators, avoid- realizes a pre-processing stage which enables to identify
ing the drawbacks due to the subjectivity of judgement eventual defected areas; in the second stage the match-
criteria (Kwak, Ventura & Tofang-Sazi 2000, Patil, ing between such pre-processed image and a reference
Biradar & Jadhav 2005). one is performed; finally, in the third step an output
Proposed solutions are required to be able to handle binary image, in which only defects are represented,
and process a large amount of data. For this reason, is yielded.
neural networks-based methods have been suggested The proposed solution needs nor complex acquisi-
for their ability to deal with a wide spread of data tion system neither feature extraction, in fact the image
(Kumar 2003, Chang, Lin & Jeng 2005, Garcia 2005, is directly processed and the synthesis parameters of the
Graham, Maas, Donaldson & Carr 2004, Acciani, networks are evaluated from the statistical image prop-
Brunetti & Fornarelli 2006). Moreover, in many cases erties automatically. Furthermore, the proposed system
these methods must satisfy time constrains of industrial is well suited for a single board implementation.
processes, because the inclusion of the diagnosis inside
the production process is needed.
To this purpose, architectures, based on Cellular CNN-BASED DIAGNOSIS
Neural Networks (CNNs), revealed successful in the ARCHITECTURE
field of real time defect detection, due to the fact that
these networks guarantee a hardware implementation The detailed implementation of each module will be
and massive parallelism (Bertucco, Fargione, Nunnari illustrated in the following. Successively the results
& Risitano 2000), (Occhipinti, Spoto, Branciforte & obtained by testing the suggested architecture on two
Doddo 2001), (Perfetti & Terzoli 2000). On the basis real cases are shown and a discussion of numerical
of these considerations, a method to identify superfi- outcomes is reported.
cial damages and anomalies in manufactures has been
given in (Fornarelli & Giaquinto 2007). This method Preprocessing Module
is aimed at the implementation by means of an archi-
tecture entirely formed by Cellular Neural Networks, The Preprocessing is realized by a Fuzzy Contrast
whose synthesis is illustrated in the present work. The Enhancement block. This block consists of a Fuzzy
suggested solution reveals effective for the detection Associative Memory (FAM), developed as the preproc-
of defects, as shown by two test cases carried out on essing stage of the CNN-based system considered in
an injection pump and a sample textile. (Carnimeo & Giaquinto 2002). The proposed circuit
enables to transform 256-gray levels images into fuzzi-
fied ones, whose contrast is enhanced, due to the fact
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
AVI of Surface Flaws on Manufactures II
that their histograms are stretched. To this purpose a S1 changes its position, excluding the image OF. The
proper fuzzification procedure is developed to define reference image RF is subtracted by the images OFN,
two fuzzy subsets adequate to describe the semantic OFS, OFE OFW and OF, then the number bN, bS, bE, bW
content of patterns such as images of industrial objects, and b0 of black pixels in the resulting images DN, DS,
which can be classified as belonging to the Object/ DE, DW and D0 are computed. The image, which best
Background class. matches with the reference one, presents the maximum
In an analogous way, the domain of output values numbers of black pixels. Therefore, such value drives
has been characterized by means of two output fuzzy the switch S2, which allows to feedback the image
subsets defined as Dark and Light. In particular, the that best matches with the reference one. In this way
fuzzy rules which provide the mapping from original the image which presents the minimum difference
images (O/R) into fuzzified ones (OF/RF) can be ex- becomes the input for a successive computational step.
pressed as: The processing is repeated until D0 presents the best
matching. When this condition is satisfied, the difference
IF O(i, j) Object THEN OF(i, j) Dark image D between D0 and RF is computed. As it can
IF O(i, j) Background THEN OF(i, j) Light be noticed the operations needed for each directional
shift can be carried on simultaneously, reducing the
computational time at each step.
where O(i, j) and OF(i, j) denote the gray level value
of the (i, j)-th pixel in the original image and in the Defect Detection Module
fuzzified one, respectively. As showed in (Carnimeo
& Giaquinto 2002), the reported fuzzy rules can be The third part of the suggested architecture is a Defect
encoded into a single FAM. Detection module. The subsystem is synthesized with
Then, a Cellular Neural Network is synthesized to the aim of computing the output binary image F, in
behave as the codified FAM by adopting the synthesis which only the defects are present. Such module is
procedure developed in (Carnimeo & Giaquinto 2002), composed by the sequence of a Major Voting circuit,
where the synthesis of a CNN-based memory, which a CNN associative memory for contrast enhancement
contains the abovementioned fuzzification rules is ac- and a Threshold circuit. The corresponding CNN-based
curately formulated. implementation is obtained by considering space in-
Contrasted images present a stretched histogram. variant networks.
This implies that such operation minimizes the effects In detail the Major Voting circuit minimizes the
of image noise, caused by environmental problems number of false detections caused by the presence of
like dust or dirtiness of camera lenses. Moreover, it noise, highlighting the dents or those flaws which lead
reduces the undesired information due to the combina- to a changing in the reflectance of light in the original
tion between the non uniformity of the illumination in image. The output of the Major Voting block DM feeds
the image and the texture of the manufacture (Jamil, a CNN working as the associative memory described
Bakar, Mohd, & Sembok 2004). in the previous Preprocessing module subsection.
This operation provides an output image DMF whose
Image Matching Module histogram is bimodal. In this kind of image the selec-
tion of a threshold, which highlights the defects, results
In Figure 1 the block diagram corresponding to the feasible. In fact, a proper value is given by the mean
Image Matching module is reported. The target of this of the modes of the histogram. Then, this image is
module consists of finding the best matching between segmented by means of the corresponding space-invari-
the images yielded by processing the acquired image ant CNN (T. Roska, L. Kek, L. Nemes, A. Zarandy &
and the reference one. To this purpose the image OF P. Szolgay 1999), obtaining the corresponding binary
is shifted by one pixel into the four cardinal directions image F. In this way errors corresponding to incorrect
(NORTH; SOUTH, EAST and WEST), using four identification of defects are minimized because only
space-invariant CNNs (T. Roska, L. Kek, L. Nemes, flaws are visible after the segmentation.
A. Zarandy & P. Szolgay 1999) and obtaining the im-
ages OFN, OFS, OFE and OFW. Successively the switch
AVI of Surface Flaws on Manufactures II
- + - + - + - + - +
RF
DN DS DE DW D0
Black pixel Black pixel Black pixel Black pixel Black pixel
count count count count count
+ bN bS bE bW b0
-
Maximum selector
AVI of Surface Flaws on Manufactures II
Figure 2. (a) Acquired Image containing the ROI (the flange of injection pump) with two dents; (b) gray-scale
histogram of the image in Figure 2(a)
(a) (b)
Figure 3. (a) Output image of the Fuzzy Contrast Enhancement module fed by the image in Figure 2(a); (b)
gray-scale histogram of the image in Figure 3(a)
(a) (b)
Detection one are shown, respectively. It can be noticed an information loss about details takes place. (Brendel
that the central defect has been isolated effectively, & Roska 2002).
even if a percentage of the areas to be identified is Finally, in the output image small white areas are
missed. This is due to the fact that, when details need misclassified as defects. This problem rises from the
to be detected, it is required that contrast is maximum. shift of the manufacture respect to the acquisition sys-
Nevertheless, as the contrast is increased, the histogram tem. The Image Matching module minimizes the effects
of the resulting image is emphasized toward extreme of such problem, but it can not delete them completely
values of gray levels with respect to the acquired im- when mechanical deformations of the manufacture oc-
age. This implies that, due to a saturation phenomenon, cur as in the textile field. As it is shown in Figure 5(d),
AVI of Surface Flaws on Manufactures II
Figure 4. (a) Output of the image matching module; (b) output of the major voting block in the defect detection
module; (c) output image containing the detected defects, represented by white pixels A
Figure 5. (a) Acquired image of a textile containing a thin bar, (b) corresponding output of the fuzzy contrast
enhancement module; (c) output of the image matching module, (d) final output image
this implies the presence of false positives, which will false positives could be analyzed by means of further
be investigated subsequently. techniques which relate the characteristics of the pos-
sible defected zones and the ones containing effective
defects according to the constraints of the application.
FUTURE TRENDS For instance, in the reported numerical examples the
false positives have geometric sizes which are negligible
As it can be argued from an observation of obtained if compared to the areas of eventual flaws. Therefore, a
numerical results, future works will be devoted to a more control of area dimensions could enable to discriminate
detailed analysis of misclassifications. In particular, the two kinds of regions.
AVI of Surface Flaws on Manufactures II
R. Stojanovic, P. Mitropulos, C.Koullamas, Y. Karayian- redistribute the information of the histogram toward
nis, S. Koubias, G .Papadopoulos (2001), Real-Time the extremes of a grey level range. The target of this A
Vision-Based System for Textile Fabric Inspection, operation consists of enhancing the contrast of digital
Real-Time Imaging, vol. 7, pp. 507-518. images.
Image Matching: Establishment of the corre-
spondence between each pair of visible homologous
KEy TERMS image points on a given pair of images, aiming at the
evaluation of novelties.
Automated Visual Inspection: An automatic form
of quality control normally achieved using one or more Major Voting: An operation aiming at deciding
cameras connected to a processing unit. Automated whether the neighbourhood of a pixel in a digital
Visual Inspection has been applied to a wide range of image contains more black or white pixels, or their
products. Its target consists of minimizing the effects number is equal. This effect is realized in two steps.
of visual fatigue of human operators who perform the The first one gives rise to an image, where the sign
defect detection in a production line environment. of the rightmost pixel corresponds to the dominant
colour. During the second step the grey levels of the
Cellular Neural Networks: A particular circuit rightmost pixels are driven into black or white values,
architecture which possesses some key features of depending on the dominant colour, or they are left
Artificial Neural Networks. Its processing units are unchanged otherwise.
arranged in an MN grid. The basic unit of Cellular
Neural Networks is called cell and contains linear and Real Time System: A system that must satisfy
non linear circuit elements. Each cell is connected only explicit bounded response time constraints to avoid
to its neighbour cells. The adjacent cells can interact failure. Equivalently, a real-time system is one whose
directly with each other, whereas cells not directly logical correctness is based both on the correctness of
connected together may affect each other indirectly the outputs and its timeliness. The timeliness constraints
because of the propagation effects of the continuous or deadlines are generally a reflection of the underlying
time dynamics. physical process being controlled.
Fuzzy Associative Memory: A kind of content-ad- Region of Interest: A selected subset of samples
dressable memory in which the recall occurs correctly within a dataset identified for a particular purpose. In
if input data fall within a specified window consisting image processing, the Region of Interest is identified by
of an upper bound and a lower bound of the stored the boundaries of an object. The encoding of a Region
patterns. A Fuzzy Associative Memory is identified of Interest can be achieved by basing its choice on:
by a matrix of fuzzy values. It allows to map an input (a) a value that may or may not be outside the normal
fuzzy set into an output fuzzy one. range of occurring values; (b) purely separated graphic
Histogram Stretching: A point process that in- information, like drawing elements; (c) separated
volves the application of an appropriate transformation semantic information, such as a set of spatial and/or
function to every pixel of a digital image in order to temporal coordinates.
INTRODUCTION
S r (i, j ) = C (k , l ) max { k i , l j } r
1 k M ,1l N
Since its seminal publication in 1988, the Cellular
Neural Network (CNN) (Chua & Yang, 1988) paradigm (1)
have attracted research communitys attention, mainly
because of its ability for integrating complex comput- This set is sometimes referred as a (2r +1) (2r +1)
ing processes into compact, real-time programmable neighbourhood, e.g., for a 3 3 neighbourhood, r should
analogic VLSI circuits (Rodrguez et al., 2004). be 1. Thus, the parameter r controls the connectivity of
Unlike cellular automata, the CNN model hosts a cell, i.e. the number of active synapses that connects
nonlinear processors which, from analogic array the cell with its immediate neighbours.
inputs, in continuous time, generate analogic array When r > N /2 and M = N, a fully connected CNN
outputs using a simple, repetitive scheme controlled is obtained, where every neuron is connected to every
by just a few real-valued parameters. CNN is the core other cell in the network and Sr(i,j) is the entire array.
of the revolutionary Analogic Cellular Computer, a This extreme case corresponds to the classic Hopfield
programmable system whose structure is the so-called ANN model (Chua & Roska, 2002).
CNN Universal Machine (CNN-UM) (Roska & Chua, The state equation of any cell C(i,j) in the M N
1993). Analogic CNN computers mimic the anatomy array structure of the standard CNN may be described
and physiology of many sensory and processing organs mathematically by:
with the additional capability of data and program stor-
dzij (t ) 1
ing (Chua & Roska, 2002). C = zij (t ) + [A(i, j; k , l ) ykl (t ) + B(i, j; k , l ) xkl ]+ Iij
dt R
This article reviews the main features of this Artifi- C ( k ,l )Sr ( i , j )
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Basic Cellular Neural Networks Image Processing
pixel (i, j), with constant weighted circuits defined by In other words,
the feedback template A that connects the cell with the B
output plane Y, and by the control template B, which A(i, j; k , l ) ykl = A(i k , j l ) ykl
connects the neuron to the neighbouring pixels of C ( k ,l )Sr ( i , j ) k i r l j r
Basic Cellular Neural Networks Image Processing
Apart from edges, convex corners (i.e. black pixels then two possible situations may occur: the activation
with at least five white neighbours) may also be detected of cells in the opposite part of only one of the satura-
with the following modification of its parameters: tion regions (partial inversion), or wave propagating
cell inversions in both binary states.
A = 2, B = BEDGE, I = 8.5 (7) The first kind of these feedback-driven CNN is
said to have the mono-activation property if cells in
This example illustrates the important role played only one saturated region can enter the linear region.
by the threshold parameter I. This parameter may be Thus, if cells can enter the linear region from the posi-
viewed as a bias index that reallocates the origin z0 of tive saturation region, then those cells saturated in the
the output function (3) (Fernndez et al., 2006). negative part must fulfil that the overall contribution
of A, B and I in its sphere of influence Sr must be less
Basic Logic Operators than 1. That is,
In order to perform pixel-wise logic operations between wij (t ) = [a kl ykl (t ) + bkl xkl ]+ I ij < 1
two binary images X1 and X2, the initial state Z(0) of the Sr ( i , j )
0
Basic Cellular Neural Networks Image Processing
0 0 0 REFERENCES
B
AShadow = 1 2 0 , B = 0, I = 0 Chua, L.O., & Roska, T. (2002). Cellular Neural
(12)
0 0 0 Networks and Visual Computing. Foundations and
Applications. Cambridge, UK: Cambridge University
Press.
FUTURE TRENDS Chua, L.O., & Roska, T. (1993). The CNN Paradigm.
IEEE Transactions on Circuits and Systems I: Funda-
There is a continuous quest by engineers and special- mental Theory and Applications, 40, 147156.
ists: compete with and imitate nature, especially some
smart animals. Vision is one particular area which Chua, L.O., & Yang, L. (1988). Cellular Neural Net-
computer engineers are interested in. In this context, the works: Theory and Applications. IEEE Transactions
so-called Bionic Eye (Werblin et al., 1995) embedded on Circuits and Systems, 35, 12571290.
in the CNN-UM architecture is ideal for implementing Crounse, K.R., Roska, T., & Chua, L.O. (1993). Im-
many spatio-temporal neuromorphic models. age Halftoning with Cellular Neural Networks. IEEE
With its powerful image processing toolbox and Transactions on Circuits and Systems II: Analog and
a compact VLSI implementation (Rodrguez et al., Digital Signal Processing, 40, 267-283.
2004), the CNN-UM can be used to program or mimic
different models of retinas and even combinations of Fernndez, J.A., Preciado, V.M., & Jaramillo, M.A.
them (Lzr et al., 2004). Moreover, it can combine (2006). Nonlinear Mappings with Cellular Neural
biologically based models, biologically inspired mod- Networks. Lecture Notes in Computer Science, 4177,
els, and analogic artificial image processing algorithms. 350359.
This combination will surely bring a broader kind of
Jain, A.K. (1989). Fundamentals of Digital Image
applications and developments.
Processing. Englewood Cliffs, NJ, USA: Prentice-
Hall.
CONCLUSION Kk, L., & Zarndy, A. (1998). Implementation of Large
Neighborhood Non-Linear Templates on the CNN
A number of other advances in the definition and char- Universal Machine. International Journal of Circuit
acterization of CNN have been researched in the past Theory and Applications, 26, 551-566.
decade. This includes the definition of methods for
Lzr, A.K., Wagner, R., Blya, D., & Roska, T. (2004).
designing and implementing larger than 3 3 neigh-
Functional Representations of Retina Channels via the
bourhoods in the CNN-UM (Kk & Zarndy, 1998),
RefineC Retina Simulator. International Workshop
the efficient implementation of halftoning techniques
on Cellular Neural Networks and their Applications
(Crounse et al., 1993), the CNN implementation of some
CNNA 2004, 333-338.
image compression techniques (Venetianer et al., 1995)
or the design of a CNN-based Fast Fourier Transform Matsumoto, T., Chua, L.O., & Suzuki, H. (1990 a).
algorithm over analogic signals (Perko et al., 1998), CNN Cloning Template: Connected Component De-
between many others. Some of them have also been tector. IEEE Transactions on Circuits and Systems,
described in this book in the article entitled Advanced 37, 633-635.
Cellular Neural Networks Image Processing.
In this article, a general review of the main proper- Matsumoto, T., Chua, L.O., & Suzuki, H. (1990 b). CNN
ties and features of the Cellular Neural Network model Cloning Template: Shadow Detector. IEEE Transac-
has been addressed, focusing on its DIP capabilities tions on Circuits and Systems, 37, 1070-1073.
from a basic viewpoint. CNN is now a fundamental Perko, M., Iztok Fajfar, I., Tuma, T., & Puhan, J. (1998).
and powerful toolkit for real-time nonlinear image Fast Fourier Transform Computation Using a Digital
processing tasks, mainly due to its versatile program- CNN Simulator. Fifth IEEE International Workshop
mability, which has powered its hardware development on Cellular Neural Network and Their Applications
for visual sensing applications. Proceedings, 230-236.
Basic Cellular Neural Networks Image Processing
Rodrguez, A., Lin, G., Carranza, L., Roca, E., Neuromorphic: A term coined by Carver Mead
Carmona, R., Jimnez, F., Domnguez, R., & Espejo, in the late 1980s to describe VLSI systems containing
S. (2004). ACE16k: The Third Generation of Mixed- electronic analogue circuits that mimic neuro-biologi-
Signal SIMD-CNN ACE Chips Toward VSoCs. IEEE cal architectures present in the nervous system. More
Transactions on Circuits and Systems I: Regular Papers, recently, its definition has been extended to include both
51, 851863. analogue, digital and mixed mode A/D VLSI systems
that implements models of neural systems as well as
Roska, T., & Chua, L.O. (1993). The CNN Universal
software algorithms.
Machine: An Analogic Array Computer. IEEE Transac-
tions on Circuits and Systems II: Analog and Digital Piecewise Linear Function: A function f(x) that
Processing, 40, 163173. can be split into a number of linear segments, each of
which is defined for a non-overlapping interval of x.
Torralba, A.B., & Hrault, J. (1999). An Efficient Neuro-
morphic Analog Network for Motion Estimation. IEEE Spatial Convolution: A term used to identify the
Transactions on Circuits and Systems I: Fundamental linear combination of a series of discrete 2D data (a
Theory and Applications, 46, 269-280. digital image) with a few coefficients or weights. In
the Fourier theory, a convolution in space is equivalent
Venetianer, P.L., Werblin, F., Roska, T., & Chua, L.O.
to (spatial) frequency filtering.
(1995). Analogic CNN Algorithms for Some Image
Compression and Restoration Tasks. IEEE Transac- Template: Also known as kernel, or convolution
tions on Circuits and Systems I: Fundamental Theory kernel, is the set of coefficients used to perform a spa-
and Applications, 42, 278-284. tial filter operation over a digital image via the spatial
convolution operator.
Werblin, F., Roska, T., & Chua, L.O. (1995). The
Analogic Cellular Neural Network as a Bionic Eye. Transient: In electronics, a transient system is a
International Journal of Circuit Theory and Applica- short life oscillation in a system caused by a sudden
tions, 23, 541-569. change of voltage, current, or load. They are mostly
found as the result of the operation of switches. The
signal produced by the transient process is called the
KEy TERMS transient signal or simply the transient. Also, the tran-
sient of a dynamic system can be viewed as its path to
Artificial Neural Network (ANN): A system made a stable final output.
up of interconnecting artificial neurons or nodes (usually
simplified neurons) which may share some properties VLSI: Acronym that stands for Very Large Scale
of biological neural networks. They may either be used Integration. It is the process of creating integrated cir-
to gain an understanding of biological neural networks, cuits by combining thousands (nowadays hundreds of
or for solving traditional artificial intelligence tasks millions) of transistor-based circuits into a single chip.
without necessarily attempting to model a real biological A typical VLSI device is the microprocessor.
system. Well known examples of ANN are the Hopfield,
Kohonen and Cellular (CNN) models.
Feedback: The signal that is looped back to control
a system within itself. When the output of the system
is fed back as a part of the system input, it is called
a feedback loop. A simple electronic device which
is based on feedback is the electronic oscillator. The
Phase-Locked Loop (PLL) is an example of complex
feedback system.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bayesian Neural Networks for Image Restoration
inference and clear axioms. Yet both approaches aim to of a positive, additive probability density function.
create models in agreement with data. Bayesian deci-
Likewise, the measured data g = {g1 , g 2 , ...g M } are
sion theory provides intrinsic means to model ranking.
expressed in the form of a probability distribution (Fig.
Bayesian inference for ANNs can be implemented nu-
1). Further assumption refers to image data as a linear
merically by deterministic methods involving Gaussian
function of physical intensity, and that the errors (noise)
approximations (MacKay, 1992), or by Monte-Carlo
b is data independent, additive, and Gaussian with zero
methods (Neal, 1996). Two features distinguish the
mean and known standard deviation m , m = 1 , 2 , ...,M
Bayesian approach to learning models from data. First,
in each pixel. The concept of image entropy and the
beliefs derived from background knowledge are used to
entropy alternative expressions used in image restora-
select a prior probability distribution for model param-
tion are discussed by Gull and Skilling (1985). A brief
eters. Secondly, predictions of future observations are
review of different approaches based on ME principle,
performed by integrating the models predictions with
as well as a full Bayesian approach for solving inverse
respect to the posterior parameter distribution obtained
problems are due to Djafari (1995).
by updating this prior with new data. Both aspects are
Image models are derived on the basis of intui-
difficult in neural modeling: the prior over network
tive ideas and observations of real images, and have
parameters has no obvious relation to prior knowledge,
to comply with certain criteria of invariance, that is,
and integration over the posterior is computationally
operations on images should not affect their likelihood.
demanding. The properties of priors can be elucidated
Each model comprises a hypothesis H with some free
by defining classes of prior distributions for net param-
parameters w = ( , , ...) that assign a probability
eters that reach sensible limits as the net size goes to
infinity (Neal, 1994). The problem of integrating over density P ( f | w, H ) over the entire image space and
the posterior can be solved using Markov chain Monte normalized to integrate to unity. Prior beliefs about the
Carlo (Neal 1996). validity of H before data acquisition are embedded in
P(H). Extreme choices for P(H) only may exceed the
Bayesian Image Modeling
evidence P ( f | H ) , thus the plausibility P (H | f ) of
The fundamental concept of Bayesian analysis is that H is given essentially by the evidence P ( f | H ) of the
image f. Consequently, objective means for comparing
the plausibility of alternative hypotheses {H i }i is various hypotheses exist.
Initially, the free parameters w are either unknown
represented by probabilities {Pi }i , and inference is
or they are assigned very wide prior distributions. The
performed by evaluating these probabilities. Inference
task is to search for the best fit parameter set wMP, which
may opperate on various propositions related in neural
has the largest likelihood given the image. Following
modeling to different paradigms. Bayes theorem makes
Bayes theorem:
no reference to any sample or hypothesis space, neither
it determines the numerical value of any probability
directly from available information. As a prerequisite P ( f | w, H ) P (w | H )
P (w | f, H ) = (1)
to apply Bayes theorem, a principle to cast available P( f | H )
information into numerical values is needed.
In statistical restoration of gray-level digital im-
ages, the basic assumption is that there exists a scene where P ( f | w, H ) is the likelihood of the image f
adequately represented by an orderly array of N pixels. given w, P (w | H ) is the prior distribution of w, and
The task is to infer reliable statistical descriptions of im-
P ( f | H ) is the evidence for H. A prior P (w | H )
ages, which are gray-scale digitized pictures and stored
has to be assigned quite subjectively based on our
as an array of integers representing the intensity of gray
level in each pixel. Then the shape of any positive, ad- beliefs about images. Since P (w | f, H ) is normal-
ditive image can be directly identified with a probability ized to 1, then the denominator in (1) ought to satisfy
distribution. The image is conceived as an outcome of
P ( f | H ) = P ( f | w, H ) P (w | H ) d w . The inte-
a random vector f = {f1 , f 2 , ..., f N }, given in the form w
Bayesian Neural Networks for Image Restoration
w
0 w LINEAR IMAGING EXPERIMENTS
between the posterior accessible volume of the In the widely spread linear case, where the N-dimen-
models parameter space and the prior accessible sional image vector f consists of the pixel values of an
volume prevents data overfitting by favoring simpler unobserved image, and the M-dimensional data vector
models. Further, Bayes theorem gives the probability g is made of the pixel values of an observed image
of H up to a constant: supposed to be a degraded version of f, and assuming
zero-mean Gaussian additive errors:
Bayesian Neural Networks for Image Restoration
g m Rmn f n
1 bT b 1 M
Eb (g | f, H ) = =
2 n =1
N
1 M
g m Rmn f n 2 b2 2 m=1 2
P (g | f, C , H ) =
1
exp n =1 b
M M 2
(2 ) 2 2 m =1 m
m
m =1
is the error function, and is the noise partition func-
(7) tion.
More complex models use the intrinsic correlation
The full joint posterior P ( f, | g, H ) of the image f
1
and the unknown PSF parameters denoted generically function C = GGT , where G is a convolution from
by should be evaluated. Then the required inference
an imaginary hidden image, which is uncorrelated, to
about the posterior probability P ( f | g, H ) is obtained the real correlated image.
as a marginal integral of this joint posterior over the If the prior probability of the image f is also Gauss-
uncertainties in the PSF: ian:
P ( f | g, H ) = P ( f, | g, H ) d = P ( f | , g, H ) P ( | g, H ) d 1 1
P ( f | F0 , H ) = 1
exp f T F01 f
N 2
(8) (2 ) 2 det 2 F0
(12)
Now applying Bayes theorem for the parameters
:
where is the prior covariance matrix of f, and assum-
P (g | , H ) P ( | H ) ing a uniform standard deviation of the image, then
P ( | g, H ) = (9) its prior probability distribution becomes:
P (g | H )
1
and substituting in (8) P( f | ,H )=
Zf ( )
(
exp E f ( f | F0 ) )
P ( f, | g, H ) d P ( f | , g, H ) P (g | , H ) P ( | H ) d (13)
(10) where the parameter = 1 2f measures the expected
smoothness of f, Z f ( ) = (2 ) is the partition
N 2
Bayesian Neural Networks for Image Restoration
function of f, and 2
C B
b
2
1 f
E f ( f | F 0 ) = f T F01 f .
2
is negligible, the optimal linear filter
The posterior probability of image f given data g is
1
derived from Bayes theorem: T 2
R R b2 C RT
P (g | f, , H ) P ( f | , H )
P ( f | g, , , H ) =
f
1
P (g | , , H ) equates to the pseudoinverse R 1 = RT R RT .
(14)
Entropic Prior of Images
where the evidence P (g | , , H ) is the normalizing Invoking the ME principle requires that the prior
factor. Since the denominator in (14) is a product of knowledge to be stated as a set of constraints on f,
Gaussian functions of f, we may rewrite: though affecting the amount by which the image recon-
struction is offset from reality. The prior information
P ( f | g, , , H ) =
(
exp E f Eb ) = exp (M ( f )) about f may be expressed as a probability distribution
(Djafari, 1995):
ZM ( , ) ZM ( , )
(15) 1
P( f | ,H )= exp ( ( f )) (17)
Z( )
where
where a is generally a positive parameter and Z(a) is
M ( f ) = E f + Eb the normalizing factor. The entropic prior in the discrete
and Z M ( , ) = f exp ( M ( f )) d f case may correspond to potential functions like:
N
fn
with the integral covering the space of all admissible im- ( f ) = fn ln (18)
U
ages in the partition function. Therefore, minimizing the n=1
U
2
derivative being zero: n =1
m =1 m
2
1 (19)
f MP = RT R b
2
C RT f (16)
f An estimation rule, such as posterior mean or
maximum a posteriori (MAP), is needed in order to
The term choose an optimal, unique, and stable solution f
for the estimated image. The posterior probability is
2
b
C assumed to summarize the full state of knowledge on
2
f a given scene. Producing a single image as the best
restoration naturally leads to the most likely one which
regularizes the ill-conditioned inversability. When maximizes the posterior probability P ( f | g, , C , H ),
the term
Bayesian Neural Networks for Image Restoration
along with some statement of reliability derived from called blurred signal-to-noise ratio redefined here by
the spread of all admissible images. using the noise variance in each pixel such as:
In variational problems with linear constraints,
Agmon et al. (1979) showed that the potential func- 1 N
[yn yn ]2
tion associated to a positive, additive image is always BSNR = 10 lg
N
2 (21)
n =1 n
concave for any set of Lagrange multipliers, and it
possesses an unique minimum which coincides with where y = g b is the difference between the measured
the solution of the nonlinear system of constraints. As a data g and the noise b.
prerequisite, the linear independence of the constraints In simulations, where the original image f of the
is checked and then the necessary and sufficient condi- measured data g is available, the objectivity of testing
tions for a feasible solution are formulated. Wilczek the performance of image restoration algorithms may
and Drapatz (1985) suggested the Newton-Raphsons be assessed by the improvement of signal-to-noise ratio
iteration method as offering high accuracy results. metric defined as:
Ortega and Rheinboldt (1970) adopted a continuation
N
technique for the very few cases where the Newtons
[ fn gn ]
2
method fails to converge. These techniques are never- n =1
theless successful in practice for relatively small data ISNR = 10 lg N 2
sets only and assume a symmetric positive definite f n fn (22)
n =1
Hessian matrix of the potential function.
~
where f is the best statistical estimation of the cor-
Quality Assessment of Image rect solution f.
Restoration While mean squared error metrics like ISNR do not
always reflect the perceptual properties of the human
In all digital imaging systems, quality degradation is visual system, they may provide an objective stan-
inevitably due to various sources like photon shot noise, dard by which to compare different image processing
finite acquisition time, readout noise, dark current noise, techniques. Nevertheless, it is of major significance
and quantization noise. Some noise sources can be ef- that various algorithms behavior be analyzed from
fectively suppressed yet some cannot. The combined the point of view of ringing and noise amplification,
effect of these degradation sources is often modeled by which can be a key indicator of improvement in quality
Gaussian additive noise (Pham et al. 2005). for subjective comparisons of restoration algorithms
In order to quantitatively estimate the restoration (Banham and Katsaggelos, 1997).
quality in the case of similar size (M = N) for both the
~
measured g and the restored image f , the mean energy
of restoration error: FUTURE TRENDS
Bayesian Neural Networks for Image Restoration
An efficient MAP procedure has to be implemented Daniell, G. J. (1994). Of maps and monkeys: An intro-
in a recursive supervised trained neural net to get re- duction to the maximum entropy method. In B. Buck B
stored (reconstructed) the best image in compliance & V. A. Macaulay (Eds.), Maximum entropy in action
with the existing constraints, measuring and modeling (pp. 1-18). Oxford: Clarendon Press.
errors.
Djafari, A. M.- (1995). A full Bayesian approach for
inverse problems. In K. M. Hanson & R. N. Silver
(Eds.), Maximum entropy and bayesian methods (pp.
CONCLUSION
135-144).
A major intrinsic difficulty in Bayesian image restora- Donoho, D. L. (1996). Unconditional bases and bit-level
tion resides in determination of a prior law for images. compression. Applied and Computational Harmonic
The ME principle solves this problem in a self-consistent Analysis, 1(1), 100-105.
way. The ME model for image deconvolution enforces
Gull, S. F. & Skilling, J. (1985). The entropy of an im-
the restored image to be positive. The spurious negative
age. In C. R. Smith & W. T. Grandy Jr. (Eds), Maximum
areas and complementary spurious positive areas are
entropy and Bayesian methods in inverse problems (pp.
wiped off and the dynamic range of the restored image
287-302), Dordrecht: Kluwer Academic Publishers.
is substantially enhanced.
Image restoration based on image entropy is effec- MacKay, D. J. K. (1992). A practical Bayesian frame-
tive even in the presence of significant noise, missing work for backpropagation networks, Neural Computa-
or corrupted data. This is due to the appropriate regu- tion, 4, 448-472.
larization of the inverse problem of image restoration
introduced in a coherent way by the ME principle. It MacKay, D. J. K. (2003). Information theory, infer-
satisfies all consistency requirements when combining ence, and learning algorithms. Cambridge: University
the prior knowledge and the information contained in Press.
experimental data. A major result is that no artifacts Mallat, S. (2000). Une exploration des signaux en
are added since no structure is enforced by entropic ondelettes, Editions de lEcole Polytechnique.
priors.
Bayesian ME approach is a statistical method which Mayers, K. J. & Hanson, K. M. (1990). Comparison of
directly operates in spatial domain, thus eliminating the algebraic reconstruction technique with the maxi-
the inherent errors coming out from numerical Fou- mum entropy reconstruction technique for a variety of
rier direct and inverse transformations and from the detection tasks. Proceedings of SPIE, 1231, 176-187.
truncation of signals. Meyer, Y. (1993). Review of An introduction to
wavelets and ten lectures on wavelets. Bulletin of the
American Mathematical Society, 28, 350-359.
REFERENCES
Mutihac, R., Colavita A.A., Cicuttin, A. & Cerdeira, A.
Agmon, N., Alhassid, Y., & Levine, R. D. (1979). An al- E. (1997). Bayesian modeling of feed-forward neural
gorithm for finding the distribution of maximal entropy. networks. Fuzzy Systems & Artificial Intelligence,
Journal of Computational Physics, 30, 250-258. 6(1-3), 31-40.
Banham, M. R. & Katsaggelos, A. K. (1997, March). Neal, R.M. (1994). Priors for infinite networks. Tech-
Digital image restoration, IEEE Signal Processing nical Report CRG-TR-94-1, Department of Computer
Magazine, 24-41. Science, University of Toronto.
Berger, J. (1985). Statistical decision theory and Bayes- Neal, R. M. (1996). Bayesian learning for neural net-
ian analysis. Springer-Verlag. works. In Lecture Notes in Statistics, 118, New York:
Springer-Verlag
Candes, E. J. (1993). Ridgelets: Theory and applica-
tions. PhD Thesis. Department of Statistics, Standford Ortega, J. M. & Rheinboldt, W. B. (1970). Iterative
University, 1998. solution of nonlinear equations in several variables.
New York: Academic Press.
Bayesian Neural Networks for Image Restoration
Pham, T. Q., van Vliet, L. J., & Schutte K. (2005). Deconvolution: An algorithmic method for elimi-
Influence of SNR and PSF on limits of super-resolu- nating noise and improving the resolution of digital
tion. Proceedings of SPIE-IS&T Electronic Imaging, data by reversing the effects of convolution on recorded
5672, 169-180. data.
Skilling, J. (1989). Classic maximum entropy. In J. Skill- Digital Image: A representation of a 2D/3D image
ing (Ed.), Maximum entropy and Bayesian methods (pp. as a finite set of digital values called pixels/voxels
45-52), Dordrecht: Kluwer Academic Publishers. typically stored in computer memory as a raster image
or raster map.
Wilczek, R. & Drapatz, S. (1985). A high accuracy
algorithm for maximum entropy image restoration in Entropy: A measure of the uncertainty associated
the case of small data sets. Astronomy and Astrophys- with a random variable. Entropy quantifies information
ics, 142, 9-12. in a piece of data.
Image Restoration: A blurred image can be signifi-
KEy TERMS cantly improved by deconvolving its PSF in such a way
that the result is a sharper and more detailed image.
Artificial Neural Networks (ANNs): Highly
Point Spread Function (PSF): The output of the
parallel nets of interconnected simple computational
imaging system for an input point source.
elements, which perform elementary operations like
summing the incoming inputs (afferent signals) and Probabilistic Inference: An effective approach to
amplifying/thresholding the sum. approximate reasoning and empirical learning in AI.
Bayesian Inference: An approach to statistics in
which all forms of uncertainty are expressed in terms
of probability.
0
Slavador Espaa-Boquera
Universidad Politcnica de Valencia, Spain
Francisco Zamora-Martnez
Universidad Politcnica de Valencia, Spain
INTRODUCTION BACKGROUND
The field of off-line optical character recognition (OCR) Multilayer Perceptrons (MLPs) have been used in previ-
has been a topic of intensive research for many years ous works for image restoration: the input to the MLP
(Bozinovic, 1989; Bunke, 2003; Plamondon, 2000; is the pixels in a moving window, and the output is the
Toselli, 2004). One of the first steps in the classical restored value of the current pixel (Egmont-Petersen,
architecture of a text recognizer is preprocessing, 2000; Hidalgo, 2005; Stubberud, 1995; Suzuki, 2003).
where noise reduction and normalization take place. We have also used neural network filters to estimate
Many systems do not require a binarization step, so the the gray level of one pixel at a time (Hidalgo, 2005):
images are maintained in gray-level quality. Document the input to the MLP consisted of a square of pixels
enhancement not only influences the overall perfor- that was centered at the pixel to be cleaned, and there
mance of OCR systems, but it can also significantly were four output units to gain resolution (see Figure
improve document readability for human readers. In 1). Given a set of noisy images and their corresponding
many cases, the noise of document images is hetero- clean counterparts, a neural network was trained. With
geneous, and a technique fitted for one type of noise the trained network, the entire image was cleaned by
may not be valid for the overall set of documents. One scanning all the pixels with the MLP. The MLP, there-
possible solution to this problem is to use several filters fore, functions like a nonlinear convolution kernel. The
or techniques and to provide a classifier to select the universal approximation property of a MLP guarantees
appropriate one. the capability of the neural network to approximate any
Neural networks have been used for document continuous mapping (Bishop, 1996).
enhancement (see (Egmont-Petersen, 2002) for a re- This approach clearly outperforms other classic
view of image processing with neural networks). One spatial filters for reducing or eliminating noise from
advantage of neural network filters for image enhance- images (the mean filter, the median filter, and the clos-
ment and denoising is that a different neural filter can ing/opening filter (Gonzalez, 1993)) when applied to
be automatically trained for each type of noise. enhance and clean a homogeneous background noise
This work proposes the clustering of neural network (Hidalgo, 2005).
filters to avoid having to label training data and to
reduce the number of filters needed by the enhance-
ment system. An agglomerative hierarchical clustering BEHAVIOUR-BASED CLUSTERING OF
algorithm of supervised classifiers is proposed to do NEURAL NETWORKS
this. The technique has been applied to filter out the
background noise from an office (coffee stains and Agglomerative Hierarchical Clustering
footprints on documents, folded sheets with degraded
printed text, etc.). Agglomerative hierarchical clustering is considered to
be a more convenient approach than other clustering
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Behaviour-Based Clustering of Neural Networks
Figure 1. An example of document enhancement with an artificial neural network. A cleaned image (right) is
obtained by scanning the entire noisy image (left) with the neural network.
algorithms, mainly because it makes very few assump- a) Determine the closest pair of clusters i and j.
tions about the data (Jain, 1999; Mollineda, 2000). In- b) Merge the two closest clusters into a new cluster
stead of looking for a single partition (based on finding i+j.
a local minimum), this clustering algorithm constructs c) Update the dissimilarity distances from the new
a hierarchical structure by iteratively merging clusters cluster i+j to all the other clusters.
according to certain dissimilarity measure, starting from d) If more than one cluster remains, go to step a).
singletons until no further merging is possible (one
general cluster). The hierarchical clustering process 4. Select the number N of clusters for a given crite-
can be illustrated with a tree that is called dendogram, rion.
which shows how the samples are merged and the
degree of dissimilarity of each union (see Figure 2). Behaviour-Based Clustering of
The dendogram can be easily broken at a given level Supervised Classifiers
to obtain clusters of the desired cardinality or with a
specific dissimilarity measure. A general hierarchical When the points of the set to be clustered are supervised
clustering algorithm can be informally described as classifiers, both a dissimilarity distance and the way to
follows: merge two classifiers must be defined (see Figure 2):
Behaviour-Based Clustering of Neural Networks
Figure 2. Behaviour-based clustering of supervised classifiers. An example of the dendogram obtained for M=5
points: A, B, C, D, E. If N=3, three clusters are selected: A+B, C, D+E. In this work, to merge two clusters, a new B
classifiers is trained. For example, cluster D+E is trained with the data used to train the classifiers D and E.
of both clusters. Another possibility is to build The method proposed in this work can be improved
an ensemble of the two classifiers. twofold: by using ensembles of MLPs when two MLPs
are merged, and by improving the method to select
An Application of Behaviour-based Clustering of the neural clustered filter that is the most suitable to
MLPs to Document Enhancement enhance a given noisy image.
In this work, MLPs are used as supervised classi-
fiers. When two clusters are merged, a new MLP is
trained with the associated training data of the two CONCLUSION
merged MLPs.
This behaviour-based clustering algorithm has been An agglomerative hierarchical clustering of supervised-
applied to enhance printed documents with typical learning classifiers that uses a measure of similarity
noises from an office (folded sheets, wrinkled sheets, among classifiers based on their behaviour on a vali-
coffee stains, ...). Figure 1 shows an example of a noisy dation dataset has been proposed. As an application
printed document (wrinkled sheet) from the corpus. of this clustering procedure, we have designed an
A set of MLPs is trained as neural filters for dif- enhancement system for document images using neural
ferent types of noise and then clustered into groups network filters. Both objective and subjective evalu-
to obtain a reduced set of neural clustered filters. In ations of the cleaning method show excellent results
order to automatically determine which clustered filter in cleaning noisy documents. This method could also
is the most suitable to clean and enhance a real noisy be used to clean and restore other types of images,
image, an image classifier is also trained using MLPs. such as noisy backgrounds in scanned documents,
Experimental results using this enhancement system stained paper of historical documents, vehicle license
show excellent results in cleaning noisy documents recognition, etc.
(Zamora-Martnez, 2007).
REFERENCES
FUTURE TRENDS
Bishop, C.M. (1996). Neural Networks for Pattern
Document enhancement is becoming more and more Recognition. Oxford University Press.
relevant due to the huge amount of scanned documents.
Besides, it not only influences the overall performance Bozinovic, R.M., & Srihari, S.N. (1989). Off-Line
of OCR systems, but it can also significantly improve Cursive Script Word Recognition. IEEE Trans. on
document readability for human readers. PAMI, 11(1), 6883.
Behaviour-Based Clustering of Neural Networks
Bunke, H. (2003). Recognition of Cursive Roman In: Computational and Ambient Intelligence. Volume
Handwriting Past, Present and Future. In: Proc. 4507 of LNCS. Springer-Verlag. 144-151.
ICDAR. 448461.
http://en.wikipedia.org
Egmont-Petersen, M., de Ridder, D., & Handels, H.
(2002). Image processing with neural networks a
review. Pattern Recognition 35(10). 22792301.
KEy TERMS
Gonzalez, R., & Woods, R. (1993). Digital Image
Processing. Addison-Wesley Pub. Co. Artificial Neural Network: An artificial neural
network (ANN), often just called a neural network
Hidalgo, J.L., Espaa, S., Castro, M.J., & Prez, J.A.
(NN), is an interconnected group of artificial neurons
(2005). Enhancement and cleaning of handwritten data
that uses a mathematical model or computational model
by using neural networks. In: Pattern Recognition and
for information processing based on a connectionist
Image Analysis. Volume 3522 of LNCS. Springer-
approach to computation.
Verlag. 376383
Backpropagation Algorithm: A supervised
Jain, A.K., Murty, M.N., & Flynn, P.J. (1999). Data
learning technique used for training artificial neural
clustering: a review. ACM Comput. Surv. 31(3).
networks. It was first described by Paul Werbos in
264323
1974, and further developed by David E. Rumelhart,
Kanungo, T., & Zheng, Q. (2004). Estimating Degra- Geoffrey E. Hinton and Ronald J. Williams in 1986.
dation Model Parameters Using Neighborhood Pattern It is most useful for feed-forward networks (networks
Distributions: An Optimization Approach. IEEE Trans. that have no feedback, or simply, that have no connec-
on PAMI 26(4). 520524. tions that loop).
Mollineda, R.A., & Vidal, E. (2000). A relative approach Clustering: The classification of objects into dif-
to hierarchical clustering. In: Pattern Recognition and ferent groups, or more precisely, the partitioning of
Applications. Volume 56. IOS Press. 1928. a data set into subsets (clusters), so that the data in
each subset (ideally) share some common trait - often
Plamondon, R., & Srihari, S.N. (2000). On-line and
proximity according to some defined distance measure.
off-line handwriting recognition: A comprehensive
Data clustering is a common technique for statistical
survey. IEEE Trans. on PAMI 22(1). 6384.
data analysis, which is used in many fields, including
Stubberud, P., Kanai, J., & Kalluri, V. (1995). Adap- machine learning, data mining, pattern recognition,
tive Image Restoration of Text Images that Contain image analysis and bioinformatics.
Touching or Broken Characters. In: Proc. ICDAR.
Document Enhancement: Accentuation of certain
Volume 2. 778781.
desired features, which may facilitate later processing
Suzuki, K., Horiba, I., & Sugie, N. (2003). Neural steps such as segmentation or object recognition.
Edge Enhancer for Supervised Edge Enhancement
Hierarchical Agglomerative Clustering: Hierar-
from Noisy Images. IEEE Trans. on PAMI 25(12).
chical Clustering algorithms find successive clusters
15821596.
using previously established clusters. Agglomerative
Toselli, A.H., Juan, A., Gonzlez, J., Salvador, I., Vidal, algorithms begin with each element as a separate cluster
E., Casacuberta, F., Keysers, D., & Ney, H. (2004). and merge them into successively larger clusters.
Integrated Handwriting Recognition and Interpretation
Multilayer Perceptron (MLP): This class of ar-
using Finite-State Models. Int. Journal of Pattern Rec-
tificial neural networks consists of multiple layers of
ognition and Artificial Intelligence 18(4). 519-539.
computational units, usually interconnected in a feed-
F. Zamora-Martnez, S. Espaa-Boquera, & M.J. forward way. Each neuron in one layer has directed
Castro-Bleda. (2007). Behaviour-based Clustering of connections to the neurons of the subsequent layer. In
Neural Networks applied to Document Enhancement. many applications the units of these networks apply a
sigmoid function as an activation function.
Behaviour-Based Clustering of Neural Networks
INTRODUCTION BACKGROUND
Large worldwide projects like the Human Genome The following section describes some of the problems
Project, which in 2003 successfully concluded the that are most commonly found in bioinformatics.
sequencing of the human genome, and the recently
terminated Hapmap Project, have opened new perspec- Interpretation of Gene Expression
tives in the study of complex multigene illnesses: they
have provided us with new information to tackle the The expression of genes is the process by which the
complex mechanisms and relationships between genes codified information of a gene is transformed into the
and environmental factors that generate complex ill- necessary proteins for the development and function-
nesses (Lopez, 2004; Dominguez, 2006). ing of the cell. In the course of this process, small
Thanks to these new genomic and proteomic data, sequences of ARN, also called ARN messengers, are
it becomes increasingly possible to develop new medi- formed by transcription and subsequently translated
cines and therapies, establish early diagnoses, and even into proteins.
discover new solutions for old problems. These tasks The amount of expressed mARN can be measured
however inevitably require the analysis, filtration, and with various methods, such as gel electrophoresis, but
comparison of a large amount of data generated in a large numbers of simultaneous expression analyses are
laboratory with an enormous amount of data stored in usually carried out with microarrays (Quackenbush,
public databases, such as the NCBI and the EBI. 2001), which make it possible to obtain the simultane-
Computer sciences equip biomedicine with an ous expression of tens of thousands of genes; such an
environment that simplifies our understanding of the amount of data can only be analyzed with the help of
biological processes that take place in each and every an informatic process.
organizational level of live matter (molecular level, Among the most common tasks in this type of
genetic level, cell, tissue, organ, individual, and popula- analysis is the task to find the differences between, for
tion) and the intrinsic relationships between them. instance, a patient and a test that determines whether
Bioinformatics can be described as the application a gene is expressed or not. These tasks can be divided
of computational methods to biological discoveries into classical problems of classification and cluster-
(Baldi, 1998). It is a multidisciplinary area that includes ing. Clustering is used not only in experiments of
computer sciences, biology, chemistry, mathematics, microarrays (to identify groups of genes with similar
and statistics. The three main tasks of bioinformatics expressions), but also suggests functional relationships
are the following: develop algorithms and mathematical between the members of the cluster.
models to test the relationships between the members
of large biological datasets, analyze and interpret het- Alignment of ADN, ARN, and Protein
erogeneous data types, and implement tools that allow Sequences
the storage, retrieve, and management of large amounts
of biological data. Sequences alignment consists in superposing two or
more sequences of both nucleotides (ADN and ARN)
and amino acids (proteins) in order to compare them and
analyze the sequence parts that are alike and unalike.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bio-Inspired Algorithms in Bioinformatics I
The optimal alignment is that which mainly shows cor- cies of individuals that are believed to have common
respondences between the nucleotides or amino acids descendence. Whereas traditionally morphological B
and is therefore said to have the highest score. This characteristics are used to carry out such analyses, in
alignment may or may not have a biological meaning. the present case we will study molecular phylogenetic
There are two types of alignment: the global alignment, trees, which use sequences of nucleotides or amino
which maximizes the number of coincidences in the acids for classification. The construction of these trees
entire sequence, and the local alignment, which looks is initially based on algorithms for multiple sequences
for similar regions in large sequences that are normally alignment, which allows us to classify the evolutive
highly divergent. The most commonly used technique relationships between homologue genes present in
to implement alignments is dynamic programming various species. In a second phase, we must calculate
by means of the Smith-Waterman algorithm (Smith, the genetic distance between each pair of sequences in
1981), which explores all the possible comparisons in order to represent them correctly in the tree.
the sequences.
Another problem in sequences alignment is multiple Gene Finding and Mapping
alignment (Wallace, 2005), which consists in aligning
three or more sequences of ADN, ARN, or proteins, and Gene finding (Fickett, 1996) basically consists in
is generally used to search for evolutive relationships identifying genes in an ADN chain by recognizing the
between these sequences. The problem is equivalent sequence that initiates the codification of the gene or
to that of simple sequences alignment, but takes into gene promoter. When the protein that will interpret the
consideration the n sequences that are to be compared. gene finds the sequence of that promoter, we know that
The complexity of the algorithm increases exponentially the next step is the recognition of the gene.
with the number of sequences to compare. Gene mapping (Setbal, 1999) consists in creating
a genetic map by assigning genes to a position inside
Identification of the Gene Regulatory the chromosome and by indicating the relative distance
Network between them. There are two types of mapping. Physical
or cytogenetic mapping, on the one hand, consists in
All the information of a living organisms genome is dividing the chromosome into small labelled fragments.
stored in each and every one of its cells. Whereas the Once divided, they must be ordered and situated in their
genome is used to synthesize information on all the correct position in the chromosome. Link mapping, on
body cells, the regulating network is in charge of guid- the other hand, shows the position of some genes with
ing the expression of a given set of genes in one cell respect to others. The latter mapping type has two in-
rather than another so as to form certain types of cells conveniences: it does not provide the distance between
(cellular differentiation) or carry out specific functions the genes, and it is unable to provide the correct order
related to spatial and temporal localization; in other if the genes are very close to each other.
words, it makes the genes express themselves when
and where necessary. The role of a gene regulatory Prediction of DNA, RNA, and Protein
network therefore consists in integrating the dynamic Structure
behaviour of the cell and the external signals with the
environment of the cell, and to guide the interaction The DNA and RNA sequences are folded into a tridi-
of all the cells so as to control the process of cellular mensional structure that is determined by the order of
differentiation (Geard, 2004). Inferring this regulating the nucleotides within the sequence. Under the same
network from the cellular expression data is considered environmental conditions, the tridimensional structure
to be one of the most complex problems in bioinformat- of these sequences implies a diverging behaviour. Since
ics (Akustsu, 1999). the secondary structure of the nucleic acids is a factor
that affects the link of both DNA molecules and RNA
Construction of Phylogenetic Trees molecules, it is essential to know these structures in
order to analyze a sequence.
A phylogenetic tree (Setbal, 1999) is a tree that shows The prediction of the folds that determine the RNA
the evolutionary relationships between various spe- structure is an important factor in the understanding of
Bio-Inspired Algorithms in Bioinformatics I
many biological processes, such as translation in the and subsequently the weights of the interconnections
RNA Messenger, replication of RNA chains in viruses, are modified so as to minimize the error between the
and the function of structural RNA and RNA/proteins obtained and the expected output; Non-supervised
complexes. learning networks: In this type of learning, none of the
The tridimensional structure of proteins is extremely expected output types is passed on to the network, but
diverse, going from completely fibrous to nodular. the network itself searches for the differences between
Predicting the folds of proteins is important, because a the inputs and separates the data accordingly.
proteins structure is closely related to its function. The
experimental determination of the proteinic structure Evolutionary Computation
as such helps us to find the proteinic function and al-
lows us to design synthetic proteins that can be used Evolutionary computation (Rechenberg, 1971)(Hol-
as medicines. land, 1975) is a technique that is inspired on evolutive
biological strategies: genetic algorithms, for example,
use biological techniques of cross-over, mutation, and
BIO-INSPIRED ALGORITHMS selection to solve searching and optimization problems.
Each of these operators has an impact on one or more
The basic principle of bio-inspired algorithms is to chromosomes, i.e. possible solutions to the problem,
use analogies with natural systems in order to solve and generates another series of chromosomes, i.e. the
problems. By simulating the behaviour of natural following generation of solutions. The algorithm is
systems, these algorithms design heuristic, non-deter- executed iteratively and as such takes the population
ministic methods for searching, learning, behaviour, through the generations until it finds an optimal solu-
etc. (Forbes, 2004). tion. Another strategy of evolutionary computation
is genetic programming (Koza 1990), which uses the
Artificial Neural Networks same operators as the genetic algorithms to develop
the optimal program to solve a problem.
Artificial neural networks (McCulloch, 1943)(Hertz,
1991)(Bishop, 1995) (Rumelhart, 1986) (ANNs) are Swarm Intelligence
computational models inspired on the behaviour of
the nervous system. Even though their development is Swarm intelligence (Beni, 1989)(Bonabeau, 2001)(En-
based on the modelling of biological processes in the gelbrench, 2005) is a recent family of bio-inspired
brain, there are considerable differences between the techniques based on the social or collective behaviour
processing elements of ANNs and actual neurons. of groups such as ants, bees, etc., insects which have
ANNs consist of unit networks that are intercon- very limited capacities as individuals, but form groups
nected and organized in layers that evolve in the course to carry out complex tasks.
of time. The main features of these systems are the
following:Self-Organization and Adaptability: Allow Immune Artificial System
robust and adaptive processing, adaptive training,
and self-organizing networks; Non-linear processing: The immune artificial system (Farmer, 1986)(Dasgupta,
Increase the networks capacity to approach, classify, 1999) is a new computational paradigm that has ap-
and be immune to noise;Parallel processing: use a peared in recent years and is based on the immune
large number of processing units with a high level of system of vertebrates. The biological immune system
interconnectivity. is a parallel and distributed adaptive system that uses
ANNs can be classified according to their learn- learning, memory, and associative recuperation to
ing type: Supervised learning neural networks: the solve problems of recognition and classification. It
network learns relationships between the input and particularly learns to recognize patterns, remember
output data. The input data are passed on to the input them, and use their combinations to build efficient
layer and propagate through the network architecture pattern detectors. From the point of view of informa-
until they reach the output layer. The output obtained tion processing, these interesting features are used
in this output layer is compared to the expected output,
Bio-Inspired Algorithms in Bioinformatics I
in the artificial immune system to successfully solve Bonabeau E, Dorigo M, Theraulaz G. (2001). Swarm
complex problems. intelligence: From natural to artificial systems. Journal B
of Artificial Societies and Social Simulation 4(1).
Dasgupta D. (1999). Artificial immune system an their
CONCLUSION
applications. Springer-Verlang Berlin.
This article describes the main problems that are Domnguez E, Loza MI, Padn JF, Gesteira A, Paz E,
presently found in the field of bio-informatics. It also Pramo M, Brenlla J, Pumar E, Iglesias F, Cibeira A,
presents some of the bio-inspired computation tech- Castro M, Caruncho H, Carracedo A, Costas J. (2006).
niques that provide solutions for problems related to Extensive linkage disequilibrium mapping at HTR2A
classification, clustering, minimization, modelling, etc. and DRD3 for schizophrenia susceptibility genes in
The following article will describe a series of techniques Galician population. Schizophrenia Research, 2006.
that allow researchers to solve the above problems with
Engelbrencht AP. (2005). Fundamentals of computation
bio-inspired models.
swarm intelligence. Wiley.
Farmer J, Pachard N and Parelson A. (1986). The im-
ACKNOWLEDGMENT mune system, adaption and machine learning. Physica
D 2:189-204.
This work was partially supported by the Spanish
Fickett JW. (1996). Finding genes by computer: The
Ministry of Education and Culture (Ref TIN2006-
state of art. Trends in Genetics 12(8):316:320.
13274) and the European Regional Development Funds
(ERDF), grant (Ref. PIO61524) funded by the Carlos III Forbes N. (2004). Imitation of Life. How Biology Is
Health Institute, grant (Ref. PGIDIT 05 SIN 10501PR) Inspiring Computing. MIT Press.
from the General Directorate of Research of the Xunta
de Galicia and grants (File 2006/60, 2007/127 and Geard N. (2004). Modelling Gene Regulatory Networks:
2007/144) from the General Directorate of Scientific Systems Biology to Complex Systems. ACCS Draft
and Technologic Promotion of the Galician University Technical Report. ITEE Universisty of Queensland.
System of the Xunta de Galicia. Holland J. (1975). Adaption in Natual and Artificial
Systems. University of Michigan Press.
Bio-Inspired Algorithms in Bioinformatics I
Mullins, K.(1990). The unusual origin of the polymerase Electroforesis: The use of an external electric
chain reaction. Scientific American 262(4):56-61. field to separate large biomolecules on the basis of
their charge by running them through acrylamide or
Quackenbush J. (2001). Computational Analysis of mi-
agarose gel.
croarray data. Nature Review Genetics 2:418-427.
Messenger RNA: The complementary copy of DNA
Rechenberg I. (1973). Evolutionsstrategie Opti-
formed from a single-stranded DNA template during
mierung technischer Systeme nach Prinzipien der
the transcription that migrates from the nucleus to the
biologischen Evolution. PhD Thesis.
cytoplasm where it is processed into a sequence carrying
Rumelhart DE, Hinton GE & Williams RJ. (1986). the information to code for a polypeptide domain.
Learning Internal Representation by Backpropagation
Microarray: A 2D array, typically on a glass, filter,
Errors. Nature 323(99):533-6.
or silicon wafer, upon which genes or gene fragments
Setubal J and Meidanis J. (1999). Introduction to Com- are deposited or synthesized in a predetermined spatial
putational Molecular Biology. PWS Publishing. order allowing them to be made available as probes in
a high-throughput, parallel manner.
Smith TF and Waterman MS.(1981). Identification of
common molecular sequences. Journal of Molecular Nucleotid: A nucleic acid unit composed of a
Biology. 24(8):195-197. five carbon sugar joined to a phosphate group and a
nitrogen base.
Wallace IM, Blacshields G and Higgins DG. (2005).
Multiple sequence alignments. Current Opinion in Swarm Intelligence: An artificial intelligence
Structural Biology. 15(3):231-267. technique based on the study of collective behaviour
in decentralised, self-organised systems.
0
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bio-Inspired Algorithms in Bioinformatics II
use non-supervised learning. This type of analysis is ANNs have been used to model these networks.
very useful to discover gene groups that are potentially Examples of such approaches can be found in the
related or associated to the illness. A comparison be- works of Krishna, Keedwell, and Narayanan (Keedwell,
tween the most commonly applied methods, using both 2003)(Krishna, 2005).
real and simulated data, can be found in the works of Genetic algorithms (Ando, 2001)(Tominaga, 2001)
Thalamuthu (2006), Handl (2005), and Sheng (2005). and hybrid RNA-genetic approaches (Keedwell, 2005)
Even though these methods have provided good results have also been used for the same purpose.
in certain cases (Spellman, 1998; Tamayo, 1999; Mav-
roudi, 2002), some of their inherent problems, such as Phylogenetic Trees
the identification of the number of clusters, the cluster-
ing of the outliers, and the complexity associated to Normally, exhaustive search techniques for the creation
the large amount of data that are being analysed, often of phylogenetic trees are computationally unfeasible
complicate their use for expression analysis (Sherlock, for more than 10 comparisons, because the number
2001). These deficiencies were tackled in a series of of possible solutions increases exponentially with the
second generation clustering algorithms, among which number of objects in the comparisons. In order to op-
the self-organising trees (Herrero, 2001; Hsu, 2003). timize these searches, researchers have used heuristics
Another interesting approach for expression analysis based on genetic algorithms (Skourikhine, 2000)(Katoh,
is the use of the artificial immune system, which can 2001)(Lemmon, 2002) that allow the reconstruction of
be observed in the works of Ando (Ando 2003), who the optimal trees with less computational load. Other
applies immune recognition to classification by mak- techniques, such as the ant colony algorithm, have also
ing the system select the most significant genes and been used to reconstruct phylogenetic trees (Ando,
optimize their weights in order to obtain classification 2002)(Kummorkaew, 2004) (Perretto, 2005).
rules. Finally, de Sousa, de Castro, and Bezerra apply
this technique to clustering (de Sousa, 2004)(de Castro, Gene Finding and Mapping
2001)(Bezerra, 2003).
Gene mapping has been approached by methods that use
Sequence Alignment only genetic algorithm (Fickett, 1996)(Murao, 2002)
as well as by hybrid methods that combine genetic
Solutions based on genetic algorithms, such as the algorithms and statistical techniques (Gaspin, 1997).
SAGA (Notredame, 1996), the RAGA, the PRAGA The problem of gene searching and in particular
(Notredame, 1997, 2002), and others (OSullivan, promoter searching has been approached by means of
2004; Nguyen, 2002; Yokohama, 2001), have been ap- neural networks (Liu, 2006), neural networks optimized
plied to sequence alignment since the very beginning. with genetic algorithms (Knudsen, 1999), conventional
The most common method consists in codifying the genetic algorithms (Kel, 1998)(Levitsky, 2003),and
alignments as individuals inside the genetic algorithm. fuzzy genetic algorithms (Jacob, 2005).
There are also hybrid solutions that use not only GA
but also dynamic programming (Zhang, 1997, 1998); Structure Prediction
and finally, there is the application of artificial life al-
gorithms, in particular the ant colony algorithm (Chen, The tridimensional structure of DNA was predicted
2006; Moss, 2003). with genetic algorithms (Beckers, 1997) by codifying
the torsional angles between the atoms of the DNA
Genetic Networks molecule as solutions of the genetic algorithm. Another
approach was the development of hybrid strategies of
In order to correct the problem of the inference of genetic ANNs and GAs (Parbhane, 2000), in which the network
networks, the structure of the regulating network and approaches the non-linear relations between the inputs
the interactions between the participating genes must and outputs of the data set, and the genetic algorithm
be predicted. The expression of the genes is regulated searches within the network inputs space to optimize
by transitions of states in which the levels of expression the output. In order to predict the secondary structure
of the involved genes are updated simultaneously. of the RNA, the system calculates the minimum free
Bio-Inspired Algorithms in Bioinformatics II
energy of the structure for all the different combinations and Evolutionary Computation (GECCO 2003). LNCS
of the hydrogene links. There are approaches that use 2724/2003 pp 205. Springer Berlin. B
genetic algorithms (Shapiro, 2001)(Wiese, 2003) and
Baldi P and Brunak S. (2001). Bioinformatics: The
artificial neural networks (Steeg, 1997).
Machine Learning Approach. MIT Press. 2001.
Artificial neural networks have been applied to the
prediction of protein structures (Qian, 1988)(Sasagawa, Beckers ML, Muydens LM, Pikkermaat JA and Altona
1992), and so have genetic algorithms. A compilation of C. (1997). Aplications of a genetic algorithm in the
the application of evolutionary computation in protein conformational analysis of methylene acetal-linked
structures prediction can be found in (Schulze-Kremer, thymine dimmers in DNA: Comparison with distance
2000). Swarm intelligence, and optimization by ant geometry calculations. Jounal of Biomolecular NMR
colony in particular, have been applied to structures 9(1):25-34.
prediction (Shmygelska, 2005)(Chu, 2005) and artificial
immune system (Nicosia, 2004)(Cutello, 2007). Bezerra GB and De Castro LN. (2003). Bioinformatics
data analysis using an artificial immune network. In
International Conference on Artificial Immune Systems
2003. LNCS 2787/2003 pp. 22-33. Springer Berlin.
CONCLUSION
Chen Y, Pan Y, Chen L and Chen J. (2006). Partitioned
This article presents a compendium of the most recent Optimization Algorithms for multiple sequence align-
references on the application of bio-inspired solutions ment. Second IEEE Workshop on High Performance
such as evolutionary computation, artificial neural net- Computing in Medicine and Biology (HiPCoMB-
works, swarm intelligence, and artificial immune system 2006).
to the most common problems in bioinformatics.
Chu D, Till M and Zomaya A. (2006). Parallel ant
colony optimization for 3D Protein Structure Prediction
using HP Lattice Model. 19th Congress on Evolution-
ACKNOWLEDGMENT
ary Computation (CEC 2006).
This work was partially supported by the Spanish Cutello V, Nicosia G, Pavone M and Timmis J. (2007).
Ministry of Education and Culture (Ref TIN2006- An immune algorithm for protein structure prediction
13274) and the European Regional Development Funds on Lattice Models. IEEE Transaction on Evolutionary
(ERDF), grant (Ref. PIO61524) funded by the Carlos III Computation 11(1):101-117.
Health Institute, grant (Ref. PGIDIT 05 SIN 10501PR)
from the General Directorate of Research of the Xunta Das S, Abraham A and Konar A. (2007). Swarm Intel-
de Galicia and grants (File 2006/60, 2007/127 and ligence Algorithms in Bioinformatics. Computational
2007/144) from the General Directorate of Scientific Intelligence in Bioinformatics. Arpad, Keleman et al.,
and Technologic Promotion of the Galician University editors. Springer Verlang Berlin.
System of the Xunta de Galicia. De Smet F, Mathys J, Marchal K. (2002). Adaptative
quality based clustering of gene expression profiles.
Bioinformatics 20(5):660-667.
REFERENCES
De Sousa JS, Gomes L, Bezerra GB, de Castro LN and
Ando S and Iba H. (2001). Inference of gene regulatory Von Zuben FJ. (2004). An immune evolutionary algo-
model by genetic algorithms. Proceedings of Congress rithm for multiple rearrangements of gene expression
of Evolutionary computation. 1:712-719. data. Genetic Programming and Evolvable Machines.
Vol 5 pp. 157-179.
Ando S and Hiba H. (2002). Ant algorithms for con-
struction of evolutionary tree. Proceedings of Congress De Castro LN & Von Zuben FJ. (2001). aiNet: An arti-
of Evolutionary Computation (CEC 2002). ficial Immune Network for Data Analysis. Data Mining:
A Heuristic Approach. 2001 Idea Group Publishing.
Ando S. and Iba H. (2003). Artificial Immune System
for Classification of Gene Expression Data. Genetic Driscol JA, Worzel B and MacLean D. (2003). Clas-
sification of gene expression data with genetic pro-
Bio-Inspired Algorithms in Bioinformatics II
gramming. Genetic Programming Theory and Practice. Keedwell E and Narayanan A. (2005). Discovering
Kluwer Academic Publishers pp 25-42. gene regulatory networks with a neural genetic hybrid.
IEE/ACM Transaction on Computational Biology and
Fickett J and Cinkosky M. (1993). A genetic algorithm
Bioinformatics. 2(3):231-243.
for assembling chromosome physical maps. Proceed-
ings 2nd international conference in Bioinformatics, Kel A, Ptitsyn A, Babenko V, Meier-Ewert S and
Supercomputing and Complex Genome Analysis Lehrach H. (1998). A genetic algorithm for design-
2:272-285. . ing gene family-specific oligonucleotide sets used for
hybridization: The G protein-coupled receptor protein
Gaspin C. and Schiex T. (1997). Genetic Algorithms
superfamily. Bioinformatics 14(3):259-270.
for genetic mapping. Proceedings of 3rd European
Conference in Artificial Evolution pp. 145-156. Kim KJ and Cho SB. (2004). Prediccion of colon cancer
using an evolutionary neural network. Neurocomput-
Gilbert RJ, Rowland JJ and Kell DB. (2000). Genomic
ing 61:361-79.
computing: Explanatory modelling for functional ge-
nomics. Proceedings of the Genetic and Evolutionary Korf I, Yendel M and Bedell J. (2003). Blast.
Computation conference (GECCO 2000). Morgan ORelly.
Kaufmann pp 551-557.
Knudsen S. (1999). Promoter 2.0: for the recogni-
Handl J, Knowles J and Kell D. (2005). Computational tion of Pol II promoter sequences. Bioinformatics
cluster validation in post-genomic data analysis. Bio- 15(5):356-417.
informatics 21(15):3201-3212.
Krishna A, Narayanan A and Keedwell EC. (2005).
Herrero J, Valencia A and Dopazo J. (2001). A hier- Neural netrowks and temporal gene expression data.
archical unsupervised growing neural network for Applications of Evolutionary Computing (EVOBIO05)
clustering gene expression patterns. Bioinformatics LNCS 3449 Springer Verlang.
17(2):126-162.
Kummorkaew M, Ku K and Ruenglertpanyakul P.
Hong JH and Cho SB. (2004). Lymphoma cancer clas- (2004). Application of ant colony optimization to
sification using genetic programming with SNR features. evolutionary tree construction. Proceedings of 15th
Proceedings of EuroGP 2004. Coimbra pp78-88. Annual Meeting of the Thai Society for Biotechnol-
ogy. Thailand.
Hong JH and Cho SB. (2006). The classification of can-
cer based on DNA microarray data that usere diverse Larraaga P, Calvo B, Santana R, Bielza C, Galdiano J,
ensemble genetic programming. Artificial Intelligence Inza I, Lozano J, Armaazas R, Santaf G, Prez A and
in Medicine 36(1):43-58. Robles V. (2006). Machine Learning in Bioinformatics.
Briefings in Bioinformatics 7(1):86-112.
Hsu AL, Tang S and Halgamuge SK. (2003). An
unsupervised hierarchical dynamic self-organizing Langdon W and Buxton B. (2004). Genetic program-
approach to cancer class discovery and marker gene ming for mining dna chip data from cancer patients. Ge-
identification in microarray data. Bioinformatics netic Programming and Evolvable Machines. 5(3).
19(16):2131-2140.
Lemmon AR and Milinkovitch MC. (2002). The meta-
Jacob E, Sasikumar and Fair KNR. (2005). A fuzzy population genetic algorithm: An efficient solution for
guided genetic algorithm for operon prediction. Bio- the problem of large phylogeny estimation. Proceed-
informatics 21(8):1403-1410. ings of national academy of sciences. 99(16):10516-
10521.
Katoh K, Kuma K and Miyata T. (2005). Genetic
algorithm-based maximum likehood analysis for Lee JW, Lee JB, Park Mand Song SH. (2005). An
molecular phylogeny. Journal of Molecular Biology. extensive comparison of recent classification tools
53(4-5):477-484. applied to microarray data. Journal of Computational
Statistics and Data Analysis. 48(4):869-885.
Keedwell E and Narayanan A. (2005). Intelligent
Bioinformatics. Wiley. Levitsky VG, Katokhin AV. (2003). Recognition of
eukaryotic promoters using genetic algorithm based
Bio-Inspired Algorithms in Bioinformatics II
on interactive discriminant analysis. In silico biology. OSullivan O, Suhre K, Abergel C, Higgins D and
3(1-2):81-87. Notredame C. (2004). 3DCoffee: Combining protein B
sequences and structures within multiple sequence
Liu DR, Xiong X, DasGupta B, Zhang HG. (2006).
alignments. Journal of Molecular Biology 340(2):385-
Motif discoveries in unaligned molecular sequences
395.
using self-organizing neural networks, IEEE TRANS-
ACTIONS ON NEURAL NETWORKS 17 (4): 919- Pal S, Bandyopadhyay S and Ray S. (2006). Evolu-
928. tionary Computation in Bioinformatics. A Review.
IEEE Transactions on System, Man and Cybernetics
Mavroudi S, Papadimitriou S and Bezerianos A. (2002).
36(5):601-615.
Gene expression data analysis with a dynamically
extended self.-organized map that exploits class infor- Parbhane R, Unniraman S, Tambe S, Nagaraja V and
mation. Bioinformatics 18(11): 14446-1453. Kulkarni B. (2000). Optimun DNA curvature DNA
curvature using a hybrid approach involving an arti-
Moss J and Johnson C. (2003). An ant colony algorithm
ficial neural network and genetic algorithm. Journal of
for multiple sequence alignment in bioinformatics.
Biomolecular Structural Dynamics 17(4):665-672.
Artificial Neural Networks and Genetic algorithms,
pp 182-186. Springer. Perretto M and Lopes HS. (2005). Reconstruction of
phylogenetic trees using the ant colony optimization
Muni DP, Pal NR and Das J. (2006). Genetic pro-
paradigm. Genetic and Molecular research 4(3):581-
gramming for simultaneous feature selection and
589.
classifier desing. System, Man and Cybernetics
36(1):106-117. Prelic A, Bleuler S, Zimmermann P, Wille A, Bhlmann
P, Gruissem W, Henning L, Thiele L and Zitzler E.
Murao H, Tamaki H and Kitamura S. (2002). A coevo-
(2006). A systematic comparison and evaluation of
lutionary approach to adapt the genotype-phenotype
bioclustering method for gene expression data. Bio-
map in genetic algorithms. Proceedings of Congress
informatics 22(9):1122-1129.
of Evolutionary Computation 2:1612-1617.
Qian N, Sejnowski TJ. (1988). Predicting the second-
Narayanan A, Keedwell E, Tatineni SS. (2004). Single-
ary structure of globular proteins using neural network
layer artificial neural networks for gene expression
models. Journal of Molecular Biology 202:865-884.
analisys. Neurocomputing 61:217-240.
Ritchie MD, Coffey CS and Moore JH. (2004). Genet-
Nguyen H, Yoshihara I, Yamamori K and Yusanaga
ics Programming Neural Networks as a Bioinformatics
M. A parallel hybrid genetic algorithm for multiple
Tool for Human Genetics. Proceedings of the Genetic
protein sequence alignment. Congress of Evolutionary
and Evolutionary Computation Conference (GECCO
Computation 1:309-314.
2004). LNCS 3012 Vol 2 pp438-448.
Nicosia G. (2004). Immune Algorithms for Optimiza-
Sasegawa F. and Tajima K. (1992). Prediction of protein
tion and Protein Structure Prediction. PhD Thesis.
secondary structures by a neural network. Bioinformat-
Department of Mathematics and Computer Science.
ics 9(2):147-152.
University of Catania, Italy.
Schulze-Kremer S. (2000). Genetic algorithms and
Notredame C and Higgins D. (1996). SAGA: Sequence
protein folding. Methods in molecular biology. Protein
alignment by genetic algorithm. Nucleic Acid Research.
Structure Prediction: Methods and Protocols 143:175-
24(8):1515-1524.
222.
Notredame C, OBrien EA and Higgins DG. (1997).
Shapiro BA, Wu JC, Bengali D and Potts MJ. (2001).
RAGA: RNA sequence alignment by genetic algorithm.
The massively parallel genetic algorithm for RNA
Nucleid Acid Research 25(22):4570-4580.
folding: MIMD implementation and popular variation.
Notredame C. (2002). Recent Progresses in multiple Bioinformatics 17(2):137-148.
sequence alignment: a survey. Pharmacogenomics
Sheng Q, Moreau Y, De Smert G and Zhang MQ.
31(1); 131-144.
(2005). Advances in cluster analysis of microarray
Bio-Inspired Algorithms in Bioinformatics II
data. Data Analysis and Visualization in Genomics Wei JS. (2004). Prediction of clinical outcome using
and Proteomics, John Wiley pp. 153-226. gene expression profiling and artificial neural networks
for patients with neuroblastoma. Cancer Research
Sherlock G. (2001). Analysis of large-scale gene expres-
65:374.
sion data. Briefings in Bioinformatics 2(4):350-412.
Wiese KC and Glen E. (2003). A permutation-based
Shmygelska A and Hoos H. (2005). An ant colony op-
genetic algorithm for the RNA folding problem: A Criti-
timization algorithm for the 2D and 3D hydrophobic
cal look of selection strategies, crossover operators, and
polar protein folding problem. BMC Bioinformatics
representation issues. Biosystems 72(1-2):29-41.
6:30.
Yokohama T, Watanabe T, Teneda A and Shimizu T.
Skourikhine A. (2000). Phylogenetic tree reconstruction
(2001). A web server for multiple sequence align-
using self adaptative genetic algorithm. IEEE Inter-
ment using genetic algorithm. Genome Informatics.
national Symposium in Bioinformatics and Biomedical
12:382-283.
engineering pp. 129-134.
Zhang C and Wong AK. (1997). Toward efficient
Spellman PT, Sherlock G, Zhang MQ. (1998). Com-
multiple molecular sequence alignments: A system
prehensive identification of cell cycleregulated genes
of genetic algorithm and dynamic programming.
of the yeast saccharomyces cerevisiase by microarray
IEEE transitions on System, Man and Cybernetics B.
hybridization. Molecular Biology Cell 9:3271-3378.
27(6):918-932.
Statnikov A, Aliferis CF, Tsamardinos I. (2005). A
Zhang C and Wong AK. (1998). A technique of genetic
comprehensive evaluation of multicategory classifica-
algorithm and sequence synthesis for multiple molecu-
tion methods for microarray gene expression cancer
lar sequence alignment. Proc. Of IEEE transactions on
diagnosis. Bioinformatics 21(5):631-43.
System, Man and Cybernetics. 3:2442-2447.
Steeg E. (1997). Neural networks, adaptative optimiza-
tion and RNA secondary structure prediction. Artificial
KEy TERMS
Intelligence and Molecular Biology. MIT Press.
Tamayo P, Slonim D, Maserov J. (1999). Interpret- Bioinformatics: The use of applied mathematics,
ing patterns on gene expression with self-organizing informatics, statistics, and computer science to study
maps: methods and application to hemotopoietics dif- biological systems.
ferectiation. Proceedings of the National Academic of
Gene Expression: The conversion of information
Sciences. 96:2907-2929.
from gene to protein via transcription and transla-
Thalamuthu A, Mukhopadhyay I, Zheng X and Tseng tion.
G. (2006). Evaluation and comparison of gene cluster-
Gene Mapping: Any method used for determining
ing methods in microarray analysis. Bioinformatics
the location of a relative distance between genes on a
22(19):2405-2412.
chromosome.
Tominaga D, Okamoto M, Maki Y, Watanabe S and
Gene Regulatory Network: Genes that regulate or
Eguchi Y. (1999). Non-linear numeric optimization
circumscribe the activity of other genes; specifically,
technique based on genetic algorithm for inverse
genes with a code for proteins (repressors or activators)
problems: Towards the inference of genetic networks.
that regulate the genetic transcription of the structural
Computational Science and Biology (Proceedings of
genes and/or regulatory genes.
German Conference of Bioinformatics) pp 127-140.
Phylogeny: The evolutionary relationships among
Wang Z, Wang Y, Xuan J, Dong Y, Bakay M, Feng Y,
organisms. The patterns of lineage branching produced
Clarke R and Hoffman E. (2006). Optimized multilayer
by the true evolutionary history of the organism that
perceptrons for molecular classification and diagnosis
is being considered.
using genomic data. Bioinformatics 22(6):755-761.
Sequence Alignment: The result of comparing two
or more gene or protein sequences in order to determine
Bio-Inspired Algorithms in Bioinformatics II
Humberto Sossa
Center for Computing Research, IPN, Mexico
INTRODUCTION the brain is not a huge fixed neural network as had been
previously thought, but a dynamic, changing neural
An associative memory AM is a special kind of neural network. In this direction, several models have emerged
network that allows recalling one output pattern given for example (Grossberg, 1967) (Hopfield, 1982).
an input pattern as a key that might be altered by In most of these classical neural networks
some kind of noise (additive, subtractive or mixed). approaches, synapses are only adjusted during the
Most of these models have several constraints that training phase. After this phase, synapses are no longer
limit their applicability in complex problems such adjusted. Modern brain theory uses continuous-time
as face recognition (FR) and 3D object recognition model based on current study of biological neural
(3DOR). networks (Hecht-Nielse, 2003). In this direction, the
Despite of the power of these approaches, they cannot next section described a new dynamic model based on
reach their full power without applying new mechanisms some aspects of biological neural networks.
based on current and future study of biological neural
networks. In this direction, we would like to present Dynamic Associative Memories (DAMs)
a brief summary concerning a new associative model
based on some neurobiological aspects of human brain. The dynamic associative model is not an iterative model
In addition, we would like to describe how this dynamic as Hopfields model. It emerges as an improvement of
associative memory (DAM), combined with some the model and results presented in (Sossa, Barron &
aspects of infant vision system, could be applied to Vazquez, 2007).
solve some of the most important problems of pattern Let x R n and y R m an input and output pattern,
recognition: FR and 3DOR. respectively. An association between input pattern x
and output pattern y is denoted as (xk, yk), where k is
the corresponding association. Associative memory:
BACKGROUND W is represented by a matrix whose components wij
can be seen as the synapses of the neural network.
Humans possess several capabilities such as learning,
recognition and memorization. In the last 60 years, If x k = y k k = 1, , p then W is auto-associative,
scientists of different communities have been trying otherwise it is hetero-associative. A distorted version
to implement these capabilities into a computer. Along of a pattern x to be recalled will be denoted as x . If an
these years, several approaches have emerged, one associative memory W is fed with a distorted version
common example are neural networks (McCulloch & of xk and the output obtained is exactly yk, we say that
Pitts, 1943) (Hebb, 1949) (Rosenblatt, 1958). Since the recalling is robust.
rebirth of neural networks, several models inspired in Because of several regions of the brain interact
the neurobiological process have emerged. Among these together in the process of learning and recognition
models, perhaps the most popular is the feed-forward (Laughlin & Sejnowski, 2003), in the dynamic model
multilayer perceptron trained with the back-propaga- there are defined several interacting areas; also it
tion algorithm (Rumelhart & McClelland, 1986). Other integrated the capability to adjust synapses in response
neural models are associative memories, for example to an input stimulus. Before the brain processes an input
(Anderson, 1972) (Hopfield, 1982) (Sussner, 2003) pattern, it is hypothesized that pattern is transformed
(Sossa, Barron & Vazquez, 2004). On the other hand, and codified by the brain. This process is simulated
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bioinspired Associative Memories
In short, building of the associative memory can be In short, pattern y can be recalled by using its
performed in three stages as: corresponding key vector x or x in six stages:
1. Transform the fundamental set of association into 1. Obtain index of the active region ar by using
coded and de-coding patterns. equation 2.
2. Compute simplified versions of input patterns by 2. Transform x k using de-coding pattern x ar by apply-
using equation 1.
ing the following transformation: x k = x k + x ar .
3. Build W in terms of coded patterns by using
equation 3. 3. Compute adjust factor w = (x ) by using equa-
tion 5.
There are synapses that can be drastically modified 4. Modify synapses of associative memory W that
and they do not alter the behavior of the associative belong to KW by using equation 6.
memory. On the contrary, there are synapses that can
5. Recall pattern y k by using equation 7.
only be slightly modified to do not alter the behavior
of the associative memory; we call this set of synapses 6. Obtain y k by transforming y k using de-coding pattern
the kernel of the associative memory and it is denoted
y ar by applying transformation: y k = y k y ar .
by KW.
Bioinspired Associative Memories
The formal set of propositions that support the a low-pass filter is first used to remove high frequency
correct functioning of this dynamic model, the main components from the image. After that, we divide the
advantages against other classical models and some image in different parts (sub-patterns). Then, over each
interesting applications of this model are described sub-pattern, we detect subtle features by means of a
in (Vazquez, Sossa & Garro, 2006) and (Vazquez & random selection of SPs. Preprocessing images used
Sossa, 2007). to remove high frequencies and random selection of
In general, we distinguish two main parts in this SPs contribute to eliminating redundant information
model: a part concerning to the determination of the and help the DAMs to learn efficiently the faces or
AR (PAR) and a part concerning to pattern recall (PPR). the objects. At last, each DAM is fed with these sub-
PAR (first step during recall procedure) sends a signal to patterns for training and recognition.
PPR (remaining steps for recall procedure) and indicates
the region activated by the input pattern. Response to Low Frequencies
0
Bioinspired Associative Memories
Figure 1. Images filtered with masks of different size. Each group could be associated with different stages of
infant vision system. B
corresponding AR could be wrongly determined due to 1. Select filter size and apply to the images.
some patterns do not satisfy the prepositions that guar- 2. Transform the images into a vector by means of
antee perfect recall. To avoid this, we use an integrator. the standard image scan method and decompose
Each DAM determines an AR, the index of the AR is x k in r sub-patterns of the same size.
sent to the integrator, the integrator determines which 3. Use the SP, spi , i = 1, , r computed during the
was the most voted region and sends to the DAMs the building phase and extract the value of each sub-
index of the most voted region (the new AR). pattern.
4. Determine the most voted active region using the
Let I x ab and I y cd an association of images
k k
integrator.
and r be the number of DAMs. Building of the nDAMs
5. Substitute mid with rand operator in recalling
is done as follows:
procedure and apply steps from two to six as
1. Select filter size and apply it to the images. described in recalling procedure on each DAM.
6. Finally, put together recalled sub-patterns to form
2. Transform the images into a vector ( x k , y k ) by the output pattern.
means of the standard image scan method where
vectors are of size a b and c d respectively. A schematic representation of the building and
3. Decompose x and y in r sub-patterns of the
k k recalling phases is shown in Figure 2.
same size.
4. Take each sub-pattern (from the first one to the last Some Experimental Results
one (r)), then take at random a SP spi , i = 1, , r
and extract the value at that position. To test the accuracy of the proposal, we performed two
5. Train r DAMS as in building procedure taking experiments. In experiment 1, we used a benchmark
each sub-pattern (from the first one to the last one (Spacek, 1996) of faces of 15 different people. In ex-
(r)) using rand operator. periment 2, we use a benchmark (Nene, 1996) of 100
objects. During the training process in both experiments,
the DAM performed with 100% accuracy using only
Pattern I ky can be recalled by using its corresponding one image of each person and object. During testing,
key image I k or I k as follows: the DAM performed in average with 99% accuracy
x x
for the remaining 285 images of faces (experiment 1)
and 95% accuracy for the remaining 1900 images of
Bioinspired Associative Memories
Figure 2. (a) Schematic representation of building phase. (b) Schematic representation of the recalling phase.
I kx I kx
aub aub
f I kx
aub
f I kx
aub
x k R ab x k R ab
x1k R ab r x k2 R ab r x kr R ab r x1k R ab r x k2 R ab r x kr R ab r
PAR1 PPR1 PAR 2 PPR 2 PAR r PPR r PAR1 PPR1 PAR 2 PPR 2 PAR r PPR r
INTEGRATOR
y1k R cd r y k2 R cd r y 3k R cd r
y k R c d y1k R cd r y k2 R cd r y 3k R cd r
f I ky
cu d
y k R c d
I ky I ky
cud cud
(a) (b)
Bioinspired Associative Memories
Figure 3. Accuracy of the proposal using different filter size. The reader can verify the accuracy of the proposal
diminish after apply a filter of size greater than 25. B
Figure 4. Average accuracy of the proposal. Maximum, average and minimum accuracy are sketched.
distance classifiers) and make possible its application Through several experiments, we have shown the
to complex problems such as FR and 3DOR. accuracy and the stability of the proposal even under
In order to recognize different images of face or occlusions. In average, the accuracy of the proposal
objects we have used several DAMs. As the infant vision oscillates between 95% and 99%.
responds to low frequencies of the signal, a low-filter The results obtained with the proposal were
is first used to remove high frequency components comparable with those obtained by means of a PCA-
from the image. Then we detected subtle features in the based method (99%). Although PCA is a powerful
image by means of a random selection of SPs. At last, technique it consumes a lot of time to reduce the
each DAM was fed with this information for training dimensionality of the data. Our proposal, because of
and recognition. its simplicity in operations, is not a computationally
Bioinspired Associative Memories
expensive technique and the results obtained are Rosenblatt, F. (1958). The perceptron: A probabilistic
comparable to those provided by PCA. model for information storage and organization in the
brain. Psychological Review, 65(6), 386-408.
Rumelhart, D. & McClelland, J. (1986). Parallel dis-
REFERENCES
tributed processing group. MIT Press.
Acerra, F., Burnod, Y. & Schonen, S. (2002). Modelling Sossa, H., Barron, R. & Vazquez, R. A. (2004). Trans-
aspects of face processing in early infancy. Develop- forming fundamental set of patterns to a canonical form
mental science, 5(1), 98-117. to improve pattern recall. Lecture Notes in Artificial
Intelligence 3315, 687-696.
Anderson, J. A. (1972). A simple neural network gener-
ating an interactive memory. Mathematical Biosciences, Sossa, H., Barron, R. & Vazquez, R. A. (2007). Study
14(3-4), 197-220. of the influence of noise in the values of a median
associative memory. Lecture Notes in Computer Sci-
Barlow, H. B. (2001). Redundancy Reduction Re- ences, 4432, 55-62.
visited. Network: Computation in Neural Systems,
12:241-253. Spacek, L. (1996). Collection of facial images: Gri-
mace. Available from http://cswww.essex.ac.uk/mv/
Grossberg, S. (1967). Nonlinear difference-differential allfaces/grimace.html
equations in prediction and learning theory. Proceed-
ings of the National Academy of Sciences, 58(4), Sussner, P. (2003). Generalizing operations of binary
13291334. auto-associative morphological memories using fuzzy
set theory. Journal of Mathematical Imaging and Vi-
Hebb, D. O. (1949). The Organization of Behavior, sion, 19(2), 81-93.
New York: Wiley.
Turk, M. & Pentland, A. (1991). Eigenfaces for recogni-
Hecht-Nielse, et al. (2003). A theory of the thalamo- tion. Journal of Cognitive Neuroscience, 3(1), 71-86.
cortex. Computational models for neuroscience, pp
85-124, Springer-Verlag, London. Vazquez, R. A., Sossa, H. & Garro, B. A. (2006). A
new bi-directional associative memory. Lecture Notes
Hopfield, J. J. (1982). Neural networks and physical in Artificial Intelligence, 4293, 367-380.
systems with emergent collective computational abili-
ties. Proceedings of the National Academy of Sciences, Vazquez, R. A. & Sossa H. (2007). A computational
79(8), 2554-2558. approach for modeling the infant vision system in ob-
ject and face recognition. Journal BMC Neuroscience
Laughlin, S. B. & Sejnowski, T. J. (2003). Commu- 8(suppl 2), P204.
nication in neuronal networks. Science, 301(5641),
1870-1874.
KEy TERMS
McCulloch, W.S. & Pitts, W.H. (1943). A logical calcu-
lus of the ideas immanent in nervous activity. Bulletin
Associative Memory: Mathematical device spe-
of Mathematical biophysics, 5(1-2), 115133.
cially designed to recall output patterns from input
Mondloch, C. J. et al. (1999). Face perception during patterns that might be altered by noise.
early infancy. Psychological Science, 10(5), 419-
Dynamic Associative Memory: A special type of
422.
associative memory composed by dynamical synapses.
Nene, S. A. et. al. (1996). Columbia Object Image This memory adjusts the values of their synapses during
Library (COIL 100). Technical Report No. CUCS- recalling phase in response to input stimuli.
006-96. Department of Computer Science, Columbia
Dynamical Synapses: Synapses that modified their
University.
values in response to an input stimulus also during
recalling phases.
Bioinspired Associative Memories
Low-Pass Filter: Filter which removes high fre- Random Selection: Selection of one or more com-
quencies from an image or signal. This type of filters ponents of a vector at randomly manner. Random selec- B
is used to simulate the infant vision system at early tion techniques are used to reduce multidimensional
stages. Examples of these filters are the average filter data sets to lower dimensions for analysis.
or the median filter.
Stimulating Points: Characteristic points of an
PCA: Principal component analysis is a technique object in an image used during learning and recog-
used to reduce multidimensional data sets to lower nition, which capture the attention of a child. These
dimensions for analysis. PCA involves the computation stimulating points are used to train the dynamic as-
of the eigenvalue decomposition of a data set, usually sociative memory.
after mean centering the data for each attribute.
Juan M. Corchado
University of Salamanca, Spain
Luis F. Castillo
National University, Colombia
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bio-Inspired Dynamical Tools
SPATIO-TEMPORAL NEURAL CODING their functional forms are usually too complicated to
GENERATOR give an understanding of the overall behavior of the B
system. In such situations qualitative analysis of the
The ability to process sequential information has long limit sets (fixed points, cycles or chaos) of the system
been seen as one of the most important functions of can often offer better insights. Qualitative means that
intelligent systems (Huerta et al., 2004). As it will this type of analysis is not concerned with the quantita-
be shown afterwards, winnerless competition principle tive changes but rather what the limiting behavior will
appears as a major type of mechanism of sequential be (Tsung, 1994).
memory processing. The underlying concept is that
sequential memory can be encoded in a (multidimen- Spatio-Temporal Neural Coding and
sional) dynamical system by means of heteroclinic Winnerless Competition Networks
trajectories connecting several saddle points. Each
of the saddle points is assumed to be remembered for It is important to understand how the information is
further action (Afraimovich et al., 2004). processed by computation from a dynamical viewpoint
(in terms of steady states, limit cycles and strange at-
Computation over Neural Networks tractors) because it gives us the possibility of manage
sequential processes (Freeman, 1990). In this section
Digital computers are considered universal in the sense it is showed a new direction in information dynamics
of capability to implement any symbolic algorithm. If namely the Winnerless Competition (WLC) behavior.
artificial neural networks, that have a great influence on The main point of this principle is the transformation
the field of computation, are considered as a paradigm of the incoming spatial inputs into identity-temporal
of computation, one may ask how the relation between output based on the intrinsic switching dynamics of
neural networks and the classical computing paradigm a dynamical system. In the presence of stimuli the
is. For this question it is needed to consider, on the one sequence of the switching, whose geometrical image
hand, discrete computation (digital) and on the other in the phase space is a heteroclinic contour, uniquely
hand, nondiscrete computation (analog). In terms of the depends on the incoming information.
first, the traditional paradigm is the Turing Machine Consider the generalized Lotka-Volterra system
with the Von Neumann architecture. A decade ago it was (N=3):
shown that artificial neural networks of analog neurons
and rational weights are computationally equivalent to a1 = a1 [1 (a1 + 12a2 + a )]
13 3
Turing machines. In terms of analog computation, it a 2 = a 2 [1 (a 2 + 21 a1 + 23 a 3 )]
was also showed that three-layer feedforward nets can a 3 = a3 [1 (a3 + 31 a 2 + 32 a 2 )]
approximate any smooth function with arbitrary preci-
sion (Hornik et al., 1990). This result was extended to
If the following matrix and parameter conditions
show how continuous recurrent neural nets (CRNN)
are satisfied,
can approximate an arbitrary dynamical system as
given by a system of n coupled first-order differential 1
1 1
equations (Tsung, 1994; Chow and Li, 2000).
( ij )= 2 1 2
1
Neural Network Computation from a 3 3
Modern dynamical systems theory is concerned with When the coefficients fulfill that 1 = 2 = 3 <1
the qualitative understanding of asymptotic behav- and 1 = 2 = 3 > 1, we have three cases:
iors of systems that evolve in time. With complex
non-linear systems, defined by coupled differential, 1. Stable equilibrium with all three components
difference or functional equations, it is often impos- simultaneously present/working.
sible to obtain closed-form (or asymptotically closed 2. Three equilibria (1,0,0), (0,1,0) and (0,0,1) all
form) solutions. Even if such solutions are obtained, stable, each one attainable depending on initial
conditions.
Bio-Inspired Dynamical Tools
Figure. 1. Topology of the behavior in the phase space of WLC neurons net in a 3D. The axes represent the ad-
dresses of the oscillatory connections between the processing units (a = 0.5, b = 1.8, x(0) = 1.01, y(0) = 1.01,
z(0) = 0.01).
3. Neither equilibrium points nor periodic solutions work) would allow to explore how building sequential
are asymptotically stable and we have wander- memory and might suggest control architectures of
ing trajectories defining Winnerless Competition insect-inspired robotic systems.
(WLC) behaviour
Winnerless Competition Systems
The advantages of dealing with Lotka-Volterra Generate Adaptive Behavior
systems are important. It has been shown above how
a Winnerless competition process can emerge in a Some features of the winnerless competition systems
generalized Lotka-Volterra system. Also it is known seem to be very promising to use these systems to model
that this type of process is generalizable to any dy- the activity and the design of intelligent artefacts. It is
namical system and that any dynamical system can focused on some of the results of previous theoretical
be represented by using recurrent neural networks studies of some authors on systems of n elements coordi-
(Hornik et al., 1990). From this point of view, winnerless nated with excitement-inhibition relations (Rabinovich
competition processes can be obtained whenever that et al., 2001). These systems show:
exits a boundary condition: the Lotka-Volterra system
must be of any dimension n greater than three to find Large Capacity: A heteroclinic (spatiotemporal)
Winnerless competition behavior. In the following, it representation provides greatly increased capacity
is assumed that Lotka-Volterra systems approximate to the system. Because sequences of activity are
arbitrarily closely the dynamics of any finite-dimen- combinatorial across elements and time, overlap
sional dynamical system for any finite time and we will between representations can be reduced, and the
assume and concentrate in showing them as a type of distance in phase space between orbits can be
neural nets with great interest for applications (Hopfield, increased.
2001). Various attempts at modeling the complex dy- Sensitivity (to similar stimulus) and, simul-
namics in insect brains have been made (Nowotny et taneously, capacity for categorization: This
al., 2004; Rabinovich et al. 2001), and it is suggested is because the heteroclinic linking of a specific
that simple CRNN systems (Continuous and recurrent set of saddle points is always unique. Two like
neural network) could be an alternative framework to stimuli, activating greatly overlapping subsets of
implement competing processes between neurons that a network, may become easily separated because
generate spatio-temporal patterns to codify memory in small initial differences will become amplified in
a similar way simplest living systems do (Rabinovich time.
et al. 2006). Recurrent neural networks of competing Robustness: In the following sense, the attractor
neuron (inspired in how higher brain centres in insects of a perturbed system remains in a small neighbor-
Bio-Inspired Dynamical Tools
hood of the unperturbed attractor (robustness as cal phenomena as the basis of the adaptive behaviour
topological similarity of the perturbed pattern). patterns of the living organisms and these systems show, B
in one hand, the coexistence of sensitivity (ability to
All these important features emerge from the dy- distinguish distinct, albeit similar, inputs) and robust-
namic of the Lotka-Volterra system. There are more ness (ability to classify similar signals receptions as
examples: in [Nepomnyashchikh et al, 2003] is de- the same one). If we are able to reproduce the same
scribed a simple chaotic system of coupled oscillators characteristics in artificial intelligent architectures, will
that shows a complex and fruitful adaptive behaviour; make it easier to go beyond the actual limitations into
the interaction among the activity of elements in the the intelligent systems applied to the real problems.
model and external inputs give rise to an emergence
of searching rules from basic properties of nonlinear
systems (rules which have not been pre-programmed CONCLUSION
explicitly) and with obvious adaptive value. More in
detail: the adaptive rules are autonomous (the system It has been summarized how a system architecture
selects an appropriate rule with no instructions from whose stimulus-dependent dynamics reproduces
outside), and they are the result of interaction between spatio-temporal features could be able to code and
intrinsic dynamics of the system and dynamics of the build a memory inspired in the higher brain centres of
environment. These rules emerge, in a spontaneous way, insects (Nowotny et al., 2004). Beyond the biological
because of the non-linearity in the simple system. observations which suggested these investigations,
recurrent neural networks where winnerless competi-
Winnerless Competition for Computing tion processes can emerge, provide an attractive model
and Interests in Robotics for computation because of their large capacity as well
as their robustness to noise contamination. It has been
The suggestion of using heteroclinic trajectories with showed an interesting tool (using control and synchro-
computing purposes shows advantages for robotics nization of spatio-temporal patterns) to transfer and
interests. It is known that very simple dynamical sys- process information between different neural assemblies
tems are equivalent to Turing machines and also that for classification problems in, eventually, several indus-
computing with heteroclinic orbits adds to the classi- trial environments. For example, winnerless competi-
cal computing the feature of high sensitivity to initial tion processes could be able to solve the fundamental
conditions increasing. If we consider artefacts with contradiction between sensitivity and generalizing of
computation processes ordered by winnerless competi- the recognition, multistability and robustness to the
tion behaviour, the artefacts will have great ability to noise in real processes (Rabinovich et al., 2000). For
process, manage and store sequential information. In classification tasks, is useful to get models that could
spite of the history of studies of sequential learning and be reproducible. In the language of non-linearity, this
memory, little is known about dynamical principles of is possible only if the system is strongly dissipative
storing and remembering of multiple events and their (in other words, if it can rapidly forget its initial state).
temporal order by neural networks. This principle called On the other hand, a useful classificator system should
winnerless competition can be a very useful mecha- be sensitive to small variations in the inputs, so that
nism to explore and model sequential and scheduled fine discriminations between similar but not identical
processes in industrial and robotic problems. stimuli are possible. Winnerless competition principle
shows both features.
FUTURE TRENDS
REFERENCES
Computation by heteroclinic orbits provides new
perspectives to traditional computing. Because of its Afraimovich, V.S., Rabinovich, M.I. and Varona, P.
features, it could be interesting building such a kind of (2004). Heteroclinic contours in neural ensembles and
bio-inspired systems based in Winnerless competition the winnerless competition principle. International
processes. Evolution has chosen the nonlinear dynami- Journal of Bifurcation and Chaos, 14: p. 1195-1208.
Bio-Inspired Dynamical Tools
Bazhenof, M., Stopfer, M., Rabinovich, M., Abarbanel, Rabinovich, M.I., Varona, P. and Abarbanel, H. (2000).
H., Sejnowski, T. J. and Laurent, G. (2001) Model of Nonlinear cooperative dynamics of living neurons. Int.
cellular and network mechanisms for odor-evoked J. .Bifurcation Chaos 10 (5), 913-933
temporal patterning in the locust antennal lobe. Neuron
Rabinovich, M.I., Volkovskii, A., Lecanda, P., Huerta,
30, 569-581 (2001)
R., Abarbanel, H., and Laurent, G. (2001). Dynamical
Chow T.M., Ho, J.K.L and Li, X.D. (2005). Approxi- encoding by networks of competing neuron groups:
mation of dynamical time-variant systems by continu- winnerless competition. Physical Review Letters,
ous-time recurrent neural networks. IEEE Transactions 87:068102(4).
on Circuits and Systems II: Analog and Digital Signal
Rabinovich M.I., Huerta R., Afraimovich V.(2006).
Processing, Vol. 52, No. 10, pp. 656 - 660.
Dynamics of sequential decision making. Physical
Freeman, W.J. and Holmes, M.D. (2005). Metastabili Review Letters, 97(18): 188103.
ty,instability,and state transition in neocortex. Neural
Networks 18(5-6): pp. 497-504
KEy TERMS
Gerber, B., Tanimoto, H., and Heisenberg, M. (2004). An
engram found? Evaluating the evidence from fruities. Adaptive Behaviour: Type of behavior that al-
Current Opinion in Neurobiology, 14:737-768. lows an individual to substitute a disruptive behavior
to something more constructive and able to adapt to a
Hopfield, J. J. and Brody, C. D. (2001). What is a
given situation.
moment? Transient synchrony as a collective mecha-
nism for spatio-temporal integration. Proceedings of Bio-Inspired Techniques: Bio-inspired systems
the National Academy of Science, vol. 98, Issue 3, and tools are able to bring together results from differ-
p.1282-1287 ent areas of knowledge, including biology, engineering
and other physical sciences, interested in studying and
Hornik, K., Stinchcombe, M. and White, H. (1990)
using models and techniques inspired from or applied
Universal approximation of an unknown mapping and
to biological systems.
its derivates using multilayer feedforward networks.
Neural Networks, 3:551-560 (1990). Computational System: Computation is a general
term for any type of information processing that can be
Huerta, R. and Rabinovich, M. (2004). Reproducible
represented mathematically. This includes phenomena
sequence generation in random neural ensembles.
ranging from simple calculations to human thinking.
Physical Review Letters, 93 (23), 238104-1-4
A device able to make computations is called compu-
Tsung, Fu-Sheng (1994). Modeling Dynamical Systems tational system.
with Recurrent Neural Networks. PhD thesis, Depart-
Dynamical Recurrent Networks: Complex non-
ment of Computer Science. University of California,
linear dynamic system described by a set of nonlinear
San Diego, 1994
differential or difference equations with extensive
Nowotny, T., Huerta, R., Abarbanel, H. D. I. and connection weights.
Rabinovich, M. I. (2004). Learning classification and
Heteroclinic Orbits: In the phase portrait of a
association inthe olfactory system of insects. Neural
dynamical system, a heteroclinic orbit (sometimes
Computation Vol. 16, Issue 8 - pp. 1601 - 1640
called a heteroclinic connection) is a path in phase
McGuire, S., Le, P., and Davis, R. (2001). The role space which joins two different equilibrium points. If
of Drosophila mushroom body signaling in olfactory the equilibrium points at the start and end of the orbit
memory. Science, 293:1330-1333. are the same, the orbit is a homoclinic orbit.
Nepomnyashchikh, V., Podgornyj K. (2003) Emergence Stability-Plasticity Dilemma: It explores how a
of Adaptive Searching Rules from the Dynamics of a learning system remains adaptive (plastic) in response
Simple Nonlinear System. Adaptive Behavior, 11(4): to significant input, yet remains stable in response to
245-265. irrelevant input.
0
Bio-Inspired Dynamical Tools
262
Table 1. Advantages and drawbacks of the three main authentication method approaches
Authentication
Advantages Drawbacks
method
It can be stolen.
A new one can be issued. A fake one can be issued.
Handheld tokens (card,
It is quite standard, although moving to It can be shared.
ID, passport, etc.)
a different country, facility, etc. One person can be registered with
different identities.
It can be guessed or cracked.
Good passwords are difficult to
It is a simple and economical method.
Knowledge based remember.
If there are problems, it can be replaced
(password, PIN, etc.) It can be shared.
by a new one quite easily.
One person can be registered with
different identities.
It cannot be lost, forgotten, guessed,
stolen, shared, etc. In some cases a fake one can be issued.
It is quite easy to check if one person It is neither replaceable nor secret.
Biometrics
has several identities. If a persons biometric data is stolen, it
It can provide a greater degree of is not possible to replace it.
security than the other ones.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Biometric Security Technology
(password or handheld token) by a biometric one, for (Faundez-Zanuy, 2005c) , gait, gesture and key
sure, we are potential users of these systems, which stroking recognition belong to this group. B
will even be mandatory for new passport models. For
this reason, it is useful to be familiarized with the pos- However, this classification is quite artificial. For
sibilities of biometric security technology. instance, the speech signal (Faundez-Zanuy and Monte,
2005) depends on behavioral traits such as semantics,
diction, pronunciation, idiosyncrasy, etc. (related to
BIOMETRIC TRAITS socio-economic status, education, place of birth, etc.)
(Furui, 1989). However, it also depends on the speakers
The first question is: Which characteristic can be used physiology, such as the shape of the vocal tract. On
for biometric recognition? As common sense says, a the other hand, physiological traits are also influenced
good biometric trait must accomplish a set of proper- by user behavior, such as the manner in which a user
ties. Mainly they are (Clarke, 1994), (Mansfield & presents a finger, looks at a camera, etc.
Wayman, 2002):
Verification and Identification
Universality: Every person should have the char-
acteristic. Biometric systems can be operated in two modes,
Distinctiveness: Any two persons should be dif- named identification and verification. We will refer to
ferent enough to distinguish each other based on recognition for the general case, when we do not want
this characteristic. to differentiate between them. However, some authors
Permanence: the characteristic should be stable consider recognition and identification synonymous.
enough (with respect to the matching criterion)
along time, different environment conditions, Identification: In this approach no identity is
etc. claimed from the user. The automatic system must
Collectability: the characteristic should be acquir- determine who the user is. If he/ she belongs to a
able and quantitatively measurable. predefined set of known users, it is referred to as
Acceptability: people should be willing to accept closed-set identification. However, for sure the
the biometric system, and do not feel that it is set of users known (learnt) by the system is much
annoying, invasive, etc. smaller than the potential number of people that
Performance: the identification accuracy and can attempt to enter. The more general situation
required time for a successful recognition must where the system has to manage with users that
be reasonably good. perhaps are not modeled inside the database is
Circumvention: the ability of fraudulent people referred to as open-set identification. Adding a
and techniques to fool the biometric system should none-of-the-above option to closed-set identi-
be negligible. fication gives open-set identification. The system
performance can be evaluated using an identifica-
Biometric traits can be split into two main catego- tion rate.
ries: Verification: In this approach the goal of the sys-
tem is to determine whether the person is the one
Physiological biometrics: it is based on direct that claims to be. This implies that the user must
measurements of a part of the human body. provide an identity and the system just accepts
Fingerprint (Maltoni et al., 2003), face, iris and or rejects the users according to a successful or
hand-scan (Faundez-Zanuy, Navarro-Mrida, unsuccessful verification. Sometimes this opera-
2005) recognition belong to this group. tion mode is named authentication or detection.
Behavioral biometrics: it is based on measure- The system performance can be evaluated using
ments and data derived from an action performed the False Acceptance Rate (FAR, those situations
by the user, and thus indirectly measures some where an impostor is accepted) and the False Re-
characteristics of the human body. Signature jection Rate (FRR, those situations where a user
is incorrectly rejected), also known in detection
263
Biometric Security Technology
Figure 1. On the left: example of a DET plot for a user verification system (dotted line). The Equal Error Rate
(EER) line shows the situation where False Alarm equals Miss Probability (balanced performance). Of course
one of both errors rates can be more important (high security application versus those where we do not want to
annoy the user with a high rejection/ miss rate). If the system curve is moved towards the origin, smaller error
rates are achieved (better performance). If the decision threshold is reduced, higher False Acceptance/Alarm rates
are achieved. On the right: Example of a ROC plot for a user verification system (dotted line). The Equal Error
Rate (EER) line shows the situation where False Alarm equals Miss Probability (balanced performance).
1
40
EER User comfort
High 0.8
20 security Decreasing
Balance threshold
Miss probability (in %)
10 0.6
True positive
High security
5 Balance
theory as False Alarm and Miss, respectively. not be inconveniences for the genuine users, but it will
There is a trade-off between both errors, which has be reasonably easy for a hacker to crack the system.
to be usually established by adjusting a decision According to security requirements, one of both taxes
threshold. The performance can be plotted in a will be more important than the other one.
ROC (Receiver Operator Characteristic) or in a
DET (Detection error trade-off) plot (Martin et Is Identification Mode More Appropriate
al., 1989). DET curve gives uniform treatment to Than Verification Mode?
both types of error, and uses a logarithmic scale
for both axes, which spreads out the plot and better Certain applications lend themselves to verification,
distinguishes different well performing systems such as PC and network security, where, for instance,
and usually produces plots that are close to linear. you replace your password by your fingerprint, but
Note also that the ROC curve has symmetry with you still use your login. However, in forensic applica-
respect to the DET, i.e. plots the hit rate instead of tions it is mandatory to use identification, because, for
the miss probability. DET plot uses a logarithmic instance, latent prints lifted from crime scenes never
scale that expands the extreme parts of the curve, provide their claimed identity.
which are the parts that give the most information In some cases, such as room access (Faundez-Zanuy,
about the system performance. Figure 1, on the 2004c), (Faundez-Zanuy & Fabregas 2005), it can be
left shows an example of DET of plot, and on the more convenient for the user to operate on identification
right shows a classical ROC plot. mode. However, verification systems are faster because
they just require one-to-one comparison (identification
For systems working in verification mode, the evolu- requires one to N, where N is the number of users in the
tion of FAR and FRR versus the threshold setting is an database). In addition, verification systems also provide
interesting plot. Using a high threshold, no impostor higher accuracies. For instance, a hacker has almost
can fool the system, but a lot of genuine users will be N times (Maltoni & al., 2003) more chance to fool an
rejected. Contrarily, using a low threshold, there would
264
Biometric Security Technology
FEATURE D E C IS IO N -
SENSOR E X T R A C T IO N M A T C H IN G MAKER
identification system than a verification one, because or may be able to perceive the geometric features but
in identification he/she just needs to match one of the do not know what the object is used for or whether a
N genuine users. For this reason, commercial applica- face is familiar or not. Agnosia can be limited to one
tions operating on identification mode are restricted to sensory modality such as vision or hearing. A par-
small-scale (at most, a few hundred users). Forensic ticular case is named face blindness or prosopagnosia
systems (Faundez-Zanuy, 2005a), (Faundez-Zanuy, (http://www.faceblind.org/research ). Prosopagnosics
2005b) operate in a different mode, because they often have difficulty recognizing family members,
provide a list of candidates, and a human supervisor close friends, and even themselves. Agnosia can result
checks the automatic result provided by the machine. from strokes, dementia, or other neurological disorders.
This is related to the following classification, which is More information about agnosia can be obtained from
also associated to the application. the National Organization for Rare Disorders (NORD
http://www.rarediseases.org).
Table 2 summarizes some possibilities for differ-
BIOMETRIC TECHNOlOGIES ent biometric traits acquisition. Obviously, properly
speaking, some sensors require a digitizer connected
Several biometric traits have been proven useful for at its output, which is beyond the scope of this paper.
biometric recognition. Nevertheless, the general scheme We will consider that block number one produces a
of a biometric recognition system is similar, in all the digital signal which can be processed by a Digital Signal
cases, to that shown in figure 2. Processor (DSP) or Personal Computer (PC).
The scheme shown in figure 2 is also interesting Figure 3 shows some biometric traits and their cor-
for vulnerability study (Faundez-Zanuy, 2004b) and responding biometric scanners.
improvements by means of data fusion (Faundez-Za-
nuy, 2004d) analysis. In this paper, we will restrict to Security and Privacy
block number one, the other ones being related to signal
processing and pattern recognition. Although common A nice property of biometric security systems is that
sense points out that good acquisition is enough for security level is almost equal for all users in a system.
performing good recognition, at least for humans, this is This is not true for other security technologies. For
not true. It must be taken into account that next blocks, instance, in an access control based on password, a
numbered 2 to 4 in figure 2, are indeed fundamental. hacker just needs to break only one password among
A good image or audio recording is not enough. Even those of all employees to gain access. In this case, a
for human beings, a rare disorder named agnosia ex- weak password compromises the overall security of
ists. Those individuals suffering agnosia are unable every system that user has access to. Thus, the en-
to recognize and identify objects or persons despite tire systems security is only as good as the weakest
having knowledge of the characteristics of the objects password (Prabhakar, Pankanti & Jain, 2003). This
or persons. People with agnosia may have difficulty is especially important because good passwords are
recognizing the geometric features of an object or face nonsense combinations of characters and letters, which
265
Biometric Security Technology
266
Biometric Security Technology
Face acquired with digital camera 2D hand geometry acquired with a document scanner and a 3D
hand-geometry scanner
Fingerprint acquired with optical sensor Iris and Iris desktop camera
Signature
7200
7000
0 6800
6600
6400
-0.9998 6200
0 4.00007 6000
T im e ( s)
5800
1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400
Speech Signature
acquired with a microphone acquired with a graphics tablet
267
Biometric Security Technology
are difficult to remember (for instance, Jh2pz6R+). Faundez-Zanuy M. (2004d)Data fusion in biometrics.
Unfortunately, some users still use passwords such as IEEE Aerospace and Electronic Systems Magazine.
password, Homer Simpson or their own name. Vol. 20 n 1, pp.34-38, January.
Faundez-Zanuy M. (2005a) Privacy issues on biomet-
ric systems. IEEE Aerospace and Electronic Systems
FUTURE TRENDS
Magazine. Vol. 20 n 2, pp13-15, February.
Although biometrics offers a good set of advantages, it Faundez-Zanuy M. (2005b)Technological evaluation
has not been massively adopted yet (Faundez-Zanuy, of two AFIS systems IEEE Aerospace and Electronic
2005d). One of its main drawbacks is that biometric Systems Magazine. Vol.20 n 4, pp13-17, April.
data is not secret and cannot be replaced after being
compromised by a third party. For those applications Faundez-Zanuy M., Monte E. (2005) State-of-the-art
with a human supervisor (such as border entrance in speaker recognition. IEEE Aerospace and Electronic
control), this can be a minor problem, because the Systems Magazine. Vol.20 n 5, pp 7-12 May.
operator can check if the presented biometric trait is Faundez-Zanuy M., Fabregas J, (2005)Testing report
original or fake. However, for remote applications of a fingerprint-based door-opening system. IEEE
such as internet, some kind of liveliness detection and Aerospace and Electronic Systems Magazine. June.
anti-replay attack mechanisms should be provided.
This is an emerging research topic. As a general rule, Faundez-Zanuy M. (2005c) State-of-the-art in sig-
concerning security matters, a constant update is nec- nature recognition. IEEE Aerospace and Electronic
essary in order to keep on being protected. A suitable Systems Magazine. July.
system for the present time can become obsolete if it Faundez-Zanuy M. (2005d) Biometric recognition:
is not periodically improved. For this reason, nobody why not massively adopted yet?. IEEE Aerospace and
can claim that he/ she has a perfect security system, Electronic Systems Magazine. Vol.20 n 8, pp.25-28,
and even less that it will last forever. August 2005
Another interesting topic is privacy, which is beyond
the scope of this article. It has been recently discussed Faundez-Zanuy M., Navarro-Mrida G. M. (2005)
in (Faundez-Zanuy, 2005a). Biometric identification by means of hand geometry
and a neural net classifier Lecture Notes in Com-
puter Science LNCS 3512, pp. 1172-1179, IWANN05
REFERENCES June
Furui S. (1989) Digital Speech Processing, synthesis,
Clarke R. (1994) Human identification in information
and recognition., Marcel Dekker.
systems: management challenges and public informa-
tion issues. December. Available in http://www.anu. Li Stan Z. , Jain A. K. (2003) Handbook of Face Rec-
edu.au/people/Roger.Clarke/DV/HumanID.html ognition Ed. Springer
Faundez-Zanuy M. (2004a) Are Inkless fingerprint Maltoni D., Maio D., Jain A. K., Prabhakar S. (2003)
sensors suitable for mobile use? IEEE Aerospace and Handbook of Fingerprint Recognition Springer profes-
Electronic Systems Magazine, pp.17-21, April. sional computing. 2003
Faundez-Zanuy M. (2004b) On the vulnerability of Mansfield A. J., Wayman J. L., (2002) Best Practices
biometric security systems IEEE Aerospace and Elec- in Testing and Reporting Performance of Biometric
tronic Systems Magazine Vol.19 n 6, pp.3-8, June. Devices. Version 2.01. National Physical Laboratory
Report CMSC 14/02. August.
Faundez-Zanuy M. (2004c) Door-opening system
using a low-cost fingerprint scanner and a PC. IEEE Martin A., Doddington G., Kamm T., Ordowski M., and
Aerospace and Electronic Systems Magazine. Vol. 19 Przybocki M., (1997) The DET curve in assessment of
n 8, pp.23-26. August. detection performance, V. 4, pp.1895-1898, European
speech Processing Conference Eurospeech 1997
268
Biometric Security Technology
KEy TERMS
269
270
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Blind Source Separation by ICA
designed for convolutive mixtures must solve the matrix. For historical reasons, the basic method is
problem of the linear mixtures in theory. In this case, simply called fourth-order blind identification (FOBI)
we have decided to describe some linear methods (Hyvrinen, Karhunen & Oja, 2001).
separately due to there being some important methods
specialized in this case. Blind Source Separation for Convolutive
There are different methods such as Infomax, which Mixtures
is based on the maximization of the information (Bell
& Sejnowski, 1995), the one based on minimization of Once the linear problem has been described, it is time
the mutual information (Hyvrinen, Karhunen & Oja, to explain how to solve the problem with convolutive
2001), or methods which use tensors. The methods that mixtures. This case is more complex than the linear
are going to be described in this paper are FastICA and one, as was presented in the Background. When the
JADE, they have been selected due to they are two of mixtures are convolutive the problem is also called
the most famous methods for the linear case of BSS, blind deconvolution.
and because they are a right first step into the BSS. To solve this problem, several methods have been
The first one, and maybe the most known, is Fas- designed. They can be divided into three groups
tICA (Hyvrinen, Karhunen & Oja, 2001). The Fas- depending on the domain: time domain, frequency
tICA algorithm is a computationally highly efficient domain or both.
method for performing the estimation of ICA. It uses When the algorithm works in the frequency domain,
a fixed-point iteration scheme that has been found in the convolutive mixtures can be simplified into simulta-
independent experiments to be 10-100 times faster neous ones by means of the frequency transform. This
than conventional gradient descent methods for ICA. makes easier the convergence of the separation filter
Another advantage of the FastICA algorithm is that (Choi, Cichocki, Park & Lee, 2005). So these algorithms
it can be used to perform projection pursuit as well, can increase the speed of convergence and reduce
thus providing a general-purpose data analysis method the computational load. But it has a cost; to maintain
that can be used both in an exploratory fashion and for the computational efficiency these algorithms need
estimation of independent components (or sources) to increase the length of the frame when the window
(Hyvrinen, Karhunen & Oja, 2001) (Parra, 2002). This frame increases. The data is reduced and can provoke
algorithm is available in a toolbox that is very easy to insufficiency of the learning data. So the efficiency of
use (Helsinki University of Technology, 2006). the algorithm is degraded.
Another method that is very common in blind Some examples of these algorithms are: blind source
source separation of linear mixtures is JADE (Cardoso, separation in the wavelet domain, recursive method
1993) (Hyvrinen, Karhunen & Oja, 2001). JADE (Ding, Hikichi, Niitsuma, Hamatsu & Sugai, 2003),
(Join approximate diagonalization of eigenmatrices) time delay decorrelation (Lee, Ziehe, Orglmeister &
refers to one principle of solving the problem of equal Sejnowski, 1998), ICA algorithm and beamforming
eigenvalues of the cumulant tensor. In this algorithm, (Saruwatari, Kawamura & Shikano, 2001) or the FIR
the tensor EVD is considered more as a preprocessing matrix toolbox 5.0 (Lambert, 1996).
step. A method closely related to JADE is given by the If the algorithm works in the time domain, we can
eigenvalue descomposition of the weighted correlation work with wide band audio signals, where the assump-
272
Blind Source Separation by ICA
tion of independence is kept. The disadvantages are that it can work incorrectly if data are insufficient. Time
they produce a high computational load when they work domain methods have more stable behaviour than the B
with huge separation matrixes, and the convergence is frequency algorithms, but a higher computational load.
slow, overall when the signals are voices. To trade off the advantages with the disadvantages of
Some algorithms that work in time domain are each type of method, multistage algorithms have been
SIMO ICA (Saruwatari, Sawai, Lee, Kawamura, proposed.
Sakata & Shikano, 2003), filter banks (Choi, Cichocki,
Park & Lee, 2005) or time-domain fast fixed-point
algorithms for convolutive ICA (Thomas, Deville & REFERENCES
Hosseini, 2006).
But we can also combine the two previous methods Cardoso, JF. (1999) High-Order Contrasts for Inde-
to compensate the advantages and disadvantages of pendent Component Analysis. Neural Computation,
each other, for example in Multistage ICA (Nishikawa, 11, 157-192.
Saruwatari, Shikano, Araki & Makino, 2003); FDICA
(frequency domain ICA) and TDICA (time domain Choi, S., Cichocki, A., Park, HM., Lee, SY. (2005).
ICA) are combined with the objective of attaining better Blind Source Separation and Independent Component
efficiency that is possible. This algorithm has a stable Analysis: A review. Neural Information Processing
behaviour, but the computational load is high. Letters and Reviews. Volume 6, number 1.
Cichocki, A., Amari, S. I., (2002). Adaptive Blind Signal
and Image Processing. John Wiley & Sons.
FUTURE TRENDS
Ding, S., Hikichi, T., Niitsuma, T., Hamatsu, M.,
Blind source separation has wide scope that is in con- Sugai, K. (2003). Recursive method for Blind Source
tinuous development and several authors are working Separation and its applications to real-time separations
on the design or modification of different methods. In of acoustic signals. Research group, Advanced Tech-
this work, different algorithms have been presented to nology Development Department, R & D Division,
solve the blind source separation problem. Clarion Co., LTD.
Future trends will be the design of on line methods Helsinki University of Technology, Laboratory of
that allow the implementation of these algorithms in Computer and Information Science. http://www.cis.
real life applications such as voice recognition or free hut.fi/projects/ica/fastica/ , (last update 2006, last visit
hand devices. The algorithms can also be improved 10/04/07).
with the aim of reaching more efficient and accurate
separation of the mixtures, linear or convolutive. It will Hyvrinen, A., Karhunen, J., Oja, E. (2001). Independ-
help in different systems as a first step for identifying ent Component Analysis. John Wiley & Sons.
speakers, for example in conferences, videoconfer- Lambert, R.H. (1996). Multichannel blind deconvolu-
ences or similar. tion: FIR matrix algebra and separation of multipath
mixture. Thesis, University of Sourthern California,
Department of Electrical Engineering.
CONClUSION
Lee, TW., Ziehe, A., Orglmeister, R., Sejnowski, T.
This article shows different ways to solve blind source (1998). Combining Time-Delayed Decorrelation and
separation. It also illustrates that the problem can be of ICA: towards solving the Cocktail Party Problem. In
two forms depending on the mixtures. If we have linear proceedings of International Conference on Acous-
mixtures the system will have a different behaviour tics, Speech and Signal Processing. Volume 2, 1249
that if we have convolutive ones. 1252.
As has been described above, the methods can be Murata, N., Ikeda, S., Ziehe, A. (2001). An approach
also divided into two types, the frequency domain to blind source separation based on temporal structure
and the time domain algorithms. The first type has of speech signals. Neurocomputational, volume 41, 1
a faster convergence than the time domain type, but 24.
273
Blind Source Separation by ICA
Nishikawa, T., Saruwatari, H., Shikano, K., Araki, S., KEy TERMS
Makino, S. (2003). Multistage ICA for blind source
separation of real acoustic convolutive mixture. In Blind Source Separation: The problem of sepa-
proceedings of International Conference on ICA and rating from mixtures, the source signals that compose
BSS. 523 528. them.
Parra, L. (2002). Tutorial on Blind Source Separation Cocktail Party Problem: A particular case of
and Independent Component Analysis. Adaptive Image blind source separation where the mixtures are con-
& Signal Processing Group, Sarnoff Corporation. volutive.
Saruwatari, H., Sawai, K., Lee A., Kawamura T., Sakata Convolutive Mixtures: Mixtures that are not linear
M., Shikano, K. (2003). Speech enhancement and recog- due to echoes and reverberations, they are not totally
nition in car environment using Blind Source Separation independent because of the signal propagation through
and subband elimination processing. In proceedings dynamic environments.
International Workshop on Independent Component
Analysis and Signal Separation, 367 372. HOS (High Order Statistics): Higher order statis-
tics is a field of statistical signal processing that uses
Saruwatari, H., Kawamura T., Shikano, K. (2001). Blind more information than autocorrelation functions and
Source Separation for speech based on fast convergence spectrum. It uses moments, cumulants and polyspectra.
algorithm with ICA and beamforming. In oroceedings They can be used to get better estimates of parameters
EUROSPEECH2001, 2603 2606. in noisy situations, or to detect nonlinearities in the
signals.
Smith, L. I. (2006). A tutorial on Principal Component
Analysis. http://www.cs.otago.ac.nz/cosc453/stu- ICA (Independent Component Analysis): Tech-
dent_tutorials/principal_components.pdf. niques based on statistical concepts such as high order
statistics.
Thomas, J., Deville, Y., Hosseini, S. (2006). Time-do-
main fast fixed-point algorithms for convolutive ICA. Linear mixtures: Mixtures that are linear combina-
Signal processing letters, IEEE. Volume 13, Issue 4, tions of the different sources that compose them.
228 231.
Statistically Independent: Given two events, the
Bell A.J., Sejnowski T.J. (1995). An information maxi- occurrence of one event makes it neither more nor less
mization approach to blind separation and blind decon- probable that the other occurs.
volution, Neural Computation, 7, 6, 1129 1159.
274
275
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chaotic Neural Networks
As discussed ahead, non-linearity is an essential Definitions for details on these concepts) of trajectories
ingredient for complex functionality and for complex in the multidimensional state space of the neural
dynamics; there is a clear contrast between linear network, and more specifically, the dense production
dynamic systems and non-linear dynamic systems, in of destabilization of cyclic trajectories with cascading
what respect their potential for the production of rich to chaotic behaviour. This scenario allows for the blend
and diverse behaviour. of ordered behaviour and chaotic dynamics, and the
presence of fractal structure and self-similarity in the
Role of Non-Linear Dynamics in the rich landscape of dynamic attractors.
Production of Rich Behaviour
In linear dynamical systems, both in continuous time MODEl NEURONS WITH RICH
and in discrete time, the autonomous dynamical behav- DyNAMICS, BIFURCATION AND CHAOS
iour is completely characterized through the systems
natural modes, either the harmonic oscillatory modes, We can look at chaotic elements that compose neuro-like
or the exponentially decaying modes (in the theory architectures from several different perspectives. They
of linear dynamical systems, these are represented by can be looked at as emergent units with rich dynamics
frequencies and complex frequencies). The possible that are produced by the interaction of classical model
dynamic outcomes in linear systems are thus limited neurons, such as the sigmoidal model neurons based
to the universe of linear combinations of these natural on frequency coding (Haykin, 1999), or the integrate
modes. These modes can have their properties of ampli- and fire spiking model neurons (Farhat, 1994). They
tudes and frequencies controlled through parameters of can also correspond to the modelling of dynamical
the system, but not their central properties such as the behaviour of neural assemblies, approached as a unity
nature of the produced waveforms. Since the number (Freeman, 1992). Finally, they can be tools for approxi-
of natural modes of linear systems is closely related mate representation of aspects of complex dynamics
to the number of state variables, we have that small in the nervous system, paying attention mainly to the
networks (of linear dynamic elements) can produce richness of attractors and blend of ordered and erratic
only limited diversity of dynamical behaviour. dynamics, and not exactly to the details of the biological
The scenario becomes completely different in non- dynamics (DelMoral, 2005; Kaneko, 2001). Ahead we
linear systems. Non-linearity promotes rich dynamic describe briefly some of the relevant model neurons in
behaviour, obtained by changing the stability and the context of chaotic neural networks.
instability of different attractors. These changes give
place to bifurcation phenomena (transitions between Aiharas Chaotic Model Neuron. One important
dynamic modalities with distinct characteristics) work in the context of chaotic neural networks is the
and therefore to diversity of dynamic behaviour. In model neuron proposed by Kazuyuki Aihara and col-
non-linear systems, we can have a large diversity of laborators (1990). In it, we have self-feedback of the
dynamical behaviours, with the potential production neurons state variable, for representing the refrac-
of infinite number of distinct waveforms (or time tory period in real neurons. This makes possible rich
series, for discrete time systems). This can happen for bifurcation and cascading to chaos. His work extends
systems with very reduced number of state variables: previous models in which some elements of dynamics
just three in continuous time, or just one state variable were already present. In particular, we have to men-
in discrete time, are enough to allow bifurcation among tion the work by Caianiello (1961), in which the past
different attractors and potential cascades of infinite inputs have impact on the value of the present state of
bifurcations leading to chaos. In our context, this the neuron, and the work by Nagumo and Sato (1972),
means obtaining rich attractor behaviour even from which incorporates an exponential decay memory.
very simple neural networks (i.e., networks with a Aiharas model included memory for the inputs of the
small number of neurons). model neuron as well as for its internal state. It also
In summary, the operation of chaotic neural included continuous transfer functions, an essential
networks explores the concepts of attractors, repellers, ingredient for rich bifurcation, fractal structure and
limit cycles, and stability (see the topic Terms and cascading to chaos. Equation 1 shows a simplified form
276
Chaotic Neural Networks
of this model: xi is the node state, while the xj regard article: chaotic neural networks based on Recursive
neighboring nodes, t is discrete time, f a continuous Processing Elements, or RPEs (DelMoral, 2005 ; C
function, kf and kr decay constants, and wij generic DelMoral, 2007). RPEs are parametric recursions that
coupling strengths: are coupled through modulation of their bifurcation
parameters (we say we have parametric coupling),
M t t for the formation of meaningful collective patterns.
xi (t + 1) = f wij k df x j (t d ) k d
r xi (t d ) Node dynamics is mathematically defined through a
j =1 d =0 d =0
first order parametric recursion:
(1)
Figure 1. Bifurcation diagram for the logistic recursion, where Rp(x) = p.x.(1-x)
277
Chaotic Neural Networks
278
Chaotic Neural Networks
279
Chaotic Neural Networks
280
Chaotic Neural Networks
(see bifurcation and diverse dynamics ahead). For some Spatio-Temporal Collective Patterns: The ob-
of the tools related to chaotic dynamics, see the related served dynamic configurations of the collective state C
topic in the main text: Non-linear dynamics tools. variable in a multi neuron arrangement (network). The
temporal aspect comes from the fact that in chaotic
Chaotic Model Neurons: Model neurons that in-
neural networks the model neurons states evolve in
corporate aspects of complex dynamics observed either
time. The spatial aspect comes from the fact that the
in the isolated biological neuron or in assemblies of
neurons that compose the network can be viewed as
several biological neurons. Some of the models with
sites of a discrete (grid-like) spatial structure.
complex dynamics, mentioned in the main text of this
article, are the Aiharas model neuron, the Bifurcation Stability: The study of repellers and attractors
Neuron proposed by Nabil Farhat, RPEs networks, is done through stability analysis, which quantifies
Kanekos CMLs, and Walter Freemans K Sets. how infinitesimal perturbations in a given trajectory
performed by the system are either attenuated or am-
plified with time.
281
282
Manuel Snchez-Montas
Universidad Autnoma de Madrid, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Class Prediction in Test Sets with Shifted Distributions
Figure 1. Changes across time of the statistics of clients in a car insurance company. The histograms of two dif-
ferent variables (a, b) related to the clients use of their insurance are shown. Dash: data collected four months C
later than data shown in solid.
flow of new and leaving clients (figure 1). If we are test sets are different. This technique can be used in
interested in constructing a model for prediction, the problems where concept drift, sampling biases, or
statistics of the clients when the model is exploited will any other phenomena exist that cause the statistical
differ from the statistics in training. Finally, often the structure of the training and test sets to be different.
concept to be learned is not static but evolves in time On the other hand, our strategy constructs an explicit
(for example, predicting which emails are spam or not), estimation of the statistical structure of the problem in
causing pTr(x, c) pTest(x, c). This problem is known the test data set. This allows us to construct a classifier
as concept drift and different algorithms have been that is optimized with respect to the new test statistics,
proposed to cope with it (Black & Hickey, 1999; Wang, and provides the user with relevant information about
Fan, Yu, & Han, 2003; Widmer & Kubat, 1996). which aspects of the problem have changed.
Algorithm
A NEW AlGORITHM FOR 1. Construct a statistical model { P~ ( x | c) , P~ (c) } for
CONSTRUCTING ClASSIFIERS the training set using a standard procedure (for
WHEN TRAINING AND TEST SETS example, using the standard EM algorithm).
HAVE DIFFERENT DISTRIBUTIONS 2. Re-estimate this statistical model using the non-
labelled patterns x of the test set. For this purpose,
Here we present a new learning strategy for problems we have developed a semi-supervised extension
where the statistical distributions of the training and of EM.
283
Class Prediction in Test Sets with Shifted Distributions
284
Class Prediction in Test Sets with Shifted Distributions
285
Class Prediction in Test Sets with Shifted Distributions
Figure 2. Re-estimation of the statistical model of the problem in a synthetic example. The two different classes
are shown as two different clouds (grey and black). a, b: the statistical structure of the problem in training is
different than in test. c: statistical model constructed from the training set (dashed: model for class grey;
solid: model for class black). d: recalculation of the statistical model using the unlabelled test set (note that
we have drawn all patterns in grey since the re-estimation algorithm does not have access to the classes of the
test patterns). e: Re-estimated statistical model superposed with the labelled test set (dashed: model for class
grey; solid: model for class black).
of class 1, 148 of class 2), and 294 from the Hungarian principal components is automatically selected on the
Institute of Cardiology (188 of class 1, 106 of class 2). basis of the performance of the classifier in a random
First, we use PCA to reduce the dimensionality of the subset of the training dataset which was reserved for
problem in the training set. Therefore the statistical this task. Once the statistical model of the training set is
model of our problem now consists of the average and constructed, we evaluated the performance of the clas-
covariance matrix of each class in the principal axes. sifier in the test set obtaining an error rate of 18.7%.
Then, a simple classifier that assigns to pattern x the In a second step we used our algorithm for re-es-
class with nearest center is considered. The number of timating the statistical model using =0.01, and then
286
Class Prediction in Test Sets with Shifted Distributions
classified the test patterns using the same procedure as other phenomena exist that cause the statistical structure
before, that is, assigning to each test pattern the class of training and test sets to be different. Moreover, our C
with nearest center. The error was reduced in this case algorithm constructs an explicit statistical model of
to 15.3 %. If we repeat the same strategy using a more the new structure in the test dataset, which is of great
elaborated classifier such as one based on Linear Dis- value for understanding the dynamics of the problem
criminant Analysis, we observe a reduction in the error and exploiting this knowledge in practical applications.
from 17.0% to 13.9% after using our algorithm. To achieve this we have extended the EM algorithm
obtaining a semi-supervised version that allows to re-
estimate the statistical model constructed in training
FUTURE TRENDS using the attribute distribution of the unlabelled test
set patterns.
In the examples we have presented here we have used
a simplified version of our algorithm that assumes a
statistical model consisting of one Gaussian for each REFERENCES
class, and that only the averages need to be re-estimated
with test. However, our semi-supervised extension Asuncion, A., & Newman, D.J. (2007). UCI Machine
of EM is a general algorithm that is applicable with Learning Repository. [http://www.ics.uci.edu/~mlearn/
arbitrary parametric statistical models (e.g. mixtures MLRepository.html]. Irvine, CA: University of
of an arbitrary number of non Gaussian models), and California, Department of Information and Computer
allows to re-estimate any parameter of the model. Science
Future work will include the study of the performance
Black, M.M., & Hickey, R.J. (1999). Maintaining the
and robustness of different types of statistical models
Performance of a Learned Classifier under Concept
in practical problems. On the other hand, since our
Drift. Intelligent Data Analysis. (3) 453-474.
approach is based on an extension of the extensively
studied standard EM algorithm, we expect that analyti- Chen, Y., Wang, G., & Dong, S. (2003). Learning with
cal results such as generalization error bounds can be Progressive Transductive Support Vector Machine.
derived, making the algorithm attractive also from a Pattern Recognition Letters. (24) 1845-1855.
theoretical perspective.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977).
Maximum Likelihood from Incomplete Data via the EM
Algorithm. J. Royal Statist. Soc. Ser. B. (39) 1-38.
CONClUSION
Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern
We have presented a new learning strategy for clas- Classification (2nd ed.). John Wiley & Sons.
sification problems where the statistical structure
of training and test sets are different. In this kind of Hastie, T., Tibshirani, R., & Friedman, J. (2001). The
problems, the performance of traditional machine Elements of Statistical Learning: Data Mining, Infer-
learning algorithms can be severely degraded. Dif- ence, and Prediction. Springer.
ferent techniques have been proposed to cope with Heckman, J.J. (1979). Sample Selection Bias as a
different aspects of the problem, such as strategies for Specification Error. Econometrica. (47) 153-161.
addressing the sample selection bias (Heckman, 1979;
Salganicoff, 1997; Shimodaira, 2000; Zadrozny, 2004; Salganicoff, M. (1997). Tolerating Concept and Sam-
Sugiyama, Krauledat & Mller, 2007), strategies for pling Shift in Lazy Learning Using Prediction Error
addressing the concept drift (Black & Hickey, 1999; Context Switching. Artificial Intelligence Review. (11)
Wang, Fan, Yu & Han, 2003; Widmer & Kubat, 1996) 133-155.
and transductive learning (Vapnik, 1998; Wu, Bennett, Shimodaira, H. (2000). Improving Predictive Inference
Cristianini & Shawe-Taylor, 1999). under Covariate Shift by Weighting the Log-Likelihood
The learning strategy we have presented allows to Function. Journal of Statistical Planning and Inference.
address all these different aspects, so that it can be used (90) 227-244.
in problems where concept drift, sampling biases, or any
287
Class Prediction in Test Sets with Shifted Distributions
288
289
Ngai-Fong Law
The Hong Kong Polytechnic University, Hong Kong
Hong Yan
City University of Hong Kong, Hong Kong
University of Sydney, Australia
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Cluster Analysis of Gene Expression Data
points that segregate clusters of biological relevance. In tribution models for clusters and attempts to optimize
addition, once a data is assigned to a node in the tree, the fit between the data and the model. Each cluster is
it cannot be reassigned to a different node even though represented by a parametric distribution, like a Gaussian,
it is later found to be closer to that node. and the entire data set is modeled by a mixture of these
In partition-based clustering algorithms, such as distributions. The most widely used clustering method
K-means clustering (Jain & Dubes, 1988), the number of this kind is the one based on a mixture of Gaussians
of clusters is arbitrarily fixed by the users at start. Set- (McLachlan & Basford, 1988; Yeung et al., 2001).
ting the correct number of clusters can be a difficult
problem and many heuristics are used. The basic idea
of K-means clustering is to partition the data into a CLUSTERING OF GENE EXPRESSION
predefined number of clusters such that the variability of DATA
the data within each cluster is minimized. Clustering is
achieved by first generating K random cluster centroids, Binary Hierarchical Clustering (BHC): In Szeto et al.
then alternatingly updating the cluster assignment of (2003), we proposed the BHC algorithm for clustering
each data vector and the cluster centroids. The Euclidean gene expression data based on the hierarchical binary
distance is usually employed in K-means clustering to subdivision framework of Clausi (2002). Consider the
measure the closeness of a data vector to the cluster dataset with three distinct classes as shown in Fig.1.
centroids. However, such distance metric inevitably The algorithm starts by assuming that the data consists
imposes an ellipsoidal structure on the resulting clusters. of one class. The first application of binary subdivision
Hence, data that do not conform to this structure are generates two clusters A and BC. As the projection of
poorly clustered by the K-means algorithm. class A and class BC have a large enough Fisher crite-
Other approach to clustering includes model-based rion on the A-BC discriminant line, the algorithm splits
approach. In contrast to model-free partition-based the original dataset into two clusters. Then, the binary
algorithms, model-based clustering uses certain dis- subdivision is applied onto each of the two clusters. The
Figure 1. The binary subdivision framework. (a) Original data treated as one class, (b) Partition into two clus-
ters, A and BC. (c) Cluster A cannot be split further, but cluster BC is split into two clusters, B and C. (d) Both
cluster B and C cannot be split any more, and we have three clusters A, B, and C. (Figure adopted from Clausi
(2002))
290
Cluster Analysis of Gene Expression Data
BC cluster is separated into B and C clusters because its Fig.2b, a final partitioning of the data into two classes is
Fisher criterion is large. However, the Fisher criterion shown in Fig.2c. We can see that A, B are merged into C
of cluster A is too small to allow further division, so it one class, and C, E, F, D are merged into the second
remains as a single cluster. Such subdivision process class. FCM-HC also allows non-ellipsoidal clusters to
is repeated until all clusters have Fisher criterion too be partitioned correctly. Fig.2d shows a dataset contain-
low for further splitting and the process stops. ing two non-ellipsoidal clusters that is over-clustered
The BHC algorithm proceeds in two steps: (1) binary into six clusters. After merging, we obtain the correct
partition using a mixed FCM-HC algorithm to partition two-class partitions as shown in Fig.2e.
the data into two classes, (2) Fisher discriminant analysis The BHC algorithm makes no assumption about
on the two classes, where if the Fisher criterion exceeds the class distributions and the number of clusters
a set threshold, we accept the partition; otherwise do is determined automatically. The only parameter
not subdivide the data any further. required is the Fisher threshold. The binary hierarchical
A novel aspect of the BHC algorithm is the use of framework naturally leads to a tree structure
the FCM-HC algorithm to partition the data. The idea representation where similar clusters are placed adjacent
is illustrated in Fig.2a-2c. In Fig.2a, there are three to each other in the tree. Near the root of the tree, only
clusters. The desirable two-class partition would be for gross structures in the data are shown, whereas near the
the two closer clusters to be classified as one partition end branches, fine details in the data can be visualized.
and the remaining cluster as the second partition. How- Figure 3 shows the BHC clustering results on Spellmans
ever, a K-means partitioning of the data would favor yeast dataset (http://cellcycle-www.stanford.edu). The
the split of the middle cluster into two classes. With dataset contains expression profiles for 6178 genes
this wrong partition, subsequent Fisher discriminant under different experimental conditions, i.e., cdc15,
analysis would conclude that the splitting is not valid and cdc28, alpha factor and elutriation experiments.
and the data should be considered as just one class. To
overcome this problem, we first over-cluster the data Self-Splitting and Merging Competitive Learning
into several clusters by the Fuzzy C-means (FCM) (SSMCL) Clustering: In Wu, Liew, & Yan (2004),
algorithm. Then, the clusters are merged together we proposed a new clustering framework based on
by the average linkage hierarchical clustering (HC) the one-prototype-take-one-cluster (OPTOC) learn-
algorithm until only two classes are left. In Fig.2a, ing paradigm (Zhang & Liu, 2002) for clustering
the data is over-clustered into six clusters by the FCM gene expression data. The new algorithm is able to
algorithm. A hierarchical tree constructed from the six identify natural clusters in the data as well as provide
clusters using the average linkage clustering algorithm a reliable estimate of the number of distinct clusters in
is constructed as in Fig.2b. Using the cutoff as shown in the data. In conventional clustering, if the number of
Figure 2. Partition of the data into two classes, where the original data is clustered into six clusters (a) with the
resulting hierarchical tree (b), and finally the two-class partition after merging (c). For two non-ellipsoidal clus-
ters, FCM-HC over-clusters them into 6 classes (d), and the resulting two-class partition after merging (e).
291
Cluster Analysis of Gene Expression Data
Figure 3. BHC clustering of (a) alpha factor experiment dataset, (b) cdc15 experiment dataset, (c) elutriation
experiment dataset, and (d) cdc28 experiment dataset.
prototypes is less than that of the natural clusters in the distance rule. The distortion |Pi-Ci| measures the dis-
data, then at least one prototype will win patterns from crepancy between the prototype Pi found by OPTOC
more than two clusters. In contrast, the OPTOC idea learning and the actual cluster structure in the data.
allows one prototype to characterize only one natural For example, in Fig. 4b, C1 would be located at the
cluster in the data, regardless of the actual number of center of the three clusters S1, S2 and S3 (since there
clusters in the data. This is achieved by constructing is only one prototype, it wins all input patterns), while
a dynamic neighborhood such that patterns inside the P1 eventually settled at the center of S3. After the pro-
neighborhood contribute more to its learning than those totypes have all settled down, a large |Pi-Ci| indicates
outside. Eventually, the prototype settles at the center the presence of other natural clusters in the data. A
of a natural cluster, while ignoring competitions from new prototype would be generated from the prototype
other clusters as shown in Fig. 4b. with the largest distortion when this distortion exceeds
The SSMCL algorithm starts with a single cluster. a certain threshold .
If the actual number of clusters in the data is more than Ideally, with a suitable threshold, the algorithm will
one, additional prototypes are generated to search for find all natural clusters in the data. Unfortunately, the
the remaining clusters. Let Ci denotes the center of all complex structure exhibited by gene expression data
the patterns that Pi wins according to the minimum makes setting an appropriate threshold difficult. Instead,
292
Cluster Analysis of Gene Expression Data
Figure 4. Two learning methods: conventional versus OPTOC. (a) One prototype takes the arithmetic center of
three clusters (conventional learning). (b) One prototype takes one cluster (OPTOC learning) and ignores the C
other two clusters.
(a) (b)
Figure 5. (a) The 22 distinct clusters found by SSMCL for the yeast cell cycle data. (b) Five patterns that cor-
respond to the five cell cycle phases. The genes presented here are only those that belong to these clusters and
are biologically characterized to a specific cell cycle phase.
(a)
(b)
293
Cluster Analysis of Gene Expression Data
we proposed an over-clustering and merging strategy. velop visualizing tools for the high dimensional gene
The over-clustering step minimizes the chance of miss- expression clusters. The parallel coordinate plot is a
ing any natural clusters in the data, while the merging well-known visualization technique for high dimension
step ensures that the final clusters are all visually distinct data and we are currently investigating it for bicluster
from each other. In over-clustering, a natural cluster visualization (Cheng et al., 2007).
might be split into more than one cluster. However,
no one cluster may contain data from several natural
clusters, since the OPTOC paradigm discourages a CONClUSION
cluster from winning data from more than one natural
cluster. The clusters that are visually similar would Cluster analysis is an important exploratory tool for
be merged during the merging step. Together with the gene expression data analysis. This article describes
OPTOC framework, the over-clustering and merging two recently proposed clustering algorithms for gene
framework allows a systematic estimation of the cor- expression data. In the BHC algorithm, the data are
rect number of natural clusters in the data. successively partitioned into two classes using the
Fig.5a shows the SSMCL clustering result for Fisher criterion. The binary partitioning leads to a
the yeast cell cycle expression data (http://genomics. tree structure representation which facilitates easy
stanford.edu). We observe that the 22 clusters found visualization. In the SSMCL algorithm, the OPTOC
have no apparent visual similarity. We checked the 22 learning framework allows the detection of natural
clusters with the study of Cho et al. (1998), where 416 clusters. The subsequent over-clustering and merging
genes have been interpreted biologically. Those gene step then allows a systematic estimation of the correct
expression profiles include five fundamental patterns number of clusters in the data. Finally, we discuss
that correspond to five cell cycles phases: early G1, late some possible avenues of future research in this area.
G1, S, G2, and M phase. Fig.5b shows the five clusters The problem of biclustering of gene expression data
that contain most of the genes belonging to these five is a particularly interesting topic that warrants further
different patterns. It is obvious that these five clusters investigation.
correspond to the five cell cycle phases.
ACKNOWlEDGMENT
FUTURE TRENDS
This work is supported by the Hong Kong Research
Microarray data is usually represented as a matrix with Grant Council (Projects CityU 122506).
rows and columns correspond to genes and conditions
respectively. Conventional clustering algorithms can
be applied to either rows or columns but not simultane- REFERENCES
ously. However, an interesting cellular process is often
active only in a subset of conditions, or a single gene Armstrong, S.A, Staunton, J.E., Silverman, L.B, Piet-
may participate in multiple pathways that may not be ers, R., den Boer, M.L., Minder, M.D., Sallan, S.E.,
co-active under all conditions. Biclustering methods Lander, E.S., Golub, T.R. & Korsmeyer, S.J. (2002).
allow the clustering of rows and columns simultane- MLL translocations specify a distinct gene expression
ously. Existing biclustering algorithms often iteratively profile that distinguishes a unique leukemia. Nature
search for the best possible sub-grouping of the data Genetics. 30: 41-47.
by permuting rows and columns of the data matrix
such that an appropriate merit function is improved Chen, T., Filkov, V., & Skiena, S.S. (1999). Identifying
(Madeira & Oliveira, 2004). When different bicluster gene regulatory networks from experimental data. Pro-
patterns co-exist in the data, no single merit function ceedings of the Third Annual International Conference
can adequately cater for all possible patterns. Although on Computational Molecular Biology RECOMB99,
recent approach such as geometric biclustering (Gan, Lyon, France. 94-103.
Liew & Yan, 2008) has shown great potential, much Cheng, K.O., Law, N.F., Siu, W.C., & Lau, T.H. (2007).
work is still needed here. It is also important to de- BiVisu: Software Tool for Bicluster Detection and
294
Cluster Analysis of Gene Expression Data
295
Cluster Analysis of Gene Expression Data
also the features of the objects are clustered. If the data and merge them into successively larger clusters. Divi-
is represented in a data matrix, the rows and columns sive algorithms begin with the whole set and proceed
are clustered simultaneously. to divide it into successively smaller clusters.
Cluster Analysis: An exploratory data analysis K-Means Clustering: K-means clustering is the
technique that aims at finding groups in a data such most well-known partition-based clustering algorithm.
that objects in the same group are similar to each other The algorithm starts by choosing k initial centroids,
while objects in different groups are dissimilar. usually at random. Then the algorithm alternates
between updating the cluster assignment of each data
DNA Microarray technology: A technology that al-
point by associating with the closest centroid and
lows massively parallel, high throughput genome-wide
updating the centroids based on the new clusters until
profiling of gene expression in a single hybridization
convergence.
experiment. The method is based on the complementary
hybridization of DNA sequence. SSMCL Clustering: A partition-based clustering
algorithm that is based on the one-prototype-take-one-
Gene Expression: Gene expression is the process by
cluster (OPTOC) learning paradigm. OPTOC learning is
which a genes DNA sequence is converted into func-
achieved by constructing a dynamic neighborhood that
tional proteins. Some genes are turned on (expressed)
favors patterns inside the neighborhood. Eventually, the
or turned off (repressed) when there is a change in
prototype settles at the center of a natural cluster, while
external conditions or stimuli.
ignoring competitions from other clusters. Together
Hierarchical Clustering: A clustering method that with the over-clustering and merging process, SSMCL
finds successive clusters using previously established is able to find all natural clusters in the data.
clusters. Hierarchical algorithms can be agglomerative
(bottom-up) or divisive (top-down). Agglomerative al-
gorithms begin with each element as a separate cluster
296
297
Hai-Dong Meng
Inner Mongolia University of Science and Technology, China
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Clustering Algorithm for Arbitrary Data Sets
298
Clustering Algorithm for Arbitrary Data Sets
K-means, a partitioning method, is one of the most Density-based clustering locates regions of high density
commonly used clustering algorithms, but it does not that are separated from one another by regions of low
perform well on data with outliers or with clusters of density. DBSCAN is a typical and effective density-
different sizes or non-globular shapes. This clustering based clustering algorithm which can find different
method is the most suitable for capturing clusters with types of clusters, identify outliers and noise, but it can
globular shapes, but this approach is very sensitive to not handle clusters of varying density. DBSCAN can
noise and cannot handle clusters of varying density. have difficulty with density if the density of clusters
and noise varies widely. Consider Figure 1, which il-
Hierarchical Method lustrates three clusters embedded in noises of unequal
densities. The noise around the clusters A and B has the
Hierarchical clustering techniques are a second im- same density as cluster C. If the Eps threshold is high
portant category of clustering methods, but the most enough such that DBSCAN finds A and B as separate
commonly used methods are agglomerative hierarchical clusters, and the points surrounding them are marked
algorithms. Agglomerative hierarchical algorithms are as noise, then C and the points surrounding it will also
expensive in terms of their computational and storage be marked as noise.
requirements. The space and time complexity of such
algorithms severely restricts the size of the data sets An Algorithm Based on Density and
that can be used. Agglomerative hierarchical algorithms Density-Reachable
identify globular clusters but cannot find non-globular
clusters, and also cannot identify any outliers or noise Based on the notions of density and density reachable
points. CURE relies on an agglomerative hierarchical (Meng & Zhang, 2006), (Meng & Song, 2005), a cluster-
scheme to perform the actual clustering. The distance ing algorithm that can find clusters of arbitrary shapes
between two clusters is defined as the minimum dis- and sizes, handle clusters and noises of varying density,
tance between any two of their representative points minimize the reliability of domain knowledge, and iden-
(after the representative points are shrunk toward their tify outliers efficiently, can be designed. A prerequisite
to this is the provision of some key definitions:
10
0
of each data points on the density of point xi;
0 20 40 60 80 100 and is the density adjustment parameter which
299
Clustering Algorithm for Arbitrary Data Sets
Exhibit A.
AlgorithmClusteringalgorithmbasedondensityanddensityreachableCADD
InputDataset,adjustmentcoefficientofdensityreachabledistance.
OutputNumberofclusters,themembersofeachcluster,outliersornoisepoints.
Method
1Computethedensitiesofeachdatapointsandconstructoriginaldatachaintableofclustering
objects.
2i1
3repeat
4SeekthemaximumdensityattractorODensityMaxiintheoriginaldatachaintableofclustering
objectsasthefirstclustercenterofCi.
5AssigntheobjectsinthedatachainwhicharedensityreachablefromODensityMaxitocluster&Land
atthesametimedeletetheclusteredobjectsformoriginaldatachaintable.
6ii+1
300
Clustering Algorithm for Arbitrary Data Sets
Figure 2. Clustering result of winded clusters storage, and 80GB hard disk. All objects in the data
set possessed 117 attributes. C
c1 c2 out
The basic time complexity of DBSCAN is O(n
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 out
20
200
180
time to find objects in the Eps-neighborhood, where
15
160
140
n is the number of data objects). In the worst case, this
10
120
100
complexity is O(n2). However, in low-dimensional
80
60
spaces, there are data structures, such as kd-trees, that
5 40
20
allow efficient retrieval of all objects within a given
0
0 5 10 15
0
0
distance of a specified point, and the time complexity
50 100 150 200
5 10 15
0
0 50 100 150 200
As many different clustering algorithms have been
developed in a variety of domains for different types of
Clustering result of winded clusters Figure 3 Clustering result of complex clusters
applications, details of experimental clustering results
301
Clustering Algorithm for Arbitrary Data Sets
are discussed in many academic texts and research protein interaction networks. BMC Bioinformatics, 4.
papers. None of these algorithms are suitable for http://www.biomedcentral.com/1471-2105/4/2
every kind of data, clusters, and applications, and so
Ben-Dor, A., Shamir, R., & Yakhini, Z. (1999). Cluster-
there is significant scope for developing new cluster-
ing gene expression patterns. Journal of computational
ing algorithms that are more efficient or better suited
Biology, 6, 281-297.
to a particular type of data, cluster, or application. This
trend is likely to continue, as new sources of data are Ertz, L., Steinbach, M., & Kumar, V. (2003). Find-
becoming increasingly available. Wireless Sensor Net- ing Clusters of Different Size, Shapes and Densities
works (WSNs) are an example of a new technology that in Noisy High Dimensional Data. In Proc. of the 3rd
can be deployed in a multitude of domains; and such SIAM International Conference on Data mining.
networks will almost invariably give rise to significant
quantities of data that will require sophisticated analysis Ertz, L., Steinbach, M., & Kumar, V. (2002). A New
if meaningful interpretations are to be attained. Shared Nearest Neighbor Clustering Algorithm and
its Applications. In: Proceedings of the Work-shop on
Clustering High Dimensional Data and its Applications,
Second SIAM International Conference on data mining,
CONClUSION
Arlington, VA, USA.
Clustering forms an important component in many Karypis, G., Han, E-H., & Kumar,V. (1999), CHA-
applications, and the need for sophisticated, robust MELEON: A Hierarchical Clustering Algorithm Using
clustering algorithms is likely to increase over time. Dynamic Modeling. IEEE Computer 32(8), 68-75.
One example of such an algorithm is CADD. Based
on the concepts of density and density-reachable, Meng, H-D., & Zhang, Y-Y. (2006). Improved Clus-
it is overcomes some of the intrinsic limitations of tering Algorithm Based on Density and Direction.
traditional clustering mechanisms, and its improved Computer Engineering and Applications, 42(20),
computational efficiency and scalability have been 154-156.
verified experimentally. Meng, H-D., & Song, Y-C. (2005). The implementa-
tion and application of data mining system based on
campus network. Journal on communications, 26(1A),
ACKNOWlEDGMENT 185-187.
This material is based upon works supported by The Qiu, B.Z., Zhang, X-Z., & Shen, J-Y. (2005). Grid-
National Funded Project of China (No. 06XTQ011), based clustering algorithm for multi-density. In: Pro-
and the China Scholarship Council. ceedings of 2005 International Conference on Machine
Learning and Cybernetics, 3, 1509- 1512.
Sander, J. O., Ester, M., Kriegel, H-P., & Xu, X. W.
(1998). Density-Based Clustering in Spatial Data sets:
REFERENCES The Algorithm GDBSCAN and Its Applications. Data
mining and Knowledge Discovery, 2, 169194.
Adami, G., Avesani, P., & Sona, D. (2005). Clustering
documents into a web directory for bootstrapping a Song, Y-C., & Meng, H-D. (2005, July). The design of
supervised classification. Data & Knowledge Engineer- expert system of market basket analysis based on data
ing, 54, 301-325. mining. Market modernization, 7,184-185.
Ayad, H., Kamel, M. (2003). Finding Natural Clusters Song, Y-C., & Meng, H-D. (2005, June). The construc-
Using Multi-clusterer Combiner Based on Shared Near- tion of knowledge base based on data mining for market
est Neighbors. Springer Berlin, 2709, 159-175. basket analysis. Market modernization, 6,152-153.
Bader, G. D. & Hogue, C. W. (2003). An automated Steinbach, M., Tan, P-N., Kumar, V., Klooster, S., &
method for finding molecular complexes in large Potter, C. (2003) Discovery of climate indices using
clustering. Proceedings of the ninth ACM SIGKDD
302
Clustering Algorithm for Arbitrary Data Sets
international conference on Knowledge discovery and Centre-Based Clusters: Each object in a centre-
data mining, Washington, D.C., 446--455 based cluster is closer to the centre of the cluster than C
to the centres of any other clusters.
Tan, P-N., Steinbach, M., & Kumar, V. (2006). In-
troduction to Data mining. Post & Telecom Press of Contiguity-Based Clusters: Each object in a con-
P.R.China (Chinese Edition), 359-371. tiguity-based cluster is closer to some other object in
the cluster than to any point in a different cluster.
Zhao, Y-C., Song, M., Xie, F., & Song, J-D. (2003).
Clustering Datasets Containing Clusters of Various Density-Based Clusters: Each object in a density-
Densities. Journal of Beijing University of Posts and based cluster is closer to some other object within its
Telecommunications, 26, 42-47. Eps neighbourhood than to any object not in the cluster,
resulting in dense regions of objects being surrounded
by regions of lower density.
KEy TERMS
Eps: Maximum radius of the neighbourhood.
Cluster Analysis: Cluster analysis groups data
Well-Separated Cluster: A cluster is a set of objects
objects based only on information found in the data
in which each object is significantly closer (or more
that describes the objects and their relationships.
similar) to every other object in the cluster than to any
CADD: A clustering algorithm based on the con- object not in the cluster.
cepts of Density and Density-reachable.
303
304
Khan M. Iftekharuddin
University of Memphis, USA
E. Olusegun George
University of Memphis, USA
David J. Russomanno
University of Memphis, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
CNS Tumor Prediction Using Gene Expression Data Part I
false positive statement) through the use of P-value TUMOR-SPECIFIC GENE EXPRESSION
adjustments. Pollard et al. (2003) and Van der Laan ESTIMATION AND VISUAlIZATION C
et al. (2004a, 2004b, 2005c) proposed methods to TECHNIQUES
control family-wise error rates based on the bootstrap
resampling technique of Westfall & Young (1993). To illustrate our procedure, we use the a microarray
Benjamini & Hochberg (1995), Efron et al. (2001) and data set by Pomeroy et al. (2002) of patients with dif-
Storey et al. (2002, 2003a, 2003b, 2004) introduced ferent types of embryonal tumors. The patients include
various techniques for controlling the false discovery 60 children with medulloblastomas, 10 young adults
rate (FDR), which is defined as the expected rate of with malignant gliomas, 5 children with AT/RTs, 5 with
falsely rejecting the null hypotheses of no differential renal/extra-renal rhabdoid tumors, and 8 children with
gene expression. These adjustment techniques have supratentorial PNETs. First, we preprocess the data
gained prominence in statistical research relating to to remove extraneous background noise and array ef-
microarray data analysis. Here, we use FDR control fects. To facilitate our analysis, we divide the dataset
because it is less conservative than family-wise error into groups as shown in Fig. 1. We rescale the raw
rates for adjusting the observed P-values for false dis- expression data obtained from Affymetrixs GeneChip
covery. In addition, we propose a novel marker gene to account for different chip intensities.
visualization technique to explore appropriate cutoff Microarray data typically suffer from unwanted
selection in the marker gene selection process. sources of variation, such as large-and-small-scale
Before performing formal analysis, one should intensity fluctuations within spots, non-additive
identify the actual gene expression levels associated background, fabrication artifacts, probe coupling and
with different tissue groups and discard or minimize processing procedures, target and array preparation in
other sources of variations. Such an approach has been the hybridization process, background and over-shining
proposed by Townsend & Hartl (2002) who use a Bayes- effects, and scanner settings (McLachlan & Ambroise,
ian model with both multiplicative and additive small 2005). To model these variations, a number of methods
error terms to detect small, but significant differences have been reported in the literature (Kerr et al., 2000,
in gene expressions. As an alternative, an ANOVA Lee et al., 2000, Pavlidis, 2001, Wolfinger et al., 2001,
model appears to be a natural choice for estimating Ranz et al., 2003, Townsend, 2004, Tadesse et al., 2005).
true gene expression (Kerr et al., 2000, Pavlidis et al., An ANOVA model similar to the one used by Kerr el
2001, Wolfinger et al. 2001). In the context of cDNA al. (2000) is adopted in our current work and facilitates
microarray data, the ANOVA model was first proposed obtaining the tumor-specific gene expression measures
by Kerr et al. (2000). from the preprocessed microarray data. Our two-way
ANOVA model is given as:
Data Set
305
CNS Tumor Prediction Using Gene Expression Data Part I
where, yjgk denotes the log of the gene expression (Wilcoxon, 1945), for the two categories problem and
measure for the kth replicate of the gth gene in the jth the Kruskal-Wallis (NIST, 2007) for the five categories
tumor group (k = 1,, Kj; g = 1,, G; j = 1,, J); m, problem in our dataset to identify significantly dif-
ag, bj, gjg refer to the overall effect, the gth gene main ferentially expressed genes among the tissue sample
effect, the jth tumor-group main effect, and the gth types. We adjust for multiplicity of the tests involved
gene - jth tumor group interaction effect, respectively, by controlling the False Discovery Rate (FDR) using
and jgk is a random error with zero mean and constant q-values as proposed by Storey et al. (2004).
variance. We assume that these terms are independent; In the selection of differentially expressed genes,
however, we do not make any assumption about their a tight cutoff may miss some of the important marker
distribution. genes, while a generous threshold increases the number
This model assumes that background noise and array of false positives. To address this issue, we use the
effects have been eliminated at previous preprocessing parallel coordinate plot (Inselberg et al., 1990) of the
steps. This assumption fits well with our preprocessed group-wise average genes expressions. In this plot, the
data. A reason for selecting the model is that it fits well parallel and equally spaced axes represent individual
with our goal to build the suitable tumor prototypes for genes. Separate polylines are drawn for each group of
prediction. We believe that tumor prototypes should be tumor samples. The more the average gene expression
built based only on the tumor-specific gene expression levels differ between groups, the more space appears
measures. In this model, the interaction term gjg con- among the polylines in the plot. To effectively visualize
stitutes the actual gene expression of gene g attributed the differentially expressed genes, we first obtain the
to the tumor type j (McLachlan & Ambroise, 2005). average of the tumor-specific gene expression values
Hence, the value of the contribution to the tumor spe- within any specific tissue sample type jg as speci-
cific gene expression value by the kth replication of the fied in Equation (3). We then standardize the average
measurement on gene g in tissue j may be written as gene expression values jg obtained in Equation (3)
given in (McLachlan & Ambroise, 2005), as: as follows:
306
CNS Tumor Prediction Using Gene Expression Data Part I
Self-Organizing Map (SOM) (Kohonen, 1987) analysis expression patterns of the marker genes associated with
of variance approach is exploited for this partitioning. medulloblastoma survivor and failure groups is shown C
Then, within each of the subgroups, genes are ordered in Figs. 2 and 3. The solid and dotted lines represent the
according to 1g 2 g . This further partitioning and failure (death) and survivor (alive) groups, respectively.
ordering method confers suitable shapes to the tumor- Genes are selected using the Wilcoxon method, which
type representing lines such that the user can quickly was previously described, wherein depending on the
visualize the tumor-type discriminating power of the q-values, different numbers of genes are selected. In
selected genes. Before generating the final plot we both Figs. 2 and 3, each individual graph represents a
normalize
the standardized average expression values group of similarly expressed genes clustered together
jg as follows: as specified in step 5 of Algorithm 1. Figs. 2 and 3
show 280 and 54 selected marker genes, respectively.
We observe that within 280 selected genes in Fig. 2,
min ( jg )
jg many show similar expression patterns in both failure
~ = j,g
jg (7) and survivor sample groups indicating that two sample
max( jg ) min ( jg )
j,g j,g groups are close to one another on the parallel axes.
Since each axis represents a different gene, we conclude
Finally, the normalized expression values ~jg are that many of the genes in Fig. 2 are falsely identified
plotted using parallel coordinates, where each parallel as marker genes (false positives). In comparison, the
axis corresponds to a specific gene and each polyline solid and dotted lines are far apart on most of the paral-
corresponds to a specific tumor type. Each genes lel axes in Fig. 3. This indicates that the average gene
subgroups are plotted in separate plots. Algorithm 1 expression values of the selected 54 genes are quite
specifies our formalized approach to gene visualization. different between the two groups of tissue samples.
The purpose of such plots is to qualitatively measure Thus, this visualization aids in selecting the correct
the performance of the gene selection process and to threshold in the marker gene selection process.
find the appropriate cutoff that reduces the number
of false positives while keeping the number of true
positives high.
The following results illustrate the usefulness of
our visualization method provided in Algorithm 1. The
Algorithm 1 (DATA) to visualize the expression pattern of the selected marker genes
DATA contains the expression levels of the marker genes for all the patient samples; where each row represents different gene
expression values and each column represents different patient samples.
1. Sort the genes in descending order according to the values of q-values and select the top G genes from the sorted list.
2. Estimate the average of the tumor-specific gene expression values jg within any specific tissue sample type using Eq. (3)
3. Obtain the standardized average gene expression values jg using Eq. (4).
4. Partition the genes into two groups: (i) C1 where 1g 2 g and (ii) C2 where 1g < 2g .
5. Within each group Cc (c=1, 2), again partition the gene expression values jg into P clusters {Cc1 Ccp} (c=1, 2) exploiting SOM , where
each cluster consists of a group of similarly expressed genes.
307
CNS Tumor Prediction Using Gene Expression Data Part I
Figure 2. Expression patterns of the marker genes associated with medulloblastoma survivor and failure groups.
Genes are selected using Wilcoxon method and FDR is controlled using q-values, where depending on the q-
values, different numbers of genes are selected 280 genes.
0 0 0
0 5 10 15 0 5 10 15 0 10 20
G enes G enes G enes
1 1 1
Avg. G ene E xp.
0 0 0
0 20 40 60 0 5 10 15 0 2 4 6
G enes G enes G enes
1 1 1
Avg. G ene E xp.
0 0 0
0 2 4 6 0 5 10 0 10 20 30
G enes G enes G enes
Figure 3. Expression patterns of the marker genes associated with medulloblastoma survivor and failure groups.
Genes are selected using Wilcoxon method and FDR is controlled using q-values, where depending on the q-
values, different numbers of genes are selected 54 genes.
1 1
Avg. G ene E xp.
Death
Alive
0.5 0.5
Death
0 0 Alive
0 5 10 15 0 5 10 15
G enes G enes
1 1
Death
Avg. G ene E xp.
Alive
0.5 0.5
Death
0 0 Alive
0 5 10 15 20 0 5 10 15
G enes G enes
308
CNS Tumor Prediction Using Gene Expression Data Part I
of adopting a hierarchical Bayesian approach to our Inselberg, A. & Dimsdale, B. (1990). Parallel Co-
work following a few more recent relevant works by ordinates: A Tool for Visualizing Multidimensional C
(Ibrahim et al., 2002, Lewin et al., 2006). Geometry. In Proceedings of IEEE Conference on
Visualization, 361-378.
Islam, A., & Iftekharuddin, K. (2005). Identification
CONClUSION
of Differentially Expressed Genes for CNS Tumor
Groups and Their Utilization in Prognosis. Memphis
We attempted to estimate tumor-specific gene expres-
Bio-Imaging Symposium.
sion measures using an ANOVA model. These estimates
are then used to identify differentially expressed marker Kerr, Martin, Churchill. (2000). Analysis Of Variance
genes. For evaluating the marker gene identification, for Gene Expression Microarray Data. Journal of
we proposed a novel approach to visualize the average Computational Biology, 7, 819-837.
genes expression values for a specific tissue type. The
Kohonen, T. (1987). Self-Organization and Associative
proposed visualization plot is useful to qualitatively
Memory. Springer-Verlag, 2nd Edition.
evaluate the performance of marker gene selection
methods, as well as to locate the appropriate cutoffs in Lee, M.L.T., Kuo, F.C., Whitmore, G.A., & Sklar, J.
the selection process. The research in this chapter was (2000). Importance of Replication in Microarray Gene
supported in part through research grants [RG-01-0125, Expression Studies: Statistical Methods and Evidence
TG-04-0026] provided by the Whitaker Foundation with from Repetitive cDNA Hybridizations. Proceedings of
Khan M. Iftekharuddin as the principal investigator. the National Academy of Sciences, 97, 9834-9838.
Lewin, A., Richardson, S., Marshall, C., Glazier, A., &
Aitman, T. (2006). Bayesian Modeling of Differential
REFERENCES
Gene Expression. Biometrics, 62(1), 10-18.
Benjamini, Y., & Hochberg, Y. (1995). Controlling McLachlan, G.J. & Ambroise, K.D. (2005). Analyzing
the False Discovery Rate: A Practical and Powerful Microarray Gene Expression Data. John Wiley.
Approach to Multiple Testing. Journal of the Royal
Statistical Society, Series B, 57(1), 289-300. NIST/SEMATECH. (2007). e-Handbook of Statistical
Methods. http://www.itl.nist.gov/div898/handbook.
Boom, J., Wolter, M., Kuick, R., Misek, D.E., Youkilis,
A.S., Wechsler, D.S., Sommer, C., Reifenberger, G., Park, P., Pagano, M., & Bonetti, M. (2001). A Nonpara-
& Hanash, S.M. (2003). Characterization of Gene Ex- metric Scoring Algorithm for Identifying Informative
pression Profiles Associated with Glioma Progression Genes from Microarray Data. Pacific Symposium on
Using Oligonucleotide-Based Microarray Analysis and Biocomputing, 6, 52-53.
Real-Time Reverse Transcription-Polymerase Chain Pavlidis, P., & Noble, W.S. (2001). Analysis of Strain
Reaction. American Journal of Pathology, 163(3), and Regional Variation of Gene Expression in Mouse
1033-1043. Brain. Genome Biology, 2, 0042.1-0042.15.
Dettling, M., & Buhlmann, P. (2002). Supervised Clus- Pollard, K.S., & Van der Laan, M.J. (2005). Resam-
tering of Genes. Genome Biology, 3(12), 1-15. pling-based Multiple Testing: Asymptotic Control of
Efron, B., Tibshirani, R., Storey, J.D., & Tusher, V. Type I Error and Applications to Gene Expression
(2001). Empirical Bayes Analysis of a Microarray Data. Journal of Statistical Planning and Inference,
Experiment. Journal of American Statistics Associa- 125, 85-100.
tion, 96(115), 160. Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla,
Ibrahim, J.G., Chen, M.H., & Gray, R.J. (2002). Bayes- L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.,
ian Models for Gene Expression with DNA Microarray Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C.,
Data. Journal of the American Statistical Association, Zagzag, D., Olson, J.M., Curran, T., Wetmore, C.,
Applications and Case Studies, 97(457), 88-99. Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R.,
Califano, A., Stolovitzky, G., Louis, D.N., Mesirov,
309
CNS Tumor Prediction Using Gene Expression Data Part I
J.P., Lander, E.S., & Golub, T.R. (2002). Prediction of the Proportion of False Positives. Statistical Applica-
Central Nervous System Embryonal Tumor Outcome tions in Genetics and Molecular Biology, 3(1), 15.
Based on Gene Expression. Nature, 415, 436-442.
Van der Laan, M.J., Dudoit, S., & Pollard, K.S. (2004b).
Ranz, J.M., Castillo-Davis, C.I., Meiklejohn, C.D., & Multiple Testing. Part I. Single-Step Procedures for
Hartl, D.L. (2003). Sex-Dependent Gene Expression Control of General Type I Error Rates. Statistical Appli-
and Evolution of the Drosophila Transcription. Science, cations in Genetics and Molecular Biology, 3(1), 13.
300, 1742-1745.
Van der Laan, M.J., Dudoit, S., & Pollard, K.S. (2004c).
Shapiro, S.S. & Wilk. M.B. (1965). An Analysis of Multiple Testing. Part II. Step-Down Procedures for
Variance Test for Normality (Complete Samples). Control of the Family-Wise Error Rate. Statistical
Biometrika, 52(3), 591-611. Applications in Genetics and Molecular Biology,
3(1), 14.
Storey, J.D. (2002). A Direct Approach to False Dis-
covery Rates. Journal of the Royal Statistical Society, Westfall, P.H., & Young, S.S. (1993). Resampling Based
Series B, 64, 479-498. Multiple Testing: Examples and Methods for P-Value
Adjustment. New York: Wiley.
Storey, J.D., & Tibshirani, R. (2003a). Statistical Signifi-
cance for Genome-Wide Experiments. Proceedings of Wilcoxon, F. (1945). Individual Comparisons by Rank-
the National Academy of Sciences, 100, 9440-9445. ing Methods. Biometrics, 1, 80-83.
Storey, J.D. (2003b). The Positive False Discovery Wolfinger, R.D., Gibson, G., Wolfinger, E .D., Ben-
Rate: A Bayesian Interpretation and the Q-Value. An- nett, L., Hamadeh, H., Bushel, P., Afshari, C., & Paules,
nals of Statistics, 31, 2013-2035. R.S. (2001). Assessing Gene Significance from cDNA
Microarray Expression Data via Mixed Models. Journal
Storey, J.D., Taylor, J.E., & Siegmund, D. (2004).
of Computational Biology, 8, 625-637.
Strong Control, Conservative Point Estimation, and
Simultaneous Conservative Consistency of False Yang, Y., Guccione, S., & Bednarski, M.D. (2003).
Discovery Rates: A Unified Approach. Journal of the Comparing Genomic and Histologic Correlations to
Royal Statistical Society, Series B, 66, 187-205. Radiographic Changes in Tumors: A Murine SCC
Vll Model Study. Academic Radiology, 10(10), 1165-
Tadesse, M.G., Ibrahim, J.G., Gentleman, R., & Chi-
1175.
aretti, S. (2005). Bayesian Error-In-Variable Survival
Model for the Analysis of Genechip Arrays. Biometrics,
61, 488-497. KEy TERMS
Townsend, J.P., & Hartl, D.L. (2002). Bayesian Analysis
DNA Microarray: A collection of microscopic
of Gene Expression Levels: Statistical Quantification
DNA spots, commonly representing single genes,
of Relative mRNA Level across Multiple Strains or
arrayed on a solid surface by covalent attachment to
Treatments. Genome Biology, 3, 1-71.
chemically suitable matrices.
Townsend, J.P. (2003). Multifactorial Experimental
False Discovery Rate (FDR): Controls the expected
Design and the Transitivity of Ratios with Spotted
proportion of false positives instead of controlling
DNA Microarrays. BMC Genomics, 4, 41.
the chance of any false positives. An FDR threshold
Townsend J.P. (2004). Resolution of Large and Small is determined from the observed p-value distribution
Differences in Gene Expression Using Models for the from multiple single hypothesis tests.
Bayesian Analysis of Gene Expression Levels and Spot-
Histologic Examination: The examination of tissue
ted DNA Microarrays. BMC Bioinformatics, 5, 54.
specimens under a microscope.
Van der Laan, M.J., Dudoit, S., & Pollard, K.S. (2004a).
Kruskal-Wallis Test: A nonparametric mean test
Augmentation Procedures for Control of the General-
which can be applied if the number of sample groups is
ized Family-Wise Error Rate and Tail Probabilities for
more than two,unlike the Wilcoxon Rank Sum Test.
310
CNS Tumor Prediction Using Gene Expression Data Part I
311
312
Khan M. Iftekharuddin
University of Memphis, USA
E. Olusegun George
University of Memphis, USA
David J. Russomanno
University of Memphis, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
CNS Tumor Prediction Using Gene Expression Data Part II
1999, Alizadeh et al., 2000) considered the correlation expression measures. Then, we obtain eigengene ex-
among the genes in classifying and/or predicting tissue pressions measures from each individual gene group by C
samples. Moreover, none of these provided any sys- projection of gene expressions into the first few principal
tematic way of handling the probable subgroups within components. At the end of this step, we replace the
the known groups. In this chapter, we consider both gene expression measurements with eigengene expres-
correlations among the genes and probable subgroups sion values that conserve correlations between strongly
within the known groups by forming appropriate tumor correlated genes. We then divide the tissue samples of
prototypes. Further, a major drawback of these analyses known tumor types into subgroups. The centroids of
(Dettling et al., 2002, Eisen et al., 1998, Herwig et al., these subgroups of tissue samples with eigengene ex-
1999, Tomayo et al., 1999, Dembele et al., 2003, Golub pressions represent the prototype of the corresponding
et al., 1999, Alon et al., 1999, Alizadeh et al., 2000) tumor type. Finally, any new tissue sample is predicted
is insufficient normalization. Although, most of these as the tumor type of the closest centroid. This proposed
methods normalize the dataset to remove the array ef- novel prediction scheme considers both the correlation
fects; they do not concentrate on removing other sources among the highly correlated genes and the probable
of variations present in the microarray data. phenotypic subgrouping within the known tumor
Our primary objective in this chapter is to develop types. These issues are often ignored in the literature
an automated prediction scheme for CNS tumors, for predicting tumor categories. The detail of the steps
based on DNA microarray gene expressions of tissue up to the identification of marker genes are provided in
samples. We propose a novel algorithm for deriving the previous chapter entitled: CNS Tumor Prediction
prototypes for different CNS tumor types, based on Af- Using Gene Expression Data Part I. In this section, we
fymetrix HuGeneFL microarray gene expression data provide the details of the subsequent steps.
from Pomeroy et al. (2002). In classifying the CNS Now, we discuss the creation of the tumor proto-
tumor samples based on gene expression, we consider types using the tumor-specific expression values of
molecular information, such as the correlations among our significantly differentially expressed marker genes
gene expressions and probable subgroupings within the identified in the previous step. Many of the marker
known histological tumor types. We demonstrate how genes are likely to be highly correlated. Such correla-
the model can be utilized in CNS tumor prediction. tions of the genes affect successful tumor classification.
However, this gene-to-gene correlation may provide
important biological information. Hence, the inclusion
CNS TUMOR PROTOTYPE FOR of the appropriate gene-to-gene correlations in the
AUTOMATIC TUMOR DETECTION tumor model may help to obtain a more biologically
meaningful tumor prediction. To address this non-trivial
The workflow to build the tumor prototypes is shown need, we first group the highly correlated genes using
in Fig. 1. In the first step, we obtain the tumor-type- the complete linkage hierarchical approach wherein
specific gene expression measures. Then, we identify correlation coefficient is considered as the pair-wise
the marker genes that are significantly differentially similarity measure of the genes. Next, for each of the
expressed among tissue types. Next, a visualization clusters, we compute the principal components (PCs)
technique is used to analyze the appropriateness of and project the genes of the corresponding cluster onto
the marker gene selection process. We organize the the first 3 PCs to obtain eigengene expressions (Speed,
marker genes in groups so that highly correlated genes 2003). Note that the PCs and the eigengene expres-
are grouped together. In this clustering process, genes sions are computed separately for each cluster. Such
are grouped based on their tumor-type-specific gene eigengenes encode the correlation information among
D e rivatio n o f
P re proce ssin g, E stim a tio n o f T um or C luste rin g w ith in
Ide n tifica tion o f C o rrelate d G e n e E ig en g en e B uild T u m o r
R e sca lin g , & S pe cific the sam e H isto lo gic
M a rke r G e n es C luste rin g E xp re ssio ns P ro to type s
F ilte ring G en e E xp re ssio n T u m o r T ype
fro m C luste rs
313
CNS Tumor Prediction Using Gene Expression Data Part II
the highly correlated genes that are clustered together. proposed prediction scheme and the method adopted in
Recently, molecularly distinct sub-grouping within the (Pomeroy et al., 2002). We observe that our prediction
same histological tumor type has been reported (Taylor scheme outperforms the other prediction method in all
et al., 2005). To find a subgrouping within the same three cases. The most noticeable difference is with data
histological tumor type, we again use self-organizing group C where we obtain 100% prediction accuracy
maps (SOMs) (Kohonen, 1987) to cluster the tissue compared with 78% accuracy. More detailed results
samples within each tumor group. This subgrouping and discussion can be found in (Islam et al., 2006a,
within each group captures the possible genotypic 2006b, 2006c).
variations within the same histological tumor type.
Now, the prototype of any specific histological tumor
type is composed of the centroid obtained from the FUTURE TRENDS
corresponding SOM grid. Algorithm 1 shows our steps
for building the tumor prototype. In this work, we estimated the tumor-specific gene
To predict the tumor category of any new sample, expression measure exploiting an ANOVA model
we calculate the distances between the new sample with the parameters as fixed effects. As discussed in
and each of the prototype subgroups obtained using the future trends section of the chapter entitled: CNS
Algorithm 1. The category of the sample is predicted Tumor Prediction Using Gene Expression Data Part I,
as that of the closest subgroup. The distance between we may consider all of the effects in our ANOVA model
the new sample and the xth subgroup, dx, is calculated as random and specify the random effect parameters
based on Euclidean distance as follows: in a Bayesian framework. Once the distributions of
tumor-specific gene expression measures are obtained
N
(1) with satisfactory confidence, more representative tumor
dx = (g
k =1
xk gk )2
prototypes may be obtained and a more accurate tumor
prediction scheme can be formalized. Representing the
where g xk is the center value of kth eigengene in the xth tumor prototypes with a mixture of Gaussian models
subgroup, gk is the expression measure of kth eigengene may provide a better representation. In that case, finding
of the new sample, and N is the total number of eigen- the number of components in the mixture is another
genes. This distance measure deliberately ignores the research question left for future work.
non-representative correlations among the eigengene
expressions since they are not natural and hence dif-
ficult to interpret. CONClUSION
Table 1 shows the efficacy of our model in classi-
fying five categories of tissues simultaneously. Table For automatic tumor prediction, we have proposed a
2 shows the performance comparison between our novel algorithm for building CNS tumor prototypes
1. Cluster genes into K partitions, C= {C1, C2,,Ck}, using Complete Linkage Hierarchical approach and correlation coefficient as the pair
wise similarity measure.
3. Considering the eigengene expressions as feature vectors, cluster each histological tumor group into subgroups exploiting SOM.
4. The set of centroid of the corresponding SOM grid is designated as the tumor prototype of that particular histological tumor type.
314
CNS Tumor Prediction Using Gene Expression Data Part II
315
CNS Tumor Prediction Using Gene Expression Data Part II
Eisen, M., Spellman, P., Brown, P., & Botstein, D. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kita-
(1998). Cluster Analysis and Display of Genome-Wide reewan, S., Dmitrovsky, E., Lander, E., & Golub, T.
Expression Patterns. Proceedings of National Academic (1999). Interpreting Patterns of Gene Expression with
of Science, 95(14), 863868. Self-Organizing Maps: Methods and Application to
Hematopoietic Differentiation. Proceedings of National
Golub, T., Slonim, D ., Tamayo, P., Huard, C., Gaasen-
Academic of Science, 96(6), 29072912.
beek, M., Mesirov, J., Coller, H., Loh, M., Downing,
J., Caligiuri, M., Bloomfield, C., & Lander, E. (1999). Taylor, M.D., Poppleton, H., Fuller, C., Su, X., Liu,
Molecular Classification of Cancer: Class Discovery Y., Jensen, P., Magdaleno, S., Dalton, J., Calabrese, C.,
and Class Prediction by Gene Expression Monitoring. Board, J., MacDonald, T., Rutka, J., Guha, A., Gajjar,
Science, 286, 531537. A., Curran, T., & Gilbertson, R.J. (2005). Radial Glia
Cells are Candidate Stem Cells of Ependymoma. Cancer
Herwig, R., Poustka, A., Mller, C., Bull, C., Lehrach,
Cell, 8(4), 323-335.
H., & OBrien, J. (1999). Large-Scale Clustering of
cDNA-Fngerprinting Data. Genome Research, 9(11),
10931105. KEy TERMS
Islam, A., & Iftekharuddin, K. (2005). Identification
DNA Microarray: Also known as a DNA chip, it
of Differentially Expressed Genes for CNS Tumor
is a collection of microscopic DNA spots, commonly
Groups and Their Utilization in Prognosis. Memphis
representing single genes, arrayed on a solid surface by
Bio-Imaging Symposium.
covalent attachment to chemically suitable matrices.
Islam, A., Iftekharuddin, K.M., & Russomanno, D.
False Discovery Rate (FDR): FDR controls the
(2006a). Visualizing the Differentially Expressed
expected proportion of false positives instead of control-
Genes. Proceedings of the Photonic Devices and Algo-
ling the chance of any false positives. A FDR threshold
rithms for Computing VIII, SPIE, 6310, 63100O.
is determined from the observed p-value distribution
Islam, A., Iftekharuddin, K.M., & Russomanno, D. from multiple single hypothesis tests.
(2006b). Marker Gene Selection Evaluation in Brain
Histologic Examination: The examination of tissue
Tumor Patients Using Parallel Coordinates. Proceedings
specimens under a microscope.
of the International Conference on Bioinformatics &
Computational Biology, 172-174. Kruskal-Wallis Test: This test is a nonparamet-
ric mean test which can be applied if the number of
Islam, A., Iftekharuddin, K.M., & George, E.O. (2006c).
sample group is more than two, unlike the Wilcoxon
Gene Expression Based CNS Tumor Prototype for
Rank Sum Test.
Automatic Tumor Detection. Proceedings of the For-
tieth Asilomar Conference on Signals, Systems and Parallel Coordinates: A multidimensional data
Computers, 846-850. visualization scheme that exploits 2D pattern recogni-
tion capabilities of humans. In this plot, the axes are
Kohonen, T. (1987). Self-Organization and Associative
equally spaced and are arranged parallel to one another
Memory, Springer-Verlag, 2nd Edition.
rather than being arranged mutually perpendicular as
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, in the Cartesian scenario.
L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.,
q-Values: A means to measure the proportion of
Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C.,
FDR when any particular test is called significant.
Zagzag, D., Olson, J.M., Curran, T., Wetmore, C.,
Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Self-Organizing Maps (SOMs): A method to
Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, learn to cluster input vectors according to how they
J.P., Lander, E.S., & Golub, T.R. (2002). Prediction of are naturally grouped in the input space. In its simplest
Central Nervous System Embryonal Tumor Outcome form, the map consists of a regular grid of units and
Based on Gene Expression. Nature, 415, 436-442. the units learn to represent statistical data described by
model vectors. Each map unit contains a vector used
Speed, T. (2003). Statistical Analysis of Gene Expres-
to represent the data. During the training process, the
sion Microarray Data. Chapman & Hall/CRC.
316
CNS Tumor Prediction Using Gene Expression Data Part II
317
318
Shun-ichi Amari
Brain Science Institute, Japan
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Combining Classifiers and Learning Mixture-of-Experts
with a number of typical topics or directions. Thereafter, accuracy estimates (Woods, Kegelmeyer, & Bowyer,
further studies have been further conducted on each 1997). Third, there are a large number of applications.
of these typical directions. First, theoretical analy- Due to space limit, details are referred to Ranawana &
ses have been made for deep insights and improved Palade (2006) and Sharkey & Sharkey (1999).
performances. For examples, convergence analysis
on the EM algorithm for the mixture based learning
are conducted in (Jordan & Xu, 1995; Xu & Jordan, A GENERAl ARCHITECTURE, TWO
1996). In Tumer & Ghosh (1996), the additive errors TASKS, AND THREE INGREDIENTS
of posteriori probabilities by classifiers or experts are
considered, with variances and correlations of these We consider a general architecture shown in Fig.1.
errors investigated for improving the performance of There are {e j ( x)}kj =1 experts with each ej(x) as either a
a sum based combination. In Kittler, et al (1998), the classifier or an estimator. As shown in Tab.2, a classi-
effect of these errors on the sensitivity of sum rule vs fier outputs one of three types of information, on which
product rule are further investigated, with a conclusion we have three levels of combination. The first two
that summation is much preferred. Also, a theoretical can be regarded as special cases of the third one that
framework is suggested for taking several combining outputs a vector of measurements. A typical example
rules as special cases (Kittler, 1998) , being unaware T
of that this framework is actually the mixture-of-ex- is [ p j (1 | x), , p j (m | x)] with each 1 p j ( | x) 0
perts model that was proposed firstly for combining expressing a posteriori probability that x is classified to
multiple function regressions in (Jacobs, et al, 1991) the -th class. Also, p j ( | x) = p j ( y = | x) can be further
and then for combining multiple classifiers in (Xu &
extended to p j ( y | x) that describes a distribution for a
Jordan, 1993). In addition, another theoretical study is
made on six classifier fusion strategies in (Kuncheva, regression x y R m . In Figure 1, there is also a gat-
k
2002). Second, there are further studies on Dempster- ing net that generates signals { j ( x)} j =1 to modulate
Shafer rule (Al-Ania, 2002) and other combing methods experts by a combining mechanism M(x).
such as rank based, boosting based, as well as local
319
Combining Classifiers and Learning Mixture-of-Experts
Based on this architecture, two essential tasks of of neural networks and machine learning, provided
expert combination could be still quoted from (Xu, with a structure for each ej(x), a gating structure, and
Krzyzak & Suen, 1992) with a slight modification on a combining structure M(x), all the rest unknowns are
Task 1, as shown in Tab.1, that the phrase `for a specific determined under guidance of a learning theory in term
application? should be deleted in consideration of the of minimizing an error cost. Such a minimization is
previously introduced studies (Hansen & Salamon, implemented via an optimizing procedure by a learn-
1990; Xu, Krzyzak, & Suen, 1991; Wolpert, 1992; N
Baxt, 1992, Breiman, 1992&94; Drucker, et al, 1994; ing algorithm, based on a training set {xt , yt }t =1 that
Tumer & Ghosh, 1996). teaches a target yt for each mapping xt R m . While in
Insights can be obtained by considering three basic the stream of combing classifiers, all { p j ( y | x)}kj =1 are
ingredients of two streams of studies, as shown in Fig.2. known without unknowns left to be specified. Also, M
Combinatorial choices of different ingredients lead to is designed according to certain heuristics or principles,
different specific models for expert combination, and with or without help of a training set, and studies are
differences in the roles by each ingredient highlight mainly placed on developing and analyzing different
the different focuses of two streams. In the stream combining mechanisms, for which we will further dis-
320
Combining Classifiers and Learning Mixture-of-Experts
where f(r) is a monotonic scalar function, and which was proposed firstly for combining multiple
regressions in (Jacobs, et al, 1991) and then for com-
aj > 0 j =1
k
=1 bining classifiers in (Xu & Jordan, 1993). For different
j
special cases of aj(x), we are lead to a number of existing
typical examples. As already pointed out in (Kittler, et
(Hardy, Littlewood, & Polya, 1952). al, 1998), the first three rows are four typical classifier
We can further generalize this f-mean to the general combining rules (the 2nd row directly applies to the
architecture shown in Fig.1, resulting in the following min-rule too). The next three rows are three types of
f-combination: ME learning models, and a rather systematic summary
321
Combining Classifiers and Learning Mixture-of-Experts
Table 3. Typical examples (including several existing rules and potential topics)
k p ( y, j | x) ,
p
j =1
M ( x) = k j ( y | x) ,
j =1
which is already in the framework of probability theory.
That is, both the sum rule and the product rule already
which is equal to the product rule (Xu, Krzyzak and
coexist in the framework of probability theory.
Suen, 1992; Kittler, et al, 1998, Hinton, 2002) if each
On the other hand, it can be observed that the
a priori is equal, i.e., aj(x) = 1/m. Generally if aj(x)
sum:
1/m, there is a difference by a scaling factor aj(x)1/k1.
The product rule works in a probability theory sense k
322
Combining Classifiers and Learning Mixture-of-Experts
be regarded as a relaxed logical AND that is beyond Thus, the discussions on the examples in Tab.2 are
the framework of probability theory when aj(x) 1/m. applicable to this fa(r). Moreover, the first row in Tab.2 C
However, staying within the framework of probability holds when = and = + for whatever a gating
theory does not mean that it is better, not only because net, which thus includes two typical operators of the fuzzy
it requires that classifiers are mutually independent, system as special case too.Also, the family is systematically
but also because there lacks theoretical analysis on modulated by a parameter +, which provides
both rules in a sense of classification errors, for which not only a spectrum from the most optimistic integration
further investigations are needed. to the most pessimistic integration as varying from
In Tab.2, the 2nd row of the third column is the = to = + but also a possibility of adapting a
harmonic mean. It can be observed that the problem for a best combining performance.
of combining the degrees of support is changed into Furthermore, Amari (2007) also provides a theoreti-
a problem of combining the degrees of disagree. This cal justification that a-integration is optimal in a sense
is interesting. Unfortunately, efforts of this kind are of minimizing a weighted average of a-divergence.
seldom found yet. Exceptionally, there are also ex- Moreover, it provides a potential road for studies on
amples that can not be included in the f-combination, combining classifiers and learning mixture models from
such as Dempster-Shafer rule (Xu, Krzyzak and Suen, the perspective of information geometry.
1992; Al-Ania, 2002) and rank based rule (Ho, Hull,
Srihari, 1994).
FUTURE TRENDS
323
Combining Classifiers and Learning Mixture-of-Experts
on combining classifiers and learning mixture models, Jacobs, R. A., Jordan, M. I. , Nowlan, S. J., & Hinton,
but also a general combination framework to unify a G. E., (1991), Adaptive mixtures of local experts,
number of classifier combination rules and mixture Neural Computation 3, 79-87.
based learning models, as well as a number of direc-
Jordan, M. I. , & Jacobs, R. A., (1994), Hierarchical
tions for further investigations.
mixtures of experts and the EM algorithm, Neural
Computation 6, 181214.
ACKNOWlEDGMENT Jordan, M.I., & Xu, L., (1995), Convergence Results
for The EM Approach to Mixtures of Experts Archi-
The work is supported by Chang Jiang Scholars Program tectures, Neural Networks 8, 1409-1431.
by Chinese Ministry of Education for Chang Jiang
Kittler, J., Hatef, M., Duinand, R., P., W. , Matas, J.,
Chair Professorship in Peking University.
(1998) On combining classifiers, IEEE Trans. Pattern
Analysis Machine Intelligence 20(3), 226239.
REFERENCES Kittler, J., (1998) , Combining classifiers: A theoreti-
cal framework, Pattern Analysis and Applications 1,
Al-Ania , A., (2002), A New Technique for Combin- 18-27.
ing Multiple Classifiers using The Dempster-Shafer
Kuncheva, L.,I., (2002), A theoretical study on six
Theory of Evidence, Journal of Artificial Intelligence
classifier fusion strategies, IEEE Transactions Pattern
Research 17, 333-361.
Analysis Machine Intelligence 24(2), 281286.
Amari, S., (2007), Integration of stochastic models
Lee, C., Greiner, R., Wang, S., (2006), Using query-spe-
by minimizing a-divergence, Neural Computation
cific variance estimates to combine Bayesian classifiers,
19(10), 2780-2796.
Proc. of the 23rd International Conference on Machine
Baxt, W. G. (1992), Improving the accuracy of an arti- Learning (ICML), Pittsburgh, June (529-536).
ficial neural network using multiple differently trained
Perrone, M.P. , & Cooper, L.N.,(1993), When networks
networks, Neural Computation 4, 772-780.
disagree: Ensemble methods for neural networks,
Breiman, L., (1992), Stacked Regression, Department Neural Networks for Speech and Image Processing,
of Statistics, Berkeley, TR-367. Mammone, R., J., editor, Chapman Hall.
Breiman, L., (1994), Bagging Predictors, Department Ranawana, R. & Palade, V., (2006), Multi-Classi-
of Statistics, Berkeley, TR-421. fier Systems: Review and a roadmap for developers,
International Journal of Hybrid Intelligent Systems
Drucker,H., Cortes,C., Jackel,L.D., LeCun,Y.& Vap-
3(1), 35 61.
nik, V., (1994) Boosting and other ensemble methods,
Neural Computation 6, 1289-1301. Shafer, G., (1976), A mathematical theory of evidence,
Princeton University Press.
Hardy, G.H., Littlewood, J.E., and Polya, G. (1952), In-
equalities, 2nd edition, Cambridge University Press. Sharkey, A.J.& Sharkey, N.E., (1997), Combining
diverse neural nets, Knowledge Engineering Review
Hansen, L.,K. & Salamon, P. (1990), Neural network
12(3), 231247.
ensembles. IEEE Transactions Pattern Analysis Ma-
chine Intelligence 12(10), 993-1001. Sharkey, A.J.& Sharkey, N.E., (1999.), Combining
Artificial Neural Nets: Ensemble and Modular Multi-
Hinton, G.E.(2002), Training products of experts by
Net Systems, Springer-Verlag, NewYork, Inc.
minimizing contrastive divergence, Neural Computa-
tion 14, 1771-1800. Tumer, K. & Ghosh, J. (1996), Error correlation and
error reduction in ensemble classifiers, Connection
Ho, T.K., Hull ., J.J., Srihari, S., N., (1994), Deci-
Science, 8 (3/4), 385-404.
sion combination in multiple classifier systems, IEEE
Transactions Pattern Analysis Machine Intelligence
16(1), 66-75
324
Combining Classifiers and Learning Mixture-of-Experts
Wolpert, D.,H., (1992), Stacked generalization, Neural Xu, L. Oja, E., & Kultanen, P., (1990), A New
Networks 5(2), 241-260. Curve Detection Method Randomized Hough transform C
(RHT), Pattern Recognition Letters 11, 331-338.
Woods,K., Kegelmeyer,K.P., & Bowyer, K., (1997),
Combination of multiple classifiers using local ac-
curacy estimates, IEEE Transactions Pattern Analysis KEy TERMS
Machine Intelligen 19(4), 405-410.
Conditional Distribution p(y|x): Describes the
Xu, L (2007), A unified perspective and new results
uncertainty that an input x is mapped into an output y
on RHT computing, mixture based learning, and multi-
that simply takes one of several labels. In this case,
learner based problem solving, Pattern Recognition
x is classified into the class label y with a probability
40, 21292153.
p(y|x). Also, y can be a real-valued vector, for which x is
Xu, L (2001), Best harmony, unified RPCL and auto- mapped into y according density distribution p(y|x).
mated model selection for unsupervised and supervised
Classifier Combination: Given a number of clas-
learning on Gaussian mixtures, three-layer nets and
sifiers, each classifies a same input x into a class label,
ME-RBF-SVM Models, International Journal of Neural
and the labels maybe different for different classifiers.
Systems 11(1), 43-69.
We seek a rule M(x) that combines these classifiers as a
Xu, L., (1998), RBF Nets, Mixture Experts, and new one that performs better than anyone of them.
Bayesian Ying-Yang Learning, Neurocomputing 19,
Sum Rule (Bayes Voting): A classifier classifies
223-257.
x to a label y can be regarded as casting one vote to
Xu, L. & Jordan, M.I., (1996), On convergence this label, a simplest combination is to count the votes
properties of the EM algorithm for Gaussian mixtures, received by every candidate label. The j-th classifier
Neural Computation, 8(1), 129-151.
classifies x to a label y with a probability p j ( y | x)
Xu, L., Jordan, M.I., & Hinton, G.E., (1995), An means that one vote is divided to different candidates
Alternative Model for Mixtures of Experts, Advances in fractions. We can sum up:
in Neural Information Processing Systems 7, Cowan,
Tesauro, and Alspector, editors, MIT Press, 633- j
p j ( y | x)
640
to count the votes on a candidate label y , which is
Xu, L., Jordan, M.I., & Hinton, G.E., (1994), A
called Bayes voting since p(y|x) is usually called
Modified Gating Network for the Mixtures of Experts
Bayes posteriori probability.
Architecture, Proc. WCNN94, San Diego, CA, (2)
405-410. Product Rule: When k classifiers {e j ( x)}kj =1 are
Xu, L. & Jordan, M.I., (1993), EM Learning on A mutually independent, a combination is given by
Generalized Finite Mixture Model for Combining Mul-
tiple Classifiers, Proc. of WCNN93, (IV) 227-230.
k
Xu, L., Krzyzak, A. & Sun, C.Y., (1992), Several p( x C y | e j ( x)
methods for combining multiple classifiers and their p( x C y | x) = p( x C y )
j =1
k
applications in handwritten character recognition, p( x C y )
IEEE Transactions on System, Man and Cybernetics j =1
22, 418-435.
Xu, L., Krzyzak, A. & Sun, C.Y., (1991), Associative
Switch for Combining Multiple Classifiers, Proc. of
IJCNN91, July 8-12. Seattle, WA, (I) 43-48.
325
Combining Classifiers and Learning Mixture-of-Experts
326
327
Significant advances in artificial intelligence, including No attempt will be made here to define commonsense
machines that play master level chess, or make medi- rigorously. Intuitively, however, commonsense is
cal diagnoses, highlight an intriguing paradox. While generally meant to include the following capabilities,
systems can compete with highly qualified experts in as defined for any given culture:
many fields, there has been much less progress in con-
structing machines that exhibit simple commonsense, a. knowing the generally known facts about the
the kind expected of any normally intelligent child. As world,
a result, commonsense has been identified as one of the b. knowing, and being able to perform, generally
most difficult and important problems in AI (Doyle, performed behaviors, and to predict their out-
1984; Waltz, 1982). comes,
c. being able to interpret or identify commonly oc-
curring situations in terms of the generally known
BACKGROUND facts i.e. to understand what happens,
d. the ability to relate causes and effects,
The Importance of Commonsense 1 e. the ability to recognize inconsistencies in descrip-
tions of common situations and behaviors and be-
It may be useful to begin by listing a number of reasons tween behaviors and their situational contexts,
why Commonsense is so important: f. the ability to solve everyday problems.
1. Any general natural language processor must In summary, commonsense is the knowledge that
possess the commonsense that is assumed in the any participant in a culture expects any other participant
text. in that culture to possess, as distinct from specialized
2. In building computerized systems, many assump- knowledge that is possessed only by specialists.
tions are made about the way in which they will be The necessary conditions for a formalization to lay
used and the users background knowledge. The claim to representing commonsense are implicit in the
more commonsense that can explicitly be built above definition; a formalism must exhibit at least one
into systems, the less will depend on the implicit of the attributes listed there. Virtually all work in the
concurrence of the designers commonsense with field has attempted to satisfy only some subset of the
that of the user. commonsense criteria.
3. Many expert systems have some commonsense
knowledge built into them, much of it reformulated
time and again for similar systems. It would be COMMONSENSE REPRESENTATION
advantageous if commonsense knowledge could FORMAlISMS
be standardized for use in different systems.
4. Commonsense has a large element that is environ- In AI research, work on common sense is generally
ment and culture specific. A study and formaliza- subsumed under the heading of Knowledge Representa-
tion of commonsense knowledge may permit tion. The objective of this article is to survey the various
people of different cultures to better understand formalisms that have been suggested for representing
one anothers assumptions. commonsense knowledge.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Commonsense Knowledge Representation I
Four major knowledge representation schemes are ness, the physiological evidence of different levels of
discussed in the literature - production rules, semantic the brain and their association with specific functions,
nets, frames, and logic. Production systems are fre- the functional specialization of specific areas of the
quently adopted in building expert systems. Virtually brain, and similar evidence concerning the two sides
all the discussions of commonsense representations, of the brain all support the view of the mind, or self,
however, are in terms of semantic net, frame-like, or as composed of a considerable number of cooperating
logic systems. These schemes are applied within three subagents, to which Minsky (1981) refers as a society
main paradigms for commonsense representation of mind. It is useful to keep this concept in mind while
propositional, truth maintenance, and dispositional studying the variety of representation schemes; it sug-
(see Figure 1). Very briefly, propositional models are gests that a number of such formalisms may coexist in
descriptions of representations of things or concrete any rational agent and little can be gained by attempts
facts. When the knowledge represented is imprecise to choose the right formalism in any general sense.
or variable, propositional formalisms are no longer
sufficient and one needs to consider the beliefs about
the world engendered by the systems current state of PROPOSITIONAL MODELS
knowledge, and to allow for changes in those beliefs
as circumstances dictate; this is the nature of belief or Virtually all the propositional models of commonsense
truth maintenance systems. Finally, when the knowledge knowledge are perceived as consisting of nodes that are
is both imprecise and not factual, but relates rather to associated with words or tokens representing concepts.
feelings, insights and understandings, the dispositional The nodes are hierarchically structured, with lower
representations are evoked. level nodes elaborating or representing instantiations of
Within each representational paradigm, there are a higher-level nodes; the higher-level nodes impart their
number of specific formalisms. Figure 1 indicates the properties to those below them, which are said to inherit
existence of eight different knowledge representation those properties. Thus, all the propositional models
formalisms. Each of these formalisms is presented via are hierarchically structured networks consisting of
discussion of one or more representatives. nodes and arcs joining the nodes. From this point, the
The need for different types of formalisms, the representational structures begin to diverge according
difficulty in representing multiple domain knowledge, to the distribution of information between the arcs and
psychological theories of various levels of conscious- the nodes. At one extreme, nodes are self-contained
knowledge representation
328
Commonsense Knowledge Representation I
descriptions of concepts with only hierarchical rela- In summary, frames, scripts, and semantic nets are
tions between concepts expressed in the arcs; frames logically equivalent. All three are hierarchies consist- C
and scripts are of this nature. At the other extreme, ing of nodes and relations; they are differentiated by
nodes contain only names and the descriptive content the location of information in the network. Scripts
resides in a multiplicity of types of relations, which are further differentiated by the temporal and causal
give meaning to the names. This last form is generally ordering of slots within nodes.
referred to as a semantic net. Between the extremes lie
representations in which the descriptive knowledge is Predicate Calculus
distributed between nodes and relations.
The representations described may be discussed as The first order predicate calculus is a method for
theoretical models or as computer implementations, representing and processing propositional knowledge
which generally derive from one of the theoretical and was designed expressly for that purpose. There
models (see Commonsense Knowledge Representation are many workers in the field who view this formalism
II - Implementation). as the most appropriate for commonsense knowledge
representation.
Semantic Nets In discussing logic as a vehicle for knowledge rep-
resentation, one should distinguish between the use
Semantic nets were developed to represent propositions of logic as the representational formalism, and the use
and their syntactic structure rather than the knowledge of logistic languages for implementation. Logistic lan-
contained in the propositions. A semantic net consists guages can implement any representational formalism,
of triples of nodes, arcs joining nodes, and labels (Rich, logistic or other. Another distinction that needs to be
1983). The words of a proposition are contained in the made is between the use of logic for inference and its
nodes while its syntactic structure is captured by the use for representation (Hayes, 1979). Thus, one might
labeled arcs. apply logical inference to any knowledge representation,
provided that it is amenable to that kind of manipula-
Frames tion; knowledge expressed in predicate calculus is the
most amenable to such manipulation.
Within the community utilizing frames (or schema), A vigorous debates in AI centered on the choice of
it seems to be universally agreed that frames define frames or logic for commonsense knowledge represen-
concepts via the contents of slots, which specify com- tation. In favor of frames were those who attempted
ponents of the concepts. The content of a slot may be an to implement knowledge representation schemes (cf.
instantiation, a default or usual value, a condition, or a Hayes, 1979). The argument is twofold: first, logistic
sub-schema. The last of these slot contents generates formalisms do not have sufficient expressive power
the generally accepted hierarchical nature of concept to adequately represent commonsense knowledge.
representations. This applies particularly to the dynamic adjustment of
what is held to be true as new information is acquired.
Scripts Classical logic does not allow for reinterpretation as
new facts are learned - it is monotonic. Secondly, this
In his seminal paper on frames, Minsky (1975) suggests argument posits that human knowledge representation
that frames are not exactly the same as Schanks (1981) does not follow the rules of formal logic, so that logic
scripts. The major difference is in the concepts repre- is psychologically inadequate in addition to being
sented. Scripts describe temporal sequences of events expressively inadequate.
represented by the sequence of slots in a script. Here The logicians reply to the first argument is that any-
the slot structure is significant. For frames describing thing can be expressed in logic, but this may sometimes
objects the exact order of slots is probably not signifi- be difficult; this school does not agree to the claim
cant. The temporal ordering of slots is necessary in of greater expressive power for the frame paradigm.
many other representations, including those for most Hayes (1979) claimed that most of frames is just
behaviors. a new syntax for parts of first-order logic. Thus, all
the representational formalisms seem to have much in
329
Commonsense Knowledge Representation I
common, even if they are not completely equivalent. 1. A proposition is regarded as a collection of (usu-
The reply to the second argument is that the object of ally implicit) fuzzy constraints.
study is Artificial Intelligence so that the formalisms 2. An explanatory database contains lists of samples
adopted need not resemble human mechanisms. of the subject of the proposition together with
One point of agreement that seems to have emerged the degree to which each predicate or constraint
between the two sides is that the classical predicate is fulfilled. These data are used to compute test
calculus indeed lacks sufficient expressive power. scores of the degree to which each constraint is
Evidence of this is a number of attempts to expand met. These test-scores then become the meaning
the logical formalisms in order to provide sufficient of the fuzzy constraints.
expressiveness, especially by alleviating some of the
restrictions of monotonicity. Circumscription (McCarthy, 1980)
The essence of monotonicity is that if a theory A
is expanded by additional axioms to become B, then Circumscription is a form of non-monotonic reasoning
all the theorems of A are still theorems of B. Thus, (as distinct from non-monotonic logic) which reduces
conclusions in monotonic logic are irreversible and the context of a sentence to anything that is deduc-
this constraint must be relaxed if a logical inference ible from the sentence itself, and no more. Without
system is to be able to reevaluate its conclusions on such a mechanism, any statement may invoke all the
learning new facts. The conclusions of a non-monotonic knowledge the system possesses that is associated with
logic are regarded as beliefs or dispositions based on whatever the topic of the statement is, much of which
what is currently known, and amenable to change if would probably be irrelevant. The question
necessary.
The attempts to overcome the limitations of mono- Crows and canaries are both birds; why is Tweetie
tonicity include non-monotonic reasoning McCarthy afraid of crows?
(1980), non-monotonic logics (McDermott and Doyle,
1980; Reiter 1980), fuzzy logic (Zadeh, 1983), and a could give rise to consideration of the facts that ostriches
functional approach (Levesque, 1984) represented by and penguins are also birds and that ostriches dont fly
the KL-One (Woods, 1983) and KRYPTON languages and put their heads in the sand while penguins swim
(Brachman et al., 1983). It should be noted that non- and eat fish, all of which is irrelevant to the problem
monotonic logic proceeds not by changing the logic in hand. The purpose of circumscription is to limit
representation, but rather by strengthening the reasoning evaluation of problem statements to the facts of the
by which representations are evaluated. statement - what McCarthy describes as the ability to
jump to conclusions.
330
Commonsense Knowledge Representation I
Non-Monotonic Logic (McDermott & establishment of a K-Line between a K-node and some
Doyle, 1980) P-nodes occurs when a memorable mental event takes C
place. Information in the K-Pyramid flows downward,
Non-monotonic logic attempts to formalize the repre- but not upward.
sentation of knowledge in such a way that the models The third structure in Minskys model is the N-Pyra-
maintained on the basis of the premises supplied can mid. Its function is to permit learning to take place, and
change, if necessary, as new premises are added. Thus, it does so by constructing new K-nodes for P.
rather than determining the truth or falsity of statements It may be useful to think of this structure as analogous
as in classical logic, non-monotonic logics hold beliefs to an ivory mandarin ball - spheres within spheres.
based on their current level of knowledge.
Memory Organization Packets: MOPS
(Schank, 1981)
DISPOSITIONAL MODELS
While Minsky presents a general theory of memory
Dispositional models attempt to establish the relation- without reference to the representation of concepts,
ship between knowledge representation and memory. Schanks model is highly dependent on the inter-
To paraphrase Schank (1981), understanding sentences relationship of representations and memory. This is,
involves adding information, generally commonsense, perhaps, the result of the particular domain explored
not explicit in the original sentence. Thus, a memory by Schanktemporal sequences of events.
full of facts about the commonsense world is necessary The scripts, plans, goals, and themes of Schanks
in order to understand language. Furthermore, a model, model are reminiscent of Minskys P, K, and N pyramids,
or set of beliefs is necessary to provide expectations although including an additional level. An attempt to
or explanations of an actors behavior. Adding beliefs specify the relationship more precisely suggests the
to a representation changes the idea of inference from equivalence of scripts with the P-Pyramid, of plans
simply adding information to permit parsing sentences with the K-Pyramid, and of goals with the N-Pyramid.
to integrating data with a given memory model; this The level of themes does not seem to be included in
leads to the study of facts and beliefs in memory. Minskys model, although this may be a reflection of
the society of mind, with each theme representing a
K-Lines (Minsky, 1981) member of the society.
In addition to differentiating levels of description,
Minskys major thesis is that the function of memory Schanks model distinguishes four levels of memory
is to recreate a state of mind. Each memory must em- by the degree of detail represented. These levels are:
body information that can later serve to reassemble
the mechanisms that were active when the memory 1. Event Memory (EM) - specific remembrances of
was formed. particular situations. After a while, the less salient
Memory is posited to consist, at the highest level, of a aspects of an event fade away leaving generalized
number of loosely interconnected specialists - a society events plus the unusual or interesting parts of the
of mind. Each of the specialists, in turn, comprises original event.
three lattices with connections between them. 2. Generalized Event Memory (GEM) - collocations
The most basic lattice comprises mental agents of events whose common features have been
or P-nodes. Some of these agents participate in any abstracted. This is where general information is
memorable event - an experience, an idea, or a problem held about situations that have been experienced
solution - and become associated with that event. Re- numerous times - e.g. dentists waiting rooms.
activation of some of those agents recreates a partial 3. Situational Memory (SM) - contains information
mental state resembling the original. about specific situations - e.g. going to medical
Reactivation of P-nodes and consequent recreation practitioners offices.
of partial mental states is performed by K-Lines at- 4. Intentional Memory (IM) - remembrances of
tached to nodes. The K-Lines originate in K-nodes generalizations at a higher level than SM - e.g.
embedded in a second lattice, the K-Pyramid. The
331
Commonsense Knowledge Representation I
332
Commonsense Knowledge Representation I
333
334
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Commonsense Knowledge Representation II
represent general purpose common sense is primarily 2007). OpenCyc, a freely available version of Cyc may
a function of the generality of commonsense versus be downloaded from http://www.opencyc.org/. C
the specificity of expert systems.
From a positive point of view, one of the major ConceptNet
aims of commonsense systems must be to represent
knowledge in such a way that it can be useful in any ConceptNet (Liu and Singh, 2004) is a commonsense
domain; i.e. when storage strategies cannot be based on knowledge base and natural-language-processing tool-
prior information about the uses to which the knowl- kit that supports many practical textual-reasoning tasks.
edge will be put. Rather than assertions being registered manually as in
This, then, is the major difference between expert Cyc, in ConceptNet they are generated automatically
systems and commonsense systems; while the former from 700,000 sentences of the Open Mind Common
deal mainly with the particular knowledge and be- Sense Project (Singh, 2002) provided by over 14,000
haviors of a strictly bounded activity, common sense authors., There is a concise version with 200,000 asser-
must deal with all areas of knowledge and behavior not tions and a full version of 1.6 million assertions.
specifically claimed by a body of experts. An expert ConceptNet is constructed as a semantic net.
system that knows about internal medicine does not A freely available version of the system may be
know about skin diseases or toxicology and certainly downloaded at http://web.media.mit.edu/~hugo/
not about drilling rigs or coal mining. Common sense
conceptnet/#download.
systems, on the other hand, should know about colds and
headaches and cars and the weather and supermarkets
WordNet
and restaurants and chalk and cheese and sealing wax
and cabbages and kings (Carroll, 1872).
WordNet (Felbaum, 1998) is described as follows
(WordNet, 2007): Nouns, verbs, adjectives and ad-
verbs are grouped into sets of cognitive synonyms
COMMONSENSE KNOWlEDGE BASE
(synsets), each expressing a distinct concept. Synsets
IMPLEMENTATIONS are interlinked by means of conceptual-semantic and
lexical relations. The resulting network of meaning-
Given the importance of commonsense knowledge, fully related words and concepts can be navigated
and because such knowledge is necessary for a wide with the browser. WordNets structure makes it a
range of applications, a number of efforts have been useful tool for computational linguistics and natural
made to construct universally applicable commonsense language processing.
knowledge bases. Three of the most prominent are Cyc, WordNet contains about 155,000 words, 118,000
ConceptNet, and WordNet. synsets, and 207,000 word-sense pairs.
WordNet is available for free download at http://
Cyc wordnet.princeton.edu/obtain.
335
Commonsense Knowledge Representation II
336
337
Dimitris Rigas
University of Bradford, UK
INTRODUCTION BACKGROUND
In all walks of life individuals are involved in a cumula- Over the years, research into note-taking has been car-
tive and incremental process of knowledge acquisition. ried out intensively. The paper aims to comparatively
This involves the accessing, processing and understand- analyse the various note-taking techniques, providing
ing of information which can be gained through many an explanation into the effectiveness and simplicity.
different forms. These include, deliberate means by The relationship between cognition and note-taking
picking up a book or passive by listening to someone. is studied presenting a breakdown into the processes
The content of knowledge is translated by individuals involved. Due to the vast amount of research into
and often recorded by the skill of note-taking, which cognition its relevance is imperative. Although, great
differs in method from one person to another. This research within both areas has been undertaken to design
article presents an investigation into the techniques to an electronic note-taking tool, an analysis into existing
take notes including the most popular Cornell method. applications has also been conducted, with Microsoft
A comparative analysis with the Outlining and Map- OneNote being the most favourable. This is an anno-
ping methods are carried out stating strengths and tation application that has no predefined technique to
weaknesses of each in terms of simplicity, usefulness record notes or annotations and saves handwriting as an
and effectiveness. The processes of developing such image. Throughout the literature many authors work
skills are not easy or straightforward and performance contributing to this study will be presented.
is much influenced by cognition. Therefore, such as-
sociations regarding cognitive conceptions involve
the exploration into note-taking processes encoding COMPARATIVE STUDY
and storage, attention and concentration, memory and
other stimuli factors such as multimedia. This article presents an insight into note-taking, the vari-
The social changes within education from the tradi- ous methods, cognitive psychology and the paradigm
tional manner of study to electronic are being adapted shift from traditional manner of study to electronic.
by institutes. This change varies from computerising
a sub-component of learning to simulating an entire Note-Taking Techniques
lecture environment. This has enabled students to
explore academia more conveniently however, is still Theoretically, note-taking is perceived as the transfer
arguable about its feasibility. The article discusses the of information from one mind to the other. Today,
underlying pedagogical principles, deriving instructions the most popular note-taking technique is the Cornell
for the development of an e-learning environment. note-taking method, also referred to as Do-it-right-
Furthermore, embarking on Tablet PCs to replace the in-the-first-place. This note-taking method was de-
blackboard in combination with annotation applications veloped over 40 years ago by Professor Walter Pauk
is investigated. Semantic analysis into the paradigm at the Cornell University (Pauk & Owens, 2005). The
shift in e-learning and knowledge management replac- main purpose of developing this method was to assist
ing classroom interaction presents its potential in the students to organise their notes in a meaningful man-
learning domain. The article concludes with ideas for ner. This technique involves a systematic approach
the design and development of an electronic note-tak- for arranging and condensing notes without the need
ing platform. to do multiple recopying. The method is simple and
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Comparative Study on E-Note-Taking
effective specifying three areas only. Area A keywords, workshop, experiments involving 70 student partici-
Area B notes-taking and Area C summary. pants revealed this note-taking method is straightfor-
Area A is assigned to keywords or phrases, which ward (Anderson-Rowland, Aroz, Blaisdell, Cosgrove,
are formed by students towards the end of the lecture. Fussell, McCartney & Reyes, 1996). The authors
Over the years an alternative has been questions aiding Anderson-Rowland et al. (1996), state this method
recall over recognition. These cues are known to assist enables the organisation of notes, entails interaction
memory and pose as a reminder alongside helping to and concentration therefore; a scheduled review can
identify relationships, also referred as the Q-System be conducted immediately highlighting keywords.
(Pauk & Owens, 2005). Area B remains for the record- Moreover, as the students can summarise content this
ing of notes during lecture. Here the student attempts facilitates learning by increasing understanding. Ad-
to capture as much information as possible. Finally, ditionally, the strength of the technique is the ability
Area C is left for the student to summarise the notes to take notes instantaneously, saving time and effort
and reflect upon the main ideas of the lecture (Pauk due to its systematic structure.
& Owens, 2005). In comparison, the Outlining method (see Figure
The main advantage of this technique is its clear-cut 2) has a more spatial and meaningful layout, implicitly
and organised structure. This technique is also suit- encoding conceptual relations. For example, indenta-
able for technical modules including Mathematics and tion may imply grouping and proximity conceptual
Physics and non-technical modules such as English and closeness (Ward & Tatsukawa, 2003).
History. During an engineering and applied sciences The method consists of dashes or indentation and is
not suitable for technical modules such as Mathematics
or Physics. This technique requires indentation with
Figure 1. The Cornell note-taking method (adapted spaces towards the right for specific facts. Relationships
from Pauk & Owens, 2005) are represented through indentation. Note-takers are
required to specify points in an organised format arrang-
ing a pattern and sequence built by space indentation.
Important points are separated and kept furthest to the
left, bringing in more specific points towards the right.
Distance from the major point indicates the level of
importance. The main advantage of this technique is
the neatly organised structure allowing reviewing to be
conducted without any difficulty. However, the Outlin-
ing method requires the students full concentration to
achieve maximum organisation of notes. Consequently,
the technique is not appropriate if the lecturer is going
at a fast pace. The method has also been disapproved
by Fox (1959) because of its confusing organisational
structure. This is mainly due to the arrangement of
numerals, capitalised letters and so forth.
Figure 2. An example of the outlining note-taking In contrast to the Cornell and Outlining methods,
method the Mapping method (see Figure 3) is a graphical
representation of the lecture content. Students are
stimulated to visually determine links illustrating re-
lationships between facts and concepts. Concept maps
enable brainstorming, breakdown and representation of
complex scenarios, identifying and providing solutions
for flaws and summarising information. To enhance
accuracy students must actively participate and initiate
critical thinking. However, a drawback arguably, has
been the structural organisation and relationship of
338
A Comparative Study on E-Note-Taking
nodes especially, questioning the hierarchical structure e-learning reduces teaching time, increase proficiency
(Hibberd, Jones, & Morris, 2002). Alternative structures and improves retention. However, this is not always C
have been discussed including chain, spider maps and correct, as in one study, results of online lecture notes
networks (Derbentseva & Safayeni, 2004). showed students performed weaker (Barnett, 2003).
From students perspective, they continuously pursue
E-Learning communication and support to influence their learning.
They also welcome constructive feedback (Mason,
E-learning, also known as distance education employs 2001).
numerous technological devices. Today, educational The major challenges faced by the e-learning society
institutes are well-known to deliver academia over the are the culture clash and lack of motivation towards
Internet. The growth of the internet is ever-increasing an electronic learning environment. People are just
with great potential for not only learning material but not prepared to accept the change. A major factor
also as a collaborative learning environment. The major determining successful outcomes of such systems is
difference between traditional learning and electronic the level of interactivity provided. Other significant
learning is the mode of instruction by which informa- pedagogical principles include the level of control
tion is communicated. users feel, in comparison to traditional manner of
To have a successful e-learning system all sub-com- learning where the lecturer has full control over the
ponents and interrelated processes must be considered, lecture environment. Moreover, the development of
because if one process fails then the entire system fails. a suitable interface is necessary considering usability
Therefore, it is necessary to derive a series of peda- factors such as efficiency, error rates, memoryability,
gogical principles. Underlying pedagogical principles learnability and subject satisfaction. As a tutor, the plan-
include considering the users behaviour towards the ning behind the course structure is important, paying
system, as it is an isolated activity resulting in users close attention towards the structure of content. Tutors
becoming frustrated. As the internet has a great deal of must ensure students receive feedback in an appropriate
knowledge it can be presented in a bias manner provid- time maintaining time management. Overall, before
ing users with partial information. Considerations for considering the design and development of an e-learn-
the environment and the user actions to be performed ing environment the main factors to consider include
to achieve a specific goal must be outlined. Moreover, the learners, content, technology, teaching techniques,
users interpersonal skills including their attitudes, and so forth (Hamid, 2002).
perceptions and behaviour are central to influencing Technologies associated with e-learning have
the effectiveness of the system. It has been learned increased usage of bandwidth and internet access.
Presently, there are two key technologies used to de-
339
A Comparative Study on E-Note-Taking
liver e-learning, scheduled delivery platforms and class, students spend approximately 80% of their time
on-demand delivery platforms. Scheduled delivery listening to the lecture (Armbruster, 2000) and the reason
platforms include multicasts, virtual libraries and remote students take notes is because of their usefulness towards
laboratories, yet the constraints of these include time learning and due to social pressures (Tran & Lawson,
and area restraints. To further improve these systems, 2001). Students vary their note-taking technique ac-
on-demand delivery platforms provide 24 hour support cording to personal experience, existing knowledge, and
maintaining flexibility of learning (Hamilton, Richards appropriateness to the lecture format. Problems within
& Sharp, 2001). the classroom are caused due to students inability to
The efficacy of electronic notes in one particular copy information presented by the lecturer (Komagata,
study showed improvement in students understanding Ohira, Kurakawa & Nakakoji, 2001). Whilst engaging
of lecture content, resulting in an overwhelming 96% in reading or listening metacognition, the human mind
users feeling e-notes are an effective tool (Wirth, 2003). has a tendency to wander off thinking about thoughts
Users are able to annotate, collaborate and discuss other than what is being taught or learnt.
subject content. This increases learning efficiency by During the learning process, effects on learning
allowing the user to engage into the text, improving occur during the encoding and storage processes.
their comprehension and supporting memorisation. The encoding stage is when students attend lecture
Recall of specific details is also enhanced (Wolfe, and record lecture notes whereas, the storage phase is
2000). Numerous annotation applications have been when students review their notes. To achieve optimum
introduced including Microsoft Word and OneNote performance, both the encoding and storage processes
primarily concerned with annotation. SharePoint should be combined. The reviewing of lecture notes
allows manipulation editing and annotation simultane- is significantly important especially when conducted
ously as well as Re:Mark. Microsoft OneNote as an in close proximity to an exam. Additionally, students
annotation tool is more popular however, the end-user should also monitor their individual progress and un-
is required to possess a copy of it in order to use it, derstanding of information before an exam. This can be
unlike Microsoft Journal which can be exported as a achieved by carrying out self-testing. This method can
.mhtml file (Cicchino, 2003). Additionally, handwriting also be encouraged by the lecturer providing relevant
is translated as an image rather then text that can be material for example, past exam papers.
explored (McCall, 2005). The benefits include ability Students with greater memory-ability benefit from
to record a lecture and annotate, referencing to specific note-taking with studies finding students with lower
points within the recording. These can then be used memory-ability record a lower number of words and
later (McCall, 2005). complete ideas (Kiewra & Benton, 1988). This is due
Tablet PCs are being used to replace the blackboard to variations within the working memory as information
presenting course material through a data projector. is stored and manipulated there. Therefore, the ability
Many academic institutes are adopting these as a teach- of note-takers to pick out relevant details, maintain
ing tool (Clark, 2004) or are providing students with knowledge and combine new knowledge to existing
them as teaching devices (Lowe, 2004). The major knowledge are essential factors. Human memory can
strength of a Tablet PC is its interactivity between be broken down into three types; Short-term; long-term
face-to-face instructions (Cicchino, 2003). However, a and sensory memory. Short-term memory allows in-
study that provided students with pen-based computers formation to be stored for a short period before being
for the purpose of taking notes presented the project forgotten or transferred to long-term memory. Long-
as unsuccessful because the devices where unsuitable term memory endures information for a longer period
in terms of performance, poor resolution, and network into the memory circuit. The brain circuit includes the
factors (Truong, Abowd & Brotherton, 1999). cerebral cortex and hippocampus. Finally, the sensory
memory is the initial storage of information lasting an
Cognition instant, consisting of visual and auditory memories.
Information here is typically gathered through the sight
The most common mode of instruction in higher edu- and sound senses.
cation is lectures where attending students take notes The incorporation of multimedia within education
based on the lecture content (Tran & Lawson, 2001). In can enhance the learning experience in a number of
340
A Comparative Study on E-Note-Taking
341
A Comparative Study on E-Note-Taking
tion during concept map construction. Concept Maps: facilitators of student learner. Contemporary Educa-
Theory, Methodology, Technology, Proceedings of the tional Psychology 29, 447-461.
First International Conference on Concept Mapping,
Tran, T. A. T., & Lawson, M. (2001). Students percep-
Spain.
tion for reviewing lecture notes, International Education
Dryden, G., & Vos, J. (1999). The Learning Revolution, Journal, 2. No.4. Educational Research Conference
The Learning Web, Torrance, CA, USA. 2001 Special Issue.
Edward W. Fox. (1959). Syllabus for History, Ithaca, Truong, K. N., Abowd, G. D., & Brotherton, J.A. (1999).
NY: Cornell University Press. Personalizing the capture of public experiences. In:
UIST, Ashville, NC, 121-130.
Hamid, A. A. (2002). e-Learning Is it the e or the
learning that matters?. Internet and Higher Education, Van Meter, P., Yokoi, L., & Pressley, M. (1994). Col-
4, 311-316 lege Students theory of note-taking derived from their
perceptions of note-taking. Journal of Educational
Hamilton, R., Richards, C., & Sharp, C. (2001). An
Psychology 86, 323-338.
Examination of E-Learning and E-Books. available
at: www.dcs.napier.ac.uk. Ward, N., Tatsukawa, H. (2003). A tool for taking class
notes, University of Tokyo, Tokyo, Japan
Hibberd, R., Jones, A., & Morris, E. (2002). The use
of Concept Mapping as a Means to Promote & Assess Wirth, M. A. (2003). E-notes: Using Electronic Lecture
Knowledge Acquisition, CALRG Report No. 202. Notes to Support Active Learning in Computer Science.
The SIGCSE Bulletin, 35, No.2, 57-60.
Kiewra, K. A., & Benton, S. L. (1988). The relationship
between information-processing ability and notetaking. Wolfe, J. L. (2000). Effects of Annotations on Student
Contemporary Educational Psychology, 13, 33-44. Readers and Writers, Digital Libraries, San Antonio,
TX, ACM.
Komagata, N., Ohira, M., Kurakawa, K., Nakakoji, K.
(2001). A cognitive system that supports note-taking
in a real time class lecture. In 9th Human Interface
Workshop Notes (SIG-HI-94-7). Information Process- KEy TERMS
ing Society of Japan, 35-40.
Annotation: The activity of briefly describing or
Lowe, P. (March 2004). Bentley College students
explaining information. It can also involve summaris-
evaluate Tablet PCs. Retrieved 3/10/2004 from http://
ing or evaluating content.
www.hp.com/hpinfo/newsroom/feature_stories/2004/
04bentley.html. Cognitive Psychology: A study into cognition such
as mental processes describing human behaviour, un-
Mason, R. (2001). Time is the New Distance? An Inau-
derstanding perceptions, examining memory, attention
gural Lecture, The Open University, Milton Keynes.
span, concentration, and forgetfulness. The purpose
McCall, K. (2005). Digital Tools for the Digital of understanding humans and the way they mentally
Classroom: Digital Pens, Tablet Mice, and Tablet function.
PCs. 20th Annual Conference on Distance Teaching
Information Processing: The ability to capture,
and Learning.
store and manipulate information. This consists of
Monty, M.L. (1990). Issues for Supporting notetaking two main processes; encoding and storage. Students
and note using in the computer environment. Ph.D. record notes during the encoding stage and conduct
Thesis, University of California, San Diego. reviewing thereafter, in the storage phase.
Pauk, W., & Ross, O. (2005). How to study in college. Metacognition: The ability and skills of learners to
Eighth Ed. Houghton Mifflin, 208. be aware of and monitor their learning processes.
Titsworth, S. B., & Kiewra, A. A. (2004). Spoken Multimodality: An electronic system that enhances
organizational lecture cues and student notetaking as interactivity by amalgamating audio, visual and speech
metaphors.
342
A Comparative Study on E-Note-Taking
343
344
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Comparison of Cooling Schedules
function of random k-neighbour feasible solutions, values of temperature T, and a finite number L of state
sj = g(si, random[0,1[). transitions for each temperature value. To achieve this C
4. If the new solution sj is better than the current aim, a cooling schedule must be specified.
one, f(sj) < f(si), then sj is set as current solution, The following cooling schedule, frequently used in
si = sj. Otherwise, if the literature, was proposed by Kirkpatrick, Gelatt and
Vecchi (1983), and it consists of three parameters:
()
f (s i ) f s j
e T > random[0,1[
Initial temperature, T0. The initial value of tem-
perature must be high enough so that any new
sj is equally accepted as current solution, si = sj. solution generated in a state transition should be
5. If the number of executed state transitions (steps accepted with a certain probability close to 1.
3 and 4) for the current value of temperature is Temperature decrease function. Generally, an
equal to L, then the temperature T is decreased. exponential decrease function is used, such as
6. If there are some k-neighbour feasible solutions Tk = T0 a k , where a is a constant smaller than
near to the current one that have not been processed the unit. Usual values of a fluctuate between 0.8
yet, steps 3 to 5 must be repeated. The algorithm and 0.99.
ends in case the set of k-neighbour solutions near Number of state transitions, L, for each tempera-
to the current one has been processed completely ture value. Intuitively, the number of transitions
with a probability close to 1 without obtaining any for each temperature must be high enough so that,
improvement in the quality of the solutions. if no solution changes were accepted, the whole
set of k-neighbour feasible solutions near to the
The most important feature of the simulated an- current one could be gone round with a probability
nealing algorithm is that, besides accepting transitions close to 1.
that imply an improvement in the solution cost, it also
allows to accept a decreasing number of transitions The initial temperature, T0, and the number of state
that mean a quality loss of the solution. transitions, L, can be easily obtained. On the other hand,
The simulated annealing algorithm converges the temperature decrease function has been studied in
asymptotically towards the set of global optimal solu- numerous research works (Laarhoven, P.J.M. Van &
tions of the problem. E. Aarts and J. Korst provide Aarts, E.H.L., 1987), (Dowsland, K.A., 2001), (Luke,
a complete proof in their book Simulated Annealing B.T., 2005), (Locatelli, M., 2000).
and Boltzmann Machines: A Stochastic Approach to
Combinatorial Optimization and Neural Computing
(1989). Essentially, the convergence condition towards MAIN FOCUS OF THE CHAPTER
global optimum sets that temperature T of the system
must be decreased logarithmically according to the In this section nine different cooling schedules used in
equation: the comparison of the simulated annealing algorithm
are described. They all consist of, at least, three pa-
T0 rameters: initial temperature T0, temperature decrease
Tk =
1 + Log(1 + k ) (1) function, and number of state transitions L for each
temperature.
where k = 0,1,...,n indicates the temperature cycle.
However, this function of system cooling requires a Multiplicative Monotonic Cooling
prohibitive computing time, so it is necessary to con-
sider faster methods of temperature decrease. In the multiplicative monotonic cooling, the system
temperature T at cycle k is computed multiplying the
Cooling Schedules initial temperature T0 by a factor that decreases with
respect to cycle k. Four variants are considered:
A practical simulated annealing implementation
requires generating a finite sequence of decreasing
345
A Comparison of Cooling Schedules
Exponential multiplicative cooling (Figure 1- Linear multiplicative cooling (Figure 1-C). The
A), proposed by Kirkpatrick, Gelatt and Vecchi temperature decrease is made multiplying the
(1983), and used as reference in the comparison initial temperature T0 by a factor that decreases in
among the different cooling criteria. The tem- inverse proportion to the temperature cycle k:
perature decrease is made multiplying the initial
temperature T0 by a factor that decreases expo-
nentially with respect to temperature cycle k: T0
Tk = (a > 0)
1 + ak (4)
Tk = T0 a k ( 0.8 a 0.9 )
(2) Quadratic multiplicative cooling (Figure 1-D).
The temperature decrease is made multiplying the
Logarithmical multiplicative cooling (figure 1-B), initial temperature T0 by a factor that decreases in
based on the asymptotical convergence condition inverse proportion to the square of temperature
of simulated annealing (Aarts, E.H.L. & Korst, cycle k:
J., 1989), but incorporating a factor a of cooling
speeding-up that makes possible its use in practice.
The temperature decrease is made multiplying the T0
Tk = (a > 0)
initial temperature T0 by a factor that decreases 1 + ak 2 (5)
in inverse proportion to the natural logarithm of
temperature cycle k: Additive Monotonic Cooling
Figure 1. Multiplicative cooling curves: (A) Exponential, (B) logarithmical, (C) linear, (D) quadratic
A B
T T
T0 T0
Tn Tn
0 n k 0 n k
C D
T T
T0 T0
Tn Tn
0 n k 0 n k
346
A Comparison of Cooling Schedules
Tn a term that decreases with respect to cycle k. Four final temperature Tn a term that decreases in
variants based on the formulae proposed by B. T. Luke inverse proportion to the e number raised to the C
(2005) are considered: power of temperature cycle k:
Figure 2: Additive cooling curves: (A) linear, (B) quadratic, (C) exponential, (D) trigonometric.
A B
T T
T0 T0
Tn Tn
0 n k 0 n k
C D
T T
T0 T0
Tn Tn
0 n k 0 n k
347
A Comparison of Cooling Schedules
Figure 3. Curve of non-monotonic adaptive cooling different instances of the euclidean TSP. The first one
obtains a tour of 47 European cities and the second
T
one a tour of 80 cities.
T0
Quadratic Assignment Problem. The Quadratic
Assignment Problem, QAP, consists of finding the
optimal location of n workshops in p available places
Tn ( p n ), considering that between each two shops a
k
0 n specific amount of goods must be transported with a
cost per unit that is different depending on where the
shops are. The objective is minimizing the total cost of
f(si), and the best objective achieved until that moment goods transport among the workshops. The objective
by the algorithm, noted f*: function f to minimize is given by the expression:
n 1 n
f (s ) f *
c(x i , x j ) q(i, j),
T = mTk = 1 + i Tk f (s) =
f (si ) i =1 j = i +1
(10)
where each variable xi means the place in which
Note that the inequality 1 m < 2 is verified. This workshop i is located, c(xi,xj) is the cost per unit of
factor m means that the greater the distance between goods transport between the places where shops i and
current solution and best achieved solution is, the greater j are, and q(i,j) is the amount of goods that must be
the temperature is, and consequently the allowed energy transported between these shops. The tests have been
hops. This criterion is a variant of the one proposed by made using two different instances of the QAP: The
M. Locatelli (2000), and it can be used in combination first one with 47 workshops to be located in 47 Euro-
with any of the former criteria to compute Tk. In the pean cities, and the second one with 80 workshops to
comparison, the standard exponential multiplicative be located in 80 cities.
cooling has been used for this purpose. So the cooling Selection of parameters
curve is characterized by a fluctuant random behaviour For each problem instance, all variants of the algo-
comprised between the exponential curve defined by rithm use the same values of initial temperature T0 and
Tk and its double value 2Tk (Figure 3). number L of state transitions for each temperature. The
initial temperature T0 must be high enough to accept
Combinatorial Optimization Problems any state transition to a worse solution. The number L
Used in the Comparison of state transitions must guarantee with a probability
close to 1 that, if no solution changes are accepted,
Travelling Salesman Problem. The Travelling Sales- any k-neighbour solution near to the current one could
man Problem, TSP, consists of finding the shortest be process.
cyclic path to travel round n cities so that each city is In order to determine the other temperature decrease
visited only once. The objective function f to minimize parameters of each cooling schedule under similar
is given by the expression: conditions of execution time, we consider the mean
final temperature T and the temperature standard
error of the exponential multiplicative cooling of
n Kirkpatrick, Gelatt and Vecchi with decreasing factor
( )
f (s) = d x i , x (i mod n )+1 , a = 0.95. The objective is to determine the temperature
i =1 decrease parameters in such a way that a temperature
in the interval [T , T + ] is reached in the same
where each variable xi means the city that is visited at
number of cycles as the exponential multiplicative
position i of the tour, and d(xi,xj) is the distance between
cooling. We distinguish three cases:
the cities xi and xj. The tests have been made using two
348
A Comparison of Cooling Schedules
Multiplicative monotonic cooling. The decrease Non-monotonic adaptive cooling. As the adap-
factor a that appears in the temperature decrease tive cooling is combined in the tests with the C
equations must allow to reach the temperature exponential multiplicative cooling, the decrease
T + , or highest temperature from which the factor a that appears in the temperature decrease
end of the algorithm is very probable, in the same equation must be also a = 0.95.
number n of cycles as the exponential multiplica-
tive cooling. Knowing that for this cooling the Analysis of Results
number n of cycles is:
Table 1 shows the cooling parameters used on each
T + instance of the TSP and QAP problems.
Log
n T0 For each instance of the problems 100 runs have
T + = T0 (0.95) n = been made with the nine cooling schedules, computing
Log (0.95)
minimum, maximum and average values, and standard
(11) error both of the objective function and of the number
of iterations. For limited space reasons we only provide
it results for the logarithmic multiplicative cool- results for the QAP 80-workshops instance (Table 2).
ing: Local search results are also included as reference.
Considering both objective quality and number of it-
T0 erations, we can conclude that the best cooling schedule
1
T0
T+= a = T+ we have studied is the non-monotonic adaptive cooling
1 + aLog(1 + n ) Log(1 + n ) schedule based on the one proposed by M. Locatelli
(2000), although without significant differences with
(12) respect to the exponential multiplicative and quadratic
multiplicative cooling schedules.
for the linear multiplicative cooling:
T0 FUTURE TRENDS
1
T0
T+= a = T+
1 + an n Complying with the former results we propose some
(13) research continuation lines on simulated annealing
cooling schedules:
and for the quadratic multiplicative cooling:
Analyse the influence of the initial temperature in
the performance of the algorithm. An interesting
T0
1 idea could be valuing the algorithm behaviour
T0
T+= a = T+ (14) when the temperature is initialized with a percent-
1 + an 2 n2 age between 10% and 90% of the usual estimated
value for T0.
Additive monotonic cooling. Final temperature is Determine an optimal monotonic temperature
Tn = T , and number n of temperature cycles decrease curve, based on the exponential multipli-
is equal to the corresponding number of cycles cative cooling. A possibility could be changing the
of the exponential multiplicative cooling for that parameter a dynamically in the temperature decrease
final temperature, that is: equation, with lower initial values ( a = 0.8 ) and
final values closer to 1 ( a = 0.9 ), but achieving an
T average number of temperature cycles equal to the
Log
n T0 standard case with constant parameter a = 0.95.
T = T0 (0.95) n = Study new non-monotonic temperature decrease
Log (0.95)
methods, combined with the exponential mul-
(15) tiplicative cooling. These methods could be
349
A Comparison of Cooling Schedules
350
A Comparison of Cooling Schedules
351
A Comparison of Cooling Schedules
by Fast Computing Machines. Journal of Chemical of organisms grouped into populations that evolve in
Physics, 21: 1087-1092. time by means of two basic techniques: crossover and
mutation. Evolutionary Algorithms (EAs) are especial
Moscato, P. (1999): Memetic Algorithms: A Short In-
genetic algorithms that only use mutation as organism
troduction. In D. Corne, M. Dorigo & F. Glover (eds.),
generation technique.
New Ideas in Optimization, 219-234. McGraw-Hill,
Maidenhead, Berkshire, England, UK. Local Search: Local search (LS) is a metaheuristic
or general class of approximate optimization algorithms,
Toda, M., Kubo, R. & Sait, N. (1983): Statistical
based on the deterministic heuristic search technique
Physics. Springer-Verlag, Berlin, 1983.
called hill-climbing.
Memetic Algorithms: Memetic Algorithms (MAs)
KEy TERMS are optimization techniques based on the synergistic
combination of ideas taken from other two metaheuris-
Combinatorial Optimization: Area of the opti- tics: genetic algorithms and local search.
mization theory whose main aim is the analysis and
the algorithmic solving of constrained optimization Simulated Annealing: Simulated Annealing (SA)
problems with discrete variables. is a variant of the metaheuristic of local search that in-
corporates a stochastic criterion of acceptance of worse
Cooling Schedule: Temperature control method in quality solutions, in order to prevent the algorithm from
the simulated annealing algorithm. It must specify the being prematurely trapped in local optima.
initial temperature T0, the finite sequence of decreasing
values of temperature, and the finite number L of state Taboo Search: Taboo Search (TS) is a metaheuristic
transitions for each temperature value. superimposed on another heuristic (usually local search
or simulated annealing) whose aim is to avoid search
Genetic and Evolutionary Algorithms: Genetic cycles by forbidding or penalizing moves which take
Algorithms (GAs) are approximate optimization al- the solution to points previously visited in the solu-
gorithms inspired on genetics and the evolution tion space.
theory. The search space of solutions is seen as a set
352
353
Peter M. A. Sloot
Section Computational Science, The University of Amsterdam, The Netherlands
In recent years, the notion of complex systems proved The concept of complex systems (CSs) emerged simul-
to be a very useful concept to define, describe, and taneously and often independently in various scientific
study various natural phenomena observed in a vast disciplines (Fishwick, 2007, Bak, 1996, Resnick, 1997).
number of scientific disciplines. Examples of scientific This could be interpreted as an indication of their uni-
disciplines that highly benefit from this concept range versality. Despite the diversity of those fields, there
from physics, mathematics, and computer science exist a number of common features within all complex
through biology and medicine as well as economy, to systems. Typically a complex system consist of a vast
social sciences and psychology. Various techniques were number of simple and locally operating parts, which are
developed to describe natural phenomena observed in mutually interacting and producing a global complex
these complex systems. Among these are artificial life, response. Self-organization (Bak, 1996) and emergence,
evolutionary computation, swarm intelligence, neural often observed within complex systems, are driven by
networks, parallel computing, cellular automata, and dissipation of energy and/or information.
many others. In this text, we focus our attention to one Self-organization can be easily explained with ant-
of them, i.e. cellular automata. colony behavior studies where a vast number of identi-
We present a truly discrete modelling universe, cal processes, called ants, locally interact by physical
discrete in time, space, and state: Cellular Automata contact or by using pheromone marked traces. There
(CAs) (Sloot & Hoekstra, 2007, Kroc, 2007, Sloot, is no leader providing every ant with information or
Chopard & Hoekstra, 2004). It is good to emphasize instructions what it should do. Despite the lack of such
the importance of CAs in solving certain classes of a leader or a hierarchy of leaders, ants are able to build
problems, which are not tractable by other techniques. complicated ant-colonies, feed their larvae, protect the
CAs, despite theirs simplicity, are able to describe and colony, fight against other colonies, etc. All this is done
reproduce many complex phenomena that are closely automatically through a set of simple local interactions
related to processes such as self-organization and among the ants. It is well known that ants are respond-
emergence, which are often observed within the above ing on each stimuli by one out of 20 to 40 (depending
mentioned scientific disciplines. on ant species) reactions, these are enough to produce
the observed complexity.
Emergence is defined as the occurrence of new
BACKGROUND processes operating at a higher level of abstraction then
is the level at which the local rules operate. Each level
We briefly explain the idea of complex systems and usually has its own local rules different from rules op-
cellular automata and provide references to a number erating at other levels. An emergent, like an ant-colony,
of essential publications in the field. is a product of the process of emergence. There can
be a whole hierarchy of emergents, e.g. as in the hu-
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Complex Systems Modeling by Cellular Automata
354
Complex Systems Modeling by Cellular Automata
Figure 1. Four types of neighbourhood is shown on the lattice of 5 x 5 cells: ( from left) the von Neumann with
r =1, and r =2, the Moore with r =1, and finally a random one C
where i represents the x coordinate, j represents y co- recrystallization, laser dynamics, traffic flow, escape
ordinate of the cell, and f the local rule. The updated and pedestrian behaviour, voting processes, self-rep-
cell has coordinates [i,j]. Figure 1 shows a 5x5 two- lication, self-organization, earthquakes, volcano activ-
dimensional CA with neighbourhoods having various ity, secure coding of information and cryptography,
radiuses. immune systems, living cells and tissue behaviour,
morphological development, ecosystems, and many
Modelling other natural phenomena (Sloot, Chopard & Hoekstra,
2004, Kroc, 2007, Illachinski, 2001). CAs were first
Computational modeling is defined as a mathematical, used in the modelling of excitable media, such as heart
numerical and/or computational description of a natu- tissue. CAs often outperforms other methods as, e.g.,
rally observed phenomenon. It is essential in situations the Monte-Carlo method, especially for highly dissipa-
where the observed phenomena are not tractable by tive systems. The main reason why CAs represents the
analytical means. Results are often validated against best choice in modelling of many naturally observed
analytical solutions in special or simplified cases. Its complex phenomena is because CAs are defined above
importance has been shown in physics and chemistry truly spatio-temporally discretized worlds. The inher-
and is continuously increasing in new fields such as ent CA properties brings new qualities in models that
biology, medicine, sociology, and psychology. are not principally achievable by other computational
techniques.
An example of an advanced CA method is the
CEllUlAR AUTOMATA MODEllING OF Lattice Boltzmann method consisting of a triangular
COMPLEX SYSTEMS network of vertices interconnected by edges where
generalized liquid particles move and undergo colli-
There is a constant influx of new ideas and approaches sions according a collision table. A model of a gas is
enriching the CA method. Within CA modelling of created where conservation of mass, momentum and
complex systems, there are distinct streams of research energy during collisions are enforced, which produce
and their applications in various disciplines, these are a fully discrete and simplified, yet physically correct
briefly discussed in this section. micro dynamics. When operated in the right limits, they
Classical cellular automata, with a regular lattice reproduce the incompressible Navier-Stokes equations
of cells, are used to model ferromagnetic and anti-fer- and therefore are a model for fluid dynamics. Averaged
romagnetic materials, solidification, static and dynamic quantities resulting from such simulations correspond
355
Complex Systems Modeling by Cellular Automata
to solutions of the Navier-Stokes equations (Sloot & role of corals in marine ecosystems and, e.g., its relation
Hoekstra, 2007, Rivet & Boon, 2001). to global climate changes. Simulation of growth and
Classical CAs, using lattices, have many advantages branching of a coral involves multiphysics processes
over other approaches but some known disadvantages such as, nutrient diffusion, fluid flow, light absorption
have to be mentioned. One of the disadvantages of CAs by the zooxanthele that live in symbiosis with the coral
could be in the use of discrete variables. This restric- polyps, as well as mechanical stress.
tion is by some authors removed by use of continuous It is demonstrated that nutrient gradients determine
variables, leading to generalized CAs. The biggest the morphogenesis of branching of phototropic corals.
disadvantage of classical CAs is often found in the In this specific case, we deal with diffusion-limited
restricted topology of the lattice. Classical regular processes fully determining the morphological shape of
lattices fail to reproduce properties of many naturally the growing corals. It is known from tank experiments
observed phenomena. What led to the following de- and simulation studies that those diffusion dominant
velopment in CAs. regions operate for relatively high flow velocities. It has
Generalized cellular automata, Darabos, Giacobini been demonstrated that simulated coral morphologies
& Tomassini in (Kroc, 2007), are built on general are indistinguishable from real corals (Kaandorp, Sloot,
networks, which are represented by regular, random, Merks, Bak, Vermeij, & Maier, 2005), Figure 3.
scale-free networks or small-world networks. A regu- Modelling of dynamic recrystallization represents
lar network can be created from a classical CA and another living application of CAs within the field of
its lattice where each cell represents a node and each solid state physics (Kroc, 2002). Metals having poly-
neighbour is linked by an edge. A random graph is crystalline form, composed from many single crystals,
made from nodes that have randomly chosen nodes as are deformed at elevated temperatures. The stored
neighbours. Within scale-free networks, some nodes are energy is increasing due to deformation, which is in
highly connected to other points whereas other nodes turn released by recrystallization, where nuclei grow
are less connected. Their properties are independent and form new grains. Growth is driven by the release
of their size. The distribution of degree of links at a of stored energy. The response of deformed polycrys-
node follow a power law relationship P(k) = k g, where talline material is reflected by complex changes within
P(k) is the probability that a node is connect to k other the microstructure and deformation curve.
nodes. The coefficient g is in most cases between 2 and Stress-strain curves measured during deforma-
3. Those networks occur for instance in the Internet, tion of metallic samples exhibits either single peak
in social networks, and in biologically produced net- or multiple peak behaviour. This complex response
works such as gene regulatory networks within living of deformed material is a direct result of concurrent
cells or food chains within ecosystems (Sloot, Ivanov, processes operating within deformed material. CAs, so
Boukhanovsky, van de Vijver & Boucher, 2007). far, represents the only computational technique, which
In general, the behaviour of a given CA is unpredict- is able to describe such complex material behaviour
able what is often used in cryptography. There exist (Kroc, 2002), Figure 3.
a number of mostly statistical techniques enabling to
study the behaviour of given CA but none of them is
exact. The easiest way, and often the only one, to find FUTURE TRENDS
out the state of a CA is its execution.
There is a number of distinct tracks within CAs re-
search with a constant flux of new discoveries (Kroc,
CASE STUDIES 2007, Sloot & Hoekstra, 2007). CAs are used to model
physical phenomena but they are increasingly used
Understanding morphological growth and branching of to model biological, medical and social phenomena.
stony corals with the lattice Boltzmann method is a good Most CAs are designed by hand but the future requires
example of studying natural complex system with CAs development of automatic and self-adjusting optimiza-
(Kaandorp, Lowe, Frenkel & Sloot, 1996, Kaandorp, tion techniques to design local rules according to the
Sloot, Merks, Bak, Vermeij, & Maier, 2005). A deep needs of the described natural phenomena.
insight into those processes is important to assess the
356
Complex Systems Modeling by Cellular Automata
Figure 2. Morphological growth of coral Mandracis mirabilis obtained through 3D visualization of a CT-scan
of the coral (top) and two simulated growth forms (bottom) with different morphological shapes are depicted C
(Kaandorp, Sloot, Merks, Bak, Vermeij, & Maier, 2005). Simulated structures are indistinguishable from real
corals.
357
Complex Systems Modeling by Cellular Automata
Figure 3. Simulation of dynamic recrystallization is represented by: stressstrain curves (top-left), relevant mean
grain size Dstrain curves (top-right), an abrupt change loading strain rate (bottom-left), and relevant D-strain
curves (bottom-right). Strain is represented by the number of CA steps (Kroc, 2002).
358
Complex Systems Modeling by Cellular Automata
for Multi-scale Modelling: Formalism and the Scale related systems. Computer Physics Communications,
Separation Map, Lecture Notes in Computer Science. 142, 76-81. C
(4487) 922-930.
Toffoli, T. (1984) Cellular automata as an alternative to
Illachinski, A. (2001). Cellular Automata: A Discrete (rather than an approximation of) differential equations
Universe. World Scientific Publishing Co. Pte. Ltd. in modelling physics. Physica D, 10, 117-127.
Kaandorp, J.A., Lowe, C.P., Frenkel D. & Sloot Toffoli, T. & Margolus, N. (1987) Cellular Automata
P.M.A. (1996) Effect of Nutrient Diffusion and Flow Theory. Cambridge, Massachusetts, London, England:
on Coral Morphology, Physical Review Letters, 77, The MIT Press.
2328-2331.
Vichniac G.Y. (1984) Simulation physics with cellular
Kaandorp, J.A., Sloot, P.M.A., Merks, M.H., Bak, automata. Physica D, 10, 96-116.
R.P.M., Vermeij, M.J.A. & Maier, C. (2005) Morpho-
von Neumann, J. (1951) The general and logical theory
genesis of the branching reef coral Madracis mirabilis,
of automata. Cerebral Mechanisms and Behavior: The
Proc. R. Soc. B, 272, 127-133.
Hixon Symposium. L.A. Jeffress editor.
Kroc, J. (2002) Application of cellular automata simula-
von Neumann, J. (1966) Theory of Self-Reproducing
tions to modeling of dynamic recrystallization, Lecture
Automata. University of Illinois Press.
Notes in Computer Science, 2329, 773-782.
Wolfram, S. (2002). A New Kind of Science. Cham-
Kroc, J. (Editor) (2007) Advances in Complex Systems,
paign: Wolfram Media Inc.
10, 1 supp. issue, 1-213.
Wolfram, S. (1994) Cellular Automata and Complex-
Resnick, M. (1997) Turtles, termites, and traffic jams:
ity. Collected Papers. New York: Addison-Wesley
explorations in massively parallel microworlds. Cam-
Publishing Company.
bridge, Massachusetts, London, England: A Bradford
Book, The MIT Press.
Rivet, J-P. & Boon, J.P. (2001) Lattice Gas Hydrody-
namics. Cambridge University Press. KEy TERMS
Sloot, P.M.A., Chopard, B. & Hoekstra, A.G. (Edi- Cellular Automaton: (plural: cellular automata.) A
tors) (2004) Cellular Automata: 6th nternational cellular automaton is defined as a lattice (network) of
Conference on Cellular Automata for Research and cells (automata) where each automaton contains a set
Industry, ACRI 2004. Amsterdam, The Netherlands, of discrete variables, which are updated according to
LectureNotes in Computer Science (3305) Heidelberg: a local rule operating above neighbours of given cell
Springer Verlag. in discrete time steps. Cellular automata are typically
used as simplified but not simple models of complex
Sloot, P.M.A. & Hoekstra, A.G. (2007) Modeling
systems.
Dynamic Systems with Cellular Automata, Handbook
of Dynamic System Modeling, P.A. Fishwick editor, Generalized Cellular Automaton: It is based on
Chapman & Hall/CRC Computer and Information use of networks instead of regular lattices.
Science, Taylor & Francis Group.
Complex Network: Most of biological and social
Sloot, P.M.A., Ivanov, S.V., Boukhanovsky, A.V., van networks reflect topological properties not observed
de Vijver, D. & Boucher, C. (2007) Stochastic Simula- within simple networks (regular, random). Two ex-
tion of HIV Population Dynamics through Complex amples are small-world and scale-free networks.
network Modeling. International Journal of Computer
Mathematics. accepted. Complex System: A typical complex system con-
sists of a vast number of identical copies of several
Sloot, P.M.A., Overeinder, B.J. & Schoneveld, A. generic processes, which are operating and interacting
(2001) Self-organized criticality in simulated cor- only locally or with a limited number of not necessary
359
Complex Systems Modeling by Cellular Automata
close neighbours. There is no global leader or controller Regular Lattice: A perfectly regular and uniform
associated to such systems and the resulting behaviour neighbourhood for each lattice element called cell
is usually very complex. characterizes such lattices.
Emergence: Emergence is defined as the occurrence Self-Organization: Self-organization is a process
of new processes operating at a higher level of abstrac- typically occurring within complex systems where a
tion then is the level at which the local rules operate. system is continuously fed by energy, which is trans-
A typical example is an ant colony where this large formed into a new system state or operational mode by
complex structure emerges through local interactions a dissipation of energy and/or information.
of ants. For example, a whole hierarchy of emergents
Self-Organized Criticality: A complex system
exists and operates in a human body. An emergent is
expressing SOC is continuously fed by energy where
the product of an emergence process.
release of it is discrete and typically occurs in the form of
Lattice Gas Automata: Typically, it is a triangular avalanches. Most of its time, SOC operates at a critical
network of vertices interconnected by edges where point where avalanches occur. Earthquakes and volcano
generalized liquid particles move and undergo colli- eruptions represent prototypical examples of SOC ob-
sions. Averaged quantities resulting from such simu- served in many naturally observed phenomena.
lations correspond to solutions of the Navier-Stokes
Small-World Network: A mixture of two differ-
equations.
ent types of connections within each neighbourhood
Modelling: It is a description of naturally observed characterizes small-worlds. Typically, a neighbourhood
phenomena using analytical, numerical, and/or compu- of given vertex is composed of a greater fraction of
tational methods. Computational modelling is classi- neighbours having regular short-range connectivity
cally used in such fields as, e.g. physics, engineering. (regular network) and a smaller fraction of random
Its importance is increasing in other fields such as connections (random network). Such type of neigh-
biology, medicine, sociology, and psychology. bourhood provides unique properties to each model
built on the top of it.
Random Network: A neighbourhood of a vertex is
created by a set of randomly chosen links to neighbour-
ing vertices (elements) within a network of vertices.
360
361
INTRODUCTION output signals are all complex numbers. The net input
Un to a complex-valued neuron n is defined as:
The usual real-valued artificial neural networks
have been applied to various fields such as U n = W nm X m +V n
m
(1)
telecommunications, robotics, bioinformatics, image
processing and speech recognition, in which complex where Wnm is the complex-valued weight connecting
numbers (two dimensions) are often used with the complex-valued neurons n and m, Xm is the complex-
Fourier transformation. This indicates the usefulness valued input signal from the complex-valued neuron m,
of complex-valued neural networks whose input and and Vn is the complex-valued threshold of the neuron
output signals and parameters such as weights and n. The output value of the neuron n is given by fC (Un)
thresholds are all complex numbers, which are an where fC : C C is called activation function (C denotes
extension of the usual real-valued neural networks. the set of complex numbers). Various types of activation
In addition, in the human brain, an action potential functions used in the complex-valued neuron have
may have different pulse patterns, and the distance been proposed, which influence the properties of the
between pulses may be different. This suggests that it is complex-valued neuron, and a complex-valued neural
appropriate to introduce complex numbers representing network consists of such complex-valued neurons.
phase and amplitude into neural networks. For example, the component-wise activation
Aizenberg, Ivaskiv, Pospelov and Hudiakov (1971) function or real-imaginary type activation function is
(former Soviet Union) proposed a complex-valued neu- often used (Nitta & Furuya, 1991; Benvenuto & Piazza,
ron model for the first time, and although it was only 1992; Nitta, 1997), which is defined as follows:
available in Russian literature, their work can now be
read in English (Aizenberg, Aizenberg & Vandewalle, (2)
f C (z )= f R (x )+ i f R (y )
2000). Prior to that time, most researchers other than
Russians had assumed that the first persons to propose where fR(u) = 1/(1+exp(-u)), u R (R denotes the set
a complex-valued neuron were Widrow, McCool and
of real numbers), i denotes 1 , and the net input Un
Ball (1975). Interest in the field of neural networks
is converted into its real and imaginary parts as fol-
started to grow around 1990, and various types of com-
lows:
plex-valued neural network models were subsequently
proposed. Since then, their characteristics have been
researched, making it possible to solve some problems
U n = x + iy = z. (3)
which could not be solved with the real-valued neuron,
That is, the real and imaginary parts of an output of
and to solve many complicated problems more simply
a neuron mean the sigmoid functions of the real part x
and efficiently.
and imaginary part y of the net input z to the neuron,
respectively.
Note that the component-wise activation function
BACKGROUND (eqn (2)) is bounded but non-regular as a complex-
valued function because the Cauchy-Riemann equations
The generic definition of a complex-valued neuron is do not hold. Here, as several researchers have pointed
as follows. The input signals, weights, thresholds and out (Georgiou & Koutsougeras, 1992; Nitta, 1997) in
the complex region, we should recall the Liouville's
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Complex-Valued Neural Networks
theorem, which states that if a function g is regular at and radar. Aizenberg et al. (2000) proposed the
all z C and bounded, then g is a constant function. That following activation function:
is, we need to choose either regularity or boundedness
for an activation function of complex-valued neurons. 2 j 2 j 2 ( j + 1) ,
In addition, it has been proved that the complex-valued
f ( z ) = exp i
C
k
, if
<
k k
neural network with the component-wise activation z = exp(i ), j = 0, 1, , k1
function (eqn (2)) can approximate any continuous
(5)
complex-valued function, whereas a network with
a regular activation function (for example, fC(z) =
where k is a constant. Eqn (5) can be regarded as a type
1/(1+exp(-z)) (Kim & Guest, 1990), and fC(z) = tanh
of amplitude-phase activation functions. Only phase
(z) (Kim & Adali, 2003)) cannot approximate any non-
information is used and the amplitude information is
regular complex-valued function (Arena, Fortuna, Re
discarded, however, many successful applications show
& Xibilia, 1993; Arena, Fortuna, Muscato & Xibilia,
that the activation function is sufficient.
1998). That is, the complex-valued neural network
with the non-regular activation function (eqn (2)) is a
universal approximator, but a network with a regular
activation function is not. It should be noted here that INHERENT PROPERTIES OF THE
the complex-valued neural network with a regular MULTI-LAYERED TYPE
complex-valued activation function such as fC(z) = COMPLEX-VALUED NEURAL NETWORK
tanh (z) with the poles can be a universal approximator
on the compact subsets of the deleted neighbourhood This article presents the essential differences between
of the poles (Kim & Adali, 2003). This fact is very multi-layered type real-valued neural networks and
important theoretically, however, unfortunately the multi-layered type complex-valued neural networks,
complex-valued neural network for the analysis is not which are very important because they expand the real
usual, that is, the output of the hidden neuron is defined application fields of the multi-layered type complex-
as the product of several activation functions. Thus, the valued neural networks. To the authors knowledge,
statement seems to be insufficient to compare with the the inherent properties of complex-valued neural
case of component-wise complex-valued activation networks with regular complex-valued activation
function. Thus, the ability of complex-valued neural functions have not been revealed except their learning
networks to approximate complex-valued functions performance so far. Thus, only the inherent properties
depends heavily on the regularity of activation func- of the complex-valued neural network with the non-
tions used. regular complex-valued activation function (eqn (2))
On the other hand, several complex-valued activa- are mainly described: (a) the learning performance, (b)
tion functions based on polar coordinates have been the ability to transform geometric figures, and (c) the
proposed. For example, Hirose (1992) proposed the orthogonal decision boundary.
following amplitude-phase type activation function:
Learning Performance
exp(i ),
f C
( z ) = tanh exp(i ), z =
m In the applications of multi-layered type real-valued
neural networks, the error back-propagation learning
(4)
algorithm (called here, Real-BP) (Rumelhart, Hinton
& Williams, 1986) has often been used. Naturally, the
where m is a constant. Although this amplitude-phase
complex-valued version of the Real-BP (called here,
activation function is not regular, Hirose noted that the
Complex-BP) can be considered, and was actually pro-
non-regularity did not cause serious problems in real
posed by several researchers (Kim & Guest, 1990; Nitta
applications and that the amplitude-phase framework
& Furuya, 1991; Benvenuto & Piazza, 1992; Georgiou
is suitable for applications in many engineering fields
& Koutsougeras, 1992; Nitta, 1993, 1997; Kim & Adali,
such as optical information processing systems, and
2003). Thisalgorithm enables the network to learn
amplitude modulation, phase modulation and frequency
complex-valued patterns naturally.
modulation in electromagnetic wave communications
362
Complex-Valued Neural Networks
It is known that the learning speed of the Complex- random real numbers between 0 and 1. The experiment
BP algorithm is faster than that of the Real-BP algorithm. consisted of two parts: a training step, followed by a C
Nitta (1991, 1997) showed in some experiments on test step. The training step consisted of learning a set
learning complex-valued patterns that the learning speed of (complex-valued) weights and thresholds, such that
is several times faster than that of the conventional the input set of (straight line) points (indicated by black
technique, while the space complexity (i.e., the number circles in Fig. 1) gave as output, the (straight line) points
of learnable parameters needed) is only about half that (indicated by white circles) rotated counterclockwise
of Real-BP. Furthermore, De Azevedo, Travessa and over p/2 radians. Input and output pairs were presented
Argoud (2005) applied the Complex-BP algorithm 1,000 times in the training step. These complex-val-
of the literature (Nitta, 1997) to the recognition and ued weights and thresholds were then used in a test
classification of epileptiform patterns in EEG, in par- step, in which the input points lying on a straight line
ticular, dealing with spike and eye-blink patterns, and (indicated by black triangles in Fig. 1) would hope-
reconfirmed the superiority of the learning speed of fully be mapped to an output set of points lying on
the Complex-BP described above. As for the regular the straight line (indicated by white triangles) rotated
complex-valued activation function, Kim and Adali counterclockwise over p/2 radians. The actual output
(2003) compared the learning speed of the Complex-BP test points for the Complex-BP did, indeed, lie on the
using nine regular complex-valued activation functions straight line (indicated by white squares). It appears that
with those of the Complex-BP using three non-regular the complex-valued network has learned to generalize
complex-valued activation functions (including eqn (4)) the transformation of each point Zk (= r k exp[i k ]) into
through a computer simulation for a simple nonlinear Z k exp[i ](= r k exp[i ( k + )]), i.e., the angle of each
system identification example. The experimental complex-valued point is updated by a complex-valued
results suggested that the Complex-BP with the regular factor exp[ia], however, the absolute length of each
input point is preserved. In the above experiment, the
activation function f C
( z ) = arcsin h( z ) was the fastest 11 training input points lay on the line y = x + 1 (0
among them. x 1) and the 11 training output points lay on the line
y = x + 1 ( 1 x 0). The seven test input points lay
Ability to Transform Geometric Figures on the line y=0.2 ( 0.9 x 0.3). The desired output
test points should lie on the line x = 0.2.
The Complex-BP with the non-regular complex-valued Watanabe, Yazawa, Miyauchi and Miyauchi (1994)
activation function (eqn (2)) can transform geometric applied the Complex-BP in the field of computer
figures, e.g. rotation, similarity transformation and vision. They successfully used the ability to transform
parallel displacement of straight lines, circles, etc., geometric figures of the Complex-BP network to
whereas the Real-BP cannot (Nitta, 1991, 1993, 1997). complement the 2D velocity vector field on an image,
Numerical experiments suggested that the behaviour which was derived from a set of images and called an
of a Complex-BP network which learned the transfor- optical flow. The ability to transform the geometric
mation of geometric figures was related to the Identity figure of the Complex-BP can also be used to generate
Theorem in complex analysis. fractal images. Actually, Miura and Aiyoshi (2003)
Only an illustrative example on a rotation is given applied the Complex-BP to the generation of fractal
below. In the computer simulation, a 1-6-1 three-lay- images and showed in computer simulations that some
ered complex-valued neural network was used, which fractal images such as snow crystals could be obtained
transformed a point (x, y) into (x, y) in the complex with high accuracy where the iterated function systems
plane. Although the Complex-BP network generates (IFS) were constructed using the ability to transform
a value z within the range 0 < Re[z], Im[z] < 1 due to geometric figure of the Complex-BP.
the activation function used (eqn (2)), for the sake of
convenience it is presented in the figure given below Orthogonal Decision Boundary
as having a transformed value within the range 1 <
Re[z], Im[z] < 1. The learning rate used in the experi- The decision boundary of the complex-valued neu-
ment was 0.5. The initial real and imaginary components ron with the non-regular complex-valued activation
of the weights and the thresholds were chosen to be function (eqn (2)) has a different structure from that
363
Complex-Valued Neural Networks
364
Complex-Valued Neural Networks
their generalization. Kibernetika (Cybernetics). 4, Miura, M., & Aiyoshi, E. (2003). Approximation and
44-51 (in Russian). designing of fractal images by complex neural networks. C
EEJ Trans. on Electronics, Information and Systems,
Arena, P., Fortuna, L., Muscato, G., & Xibilia,
123(8), 1465-1472 (in Japanese).
M. G. (1998). Neural networks inmultidimensional
domains. Lecture Notes in Control and Information Nitta, T., & Furuya, T. (1991). A complex back-
Sciences, 234, London: Springer. propagation learning. Transactions of Information
Processing Society of Japan, 32(10), 1319-1329 (in
Arena, P., Fortuna, L., Re, R., & Xibilia, M. G. (1993).
Japanese).
On the capability of neural networks with complex
neurons in complex valued functions approximation. Nitta, T. (1993). A complex numbered version of the
Proc. IEEE Int. Conf. on Circuits and Systems, 2168- back-propagation algorithm. Proc. World Congress on
2171. Neural Networks, 3, 576-579.
Benvenuto, N., & Piazza, F. (1992). On the complex Nitta, T. (1997). An extension of the back-propagation
backpropagation algorithm. IEEE Trans. Signal Pro- algorithm to complex numbers. Neural Networks,
cessing, 40(4), 967-969. 10(8), 1392-1415.
Buchholz, S., & Sommer, G. (2001). Introduction to Nitta, T. (2004a). Orthogonality of decision boundaries
neural computation in Clifford algebra. In Sommer, G. in complex-valued neural networks. Neural Computa-
(Ed.), Geometric computing with Clifford algebras (pp. tion, 16(1), 73-97.
291-314). Berlin Heidelberg: Springer.
Nitta, T. (2004b). A Solution to the 4-bit parity problem
De Azevedo, F. M., Travessa, S. S., & Argoud F. with a single quaternary neuron. Neural Information
I. M. (2005). The investigation of complex neural Processing Letters and Reviews, 5(2), 33-39.
network on epileptiform pattern classification. Proc.
Nitta, T. (2006). Three-dimensional vector valued
The 3rd European Medical and Biological Engineering
neural network and its generalization ability. Neural
Conference (EMBEC'05), 2800-2804.
Information Processing Letters and Reviews, 10(10),
Georgiou, G. M., & Koutsougeras, C. (1992). Complex 237-242.
domain backpropagation. IEEE Trans. Circuits and
Nitta, T. (2007). N-dimensional vector neuron.
Systems--II: Analog and Digital Signal Processing,
Proc. IJCAI Workshop on Complex Valued Neural
39(5), 330-334.
Networks and Neuro-Computing: Novel Methods, Ap-
Hirose, A. (1992). Continuous complex-valued back- plications and Implementations, 2-7.
propagation learning. Electronics Letters, 28(20),
Pearson, J., & Bisset, D. (1992). Back propagation in
1854-1855.
a Clifford algebra. Proc. International Conference on
Hirose, A. (2003). Complex-valued neural networks, Artificial Neural Networks, Brighton.
Singapore: World Scientific Publishing.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J.
Isokawa, T., Kusakabe, T., Matsui, N., & Peper, F. (2003). (1986). Parallel distributed processing (Vol.1). MA:
Quaternion neural network and its application. In Palade, The MIT Press.
V., Howlett, R. J., & Jain, L. C. (Ed.), Lecture notes
Watanabe, A., Yazawa, N., Miyauchi, A., & Miyauchi,
in artificial intelligence, 2774 (KES2003) (pp. 318-324),
M. (1994). A method to interpret 3D motions using
Berlin Heidelberg: Springer.
neural networks. IEICE Transactions on Fundamentals
Kim, T., & Adali, T. (2003). Approximation by fully of Electronics, Communications and Computer
complex multilayer perceptrons. Neural Computation, Sciences, E77-A(8), 1363-1370.
15(7), 1641-1666.
Widrow, B., McCool, J., & Ball, M. (1975). The complex
Kim, M. S., & Guest, C. C. (1990). Modification of LMS algorithm. Proceedings of the IEEE. 63(4), 719-
backpropagation networks for complex-valued signal 720.
processing in frequency domain. Proc. Int. Joint Conf.
on Neural Networks, 3, 27-31.
365
Complex-Valued Neural Networks
Figure 1. Rotation of a straight line. A black circle denotes an input training point, a white circle an output train-
ing point, a black triangle an input test point, a white triangle a desired output test point, and a white square an
output test point generated by the Complex-BP network.
366
367
INTRODUCTION BACKGROUND
The typical recognition/classification framework in Feature Selection aims at identifying the most informa-
Artificial Vision uses a set of object features for dis- tive features. Once we have a measure of informative-
crimination. Features can be either numerical measures ness for each feature, a subset of them can be used for
or nominal values. Once obtained, these feature values classifying. In this case, the features remain the same,
are used to classify the object. The output of the clas- only a selection is made. The topic of feature selec-
sification is a label for the object (Mitchell, 1997). tion has been extensively studied within the Machine
The classifier is usually built from a set of training Learning community (Duda et al, 2000). Alternatively,
samples. This is a set of examples that comprise feature in Feature Extraction a new set of features is created
values and their corresponding labels. Once trained, from the original set. In both cases the objective is both
the classifier can produce labels for new samples that reducing the number of available features and using
are not in the training set. the most discriminative ones.
Obviously, the extracted features must be discrimi- The following sections describe two techniques
native. Finding a good set of features, however, may for Feature Extraction: Principal Component Analysis
not be an easy task. Consider for example, the face and Independent Component Analysis. Linear Dis-
recognition problem: recognize a person using the criminant Analysis (LDA) is a similar dimensionality
image of his/her face. This is currently a hot topic of reduction technique that will not be covered here for
research within the Artificial Vision community, see space reasons, we refer the reader to the classical text
the surveys (Chellappa et al, 1995), (Samal & Iyengar, (Duda et al., 2000).
1992) and (Chellappa & Zhao, 2005). In this problem,
the available features are all of the pixels in the image.
However, only a number of these pixels are normally Figure 1.
useful for discrimination. Some pixels are background,
hair, shoulders, etc. Even inside the head zone of the
image some pixels are less useful than others. The eye
zone, for example, is known to be more informative
than the forehead or cheeks (Wallraven et al, 2005).
This means that some features (pixels) may actually
increase recognition error, for they may confuse the
classifier.
Apart from performance, from a computational cost
point of view it is desirable to use a minimum number
of features. If fed with a large number of features, the
classifier will take too long to train or classify.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Component Analysis in Artificial Vision
As an example problem we will consider face rec- of view, for samples of the two classes can be readily
ognition. The face recognition problem is particularly separated. Besides, the PCA transform provides an
interesting here because of a number of reasons. First, ordering of features, from the most discriminative (in
it is a topic of increasingly active research in Artificial terms of variance) to the least. This means that we can
Vision, with potential applications in many domains. select and use only a subset of them. In the figure above,
Second, it has images as input, see Figure 1 from the for example, the coordinate with the largest axis is the
Yale Face Database (Belhumeur et al, 1997), which most discriminative.
means that some kind of feature processing/selection When the input space is an image, as in face recogni-
must be done previous to classification. tion, training images are stored in a matrix T. Each row
of T contains a training image (the image rows are laid
consecutively, forming a vector). Thus, each image pixel
PRINCIPAL COMPONENT ANALYSIS is considered a feature. Let there be n training images.
PCA can then be done in the following steps:
Principal Component Analysis (PCA), see (Turk &
Pentland, 1991), is an orthogonal linear transformation 1. Subtract the mean image vector m from T,
of the input feature space. PCA transforms the data to where:
a new coordinate system in which the data variances
in the new dimensions is maximized. Figure 2 shows a 1
2-class set of samples in a 2-feature space. These data
m= xi
n i
have a certain variance along the horizontal and verti-
cal axes. PCA maps the samples to a new orthogonal 2. Calculate the covariance matrix C:
coordinate system, shown in bold, in which the sample
variances are maximized. The new coordinate system 1
is centered on the data mean.
C= ( xi m)( xi m)
n i
The new set of features (note that the coordinate
axes are features) is better from a discrimination point 3. Perform Singular Value Decomposition over C,
which gives an orthogonal transform matrix W
Figure 2.
368
Component Analysis in Artificial Vision
369
Component Analysis in Artificial Vision
REFERENCES
FUTURE TRENDS
Bartlett, M.S. & Sejnowski, T.J. (1997). Independent
components of face images: a representation for face
PCA has been shown to be an invaluable tool in
recognition. In: Proceedings of the 4th Annual Joint
Artificial Vision. Since the seminal work of (Turk &
Symposium on Neural Computation, Pasadena, CA.
Pentland, 1991) PCA-based methods are considered
standard baselines in the problem of face recognition. Belhumeur, P.N., Hespanha, J.P. & Kriegman, D.J.
Many other techniques have evolved from it: robust (1997). Eigenfaces vs. fisherfaces: recognition using
PCA, nonlinear PCA, incremental PCA, kernel PCA, class specific linear projection. IEEE Transactions
probabilistic PCA, etc. on Pattern Anaysis and Machine Intelligence. 19 (7),
As mentioned above, ICA can be considered an 711720.
extension of PCA. Some authors have shown that in
certain cases the ICA transformation does not provide Bell, A. & Sejnowski, T. (1995). An information
performance gain over PCA when a good classifier is maximization approach to blind separation and blind
used (like Support Vector Machines), see (Dniz et al, deconvolution. Neural Computation. 7, 11291159.
2003). This may be of practical significance, since PCA Chellappa, R., Wilson, C. & Sirohey, S. (1995). Human
is faster than ICA. ICA is not being used as extensively and machine recognition of faces a survey. Proceedings
within the Artificial Vision community as it is in other IEEE, vol. 83, n. 5, pp. 705740.
disciplines like signal processing, especially where the
problem of interest is signal separation. Chellappa, R. & Zhao, W., Eds. (2005). Face Process-
On the other hand, Graph Embedding (Yan et al, ing: Advanced Modeling and Methods. Elsevier.
2005) is a framework recently proposed that constitutes Comon, P. (1994). Independent component analysisa
an elegant generalization of PCA, LDA (Linear Discrim- new concept?. Signal Processing, 36:287314.
inat Analysis), LPP (Locality Preserving Projections)
and other dimensionality reduction techniques. As well Dniz, O., Castrilln, M. & Hernndez, M. (2003). Face
as providing a common formulation, it facilitates the Recognition using Independent Component Analysis
designing of new dimensionality reduction algorithms and Support Vector Machines. Pattern Recognition
based on new criteria. Letters, vol 24, issue 13, Pages 2153-2157.
Duda, R.O., Hart, P.E. & Stork, D.G. (2000). Pattern
Classification (2nd Edition). Wiley.
CONClUSION
Gvert, H., Hurri, J., Srel, J. & Hyvrinen, A. (2005).
Component analysis is a useful tool for Artificial Vision The FastICA MATLAB package. Available from http://
Researchers. PCA, in particular, can now be considered www.cis.hut.fi/projects/ica/fastica/.
indispensable to reduce the high dimensionality of im- Jutten, C. & Hrault, J. (1991). Blind separation of
ages. Both computation time and error ratios can be sources, part I: An adaptive algorithm based on neuro-
reduced. This article has described both PCA and the mimetic architecture. Signal Processing, 24:110.
related technique ICA, focusing on their application
to the face recognition problem. Mitchell, T. (1997). Machine Learning. McGraw
Both PCA and ICA act as a feature extraction stage, Hill.
previous to training and classification. ICA is computa-
370
Component Analysis in Artificial Vision
371
372
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Computational Methods in Biomedical Imaging
but are typically characterized by a spatial reso- device. Under the most general assumptions the model
lution which is lower (sometimes much lower) equation is a non-linear integral equation, although, for C
than the one of anatomical imaging. Emission several devices, the non-linear imaging equation can
tomographies like Single Photon Emission Com- be reliably approximated by a linear model where the
puterized Tomography (SPECT) (Duncan, 1997) integral kernel encodes the impulse response of the
or Positron Emission Tomography (PET) (Valk, instrument. Such linearization can be either performed
Bailey, Townsend & Maisey, 2004) and Magnetic through a precise technological realization, like in MRI,
Resonance Imaging in its functional setup (fMRI) where acquisition is designed in such a way that the
(Huettel, Song & McCarthy, 2004) are examples of data are just the Fourier Transform of the object to be
these dynamical techniques together with Electro- imaged; or obtained mathematically, by applying a sort
and Magnetoencephalography (EEG and MEG) of perturbation theory to the non-linear equation, like
(Zschocke & Speckmann, 1993; Hamalainen, in diffraction tomography whose model comes from
Hari, Ilmoniemi, Knuutila & Lounasmaa, 1993), the linearization of the scattering equation.
which reproduce the neural activity at a millisec- The second step toward image reconstruction is
ond time scale and in a completely non-invasive given by the formulation of computational methods
fashion. for the reduction of the model equation. In the case of
linear ill-posed inverse problems, a well-established
In all these imaging modalities the correct math- regularization theory exists which attenuates the nu-
ematical modeling of the imaging problem, the for- merical instability related to ill-posedness maintaining
mulation of computational algorithms for the solution the biological reliability of the reconstructed image.
of the model equations and the application of image Regularization theory is at the basis of most linear
processing algorithms for data interpretation are the imaging modalities and regularization methods can be
crucial steps which allow the exploitness of the visual formulated in both a probabilistic and a deterministic
information from the measured raw data. setting. Unfortunately an analogously well- established
theory does not exist in the case of non-linear imaging
problems which therefore are often addressed by means
MAIN FOCUS of ad hoc techniques.
Once an image has been reconstructed from the data,
From a mathematical viewpoint the inverse problem a third step has to be considered, i.e. the processing of
of synthesizing the biological information in a visual the reconstructed images for the extraction and inter-
form from the collected radiation is characterized by pretation of their information content. Three different
a peculiar pathology. problems are typically addressed at this stage:
The concept of ill-posedness has been introduced
by Jules Hadamard (Hadamard, 1923) to indicate math- Edge detection (Trucco & Verri, 1998). Computer
ematical problems whose solution does not exist for vision techniques are applied in order to enhance
all data, or is not unique or does not depend uniquely the regions of the image where the luminous
on the data. In biomedical imaging this last feature has intensity changes sharply.
particularly deleterious consequences: indeed, the pres- Image integration (Maintz & Viergever, 1998). In
ence of measurement noise in the raw data may produce the clinical workflow several images of a patient
notable numerical instabilities in the reconstruction are taken with different modalities and geometries.
when naive approaches are applied. These images can be fused in an integrated model
Most (if not all) biomedical imaging problems are by recovering changes in their geometry.
ill-posed inverse problems (Bertero & Boccacci, 1998) Image segmentation (Acton & Ray, 2007). Partial
whose solution is a difficult mathematical task and often volume effects make the interfaces between the
requires a notable computational effort. The first step different tissues extremely fuzzy, thus complicat-
toward the solution is represented by an accurate model- ing the clinical interpretation of the restored im-
ing of the mathematical relation between the biological ages. An automatic procedure for the partitioning
organ to be imaged and the data provided by the imaging of the image in homogeneous pixel sets and for
373
Computational Methods in Biomedical Imaging
the classification of the segmented regions is at tubes for data acquisition and computational methods
the basis of any Computed Aided Diagnosis and for the reduction of beam hardening effects; electro-
therapy (CAD) software. physiological and structural information on the brain
can be collected by performing an EEG recording
AI algorithms and, above all, machine learning play inside an MRI scanning but also using the structural
a crucial role in addressing these image processing is- information from MRI as a prior information in the
sues. In particular, as a subfield of machine learning, analysis of the EEG signal accomplished in a Bayesian
pattern recognition provides a sophisticated descrip- setting; finally, non- invasivity in colonoscopy can be
tion of the data which, in medical imaging, allows to obtained by utilizing the most recent acquisition de-
locate tumors and other pathologies, measure tissue sign in X-ray tomography together with sophisticated
dimensions, favor computer-aided surgery and study softwares which allow virtual navigation within the
anatomical structures. For example, supervised ap- bowel, electronic cleansing and automatic classification
proaches like backpropagation (Freeman & Skapura, of cancerous and healthy tissues.
1991) or boosting (Shapire, 2003) accomplish classifica- From a purely computational viewpoint, two im-
tion tasks of the different tissues from the knowledge portant goals in machine learning applied to medical
of previously interpreted images; while unsupervised imaging are the development of algorithms for semi-
methods like Self-Organizing Maps (SOM) (Kohonen, supervised learning and for the automatic integration
2001), fuzzy clustering (De Oliveira & Pedrycz, 2007) of genetic data with information coming from the
and Expectation-Maximization (EM) (McLachlan & acquired imagery.
Krishnan, 1996) infer probabilistic information or
identify clustering structures in sets of unlabeled im-
ages. From a mathematical viewpoint, several of these CONClUSION
methods correspond more to heuristic recipes than
to rigorously formulated and motivated procedures. Some aspects of recent biomedical imaging have been
However, since the last decade the theory of statisti- described from a computational science perspective.
cal learning (Vapnik, 1998) has appeared as the best The biomedical image reconstruction problem has been
candidate for a rigorous description of machine learning discussed as an ill-posed inverse problem where the
within a functional analysis framework. intrinsic numerical instability producing image artifacts
can be reduced by applying sophisticated regulariza-
tion methods. The role of image processing based
FUTURE TRENDS on machine learning techniques has been described
together with the main goals of recent biomedical
Among the main goals of recent biomedical imaging imaging applications.
we point out the realization of
374
Computational Methods in Biomedical Imaging
sional Vector Microwave Tomography: Theory and Maintz, J. & Viergever, M. (1998). A Survey of Medi-
Computational Experiments. Inverse Problem. 20, cal Imaging Registration. Medical Imaging Analysis, C
1239-1259. 2, 1-36.
Cheney, M., Isaacson, D. & Newell, J. C. (1999). McLachlan, G. & Krishnan, T. (1996). The EM Algo-
Electrical Impedance Tomography. SIAM Review. rithm and Extensions. San Francisco: John Wiley.
41, 85-101.
Rost, F. & Oldfield R. (2000). Fluorescence Micros-
De Oliveira, J. V. & Pedrycz, W (2007). Advances in copy: Photography with a Microscope. Cambridge:
Fuzzy Clustering and its Applications. San Francisco: Cambridge University Press.
Wiley.
Shapire, R. E. (2003). The Boosting Approach to Ma-
Duncan, R. (1997). SPECT Imaging of the Brain. chine Learning: an Overview. Nonlinear Estimation and
Amsterdam: Kluwer. Classification, Denison, D. D., Hansen, M. H., Holmes,
C., Mallik, B. & Yu, B. editors. Berlin: Springer.
Freeman, J. A. & Skapura, D. M. (1991). Neural
Network Algorithms: Applications and Programming Trucco, E. & Verri, A. (1998). Introductory Techniques
Techniques. Redwood City: Addison-Wesley. for 3D Computer Vision. Englewood Cliffs: Prentice
Hall.
Greenleaf, J., Gisvold, J. J. & Bahn, R. (1982). Com-
puted Transmission Ultrasound Tomography. Medical Valk, P. E, Bailey, D. L., Townsend, D. W. & Maisey,
Progress Through Technology. 9, 165-170. M. N. (2004). Positron Emission Tomography. Basic
Science and Clinical Practice. Berlin: Springer.
Guo, P. & Devaney, A. J. (2005). Comparison of
Reconstruction Algorithms for Optical Diffraction Vapnik, V. (1998). Statistical Learning Theory. San
Tomography. Journal of the Optical Society of America Francisco: John Wiley.
A. 22, 2338-2347.
Zschocke, S. & Speckmann, E. J. (1993). Basic Mecha-
Haacke, E. M., Brown, R. W., Venkatesan, R. & nisms of the EEG. Boston: Birkhaeuser.
Thompson, M. R. (1999). Magnetic Resonance Imag-
ing: Physical Principles and Sequence Design. San
Francisco: John Wiley.
KEy TERMS
Hadamard, J. (1923). Lectures on Cauchys Problem
in Partial Differential Equations. Yale: Yale University Computer Aided Diagnosis (CAD): The use of
Press. computers for the interpretation of medical images.
Hamalainen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J. Automatic segmentation is one of the crucial task of
& Lounasmaa, O.V. (1993). Magnetoencephalography: any CAD product.
theory, instrumentation and applications to non-inva- Edge Detection: Image processing technique for
sive studies of the working human brain. Reviews of enhancing the points of an image at which the luminous
Modern Physics. 65, 413-497. intensity changes sharply.
Hounsfield, G. N. (1973). Computerised Transverse Electroencephalography (EEG): Non-invasive
Axial Scanning (Tomography). I: Description of Sys- diagnostic tool which records the cerebral electrical
tem. British Journal of Radiology. 46, 1016-1022. activity by means of surface electrodes placed on the
Huettel, S. A., Song, A. W. and McCarthy, G. (2004). skull.
Functional Magnetic Resonance Imaging. Sunderland: Ill-Posedness: Mathematical pathology of differen-
Sinauer Associates. tial or integral problems, whereby the solution of the
Kohonen, T. (2001). Self-Organizing Maps. Berlin: problem does not exist for all data, or is not unique or
Springer. does not depend continuously on the data. In computa-
tion, the numerical effects of ill-posedness are reduced
by means of regularization methods.
375
Computational Methods in Biomedical Imaging
Image Integration: In medical imaging, combina- Segmentation: Image processing technique for
tion of different images of the same patient acquired distinguishing the different homogeneous regions in
with different modalities and/or according to different an image.
geometries.
Statistical Learning: Mathematical framework
Magnetic Resonance Imaging (MRI): Imaging which utilizes functional analysis and optimazion tools
modality based on the principles of nuclear magnetic for studying the problem of inference.
resonance (NMR), a spectroscopic technique used to
Tomography: Imaging technique providing two-
obtain microscopic chemical and physical information
dimensional views of an object. The method is used
about molecules. MRI can be applied in both functional
in many disciplines and may utilize input radiation of
and anatomical settings.
different nature and wavelength. There exist X-ray,
Magnetoencephalography (MEG): Non-invasive optical, microwave, diffraction and electrical imped-
diagnostic tool which records the cerebral magnetic ance tomographies.
activity by means of superconducting sensors placed
on a helmet surrounding the brain.
376
377
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Computer Morphogenesis in Self-Organizing Structures
Considering the gene regulatory networks works, the promoter, which identifies the proteins that are needed
most relevant models are the following: the Kumar and for gene transcription.
Bentley model (Kumar S. & Bentley P.J 2003), which Another remarkable aspect of biological genes is the
uses the Bentleys theory of fractal proteins (Bentley, difference between constitutive genes and regulating
P.J. 1999); for the calculation of protein concentra- genes. The latter are transcribed only when the proteins
tion; the Eggenberger model (Eggenberger P. 1996), identified in the promoter part are present. The constitu-
which uses the concepts of cellular differentiation and tive genes are always transcribed, unless inhibited by
cellular movement to determine cell connections; and the presence of the proteins identified in the promoter
the work of Dellaert and Beer (Dellaert F. & Beer R.D. part, acting then as gene oppressors.
1996), who propose a model that incorporates the idea The present work has tried to partially model this
of biological operons to control the model expression, structure with the aim of fitting some of its abilities into
where the function assumes the mathematical meaning a computational model; in this way, the system would
of a Boolean function. have a structure similar that is similar to the above and
All these models can be regarded as special cel- will be detailed in the next section.
lular automata. In cellular automata, a starting cell set
in a certain state will turn into a different set of cells Proposed Model
in different states when the same transition function
(Conway J.H. 1971) is applied to all the cells during a Various model variants were developed on the basis
determined lapse of time in order to control the message of biological concepts. The proposed artificial cellular
concurrence among them. The best known example of system is based on the interaction of artificial cells
cellular automats is Conways Game of Life, where by means of messages that are called proteins. These
this behaviour can be observed perfectly. Whereas the cells can divide themselves, die, or generate proteins
classical conception specifies the behaviour rules, the that will act as messages for themselves as well as for
evolutionary models establish the rules by searching neighbour cells.
for a specific behaviour. This difference comes from the The system is supposed to express a global behav-
mathematical origin of the cellular automats, whereas iour towards the generation of structures in 2D. Such
the here presented models are based on biology and behaviour would emerge from the information encoded
embryology. in a set of variables of the cell that, in analogy with the
These models should not be confused with other biological cells, will be named genes.
concepts that might seem similar, such as Gene Expres- One promising application, in which we are work-
sion Programming (GEP) (Ferreira C. 2006). Although ing, could be the compact encoding of adaptive shapes,
GEP codifies the solution in a string, similarly as how similar to the functioning of fractal growth or the fractal
it is done in the present work, the solution program image compression.
is developed in a tree shape, as in classical genetic
programming (Koza, J. et. al.1999) which has little or
nothing in common with the presented models.
will be generated if the gene is transcribed, and the A c tiv ation P roteins
378
Computer Morphogenesis in Self-Organizing Structures
The central element of our model is the artificial cell. The activation of the regulating genes or the inhibi-
Every cell has a binary string-encoded information for tion of the constituent genes is achieved if the condition C
the regulation of its functioning. Following the biologi- expressed by Eq.1 is fulfilled, where Protein Concentra-
cal analogy, this string will be called DNA. The cell also tion Percentage represents the cytoplasm concentration
has a structure for the storage and management of the of the protein that is being considered; Distance stands
proteins generated by the own cell and those received for the Hamming distance between one promoter and
from neighbourhood cells; following the biological the considered protein; and Activation Percentage is the
model, this structure is called cytoplasm. minimal percentage needed for the gene activation that
The DNA of the artificial cell consists of functional is encoded in the gene. This equation is tested on each
units that are called genes. Each gene encodes a protein promoter and each protein. If the condition is fulfilled
or message (produced by the gene). The structure of a for all the promoters, that gene is transcribed. According
gene has four parts (see Figure 1): to this, if gene-like promoters exist in a concentration
higher than the encoded concentration, they can also
Sequence: the binary string that corresponds to induce its transcription, similarly to what happens in
the protein that encodes the gene biology and therefore providing the model with higher
Promoters: is the gene area that indicates the flexibility. If the condition is fulfilled for each promoter,
proteins that are needed for the genes transcrip- the gene is activated and therefore transcribed.
tion. After the activation of one of the genes, three things
Constituent: this bit identifies if the gene is can happen: the generated protein may be stored in the
constituent or regulating cell cytoplasm, it may be communicated to the neigh-
Activation percentage (binary value): the per- bour cells, or it may induce cellular division (mitosis)
centage of minimal concentration of promoters and/or death (apoptosis). The different events of a
proteins inside the cell that causes the transcription tissue are managed in the cellular model by means of
of the gene. cellular cycles. Such cycles will contain all the
actions that can be carried out by the cells, restricting
The other fundamental element for keeping and sometimes their occurrence. The cellular cycles can
managing the proteins that are received or produced by be described as follows:
the artificial cell is the cytoplasm. The stored proteins
have a certain life time before they are erased. The Actualisation of the life time of proteins in the
cytoplasm checks which and how many proteins are cytoplasm
needed for the cell to activate the DNA genes, and as Verification of the life status of the cell (cellular
such responds to all the cellular requirements for the death)
concentration of a given type of protein. The cytoplasm Calculation of the genes that react and perform
also extracts the proteins from the structure in case they the special behaviour that may be associated to
are needed for a gene transcription. them
Communication between proteins
Model Functioning
Solution Search
The functioning of genes is determined by their type,
which can be constituent or regulating. The transcrip- A classical approach of EC proposes the use of Genetic
tion of the encoded protein occurs when the promoters Algorithms (GA) (Fogel L.J., Owens A. J. & Walsh
of the non-constituent genes appear in a certain rate at M.A. 1966; Goldberg D.E. 1989; Holland J.H. 1975)
the cellular cytoplasm. On the other hand, the constitu- for the optimisation, in this case, of the values of the
ent genes are expressed during all the cycles until DNA genes (binary strands). Each individual of the
such expression is inhibited by the present rate of the GA population will represent a possible DNA strand
promoter genes. for problem solving.
In order to calculate the fitness value for every indi-
Protein Concentration Percent>= vidual in the GA or the DNA, the strand is introduced
(Distance+1 * Activation Percent (1) into an initial cell or zygote. After simulating during a
379
Computer Morphogenesis in Self-Organizing Structures
Figure 2. (Above) Three promoters and a PCS struc- so that it could modify not only the number of these
ture (Below); Example of GA genes association for the sections, but also the value of a given section.
encoding of cellular genes The probability of executing the mutation is usu-
ally low, but this time it even had to be divided into
A ctiva tio n
the three possible mutation operations that the system
P ro m o te r P ro m o te r P ro m o te r
P e rce n ta g e
C o n stitu e n t S e q u e n ce
contemplates. Various tests proved that the most suit-
GEN E GEN E able values for the distribution of the different mutation
operations, after the selection of a position for mutation,
were the following: for 20% of the opportunities, a sec-
P ro m o te r PCS P ro m o te r P ro m o te r P ro m o te r PCS
380
Computer Morphogenesis in Self-Organizing Structures
template, will execute the NEXOR Boolean operation Conway J.H. (1971) Regular Algebra and Finite Ma-
in order to obtain the number of differences between chines. Chapman and Hall, Ltd., London C
the template and the tissue. Each difference contributes
Dellaert F. & Beer R.D. (1996) A Developmental Model
with a value of 1.0. to tissue error.
for the Evolution of Complete Autonomous Agent In
Figure 3 illustrates the use of this method. We can
From animals to animats: Proceedings of the Forth
observe that the error of this tissue with regard to the
International Conference on Simulation of Adaptive
template is 2, since we generated a cell that is not con-
Behavior, Massachusetts, September 9-13, pp. 394-
templated by the template, whereas another cell that is
401, MIT Press.
present in the template is really missing.
Eggenberger P. (1996) Cell Interactions as a Control
Tool of Developmental Processes for Evolutionary
FUTURE TRENDS Robotics. In From animals to animats: Proceedings of
the Forth International Conference on Simulation of
The model could also include new characteristics such Adaptive Behavior, Massachusetts, September 9-13,
as the displacement of cells around their environment, pp. 440-448, MIT Press.
or a specialisation operator that blocks pieces of DNA
Endo K., Maeno T. & Kitano H. (2003): Co-evolution
during the expression of its descendants, as happens
of morphology and walking pattern of biped humanoid
in the natural model.
robot using evolutionary computation -designing the
Finally, this group is currently working in one of the
real robot. ICRA 2003: 1362-1367
possible applications of this model: its use for image
compression similarly as fractal compression works. Fernndez-Blanco E., Dorado J., Rabual J.R., Gestal
The fractal compression searches the parameters of a M. & Pedreira N. (2007) A New Evolutionary Compu-
fractal formula that encodes itself the starting image. tation Technique for 2D Morphogenesis and Informa-
The present model searches the gene sequence that tion Processing. WSEAS Transactions on Information
might result in the starting image. In this way, the Science & Applications vol. 4(3) pp.600-607, WSEAS
method based on template that has been presented in Press.
this paper can be used for performing that search, using
the starting image as template. Ferreira C. (2006) Gene Expression Programming:
Mathematical Modeling by an Artificial Intelligence
Springer, Berlin.
CONClUSION Fogel, L.J., Owens, A. J. & Walsh, M.A. (1966) Artifi-
cial Intelligence through Simulated Evolution. Wiley,
Taking into account the here developed model, we New York.
can say that the use of certain properties of biological
cellular systems is feasible for the creation of artificial Goldberg, D.E. (1989). Genetics Algorithms in Search.
structures that might be used in order to solve certain Optimization and Machine Learning. Addison-Wes-
computational problems. ley.
Some behaviours of the biological model have been Holland, J.H. (1975) Adaptation in natural and ar-
also observed in the artificial model: information re- tificial systems. University of Michigan Press, Ann
dundancy in DNA, stability after achieving the desired Arbor, MA, USA.
shape, or variability in gene behaviour.
Kaneko K. (2006) Life: An Introduction to Complex
Systems Biology. Springer Complexity: Understanding
REFERENCES Complex Systems, Springer Press.
Kauffman, S.A. (1969) Metabolic stability and epi-
Bentley, P.J., Kumar, S. (1999) Three ways to grow genesis in randomly constructed genetic nets. Journal
designs: A comparation of three embryogenies for an of Theoretical Biology 22 pp. 437-467.
evolutionary design problem. In Proceedings of Genetic
and Evolutionay Computation.
381
Computer Morphogenesis in Self-Organizing Structures
Kitano, H. (1994). Evolution of Metabolism for Mor- Turing, A.(1952) The chemical basis of morphogen-
phogenesis. In Artificial Life IV, Proceedings of the esis. Philosofical Transactions of the Royal Society B,
Fourth International Workshop on the Synthesis and vol.237, pp. 37-72
Simulation of Living Systems, edited by Brooks, R.
Watson J.D. & Crick. F. H. (1953) Molecular structure
and Maes, P. MIT Press, Cambridge,MA.
of Nucleic Acids. Nature vol. 171, pp. 737738.
Koza, J. et. al.(1999). Genetic Programming III:
Darwin Invention and Problem Solving. MIT Press,
Cambridge, MA.
KEy TERMS
Kumar, S. & Bentley P.J. (editors) (2003). On Growth,
Form and Computers. Academic Press. London UK. Artificial Cell: Each of the elements that process
the orders codified into the DNA.
Kumar, S. (2004). Investigating Computational Mod-
els of Development for the Construction of Shape and Artificial Embryogeny: The term overlaps all the
Form. PhD Thesis. Department of Computer Science, processing models which use biological development
University Collage London. ideas as inspiration for its functioning.
Lindenmayer, A. (1968) Mathematical models for Cellular Cycle: Cellular development time unit
cellular interaction in development: Part I and II. which limits the ocurrents number of certain cellular
Journal of Theorical Biology. Vol. 18 pp. 280-299, development actions.
pp. 300-315.
Cytoplasm: Part of an artificial cell which is respon-
Mjolsness, E., Sharp, D.H., & Reinitz, J. (1995) A Con- sible of management the protein-shaped messages.
nectionist Model of Development. Journal of Theoretical
Biology 176: 291-300. DNA: Set of rules which are responsible of the
cell behaviour.
Stanley, K. & Miikkulainen, R. (2003) A Taxonomy for
Artificial Embryogeny. In Proceedings Artificial Life Gene: Each of the rules which codifies one action
9, pp. 93-130. MIT Press. of the cell.
Tufte, G. & Haddow, P. C. (2005) Towards Develop- Protein: This term identifies every kind of the mes-
ment on a Silicon-based Cellular Computing Machine. sages that receives an artificial cell.
Natural Computing 4 vol. 4: pp.387-416. Zygote: The initial cell from where a tissue is
generated using the DNA information.
382
383
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Computer Vision for Wave Flume Experiments
objectives, a non-intrusive method is necessary (due to Figure 1. Template to image rectification. Crosses are
the presence of objects inside the tank) and the method equidistant with a 4cm separation.
has to be able to differentiate between at least two dif-
ferent fluids, with the oil slick in view.
Others works using image analysis to measure sur-
face wave profile, have been developed over the past
ten years (e.g., Erikson and Hanson, 2005; Garca, Her-
ranz, Negro, Varela & Flores, 2003; Javidi and Psaltis,
1999; Bonmarin, Rochefort & Bourguel, 1989; Zhang,
1996), but they were developed neither with a real-
time approach nor as non-intrusive methods. In some
of these techniques it is necessary to colour the water
with a fluorescent dye (Erikson and Hanson, 2005),
which is not convenient in most cases, and especially
when two fluids must be used (Flores, Andreatta, Llona
& Saavedra, 1998).
384
Computer Vision for Wave Flume Experiments
Construction and Civil Engineering (CITEEC), in the Adjust the camera taking into account the tem-
University of A Corua, Spain. The flume section is plates marks.
77 cm (height) x 59.6 cm (width). Wave generation is Provide uniform and frontal lighting for the tem-
conducted by means of a piston-type paddle. A wave plate.
absorber is located near its end wall to prevent wave Film the template.
reflection. It consists of a perforated plate with a length Provide uniform lighting on the target plane and
of 3.04 m, which can be placed at different slopes. The a uniformly colored background on the opposite
experimental set-up is shown in fig. 2. sidewall (to block any unwanted objects from the
With the aim of validating the system, solitary waves field of view);
were generated and measured on the base of images Start filming.
recorded by a video camera mounted laterally, which
captured a flume length of 1 m. The waves were also The mark was placed horizontally on the glass
measured with one conductivity wave gauge located sidewall of the flume, on the bottom of the filmed
within the flume area recorded by the video camera. area in order to know a real distance between the bed
These gauges provide an accuracy of 1 mm at a of the tank and this mark, to avoid filming the bed of
maximum sampling frequency of 30 Hz. the tank and thus to film a smaller area (leading to a
A video camera, Sony DCR-HC35E, was used in better resolution).
turn to record the waves; it worked on the PAL Western With regard to the lighting of the laboratory it is
Europe standard, with a resolution of 720 x 576 pixels, necessary to avoid direct lighting and consequently we
recording 25 frames per second. can work without gleam and glints.
The camera was mounted on a standard tripod and To achieve this kind of lighting, all lights in the
positioned approximately 2 m from the sidewall of laboratory were turned off and two halogen lamps of
the tank (see fig. 2). It remained fixed throughout the 200W were placed on both sides of the filmed area,
duration of a test. The procedure is as follows: one in front the other (see fig. 2).
Place one mark on the glass sidewall of the flume, Video Image Post-Processing
on the bottom of the filmed area (see fig. 2);
Place a template with equidistant marks (crosses) Image capture was carried out on a PC, Pentium 4, 3.00
in a vertical plane parallel to the flume sidewall GHz and 1.00 GB de RAM memory with the Windows
(see fig 1). XP platform. Filmed with the Sony DCR-HC35E1, a
Position the camera at a distance from the target high-speed interface card, IEEE 1394 FireWireTM,
plane (i.e., tank sidewall) depending on desired was used to transfer digital data from the camcorder
resolution. to the computer, and the still images were kept in the
385
Computer Vision for Wave Flume Experiments
uncompressed bitmap format so that information would done. The comparisons are not necessarily meant to
not be lost. De-interlacing was not necessary because validate the procedure as there are inherent errors
of the quality of the obtained images and everything with conventional instruments, as well; rather, the
was done on a real-time approach. comparisons aim to justify the use of video images as
An automatic tool for measuring waves from an alternative method for measuring wave and profile
consecutive images was developed. The tool was de- change data.
veloped under .NET framework, using C++ language Different isolated measurements with conductivity
and OpenCV (Open Source Computer Vision Library, gauge were done at the same time the video camera
developed by Intel2) library. The computer vision pro- was recording. Then results from both methods were
cedure is as follows: compared.
The process followed to measure with conductivity
Extract a frame from the video. gauge and the artificial vision system at the same time
Using different computer vision algorithms get involves recognizing one point in x-axis (in the record
the constant pixel to mm for each pixel. video) where the gauge is situated (one color mark
Using different computer vision algorithms, the was pasted around the gauge to make easier this task)
crest of the wave is obtained. and after knowing the measure point of the gauge we
Work out the corresponding height, for all the create a file with the height of the wave in this x point
pixels in the crest of the wave. for each image in the video. While the video-camera is
Supply results. recording one file with gauge measure is created. Once
Repeat the process until the video finish. we have both measure files we have two determine
manually the same time point in both files (due to the
With regard to get the constant pixel to mm, a difficulty to initialize both systems at the same time).
template with equidistant marks (crosses) is placed right Now, we can compare both measurements.
up the glass sidewall of the tank and is filmed. Then a A lot of tests were done with different wave param-
C++ routine recognize de centre of the crosses. eters for wave period, wave height and using regular
(sine form) and irregular waves. Test with waves be-
Results tween 40mm and 200mm of height were done.
Using the camera DCR-HC35E, figure 3 shows
A comparison of data extracted from video images one example of a test done, where the used wave was
with data measured by conventional instruments was an irregular one, with a maximum period of 1s and a
386
Computer Vision for Wave Flume Experiments
maximum value for wave amplitude of 70mm, excel- Cheaper price than traditional systems of mea-
lent results were obtained as it can be seen in figure surement. C
4, where both measurements (conductivity sensor and Easier and faster to calibrate.
video analysis) are quite similar. It is unnecessary to mount an infrastructure to
The correlation between sensor and video image know what happens at different points of the
analysis measurements has an associated mean square tank (only one camera instead of an array of sen-
error of 0.9948. sors).
In spite of these sources of error, after several tests, As the system is a non-intrusive one, it doesnt dis-
the average error between conductivity sensor mea- tort the experiments and their measurements.
surements and video analysis is 0.8 mm with camera Provide high accuracy.
DCR-HC35E, a lot better compared with the 5 mm Finally, this system is an innovation idea of ap-
average error obtained in the best work done until plying computer vision techniques to civil engi-
this moment (Erikson and Hanson, 2005). But it isnt neering area and specifically in ports and coasts
an indicative error because of the commented source field. No similar works have been developed.
of errors taken into account in this study, however the
estimated real error from this video analysis system is
1 mm, that is to say, the equivalence between one pixel REFERENCES
and a real distance, and in our case (with the commented
video camera and distance from the tank) one pixel is Baglio, S. & Foti, E., 2003. Non-invasive measurements
equivalent to nearly 1mm. This error could be improv- to analyze sandy bed evolution under sea waves action.
able with a camera which allows a better resolution or Instrumentation and Measurement, IEEE Transactions
focusing a smaller area. on. Vol. 52, Issue: 3, pp. 762-770.
Bonmarin P., Rochefort R. & Bourguel M., 1989.
Surface wave profile measurement by image analysis,
FUTURE TRENDS
Experiments in Fluids, Vol.7, No.1, Springer Berlin /
Heidelberg, pp. 17-24.
This is the first part of a bigger system which is capable
of measuring the motions of a containment boom section Erikson, L. H. & Hanson, H., 2005. A method to extract
in the vertical axis and its slope angle (Kim, Muralid- wave tank data using video imagery and its comparison
haran, Kee, Jonson, & Seymour, 1998). Furthermore to conventional data collection technique, Computers
the system would be capable of making a distinction & Geosciences, Vol.31, pp. 371-384.
between the water and a contaminant, and thus would
identify the area occupied by each fluid. Flores, H., Andreatta, A., Llona & G., Saavedra, I., 1998.
Another challenge is to test this system in other Measurements of oil spill spreading in a wave tank using
work spaces with different light conditions (i.e., in a digital image processing, The 1998 1st International
different wave flume). Conference on Oil and Hydrocarbon Spills, Modelling,
Analysis and Control, Oil Spill, pp. 165-173.
Garca, J., Herranz, D., Negro, V., Varela & O., Flores,
CONClUSION J., 2003. Tratamiento por color y video. VII Jornadas
Espaolas de Costas y Puertos.
An artificial vision system was developed for these
targets because these systems are non-intrusive and can Haralick, R. & Shapiro L., 1992. Computer and Robot
separate a lot of different objects or fluids (anything Vision.Vol 1, Addison-Wesley Publishing Company,
that a human eye can differentiate) in the image and a Chap 5, pp 174-185.
non-intrusive method is necessary. Harris, C. & Stephens, M. J., 1988. A combined corner
Other interesting aspects that these systems provide and edge detector. In Alvey Vision Conference, pages
are: 147152.
387
Computer Vision for Wave Flume Experiments
Holland, K.T., Holman, R.A. & Sallenger, A.H., 1991. Ojanen, H., 1999. Automatic correction of lens distor-
Estimation of overwash bore velocities using video tion by using digital image processing. http://www.
techniques. In: Proceedings of the Third International iki.fi
Symposium on Coastal Engineering and the Science
Schmid, C., Mohr, R. & Bauckhage., 2000. C. Evalu-
of Coastal Sediment Processes, Seattle, WA, pp.
ation of interest point detectors. International Journal
489496.
of Computer Vision, 37(2):151-172
Hu, M. K., 1962. Visual pattern recognition by mo-
Tsai, R.Y., 1987. A versatile camera calibration tech-
ment invariants, IRE Trans. Inform. Theory, vol. 8,
nique for high-accuracy 3D machine vision metrology
pp. 179187.
using off-theshelf TV cameras and lenses. Journal of
Hudon, T., 1992. Wave tank testing of small scale boom Robotics and Automation RA (3), 323344.
sections. Private communication.
Van Rijn, Grasmeijer & Ruessink, 2000. Measurement
Hughes, 1993. Laboratory wave reflection analysis errors of instruments for velocity, wave height, sand
using co-located gages. Coastal Engineering. Vol. 20, concentration and bed levels in field conditions. Coast
no. 3-4, pp. 223-247. 3D, November.
Ibez O., Rabual J., Castro A., Dorado J., Iglesias Vernon D., 1991. Machine Vision. Prentice-Hall, pp
G, Pazos A., 2007. A framework for measuring waves 78-79.
level in a wave tank with artificial vision techniques.
Zhang, X., 1996. An algorithm for calculating water
WSEAS Transactions on Signal Processing Editorial:
surface elevations from surface gradient image data,
WSEAS Press Volume 3, Issue 1, pp 17-24.
Experiments in Fluids, Vol.21, No.1, Springer Berlin
Javidi, B. & Psaltis, D., 1999. Image sequence analysis / Heidelberg, pp. 43-48.
of water surface waves in a hydraulic wind wave tank,
Proceedings of SPIE, Vol.3804, Psaltis, pp. 148-158.
Kawata Y. & Tsuchiya Y., 1988. Local scour around KEy TERMS
cylindrical piles due to waves and currents. Proc.
21st Coast. Eng. Conf., Vol. 2, ASCE, New York, pp. Color Spaces: (Konstantinos & Anastasios, 2000)
13101322. supply a method to specify, sort and handle colors.
These representations match n-dimensional sorts of
Kim, M. H., Muralidharan, S., Kee, S. T., Jonson, R. P.
the color feelings (n-components vector). Colors are
& Seymour, R. J., 1998. Seakeeping performance of a
represented by means of points in these spaces. There
containment boom section in random waves and cur-
are lots of colors spaces and all of them start from the
rents. Ocean Engng, Vol 25, Nos. 2-3, pp. 143-172.
same concept, the Tri-chromatic theory of primary
Konstantinos N. Plataniotis & Anastasios N Venet- colors, red, green and blue.
sanopoulos, 2000. Color Image Processing and Ap-
Dilation: The dilation of an image by a structuring
plications. Springer.
element Y is defined as the maximum value of all the
Milgran, J. H., 1971. Forces and motions of a flexible pixels situated under the structuring element
floating barrier. Journal of Hydronautics 5, 41-51.
Mukundan, R. & Ramakrishman K. R., 1998. Moment ( f )( x, y ) = min f ( x + s, y + t ).
Y
Functions in Image Analysis: Theory and Application, ( s ,t )Y
388
Computer Vision for Wave Flume Experiments
grow in size while holes within those regions become or functions of those moments, usually chosen to have
smaller. some attractive property or interpretation. They are C
useful to describe objects after segmentation. Simple
Erosion: The basic effect of the operator on a bi-
properties of the image which are found via image
nary image is to reduce the definition of the objects.
moments include area (or total intensity), its centroid,
The erosion in the point (x,y) is the minimum value
and information about its orientation.
of all the points situated under the window, which is
defined by the structuring element Y that travels Morphological Operators: (Haralick and Shapiro,
around the image: 1992; Vernon, 1991) Mathematical morphology is a
set-theoretical approach to multi-dimensional digital
signal or image analysis, based on shape. The signals
Y ( f )( x, y ) = max f ( x + s, y + t ). are locally compared with so-called structuring elements
( s ,t )Y
of arbitrary shape with a reference point.
Videometrics: (Tsai, 1987) can loosely be defined
Harris Corner Detector: A popular interest point as the use of imaging technology to perform precise
detector (Harris and Stephens, 1988) due to its strong and reliable measurements of the environment.
invariance to (Schmid, Mohr, & Bauckhage, 2000):
rotation, scale, illumination variation and image
noise. The Harris corner detector is based on the local
auto-correlation function of a signal; where the local ENDNOTES
auto-correlation function measures the local changes
of the signal with patches shifted by a small amount 1
http://www.sony.es/view/ShowProduct.
in different directions. action?product=DCR-C35E&site=odw_es_ES
&pageType=Overview&category=CAM+Mini
Image Moments: (Hu, 1963; Mukundan and Ra-
DV
makrishman, 1998) they are certain particular weighted 2
http://www.intel.com/technology/computing/
averages (moments) of the image pixels intensities,
opencv/
389
390
Azzam Taktak
Royal Liverpool University Hospital, UK
Bertil Damato
Royal Liverpool University Hospital, UK
Angela Douglas
Liverpool Womens Hospital, UK
Sarah Coupland
Royal Liverpool University Hospital, UK
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Conditional Hazard Estimating Neural Networks
391
Conditional Hazard Estimating Neural Networks
Note that the prior is parametric, and the regular- The approximation thus is:
ization parameters ak (which are inverse variances)
are called hyperparameters, because they control the 1 1
p( w | D) exp ( w wMP )T A1 ( w wMP )
distribution of the network parameters. Z MP 2
Note also that the prior is specialized for different
groups of weights by using different regularization where the normalisation constant is simply evaluated
parameters for each group; this is done to preserve from usual multivariate normal formulas.
the scaling properties of network mappings (Bishop, The hyperparameters are calculated by finding the
1995), (MacKay, 1992). This prior is called Automatic maximum of the approximate evidence ZMP. Alternate
Relevance Determination (ARD). This scheme defines maximization (by using a nonlinear optimization algo-
a model whose prior over the parameters embodies the rithm) of the posterior and evidence is repeated until a
concept of relevance, so that the model is effectively self consistent solution {wMP, ak} is found.
able to infer which parameters are relevant based on the The full Bayesian treatment of inference implies that
training data and then switch the others off (or at least we do not simply get a pointwise prediction for func-
reduce their influence on the overall mapping). tions f(x,t;w) of a model output, but a full distribution.
The ARD modeling scheme in the case of the Such predictive distributions have the form:
CHENN model defines weight groups for the inputs,
the output layer weights, and the biases.
Once the expressions for the prior and the noise p( f ( x, t | D)) = f ( x, t | w) p ( w | D)dw .
model are given, we can evaluate the posterior:
The above integrals are in general not analytically
1 1 tractable, even when the posterior distribution over the
p ( w | D) = exp L T
k wk wk . parameters is Gaussian. However, it is usually enough
Z 2 k
to find the moments of the predictive distribution, in
This distribution is usually very complex and mul- particular its mean and variance. A useful approximation
timodal (reflecting the nature of the underlying error is given by the delta method. Let f(w) be the function
function, the term -L); and the determination of the (of w) we wish to approximate. By Taylor expanding
normalization factor (also called the evidence) is very to first order around wMP, we can write:
difficult. Furthermore, the hyperparameters must be
integrated out, since they are only used to determine
the form of the distributions. f ( w) f ( wMP ) + ( w wMP )T w f ( w) w= wMP .
A solution is to integrate out the parameters sepa-
rately from the hyperparameters, by making a Gaussian Since this is a linear function of w, it will still be
approximation; then, searching for the mode with re- normally distributed under the Gaussian posterior, with
spect to the hyperparameters (Bishop, 1995), (MacKay, mean and variance:
1992). This procedure gives a good estimation of the
probability mass attached to the posterior, in particular E[ f ( w)] = f ( wMP )
for distributions over high-dimensional spaces, which
is the case for large networks. Var[ f ( w)] = Tw f A1 f w .
The Gaussian approximation is in practice derived
by finding a maximum of the posterior distribution, Error bars are simply obtained by taking the square
and then evaluating the curvature of the distribution root of the variance. We emphasize that it is impor-
around the maximum: tant to evaluate first and second order information
to understand the overall quality and reliability of a
1 models predictions. Error bars also provide hints on
wMP = arg max L
w
2 k
k wkT wk,
the distribution of the patterns (Williams et al., 1995)
1 and can therefore be useful to understand whether a
A = w L + wkT wk model is extrapolating its predictions. Furthermore,
w= wMP.
k
2 k
they can offer suggestions for the collection of future
data (Williams et al., 1995; MacKay, 1992).
392
Conditional Hazard Estimating Neural Networks
A Case Study: Ocular Melanoma the modeled distributions follow the empirical estimate
cannot be rejected for years 1 to 7, whereas it is rejected C
We show now an application of the CHENN model to the if we compare the distributions up to 16.8 years; the
prognosis of all-cause mortality in ocular melanoma. null hypothesis is always rejected for the Cox model.
Intraocular melanoma occurs in a pigmented tis-
sue called the uvea, with more than 90% of tumours
involving the choroid, beneath the retina. About 50% FUTURE TRENDS
of patients die of metastatic disease, which usually
involves the liver. Neural networks are very flexible modelling tools,
Estimates for survival after treatment of uveal and in the context of survival analysis they can offer
melanoma are mostly derived and reported using Cox advantages with respect to the (usually linear) model-
analysis and Kaplan-Meier (KM) survival curves (Cox ling approaches commonly found in literature. This
& Oakes). As a semiparametric model, the Cox method, flexibility, however, comes at a cost: computational
however, usually utilizes linear relationships between time and difficulty of interpretation of the model. The
variables, and the proportionality of risks is always as- first aspect is due to the typically large number of
sumed. It is therefore worth exploring the capability of parameters which characterise moderately complex
nonlinear models, which do not make any assumptions networks, and the fact that the learning process results
about the proportionality of risks. in a nonconvex, nonlinear optimization problem.
The data used to test the model were selected from The second aspect is in some way a result of the
the database of the Liverpool Ocular Oncology Centre nonlinearity and nonconvexity of the model. Address-
(Taktak et al., 2004). The dataset was split into two parts, ing the issue of nonconvexity may be the first step to
one for training (1823 patterns), the other one for test obtain models which can be easily interpreted in terms
(781 patterns). Nine prognostic factors were used: Sex, of their parameters, and easier to train; and in this re-
Tumour margin, Largest Ultrasound Basal Diameter, spect, kernel machines (like Support Vector Machines)
Extraocular extension, Presence of epithelioid cells, might be considered as the next step in flexible non-
Presence of closed loops, Mitotic rate, Monosomy of linear modelling, although the formulation of learning
chromosome 3. algorithms for these models follows a paradigm which
The performance of survival analysis models can in is not based on likelihood functions, and therefore their
general be assessed according to their discrimination application to survival data is not immediate.
and calibration aspects. Discrimination is the ability of
the model to separate correctly the subjects into differ-
ent groups. Calibration is the degree of correspondence CONClUSION
between the estimated probability produced by the
model and the actual observed probability (Dreiseitl & This article proposes a new neural network model for
Ohno-Machado, 2002). One of the most widely used survival analysis in a continuous time setting, which
methods for assessing discrimination in survival analy- approximates the logarithm of the hazard rate function.
sis is Harrells C index (Dreiseitl & Ohno-Machado, The model formulation allows an easy derivation of error
2002), (Harrell et al. 1982), an extension to survival bars on both hazard rate and survival predictions. The
analysis of the Area Under the Receiver Operator model is trained in the Bayesian framework to increase
Characteristic (AUROC). Calibration is assessed by a its robustness and to reduce the risk of overfitting. The
Kolmogorov-Smirnov (KS) goodness-of-fit test with model has been tested on real data, to predict survival
corrections for censoring (Koziol, 1980). from intraocular melanoma.
The C index was evaluated for a set of years that are Formal discrimination and calibration tests have
of interest to applications, from 1 to 7. The minimum been performed, and the model shows good performance
was achieved at 7 years (0.75), the maximum at 1 within a time horizon of 7 years, which is found useful
year (0.8). The KS test with corrections for censoring for the application at hand.
was applied for the above set of years, and up to the This project has been funded by the Biopattern
maximum uncensored time (16.8 years); the confidence Network of Excellence FP6/2002/IST/1; proposal N.
level was set as usual at 0.05. The null hypothesis that IST-2002-508803; Project full title: Computational
393
Conditional Hazard Estimating Neural Networks
Intelligence for Biopattern Analysis is Support of Ripley, B. D. , Ripley, R. M. (1998). Neural Net-
eHealthcare; URL:www.biopattern.org works as Statistical Methods in Survival Analysis.
Artificial Neural Networks: Prospects for Medicine
(R. Dybowsky and V. Gant eds.), Landes Biosciences
REFERENCES Publishers.
Schwarzer, G. , Vach, W. , Schumacher, M. (2000). On
Bakker, B., Heskes, T. (1999). A neural-Bayesian ap-
the misuses of artificial neural networks for prognostic
proach to survival analysis. Proceedings IEE Artificial
and diagnostic classification in oncology. Statistics in
Neural Networks, pp. 832-837.
medicine 19, pp. 541-561.
Biganzoli, E., Boracchi, P., Mariani, L., Marubini, E.
Taktak, A. F. G. , Fisher, A. C. , Damato, B. (2004).
(1998). Feed forward neural networks for the analysis
Modelling survival after treatment of intraocular
of censored survival data: a partial logistic regression
melanoma using artificial neural networks and Bayes
approach. Statistics in Medicine. 17, pp. 1169-86.
theorem. Physics in Medicine and Biology. 49, pp.
Bishop, C. M. (1995). Neural networks for pattern 87-98.
recognition. Oxford University Press Inc. New York.
Williams, C. K. I. , Qazaz, C. , Bishop, C. M. , Zhu,
Cox, D. R., Oakes, D. (1984). Analysis of Survival H. (1995). On the relationship between Bayesian
Data. Chapman and Hall. error bars and the input data density. Proceedings of
the 4th International Conference on Artificial Neural
Dreiseitl, S., Ohno-Machado, L. (2002). Logistic Networks, pp. 160-165, Cambridge (UK).
regression and artificial neural network classification
models: a methodology review. Journal of Biomedical
Informatics, vol. 35, no. 5-6, pp. 352-359. KEy TERMS
Eleuteri, A. , Tagliaferri, R. , Milano, L. , De Placido,
Bayesian Inference: Inference rules which are
S. , De Laurentiis, M. (2003). A novel neural network-
based on application of Bayes theorem and the basic
based survival analysis model, Neural Networks, 16,
laws of probability calculus.
pp. 855-864.
Censoring: Mechanism which precludes observa-
Harrell Jr., F. E. , Califf, R. M. , Pryor, D. B. , Lee, K.
tion of an event. A form of missing data.
L. and Rosati, R. A. (1982). Evaluating the yield of
medical tests. Journal of the American Medical As- Hyperparameter: Parameter in a hierarchical
sociation, vol. 247, no. 18, pp. 2543-2546. problem formulation. In Bayesian inference, the pa-
rameters of a prior.
Koziol, J. (1980). Goodness-of-fit tests for randomly
censored data. Biometrika 67 (3), pp. 693-696. Neural Networks: A graphical representation of a
nonlinear function. Usually represented as a directed
Lisboa, P. J. G. , Wong, H. , Harris, P. , Swindell, R.
acyclic graph. Neural networks can be trained to find
(2003). A Bayesian neural network approach for mod-
nonlinear relationships in data, and are used in ap-
elling censored data with an application to prognosis
plications such as robotics, speech recognition, signal
after surgery for breast cancer. Artificial intelligence
processing or medical diagnosis.
in medicine, 28, pp. 1-25.
Posterior Distribution: Probabilistic representa-
MacKay, D. J. C. (1992). The evidence framework
tion of knowledge, resulting from combination of prior
applied to classification networks. Neural Computa-
knowledge and observation of data.
tion, 4 (5), pp. 720-36.
Prior Distribution: Probabilistic representation of
Neal, R. M. (2001). Survival Analysis Using a Bayes-
prior knowledge.
ian Neural Network. Joint Statistical Meetings report,
Atlanta. Random Variable: Measurable function from
a sample space to the measurable space of possible
values of the variable.
394
Conditional Hazard Estimating Neural Networks
395
396
Configuration
Luca Anselma
Universit di Torino, Italy
Diego Magro
Universit di Torino, Italy
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Configuration
parts to describe the compositional structure; taxonomies in which component, port, resource
ports to model connections and compatibilities and function types may be organized in; C
between components; constraints to specify conditions that configura-
resources that are produced, used or consumed tions must satisfy.
by components;
functions to represent functionalities; Figure 1 depicts a simplified fragment of the domain
attributes used to describe components, ports, knowledge for PC configuration. It describes all the
resources and functions; PC variants valid for the domain. Has-part relations
0#
HAS ?MON HAS ?OS REQUIRED?HD?S IZE
-ONITOR /PERATING
PRICE 3 YS TEM REQUIRED?RAM?MEM
PRICE
HAS ?K
HAS ?HD
#24 ,# $
HAS ?CD
DVD
+ EYBOARD (ARD S IZE
HAS ?MOTH PRICE # $
2 7 PRICE $IS K
$6 $
2 7 PRICE
# ONS TRAINTS
)N ANY 0 # IF THERE IS ANY 3 # 3 ) DEVICE THEN THERE MUS T BE EITHER A 3 # 3 ) -AIN 0 RINTED # IRCUIT " OARD OR
A 3 # 3 ) # ONTROLLER
)N ANY 0 # THERE MUS T BE NO MORE THAN FOUR % )$% DEVICES AND NO MORE THAN FIFTEEN 3 # 3 ) DEVICES
)N ANY 0 # THE TOTAL HARD DIS K S PACE REQUIRED BY ALL THE /PERATING 3 YS TEMS MUS T BE LES S THAN THE S IZE
OF HARD DIS KS
)N ANY 0 # THE 2 !- REQUIRED BY EACH /PERATING 3 YS TEM MUS T BE LES S THAN THE AVAILABLE 2 !- AMOUNT
)N ANY -OTHERBOARD THERE CANNOT BE BOTH A 3 # 3 ) -AIN 0 RINTED # IRCUIT " OARD AND A 3 # 3 ) # ONTROLLER
x
.OTATION
HAS ?HD
+ EYBOARD BAS IC COMPONENT TYPE HAS
PART RELATION
PRICE ATTRIBUTE
DEVICE?S CS I? PORT TYPE
PORT
EIDE?PORT DEVICE?EIDE?PORT
CONNECTION RELATION
397
Configuration
model the compositional structure of PCs (e.g., each optional (e.g., for eide_port), for others it is mandatory
PC has one or two monitors, a motherboard, etc.). (e.g., for device_eide_port). Some attributes (e.g., the
Each component of a PC can be either of a basic (non price) describe the components. A set of constraints
configurable) type (e.g., the monitor) or of an aggregate model the interactions among the components: e.g.,
(possibly configurable) type (e.g., the motherboard). the third constraint specifies that hard disks must pro-
Some relevant taxonomic relations are reported (e.g., vide enough space, which is a resource consumed by
the hard disks are either SCSI or EIDE). The basic operating systems.
components can be connected through ports (only Figure 3 describes a particular PC variant, meeting
few ports are reported): each port connects with at the requirements stated in Figure 2 (containing also an
most one other port; for some ports the connection is optimization criterion on price).
2 EQUIREMENTS
4 HE 0 # S HOULD HAVE
TWO 3 # 3 ) (ARD $IS KS AND AT LEAS T ' " OF (ARD $IS K S
AT LEAS T ' " 2 !-
A 7IRELES S + EYBOARD
THE CHEAPES T PRICE
x
Figure 3. A configured PC, compliant with the domain knowledge in Figure 1 and meeting the requirements in
Figure 2
DEVICE? PRICE
DEVICE?
EIDE?PORT EIDE?CDDVD S CS I?PORT
PRICE
PRICE
HAS ?RAM
HAS ?RAM HAS ?MPCB EIDE?PORTS
S IZE ' "
RAM? S CS I?HD
MODULE
HAS ?CPU S CS I? PRICE
RAM? MPCB DEVICE?
AMOUNT -" MODULE PRICE S CS I?PORT
PRICE CPU?PORT
.OTATION
HAS ?HD
MOTHERBOARD AGGREGATE COMPONENT HAS
PART RELATION INS TANCE
WIRELES S ? PRICE
K BAS IC COMPONENT ATTRIBUTE VALUE
EIDE?PORT DEVICE?EIDE?PORT
PORT CONNECTION RELATION INS TANCE
DEVICE?S CS I?
PORT
398
Configuration
399
Configuration
As regards logic-based frameworks, (McGuinness, for conceptual descriptions, and adapts some constraint-
2002) analyzes Description Logics (DL) (Baader, propagation techniques to the logical framework.
Calvanese, McGuinness, Nardi & Patel-Schneider, In most formalizations, the configuration task is
2003) as a convenient modeling tool for configurable theoretically intractable (at least NP-hard, in the worst
products. DL make possible a description of the configu- case) and in some cases the intractability does appear
ration knowledge by means of expressive conceptual also in practice and solving configuration problems
languages with a rigorous semantics, thus enhancing can require a huge amount of CPU time. There are
knowledge comprehensibility and facilitating knowl- several ways that can be explored to cope with these
edge re-use. Furthermore, the powerful inference situations: providing the configurator with a set of
mechanisms currently available can be exploited both domain-specific heuristics, defining general focusing
off-line by the knowledge engineers and on-line by the mechanisms (Magro & Torasso, 2001), making use of
configuration system. Moreover, the paper describes a compilation techniques (Sinz, 2002) (Narodytska &
commercial DL-based family of configurators devel- Walsh, 2006), re-using past solutions (Geneste & Ruet,
oped by AT&T. 2002), defining techniques to decompose a problem
(Soininen, Niemel, Tiihonen & Sulonen, 2000) into a set of simpler subproblems (Magro, Torasso &
describes an approach in which the domain knowledge Anselma, 2002) (Anselma & Magro, 2003).
is represented with a high-level language and then Configuration has a growing commercial market. In
mapped to a set of weight constraint rules, a form of fact, several configurator systems have been developed
logic programs offering support for expressing choices and some commercial tools are currently available (e.g.,
and both cardinality and resource constraints. Con- ILOG (Junker & Mailharro, 2003), Koalog, OfferIt!
figurations are computed by finding stable Herbrand (Bergenti, 2004), Oracle, SAP (Haag, 2005), TACTON
models of such a logic program. (Orsvarn, 2005)).
(Sinz, Kaiser & Kchlin, 2003) presents an approach Furthermore, some Web sites have been equipped
particularly geared to industrial context (in fact, it has with configuration capabilities to support custom-
been developed to be used by DaimlerChrysler for ers in selecting a suitable product in a wide range of
the configuration of their Mercedes lines). In Sinz et domains such as cars (e.g., Porsche, Renault, Volvo),
al.s approach the domain knowledge is expressed as bikes (e.g., Pro-M Bike Configurator) and computers
formulae in propositional logic; then, it is validated by (e.g., Dell, Cisco).
running a satisfiability checker, which can also provide
explanations in case of failure. However, this work aims
at validating the knowledge base, rather than solving FUTURE TRENDS
configuration problems.
There are also hybrid approaches that recon- Many current configuration approaches and software
cile constraint-based frameworks and logic-based configuration systems concern the configuration of
frameworks. For example, both (Magro & Torasso, mechanical or electronic devices/products and are
2003) and (Junker & Mailharro, 2003) describe hybrid conceived in order to be employed by domain experts,
frameworks based on a logic-based description of the such as production or sales engineers.
structure of the configurable product (taking inspira- Nowadays, the scope of configuration is growing and
tion from logical languages derived from frame-based the application of automatic configuration techniques
languages such as the DL) and on a constraint-based to non-physical entities is gaining more and more
description of the possible ways of interaction between importance. The configuration of software products
components. and complex services built on simpler ones are two
In (Junker & Mailharro, 2003) constructs of DL are research areas and application domains that are cur-
translated into concepts of constraint programming in rently attracting the attention of researchers.
order to solve a configuration problem. On the con- The capability of producing understandable explana-
trary, (Magro & Torasso, 2003) adopts an inference tions for their choices or for the inconsistencies that they
mechanism specific for configuration, which, basically, encounter and of suggesting consistency restorations
searches for tree-structured models on finite domains are some needs that configuration systems share with
many knowledge-based or expert systems. However,
400
Configuration
the aim of making configuration systems profitably tion domains, ranging from cars to computer systems,
used by non-expert users too and the deployment of from software to travel plans. C
configurators on the Web all contribute to strengthen Theoretical results achieved by the academic en-
the importance of these issues. vironment have found effective, tangible applications
Explanations and restorations are also related to the in industrial settings, thus contributing to the diffusion
topic of interactive configuration, which is still posing of both industrial and commercial configurators. Such
some challenging problems to researchers. Indeed, applications in their turn gave rise to new challenges,
besides these capabilities, interactive configuration engendering a significant cross-fertilization of ideas
requires also effective mechanisms to deal with incom- among researchers in academia and in industry.
plete and/or incremental requirements specification
(and with their retraction) and it is also demanding in
terms of efficiency of the algorithms. REFERENCES
Real-world configuration knowledge bases can be
very large and they usually are continually modified dur- Aldanondo, M., Moynard, G., & Hamou, K. H. (2000).
ing their life cycle. Some research efforts are currently General configurator requirements and modeling
devoted to define powerful techniques and to design and elements, Proc. ECAI 2000 Configuration Workshop,
implement tools that support knowledge acquisition, 1-6.
knowledge-base verification and maintenance.
Furthermore, a closer integration of configuration Amilhastre, J., Fargier, H., & Marquis, P. (2002). Con-
into the business models and of configurators into sistency restoration and explanations in dynamic CSPs
enterprise software systems is an important goal for Application to configuration, Artificial Intelligence
several companies (as well as for enterprise software 135(1-2), 199-234.
providers). Anselma, L., & Magro, D. (2003). Dynamic problem
Distributed configuration is another important topic, decomposition in configuration, Proc. IJCAI 2003
especially in an environment where a specific complex Configuration Workshop, 21-26.
product/service is provided by different suppliers that
have to cooperate in order to produce it. Baader, F., Calvanese, D., McGuinness, D., Nardi, D.,
Finally, it is worth mentioning the reconfiguration & Patel-Schneider, P. (Editors) (2003). The Description
of existing systems, which is still mainly an open Logic Handbook. Cambridge University Press.
problem. Bergenti, F. (2004). Product and Servive Configura-
tion for the Masses. Proc. ECAI 2004 Configuration
Workshop, 7/1-7/6.
CONClUSION
Dechter, R. (2003). Constraint Processing, Morgan
Configuration has been a prominent area of Artificial Kaufmann.
Intelligence since the early Eighties, when it started to Freuder, E., Carchrae, T., & Beck, J.C. (2003). Sa-
arouse interest among researchers working in academia tisfaction Guaranteed. Proc. IJCAI-03 Configuration
and industry. Workshop, 1-6.
This article provides a general overview of the area
of configuration by introducing the problem of configu- Freuder, E., Likitvivatanavong, C., & Wallace, R.
ration, briefly presenting a general conceptualization (2001). Explanation and implication for configuration
of configuration tasks, and succinctly describing some problems, Proc. IJCAI-01 Configuration Workshop,
representative proposals in literature to deal with con- 31-37.
figuration problems. Freuder, E., & OSullivan, B. (2001). Modeling and Ge-
As we have illustrated, during the last few years nerating Tradeoffs for Constraint-Based Configuration.
several approaches involving configuration techniques Proc. IJCAI-01 Configuration Workshop, 38-44.
have been successfully applied in order to deal with
issues pertaining to a wide range of real-world applica- Gelle, E., & Sabin, M. (2003). Solving Methods for
Conditional Constraint Satisfaction, Proc. IJCAI-03
Configuration Workshop, 7-12.
401
Configuration
Gelle, E., & Sabin, M. (2006). Direct and Reformu- Orsvarn, K. (2005). Tacton Configurator Research
lation Solving of Conditional Constraint Satisfaction directions, Proc. IJCAI 2005 Configuration Workshop,
Problems . Proc. ECAI-06 Configuration Workshop, 75.
14-19.
Sabin, D., & Freuder, E.C. (1996). Configuration as
Geneste, L., & Ruet, M. (2002), Fuzzy Case Ba- Composite Constraint Satisfaction, Proc. Artificial
sed Configuration, Proc. ECAI 2002 Configuration Intelligence and Manufacturing. Research Planning
Workshop, 71-76. Workshop, 153-161.
Haag, A. (2005). Dealing with Configurable Products Sinz, C. (2002). Knowledge Compilation for Pro-
in the SAP Business Suite. Proc. IJCAI-05 Configura- duct Configuration, Proc. ECAI 2002 Configuration
tion Workshop, 68-71. Workshop, 23-26.
Junker, U., & Mailharro, D. (2003). The Logic of ILOG Sinz, C., Kaiser, A., & Kchlin, W. (2003). Formal
(J)Configurator: Combining Constraint Programming methods for the validation of automotive product con-
with a Description Logic. Proc. IJCAI-03 Configuration figuration data, Artificial Intelligence for Engineering
Workshop, 13-20. Design, Analysis and Manufacturing, Special Issue on
Configuration 17(1), 75-97.
Magro, D., & Torasso, P. (2001). Interactive Configura-
tion Capability in a Sale Support System: Laziness and Soininen, T., Gelle, E., & Niemel, I. (1999). A Fixpoint
Focusing Mechanisms, Proc. IJCAI-01 Configuration Definition of Dynamic Constraint Satisfaction, Lecture
Workshop, 57-63. Notes in Computer Science 1713, 419-433.
Magro, D., Torasso, P., & Anselma, L. (2002). Problem Soininen, T., Niemel, I., Tiihonen, J., & Sulonen, R.
Decomposition in Configuration, Proc. ECAI 2002 (2000). Unified Configuration Knowledge Representa-
Configuration Workshop, 50-55. tion Using Weight Constraint Rules, Proc. ECAI 2000
Configuration Workshop, 79-84.
Magro, D., & Torasso, P. (2003). Decomposition Strate-
gies for Configuration Problems, Artificial Intelligence Soininen, T., Tiihonen, J., Mnnist, T., & Sulonen, R.
for Engineering Design, Analysis and Manufacturing, (1998). Towards a General Ontology of Configuration,
Special Issue on Configuration 17(1), 51-73. Artificial Intelligence for Engineering Design, Analysis
and Manufacturing 12(4), 383-397.
McDermott, J. (1982). R1: A Rule-Based Configurer of
Computer Systems. Artificial Intelligence 19, 39-88. Stumptner, M., & Haselbck, A. (1993). A Generative
Constraint Formalism for Configuration Problems.
McDermott, J. (1993). R1 (XCON) at age 12: les-
Lecture Notes in Artificial Intelligence 728, 302-313.
sons from an elementary school achiever. Artificial
Intelligence 59, 241-247.
McGuinness, D.L. (2002). Configuration. In Baader,
KEy TERMS
F., McGuinness, D.L., Nardi, D. & Patel-Schneider,
P.F. (Editors). The Description Logic Handbook.
Constraint Satisfaction Problem (CSP): A CSP is
Theory, implementation, and applications. Cambridge
defined by a finite set of variables, where each variable is
University Press.
associated with a domain, and a set of constraints over a
Mittal, S., & Falkenhainer, B. (1990). Dynamic subset of variables, restricting the possible combinations
Constraint Satisfaction Problems, Proc. of the AAAI of values that the variables in the subset may assume.
90, 25-32. A solution of a CSP is an assignment of a value to each
variable that is consistent with the constraints.
Narodytska, N., & Walsh, T. (2006). Constraint and
Variable Ordering Heuristics for Compiling Confi- Description Logics (DL): Logics that are designed
guration Problems, Proc. ECAI 2006 Configuration to describe concepts and individuals in knowledge
Workshop, 2-7. bases. They were initially developed to provide a precise
semantics for the frame systems and the semantic net-
402
Configuration
works. The typical inference for concepts is checking if Propositional Logic Formula Satisfiability: The
a concept is more general than (i.e., subsumes) another task of checking whether it is possible to assign a truth C
one. The typical inference for individuals is checking value to every variable that occurs in a propositional
if an individual is an instance of a concept. Many DL formula, such that the truth value of the whole formula
are fragments of first-order logic, while some of them equals true.
go beyond first order.
Stable Herbrand Model: A minimal set of facts
Logic Program: A logic theory (possibly contain- satisfying a logic program (theory). Each fact in the
ing some extra-logic operators) that can be given a model is a variable-free atom whose arguments are
procedural meaning such that the process of checking terms exclusively built through function and constant
if a formula is derivable in the theory can be viewed symbols occurring in the program and whose predicate
as a program execution. symbols occur in the program as well. Facts not ap-
pearing in the model are regarded as false.
Mass Customization: A business strategy that
combines the mass production paradigm with product
personalization. It is closely related to the modularity
in product design. This design strategy makes it pos-
sible to adopt the mass production model for standard
modules, facilitates the management of product families
and variants and it leaves room for (various kinds and
degrees of) personalization.
Production-Rule-Based System: A system where
knowledge is represented by means of production rules.
A production rule is a statement composed of condi-
tions and actions. If data in working memory satisfy
the conditions, the related actions can be executed,
resulting in an update of the working memory.
403
404
Constraint Processing
Roman Bartk
Charles University in Prague, Czech Republic
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Constraint Processing
ment that minimizes (or maximizes) the value of the the constraints whose scope is already instantiated are
objective function. This article focuses on techniques satisfied. In the positive case, we proceed to the next C
for finding a feasible assignment but these techniques variable. In the negative case, we try another value
can be naturally extended to optimization problems for the current variable or if there are no more values
via a well-known branch-and-bound technique (Van we backtrack to the last instantiated variable and try
Hentenryck, 1989). alternative values there. The following code shows the
There are several comprehensive sources of informa- skeleton of this procedure called historically labelling
tion about constraint satisfaction starting from journal (Waltz, 1975). Notice that the consistency check may
surveys (Kumar, 1992) (Jaffar & Maher, 1996) through prune domains of individual variables, which will be
on-line tutorials (Bartk, 1998) till several books. Van discussed in the next section.
Hentenrycks book (1989) was a pioneering work
showing constraint satisfaction in the context of logic procedure labelling(V,D,C)
if all variables from V are assigned then return V
programming. Later Tsangs book (1993) focuses on
select not-yet assigned variable x from V
constraint satisfaction techniques independently of the for each value v from Dx do
programming framework and it provides full technical (TestOK,D) consistent(V,D,C{x=v})
details of most algorithms described later in this article. if TestOK=true then
Recent books cover both theoretical (Apt, 2003) and R labelling(V,D,C)
if R fail then return R
practical aspects (Marriott & Stuckey, 1998), provide end for
good teaching material (Dechter, 2003) or in-depth return fail
surveys of individual topics (Rossi et al., 2006). We end labelling
should not forget about books showing how constraint
satisfaction technology is applied in particular areas; The above backtracking mechanism is parameter-
scheduling problems play a prominent role here ized by variable and value selection heuristics that
(Baptiste et al., 2001) because constraint processing decide about the order of variables for instantiation
is exceptionally successful in this area. and about the order in which the values are tried.
While value ordering is usually problem dependent
and problem-independent heuristics are not frequently
CONSTRAINT SATISFACTION used due to their computational complexity, there are
TECHNIQUES popular problem-independent variable ordering heuris-
tics. Variable ordering is based on a so called first-fail
Constraint satisfaction problems over finite domains principle formulated by Haralick and Eliot (1980)
are basically combinatorial problems so they can be which says that the variable whose instantiation will
solved by exploring the space of possible (partial or lead to a failure with the highest probability should
complete) instantiations of decision variables. Later in be tried first. A typical instance of this principle is a
this section we will present the typical search algorithms dom heuristic which prefers variables with the small-
used in constraint processing. However, it should be est domain for instantiation. There exist other popular
highlighted that constraint processing is not simple variable ordering heuristics (Rossi et al., 2006) such
enumeration and we will also show how so called as dom+deg or dom/deg, but their detail description is
consistency techniques contribute to solving CSPs. out scope of this short article.
Though the heuristics influence (positively) effi-
Systematic Search ciency of search they cannot resolve all drawbacks of
backtracking. Probably the main drawback is ignoring
Search is a core technology of artificial intelligence and the information about the reason of constraint infeasi-
many search algorithms have been developed to solve bility. If the algorithm discovers that no value can be
various problems. In case of constraint processing we assigned to a variable, it blindly backtracks to the last
are searching for a feasible assignment of values to vari- instantiated variable though the reason of the conflict
ables where the feasibility is defined by the constraints. may be elsewhere. There exist techniques like back-
This can be done in a backtracking manner where we jumping that can detect the variable whose instantiation
assign a value to a selected variable and check whether caused the problem and backtrack (backjump) to this
405
Constraint Processing
variable (Dechter, 2003). These techniques belong to We did not cover the details of the filtering procedure
a broader class of intelligent backtracking that shares here. In the simplest way, it may explore the consistent
the idea of intelligent recovery from the infeasibility. tuples in the constraint to find a support for each value.
Though these techniques are interesting and far be- There exist more advanced techniques that keep some
yond simple enumeration, it seems better to prevent information between the repeated calls to the filter and
infeasibility rather than to recover from it (even in an hence achieve better time efficiency (Bessiere, 1994).
intelligent way). Frequently, the filtering procedure exploits semantics
of the constraint to realise filtering faster. For example,
Domain Filtering and Maintaining filtering for constraint A < B can be realised by remov-
Consistency ing from the domain of A all values greater than the
maximal value of B (and similarly for B).
Assume variables A and B with domain {1, 2, 3} and Let us return our attention back to search. Even if
a simple constraint A < B. Clearly, value 3 can never we make all the constraints consistent, it does not mean
be assigned to A because there is no way to satisfy the that we obtained a solution. For example, the problem
constraint A < B if this value is used for A. Hence, ({A, B, C}, {1, 2, 3}, {A B, B C}) is consistent
this value can be safely removed from the domain of in the above-described sense, but it has no solution.
variable A and it does not need to be assumed during Hence consistency techniques need to be combined
search. Similarly, value 1 can be removed from the with backtracking search to obtain a complete constraint
domain of B. This process is called domain filtering solver. First, we make the constraints consistent. Then
and it is realised by a special procedure assigned to we start the backtracking search as described in the
each constraint. Domain filtering is closely related to previous section and after each variable instantiation,
consistency of the constraint. We say that constraint C we make the constraints consistent again. It may hap-
is (arc) consistent if for any value x in the domain of pen that during the consistency procedure some domain
any variable in the scope of C there exist values in the becomes empty. This indicates inconsistency and we
domains of other variables in the scope of C such that can backtrack immediately. Because the consistency
the value tuple satisfies C. Such value tuple is called procedure removes inconsistencies from the not yet
a support for x. Domain filtering attempts to make the instantiated variables, it prevents future conflicts dur-
constraint consistent by removing values which have ing search. Hence this principle is called look ahead
no support. opposite to look back techniques that focus on recovery
Domain filtering can be applied to all constraints from discovered conflicts. The whole process is also
in the problem to remove unsupported values from the called maintaining consistency during search and it
domains of variables and to make the whole problem can be realised by substituting the consistent procedure
consistent. Because the constraints are interconnected, in labelling by the procedure AC-3. Figure 1 shows
it may be necessary to repeat the domain filtering of a a difference between simple backtracking (top) and
constraint C if another constraint pruned the domain of the look-ahead technique (bottom) when solving a
variable in the scope of C. Basically the domain filtering well known 4-queens problem. The task is to allocate
is repeated until a fixed point is reached which removes a queen to each column of the chessboard in such a
the largest number of unsupported values. There exist way that no two queens attack each other. Notice that
several procedures to realise this idea (Mackworth, the look-ahead solved the method after four attempts
1977), AC-3 schema is the most popular one: while the simple backtracking is still allocating the
first two queens.
procedure AC-3(V,D,C) Clearly, the more inconsistencies one can remove,
Q C the smaller search tree needs to be explored. There exist
while non-empty Q do
select c from Q
stronger consistency techniques that assume several
D c.FILTER(D) constraints together (rather that filtering each constraint
if any domain in D is empty then return (fail,D) separately, as we described above), but they are usually
Q Q {cC | xvar(c) DxDx} {c} too computationally expensive and hence they are not
D D
used in each node of the search tree. Nevertheless, there
end while
return (true,D) also exists a compromise between stronger and efficient
end AC-3
406
Constraint Processing
Figure 1. Solving 4-queens problem using backtracking (top) and look-ahead (bottom) techniques; the crosses
indicate positions forbidden by the current allocation of queens in the look-ahead method (values pruned by C
AC).
Figure 2. A graph representation of a constraint satisfaction problem with binary inequalities (left) and a bipartite
graph representing the same problem in the all-different constraint (right).
X1 in [5,6]
X1X3 X1 5
X1X2
X3 in [5,6,7] X2 6
X2 in [5,6] X3 7
X2X3
407
Constraint Processing
408
Constraint Processing
Marriott, K. & Stuckey, P.J. (1998). Programming Consistency Techniques: Techniques that remove
with Constraints: An Introduction. The MIT Press, inconsistent values (from variables domains) or value C
Cambridge, MA. tuples, that is, the values that cannot be assigned to a
given variable in any solution. Arc consistency is the
Mohr, R. & Henderson, T.C. (1986). Arc and path con-
most widely used consistency technique.
sistency revised. Artificial Intelligence 28:225233.
Decision Variable: A variable modelling some
Montanari, U. (1974). Networks of constraints: Funda-
feature of the problem, for example a start time of
mental properties and applications to picture processing.
activity, whose value we are looking for in such a way
Information Sciences 7:95132.
that specified constraints are satisfied.
Rgin, J.-C. (1994). A filtering algorithm for constraints
Domain of Variable: A set of possible values that
of difference in CSPs. In Proceedings of the National
can be assigned to a decision variable, for example a
Conference on Artificial Intelligence (AAAI-94). AAAI
set of times when some activity can start. Constraint
Press, pp. 362367.
processing usually assumes finite domains only.
Rossi, F.; Van Beek, P.; Walsh, T. (2006). Handbook of
Domain Pruning (Filtering): A process of remov-
Constraint Programming. Elsevier, Oxford.
ing values from domains of variables that cannot take
Tsang, E. (1993). Foundations of Constraint Satisfac- part in any solution. Usually, due to efficiency issues
tion. Academic Press, London. only the values locally violating some constraint are
pruned. It is the most common type of consistency
Van Hentenryck, P. (1989). Constraint Satisfaction technique.
in Logic Programming. The MIT Press, Cambridge,
MA. Global Constraint: An n-ary constraint model-
ling a subset of simpler constraints by providing a
Waltz, D.L. (1975). Understanding line drawings of dedicated filtering algorithm that achieves stronger or
scenes with shadows. In Psychology of Computer Vi- faster domain pruning in comparison to making the
sion, McGraw-Hill, New York, pp. 1991. simpler constraints (locally) consistent. All-different
is an example of a global constraint.
Look Ahead: The most common technique for inte-
KEy TERMS grating depth-first search with maintaining consistency.
Each time a search decision is done, it is propagated in
Constrained Optimisation Problem (COP): A the problem model by making the model consistent.
Constraint Satisfaction Problem extended by an objec-
tive function over the (subset of) decision variables. The Search Algorithms: Algorithms that explore the
task is to find a solution to the CSP which minimizes space of possible (partial or complete) instantiations
or maximizes the value of the objective function. of decision variables with the goal to find an instantia-
tion satisfying all the constraints (and optimizing the
Constraint: Any relation between a subset of deci- objective function in case of COP).
sion variables. Can be expressed extensionally, as a set
of value tuples satisfying the constraint, or intentionally,
using an arithmetical or logical formula between the
variables, for example A+B < C.
Constraint Satisfaction Problem (CSP): A prob-
lem formulated using a set of decision variables, their
domains, and constraints between the variables. The
task is to find an instantiation of decision variables by
values from their domains in such a way that all the
constraints are satisfied.
409
410
INTRODUCTION BACKGROUND
The effective capacity of inter-urban motorway net- Traditionally, there has been a wide variety of fore-
works is an essential component of traffic control and casting approaches applied to forecast the traffic flow
information systems, particularly during periods of of inter-urban motorway networks. Those approaches
daily peak flow. However, slightly inaccurate capacity could be classified according to the type of data, fore-
predictions can lead to congestion that has huge social cast horizon, and potential end-use (Dougherty, 1996);
costs in terms of travel time, fuel costs and environment including historical profiling (Okutani & Stephanedes,
pollution. Therefore, accurate forecasting of the traffic 1984), state space models (Stathopoulos & Karlafits,
flow during peak periods could possibly avoid or at 2003), Kalman filters (Whittaker, Garside & Lindveld,
least reduce congestion. Additionally, accurate traffic 1994), and system identification models (Vythoulkas,
forecasting can prevent the traffic congestion as well 1993). However, traffic flow data are in the form of
as reduce travel time, fuel costs and pollution. spatial time series and are collected at specific locations
However, the information of inter-urban traffic pres- at constant intervals of time. The above-mentioned
ents a challenging situation; thus, the traffic flow fore- studies and their empirical results have indicated that
casting involves a rather complex nonlinear data pattern the problem of forecasting inter-urban motorway traf-
and unforeseen physical factors associated with road fic flow is multi-dimensional, including relationships
traffic situations. Artificial neural networks (ANNs) are among measurements made at different times and
attracting attention to forecast traffic flow due to their geographical sites. In addition, these methods have
general nonlinear mapping capabilities of forecasting. difficultly coping with observation noise and missing
Unlike most conventional neural network models, values while modeling. Therefore, Danech-Pajouh and
which are based on the empirical risk minimization Aron (1991) employed a layered statistical approach
principle, support vector regression (SVR) applies the with a mathematical clustering technique to group the
structural risk minimization principle to minimize an traffic flow data and a separately tuned linear regression
upper bound of the generalization error, rather than model for each cluster. Based on the multi-dimensional
minimizing the training errors. SVR has been used to pattern recognition requests, such as intervals of time
deal with nonlinear regression and time series prob- and geographical sites, non-parametric regression
lems. This investigation presents a short-term traffic models (Smith, Williams & Oswald, 2002) have also
forecasting model which combines SVR model with successfully been employed to forecast motorway traf-
continuous ant colony optimization (SVRCACO), to fic flow. The ARIMA model and extended models are
forecast inter-urban traffic flow. A numerical example of the most popular approaches in traffic flow forecasting
traffic flow values from northern Taiwan is employed to (Kamarianakis & Prastacos, 2005) (Smith et al., 2002).
elucidate the forecasting performance of the proposed Due to the stochastic nature and the strongly nonlinear
model. The simulation results indicate that the proposed characteristics of inter-urban traffic flow data, the arti-
model yields more accurate forecasting results than ficial neural networks (ANNs) models have received
the seasonal autoregressive integrated moving average much attention and been considered as alternatives for
(SARIMA) time-series model. traffic flow forecasting models (Ledoux, 1997) (Yin,
Wong, Xu & Wong, 2002). However, the training pro-
cedure of ANNs models is not only time consuming
but also possible to get trapped in local minima and
subjectively in selecting the model architecture.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Continuous ACO in a SVR Traffic Forecasting Model
411
Continuous ACO in a SVR Traffic Forecasting Model
i +
Hence, the regression function is Eq. (6): The probability, Pk(i,j), that an ant k moves from city
i to city j is expressed as Eq. (7),
N
g ( x, , *
) = ( *
) K ( x, x i ) + b (6)
i =1
i i
Pk (i, j ) = S M k
{
arg max [ (i, S )] [ (i, S )] } , if q q 0
Here, K(xi, xj) is called the Kernel function. The Eq.(9) , otherwise
value of the Kernel equals the inner product of two (7)
vectors, xi and xj, in the feature space f(xi) and f(xj);
that is, K(xi, xj) = f(xi) * f(xj). The Gaussian RBF ker- [ (i, j )] [ (i, j )] [ (i, S )] [ (i, S )] , jMk
nel is not only easier to implement, but also capable Pk (i, j ) = S M k
0 , otherwise
to nonlinearly map the training data into an infinite
dimensional space, thus, it is suitable to deal with
nonlinear relationship problems. In this work, the (8)
2
Gaussian function, exp(- x - xi 2 2 ), is used in the
SVR model. where t(i,j) is the pheromone level between city i and
city j, (i,j) is the inverse of the distance between cities
CACO in Selecting Parameters of the i and j. In this study, the forecasting error represents
the distance between cities. The a and b are parameters
SVR Model
determining the relative importance of pheromone level
and Mk is a set of cities in the next column of the city
Ant colony optimization algorithms (Dorigo, 1992)
matrix for ant k. q is a random uniform variable [0,1]
have been successfully used to dealing with combinato-
and the value q0 is a parameter. The values of a, b and
rial optimization problems such as job-shop scheduling
q0 are set to be 8, 5 and 0.2 respectively.
(Colorni, Dorigo, Maniezzo & Trubian, 1994), travel-
Once ants have completed their tours, the most
ing salesman problem (Dorigo & Gambardella, 1997),
pheromone deposited by ants on the visited paths
space-planning (Bland, 1999), quadratic assignment
is considered as the information regarding the best
problems (Maniezzo & Colorni, 1999), and data mining
paths from the nest to the food sources. Therefore, the
(Parpinelli, Lopes & Freitas, 2002). ACO imitates the
pheromone dynamic updating plays the main role in
behaviors of real ant colonies as they forage for food,
real ant colonies searching behaviors. The local and
wherein each ant lays down the pheromone on the path
global updating rules of pheromone are expressed as
to the food sources or back to the nest. The paths with
Eq.(9) and Eq(10) respectively.
more pheromone are more likely to be selected by
(i, j ) = (1 - ) (i, j ) + 0 (9)
other ants. Over time, a colony of ants will select the
shortest path to the food source and back to the nest. (i, j ) = (1 - ) (i, j ) + (i, j ) (10)
Therefore, a pheromone trail is the most important
process for individual ant to smell and select its route.
412
Continuous ACO in a SVR Traffic Forecasting Model
where r is the local evaporation rate of pheromone, 0 < where n is the number of forecasting periods; ai is the
r < 1; t0 is the initial amount of pheromone deposited actual traffic flow value at period i; and fi is the fore- C
on each of the paths. In this work, the values of r and t0 casting traffic flow value at period i.
are set to be 0.01 and 1 correspondingly. In addition, the The parameter selection of forecasting models is
approach proposed by Dorigo and Gambardella (1994) important for obtaining good forecasting performance.
was employed here for generating the initial amount For the SARIMA model, the parameters are determined
of pheromone. Global trail updating is accomplished by taking the first-order regular difference and first
according Eq.(10). The d is the global pheromone decay seasonal difference to remove non-stationary and sea-
parameter, 0 < d < 1, and set to be 0.2 for this study. sonality characteristics. Using statistical packages, with
The (i, j ), expressed as Eq.(11), is used to increase no residuals autocorrelated and approximately white
the pheromone on the path of the solution. noise residuals, the most suitable models for these two
morning/evening peak periods for the traffic data are
1 L , if (i, j ) global best route
(i, j ) = SARIMA(1,0,1) (0,1,1) 5 with non-constant item and
0 , otherwise
SARIMA(1,0,1) (1,1,1) 5 with constant item, respec-
(11) tively. The equations used for the SARIMA models
are presented as Eqs. (13) and (14), respectively.
where L is the length of the shortest route.
(1 0.5167 B)(1 B 5 ) X t = (1 + 0.3306 B)(1 0.9359 B 5 ) t
A Numerical Example and Experimental (13)
Results
(1 0.5918 B)(1 B 5 ) X t = 2.305 + (1 0.9003B 5 ) t
The traffic flow data sets were originated from three (14)
Civil Motorway detector sites. The Civil Motorway is
the busiest inter-urban motorway networks in Panchiao For the SVRCACO model, a rolling-based forecast-
city, the capital of Taipei County, Taiwan. The major ing procedure was conducted and a one-hour-ahead
site was located at the center of Panchiao City, where forecasting policy adopted. Then, several types of
the flow intersects an urban local street system, and data-rolling are considered to forecast traffic flow
it provided one way traffic volume for each hour in in the next hour. In this investigation, the CACO is
weekdays. Therefore, one way flow data for peak traffic employed to determine suitable combination of the
are employed in this investigation, which includes the three parameters in a SVR model. Parameters of the
morning peak period (MPP; from 6:00 to 10:00) and SVRCACO models with the minimum testing NRMSE
the evening peak period (EPP; from 16:00 to 20:00). values were selected as the most suitable model for this
The data collection is conducted from February 2005 to investigation. Table 1 indicates that SVRCACO models
March 2005, the number of traffic flow data available perform the best when 15 and 35 input data are used for
for MPP and EPP are 45 and 90 hours, respectively. morning/evening traffic forecast respectively. Table 2
For convenience, the traffic flow data are converted to compares the forecasting accuracy of the SARIMA and
equivalent of passengers (EOP), and both of these two SVRCACO models in terms of NRMSE. It is illustrated
peak periods show the seasonality of traffic data. In that SVRCACO models have better forecasting results
addition, traffic flow data are divided into three parts: than the SARIMA models.
training data (MPP 25 hours; EPP 60 hours), valida-
tion data (MPP 10 hours; EPP 15 hours) and testing
data (MPP 10 hours; EPP 15 hours). The accuracy of
FUTURE TRENDS
forecasting models is measured by the normalized root
mean square error (NRMSE), as given by Eq.(12).
In this investigation, the SVRCACO model provides
a convenient and valid alternative for traffic flow fore-
n n
casting. The SVRCACO model directly uses historical
NRMSE = (a i fi )2 a 2
i
observations from traffic control systems and then
i =1 i =1 (12)
determines suitable parameters by efficient optimiza-
413
Continuous ACO in a SVR Traffic Forecasting Model
414
Continuous ACO in a SVR Traffic Forecasting Model
tion algorithms. In future research, other factors and Forecasting. Recherche Transports Scurit (English
meteorological control variables during peak periods, Issue). (6) 11-16. C
such as driving speed limitation, important social
5. Dorigo, M. (1992). Optimization, Learning, and
events, the percentage of heavy vehicles, bottleneck
Natural Algorithms. Ph.D. Thesis, Dip. Elettronica e
service level and waiting time during intersection
Informazione, Politecnico di Milano, Italy.
traffic signals can be included in the traffic forecasting
model. In addition, some other advanced optimization 6. Dorigo, M., & Gambardella, L. (1997). Ant Colony
algorithms for parameters selection can be applied for System: A cooperative Learning Approach to The
the SVR model to satisfy the requirement of real-time Traveling Salesman Problem. IEEE Transactions on
traffic control systems. Evolutionary Computation. (1) 53-66.
7. Dougherty, M. S. (1996). Investigation of Network
Performance Prediction Literature Review. Technical
CONClUSION
Note, Vol. 394, Institute for Transport Studies, Uni-
versity of Leeds.
Accurate traffic forecast is crucial for the inter-urban
traffic control system, particularly for avoiding con- 8. Hong, W. C., & Pai, P. F. (2006). Predicting engine
gestion and for increasing efficiency of limited traffic reliability by support vector machines. International
resources during peak periods. The historical traffic data Journal of Advanced Manufacturing Technology. (28)
of Panchiao City in northern Taiwan shows a seasonal 154-161.
fluctuation trend which occurs in many inter-urban
traffic systems. Therefore, over-prediction or under-pre- 9. Hong, W. C., & Pai, P. F. (2007). Potential assess-
diction of traffic flow influences the transportation ca- ment of the support vector regression technique in
pability of an inter-urban system. This study introduces rainfall forecasting. Water Resources Management.
the application of forecasting techniques, SVRCACO, (21) 495-513.
to investigate its feasibility for forecasting inter-urban 10. Kamarianakis, Y., & Prastacos, P. (2005). Space-
motorway traffic. This article indicates that the SVR- Time Modeling of Traffic Flow. Computers & Geosci-
CACO model has better forecasting performance than ences. (31) 119-133.
the SARIMA model. The superior performance of the
SVRCACO model is due to the generalization ability 11. Ledoux, C. (1997). An Urban Traffic Flow Model
of SVR model for forecasting and the proper selection Integrating Neural Networks. Transportation Research
of SVR parameters by CACO. Part C. (5) 287-300.
12. Maniezzo, V., & Colorni, A. (1999). The Ant System
Applied to The Quadratic Assignment Problem. IEEE
REFERENCES Transactions on Knowledge and Data Engineering.
(11) 769-778.
1. Bland, J. A. (1999). Space-Planning by Ant Colony
Optimization. International Journal of Computer Ap- 13. Mohandes, M. A., Halawani, T. O., Rehman, S., &
plications in Technology. (12) 320-328. Hussain, A. A. (2004). Support vector machines for wind
speed prediction. Renewable Energy. (29) 939-947.
2. Box, G. E. P., & Jenkins, G. M. (1970). Time Series
Analysis, Forecasting and Control. San Francisco: 14. Okutani, I., & Stephanedes, Y. J. (1984). Dynamic
Holden-Day. Prediction of Traffic Volume Through Kalman Filtering
Theory. Transportation Research Part B. (18) 1-11.
3. Colorni, A., Dorigo, M., Maniezzo, V., & Trubian,
M. (1994). Ant System For Job-Shop Scheduling. 15. Pai, P. F., & Lin, C. S. (2005). A hybrid ARIMA
JORBELBelgian Journal of Operations Research and support vector machines model in stock price
Statistics and computer Science. (34) 39-53. forecasting. Omega. (33) 497-505.
4. Danech-Pajouh, M., & Aron, M. (1991). ATHENA: A 16. Pai, P. F., Lin, C. S., Hong, W. C., & Chen, C. T.
Method For Short-Term Inter-Urban Motorway Traffic (2006). A hybrid support vector machine regression
415
Continuous ACO in a SVR Traffic Forecasting Model
for exchange rate prediction. International Journal of finding good paths through graphs. A short path gets
Information and Management Sciences. (17) 19-32. marched over faster, and thus the pheromone density
remains high as it is laid on the path as fast as it can
17. Pai, P. F., & Hong, W. C. (2006). Software reli-
evaporate.
ability forecasting by support vector machines with
simulated annealing algorithms. Journal of Systems Artificial Neural Networks (ANNs): A network
and Software. (79) 747-755. of many simple processors (units or neurons) that
imitates a biological neural network. The units are
18. Pai, P. F., & Hong, W. C. (2005a). Forecasting
connected by unidirectional communication channels,
regional electric load based on recurrent support vec-
which carry numeric data.
tor machines with genetic algorithms. Electric Power
Systems Research. (74) 417-425. Autoregressive Integrated Moving Average
(ARIMA): A generalization of an autoregressive mov-
19. Pai, P. F., & Hong, W. C. (2005b). Support vec-
ing average (ARMA) model. These models are fitted
tor machines with simulated annealing algorithms in
to time series data either to better understand the data
electricity load forecasting. Energy Conversion and
or to predict future points in the series. The model is
Management. (46) 2669-2688.
generally referred to as an ARIMA(p,d,q) model where
20. Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. p, d, and q are integers greater than or equal to zero and
(2002). Data Mining With An Ant Colony Optimiza- refer to the order of the autoregressive, integrated, and
tion Algorithm. IEEE Transactions on Evolutionary moving average parts of the model respectively.
Computation. (6) 321-332.
Evolutionary Algorithm (EA): is a generic popu-
21. Smith, B. L., Williams, B. M., & Oswald, R. K. lation-based meta-heuristic optimization algorithm.
(2002). Comparison of Parametric and Nonparametric An EA uses some mechanisms inspired by biological
Models For Traffic Flow Forecasting. Transportation evolution: reproduction, mutation, recombination,
Research Part C. (10) 303-321. natural selection and survival of the fittest. Evolutionary
algorithms consistently perform well approximating
22. Stathopoulos, A., & Karlafits, G. M. (2003). A Mul- solutions to all types of problems because they do not
tivariate State Space Approach for Urban Traffic Flow make any assumption about the underlying fitness
Modeling and Prediction. Transportation Research landscape.
Part C. (11) 121-135.
Pheromone: A pheromone is a chemical that trig-
23. Vythoulkas, P. C. (1993). Alternative Approaches gers an innate behavioral response in another member
to Short Term Forecasting For Use in Driver Infor- of the same species. There are alarm pheromones, food
mation Systems. Transportation and Traffic Theory, trail pheromones, sex pheromones, and many others
Amsterdam: Elsevier. that affect behavior or physiology. In this article, food
24. Whittaker, J., Garside, S., & Lindveld, K. (1994). trail pheromones are employed, which are common in
Tracking and Predicting A Network Traffic Process. social insects.
Proceedings of the Second DRIVE II Workshop on Seasonal Autoregressive Integrated Moving
Short-Term Traffic Forecasting, Delft. Average (SARIMA): A kind of ARIMA model to
25. Yin, H., Wong, S. C., Xu, J., & Wong C. K. (2002). conduct forecasting problem while seasonal effect
Urban Traffic Flow Prediction Using A Fuzzy-Neural is suspected. For example, consider a model of daily
Approach. Transportation Research Part C. (10) 85- road traffic volumes. Weekends clearly exhibit differ-
98. ent behavior from weekdays. In this case it is often
considered better to use a SARIMA (seasonal ARIMA)
model than to increase the order of the AR or MA parts
KEy TERMS of the model.
Ant Colony Optimization Algorithm (ACO): Support Vector Machines (SVMs): Support vector
inspired by the behavior of ants in finding paths from machines (SVMs) were originally developed to solve
the colony to food, is a probabilistic technique for solv- pattern recognition and classification problems. With
ing computational problems which can be reduced to
416
Continuous ACO in a SVR Traffic Forecasting Model
417
418
Qiyang Chen
Montclair State University, USA
James Yao
Montclair State University, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Data Mining Fundamental Concepts and Critical Issues
tion, data/pattern analysis, data archeology, data dredg- ner that minimizes the error between what the model
ing, data snooping, data fishing, information harvesting, had produced and what actually occurred in the past, D
and business intelligence (Han and Kamber, 2001). and then applies this function to future data. Neural
networks are a bit more complex as they incorporate
intensive program architectures in attempting to iden-
MAIN FOCUS tify linear, non-linear and patterned relationships in
historical data.
Functionalities and Tasks
Decision Trees
The common types of information that can be de-
rived from data mining operations are associations, Megaputer (2006) mentioned that this method can be
sequences, classifications, clusters, and forecasting. applied for solution of classification tasks only. As
Associations happen when occurrences are linked in a result of applying this method to a training set, a
a single event. One of the most popular association hierarchical structure of classifying rules of the type
applications deals with market basket analysis. This ifthen is created. This structure has a form of
technique incorporates the use of frequency and prob- a tree. In order to decide to which class an object or a
ability functions to estimate the percentage chance of situation should be assigned one has to answer ques-
occurrences. Business strategists can leverage off of tions located at the tree nodes, starting from the root.
market basket analysis by applying such techniques Following this procedure one eventually comes to one
as cross-selling and up-selling. In sequences, events of the final nodes (called leaves), where the analyst
are linked over time. This is particularly applicable finds a conclusion to which class the considered object
in e-business for Website analysis. should be assigned.
Classification is probably the most common data
mining activity today. It recognizes patterns that Genetic Algorithms (or Evolutionary
describe the group to which an item belongs. It does Programming)
this by examining existing items that already have
been classified and inferring a set of rules from them. Genetic algorithms, biologically inspired search
Clustering is related to classification, but differs in that method, borrow mechanisms of inheritance to find
no groups have yet been defined. Using clustering, solutions. Biological systems demonstrated flexibility,
the data-mining tool discovers different groupings robustness and efficiency. Many biological systems are
within the data. The resulting groups or clusters help good at adapting to their environments. Some biological
the end user make some sense out of vast amounts of methods (such as reproduction, crossover and muta-
data (Kudyba, & Hoptroff, 2001). All of these appli- tion) can be used as an approach to computer-based
cations may involve predictions. The fifth application problem solving. An initial population of solutions is
type, forecasting, is a different form of prediction. It created randomly. Only a fixed number of candidate
estimates the future value of continuous variables based solutions are kept from one generation to the next. Those
on patterns within the data. solutions that are less fit tend to die off, similar to the
biological notion of survival of the fittest.
Algorithms and Methodologies
Regression Analysis
Neural Networks
This technique involves specifying a functional form
Also referred to as artificial intelligence (AI), neural that best describes the relationship between explana-
networks utilize predictive algorithms. This technology tory, driving or independent variables and the target
has many similar characteristics to that of regression or dependent variable the decision maker is looking to
because the application generally examines historical explain. Business analysts typically utilize regression to
data, and utilizes a functional form that best equates identify the quantitative relationships that exist between
explanatory variables and the target variable in a man- variables and enable them to forecast into the future.
419
Data Mining Fundamental Concepts and Critical Issues
Regression models also enable analysts to perform thus increasing effectiveness of their marketing
what if or sensitivity analysis. Some examples include operation.
how response rates change if a particular marketing Predict cross-sell opportunities and make recom-
or promotional campaign is launched, or how certain mendations.
compensation policies affect employee performance Both traditional and Web-based operations can
and many more. help customers quickly locate products of interest
to them and simultaneously increase the value of
Logistics Regression each communication with the customers.
Learn parameters influencing trends in sales and
Logistic regression should be used when you want to margins.
predict the outcome of a dichotomous (e.g., yes/no) In the majority of cases we have no clue on what
variable. This method is used for data that is not nor- combination of parameters influences operation
mally distributed (bell-shaped curve) i.e., categorical (black box). In these situations data mining is
(coded) data. When a dependent variable can only have the only real option.
one of two answers, such as will graduate or will Segment markets and personalize communica-
not graduate, you cannot get a normal distribution as tions.
previously discussed. There might be distinct groups of customers,
patients, or natural phenomena that require dif-
Memory Based Reasoning (MBR) or the ferent approaches in their handling.
Nearest Neighbor Method
The importance of collecting data that reflect specific
To forecast a future situation, or to make a correct business or scientific activities to achieve competitive
decision, such systems find the closest past analogs advantage is widely recognized. Powerful systems for
of the present situation and choose the same solution collecting data and managing it in large databases are in
which was the right one in those past situations. The place in all large and mid-range companies. However,
drawback of this application is that there is no guarantee the bottleneck of turning this data into information is
that resulting clusters provide any value to the end user. the difficulty of extracting knowledge about the system
Resulting clusters may just not make any sense with being studied from the collected data. Human analysts
regards to the overall business environment. Because without special tools can no longer make sense of
of limitations of this technique, no predictive, what if enormous volumes of data that require processing in
or variable/target connection can be implemented. order to make informed business decisions (Kudyba
The key differentiator between classification and & Hoptroff, 2001).
segmentation with that of regression and neural network The applications of data mining are everywhere:
technology mentioned above is the inability of the for- from biomedical data (Hu and Xu, 2005) to mobile user
mer to perform sensitivity analysis or forecasting. data (Goh and Taniar, 2005); from data warehousing
(Tjioe and Taniar, 2005) to intelligent web personal-
Applications and Benefits ization (Zhou, Cheung, & Fong, 2005); from analyz-
ing clinical outcome (Hu, Song, Han, Yoo, Prestrud,
Data mining can be used widely in science and busi- Brennan, & Brooks, 2005) to mining crime patterns
ness areas for analyzing databases, gathering data and (Bagui, 2006).
solving problems. In line with Berry and Linoff (2004),
the benefits data mining can provide for businesses are Potential Pitfalls
limitless. Here are just a few examples:
Data Quality
Identify best prospects and then retain them as
customers. Data quality means the accuracy and completeness of
By concentrating marketing efforts only on the best the data. Data quality is a versatile issue that repre-
prospects, companies will save time and money, sents one of the biggest challenges for data mining.
420
Data Mining Fundamental Concepts and Critical Issues
Data quality problem is of great importance due to appended are demographic, geographic, psychographic,
the emergence of large volumes of data. Many busi- behavioristic, event-driven and computed. Matching D
ness and industrial applications critically rely on the allows you to identify similar data within and across
quality of information stored in diverse databases your data sources. One of the greatest challenges of
and data warehouses. As Seifert (2004) emphasized matching is creating a system that incorporates your
that data quality can be affected by the structure and business rules, or criteria for determining what
consistency of the data being analyzed. Other factors constitutes a match.
like the presence of duplicate records, the lack of data
standards, the timeliness of updates and human errors Preventing Decay
can significantly impact the effectiveness of complex
data mining techniques, which are sensitive to subtle The worst enemy of information is time. And informa-
differences in data. To improve the quality of data it tion decays at different rates (Berry & Linoff, 2004).
is sometimes necessary to clean data by removing the Cleaning your database is a large accomplishment, but
duplicate records, standardizing the values or symbols it will be short-lived if you fail to implement procedures
used in the database to represent certain information, for keeping it clean at the source. According to the
accounting for missing data points, removing unneeded second law of thermodynamics, ordered systems tend
data fields, identifying abnormal data points. to disorder, and a database is a very ordered system.
Contacts move. Companies grow. Knowledge workers
Interoperability enter new customer information incorrectly.
Some information simply starts out wrong, result
Interoperability refers to the ability of computer system of data input errors such as typos, transpositions,
and/or data to work with other systems or data using omissions and other mistakes. These are often easy to
common standards or process. Until recently, some avoid. Finding ways to successfully implement these
government agencies elected not to gamble with any new technologies into a comprehensive data quality
level of open access and operated isolated informa- program not only increases the quality of your customer
tion systems. But isolated data is in many ways use- information, but also saves time, reduces frustration,
less data; bits of valuable information on the Sept. improves customer relations, and ultimately increases
11, 2001 hijackers activities may have been stored revenue. Without constant attention to quality, your
in a variety of databases at the federal, state, and lo- information quality will disintegrate.
cal government levels, but that information was not
colleted and available to those who needed to see it No Generalizations to a Population
to glimpse a complete picture of the growing threat.
So Seifert (2004) suggested that it is a critical part of In statistics a population is defined, and then a sample
the larger efforts to improve interagency collabora- is collected to make inferences about the population.
tion and information sharing. For public data mining, This means that data cannot be re-used. They define a
interoperability of databases and software is important model before looking at the data. Data mining does not
to enable the search and analysis of multiple databases attempt generalizations to a population. The database
simultaneously. This also ensures the compatibility of is considered as the population. With the computing
data mining activities of different agencies. power of modern computers data miners can use the
whole database, making sampling redundant. Data can
Standardization be re-used. In data mining it is a common practice to
try hundreds of models and find the one that fits best.
This allows you to arrange customer information in This makes the interpretation of the significance dif-
a consistent format. Among the biggest challenges ficult. Machine learning is the data mining equivalent
are inconsistent abbreviations, and misspellings and to regression. In machine learning we use a training set
variant spellings. Among the types of data that can be to train the system to find the dependent variable.
421
Data Mining Fundamental Concepts and Critical Issues
422
Data Mining Fundamental Concepts and Critical Issues
423
424
John Wang
Montclair State University, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Data Warehousing Development and Design Methodologies
This architecture uses star schema at atomic-level data table and a set of surrounding dimension tables, each
marts. Which architecture should an enterprise follow? corresponding to one of the components or dimensions D
Is one better than the other? of the fact table (Levene & Loizou, 2003). Dimensional
Currently, the most popular data model for data models are oriented toward a specific business process
warehouse design is the dimensional model (Han & or subject. This approach keeps the data elements as-
Kamber, 2006; Bellatreche, 2006). Some researchers sociated with the business process only one join away.
call this model the data-driven design model. Artz The most popular data model for a data warehouse is
(2006), nevertheless, advocates the metric-driven multidimensional model. Such a model can exist in
model, which, as another view of data warehouse the form of a star schema, a snowflake schema, or a
design, begins by identifying key business processes starflake schema.
that need to be measured and tracked over time in order The star schema (see Figure 1) is the simplest data-
for the organization to function more efficiently. There base structure containing a fact table in the center, no
has always been the issue of top-down vs. bottom-up redundancy, which is surrounded by a set of smaller
approaches in the design of information systems. The dimension tables (Ahmad et al., 2004; Han & Kamber,
same is with a data warehouse design. These have been 2006). The fact table is connected with the dimension
puzzling questions for business intelligent architects and tables using many-to-one relationships to ensure their
data warehouse designers and developers. The next sec- hierarchy. The star schema can provide fast response
tion will extend the discussion on issues related to data time allowing database optimizers to work with simple
warehouse design and development methodologies. database structures in order to yield better execution
plans.
The snowflake schema (see Figure 2) is a variation
DESIGN AND DEVELOPMENT of the star schema model, in which all dimensional
METHODOlOGIES information is stored in the third normal form, thereby
further splitting the data into additional tables, while
Data Warehouse Data Modeling keeping fact table structure the same. To take care of
hierarchy, the dimension tables are connected with
Database design is typically divided into a four-stage sub-dimension tables using many-to-one relationships.
process (Raisinghani, 2000). After requirements are The resulting schema graph forms a shape similar to a
collected, conceptual design, logical design, and physi- snowflake (Ahmad et al., 2004; Han & Kamber, 2006).
cal design follow. Of the four stages, logical design The snowflake schema can reduce redundancy and save
is the key focal point of the database design process storage space. However, it can also reduce the effective-
and most critical to the design of a database. In terms ness of browsing and the system performance may be
of an OLTP system design, it usually adopts an ER adversely impacted. Hence, the snowflake schema is
data model and an application-oriented database de- not as popular as star schema in data warehouse design
sign (Han & Kamber, 2006). The majority of modern (Han & Kamber, 2006). In general, the star schema
enterprise information systems are built using the ER requires greater storage, but it is faster to process than
model (Raisinghani, 2000). The ER data model is the snowflake schema (Kroenke, 2004).
commonly used in relational database design, where The starflake schema (Ahmad et al., 2004), also
a database schema consists of a set of entities and the called galaxy schema or fact constellation schema
relationship between them. The ER model is used to (Han & Kamber, 2006), is a combination of the de-
demonstrate detailed relationships between the data normalized star schema and the normalized snowflake
elements. It focuses on removing redundancy of data schema (see Figure 3). The starflake schema is used in
elements in the database. The schema is a database situations where it is difficult to restructure all entities
design containing the logic and showing relationships into a set of distinct dimensions. It allows a degree
between the data organized in different relations (Ahmad of crossover between dimensions to answer distinct
et al., 2004). Conversely, a data warehouse requires a queries (Ahmad et al., 2004). Figure 3 illustrates the
concise, subject-oriented schema that facilitates online starflake schema.
data analysis. A data warehouse schema is viewed as a What needs to be differentiated is that the three
dimensional model which is composed of a central fact schemas are normally adopted according to the differ-
425
Data Warehousing Development and Design Methodologies
C U STOM ER S A LE S P E R S O N
C us tom erID S ales pers onID
N am e N am e
A reaC ode P hone
Loc alN um ber H ireD ate
S treet
C ity
S tate T R A N S A C T IO N
Z ip
P urc has eD ate
S ales P ric e
C us tom erID
A s k ingP ric e
S ales pers onID
A R T_ D E A LE R A rtD ealerID W OR K
A rtdealerID W ork ID
C om pany N am e A rtis tID
C ity T itle
S tate C opy
C ountry D es c ription
C U STOM ER S A LE S P E R S O N
ences of design requirements. A data warehouse collects a department subject of the data warehouse. Its scope
information about subjects that span the entire organi- is department-wide. The star schema and snowflake
zation, such as customers, items, sales, etc. Its scope schema are geared towards modeling single subjects.
is enterprise-wide (Han & Kamber, 2006). Starflake Consequently, the star schema or snowflake schema
schema can model multiple and interrelated subjects. is commonly used for a data mart modeling, although
Therefore, it is usually used to model an enterprise- the star schema is more popular and efficient (Han &
wide data warehouse. A data mart, on the other hand, Kamber, 2006).
is similar to a data warehouse but limits its focus to
426
Data Warehousing Development and Design Methodologies
Figure 3. Example of a starflake schema (galaxy schema or fact constellation) (adapted from Han & Kamber,
2006) D
T IM E S A LE S IT E M S H IP P IN G S H IP P E R
BRANCH LO C A T IO N
427
Data Warehousing Development and Design Methodologies
derived summary data from OLTP systems may not be An alternative to the above-discussed two ap-
clear. The metric-driven design approach, on the other proaches is to use a combined approached (Han &
hand, begins first by defining key business processes Kamber, 2006), with which an organization can exploit
that need to be measured and tracked over time. After the planned and strategic nature of the top-down ap-
these key business processed are identified, then they are proach while retaining the rapid implementation and
modeled in a dimensional data model. Further analysis opportunistic application of the bottom-up approach
follows to determine how the dimensional model will (p. 129), when such an approach is necessitated in the
be populated (Artz, 2006). undergoing organizational and business scenarios.
According to Artz (2006), data-driven model to a
data warehouse design has little future since informa- Data Partitioning and Parallel
tion derived from a data-driven model is information Processing
about the data set. Metric-driven model, conversely,
is possibly to have some key impacts and implications Data partitioning is the process of decomposing large
because information derived from a metric-driven tables (fact tables, materialized views, indexes) into
model is information about the organization. Data- multiple small tables by applying the selection opera-
driven approach is dominating data warehouse design tors (Bellatreche, 2006). A good partitioning scheme
in organizations at present. Metric-driven, on the other is an essential part of designing a database that will
hand, is at its research stage, needing practical applica- benefit from parallelism (Singh, 1998). With a well
tion testimony of its speculated potentially dramatic performed partitioning, significant improvements in
implications. availability, administration, and table scan performance
can be achieved.
Top-Down vs. Bottom-Up Parallel processing is based on a parallel database,
in which multiprocessors are in place. Parallel data-
There are two approaches in general to building a data bases link multiple smaller machines to achieve the
warehouse prior to the data warehouse construction same throughput as a single, larger machine, often with
commencement, including data marts: the top-down greater scalability and reliability than single proces-
approach and bottom-up approach (Han & Kamber, sor databases (Singh, 1998). In a context of relational
2006; Imhoff et al., 2004; Marakas, 2003). Top-down online analytical processing (ROLAP), by partitioning
approach starts with a big picture of the overall, en- data of ROLAP schema (star schema or snowflake
terprise-wide design. The data warehouse to be built schema) among a set of processors, OLAP queries
is large and integrated, with a focus on integrating the can be executed in a parallel, potentially achieving a
enterprise data for usage in any data mart from the very linear speedup and thus significantly improving query
first project (Imhoff et al., 2004). It implies a strategic response time (Datta et al., 1998; Tan, 2006). Given the
rather than an operational perspective of the data. It size of contemporary data warehousing repositories,
serves as the proper alignment of an organizations multiprocessor solutions are crucial for the massive
information systems with its business goals and objec- computational demands for current and future OLAP
tives (Marakas, 2003). However, this approach is risky system (Dehne et al., 2006). The assumption of most of
(Ponniah, 2001). In contrast, a bottom-up approach is the fast computation algorithms is that their algorithms
to design the warehouse with business-unit needs for can be applied into the parallel processing system
operational systems. It starts with experiments and (Dehne, 2006; Tan, 2006). As a result, it is sometimes
prototypes (Han & Kamber, 2006). With bottom-up, necessary to use parallel processing for data mining
departmental data marts are built first one by one. It because large amounts of data and massive search ef-
offers faster and easier implementation, favorable re- forts are involved in data mining (Turban et al., 2005).
turn on investment, and less risk of failure, but with a Therefore, data partitioning and parallel processing are
drawback of data fragmentation and redundancy. The two complementary techniques to achieve the reduction
focus of bottom-up approach is to meet unit-specific of query processing cost in data warehousing design
needs with minimum regards to the overall enterprise- and development (Bellatreche, 2006).
wide data requirements (Imhoff et al, 2004).
428
Data Warehousing Development and Design Methodologies
FUTURE TRENDS Banker, R.D., & Kauffman, R.J. (2004, March). The
evolution of research on information systems: A fiftieth- D
Currently, data warehousing is largely applied in cus- year survey of the literature in Management Science.
tomer relationship management (CRM). However, there Management Science, 50(3), 281-298.
are up to date no agreed upon standardized rules for
Bellatreche, L., & Mohania, M. (2006). Physical data
how to design a data warehouse to support CRM and a
warehousing design. In J. Wang (Ed.), Encyclopedia of
taxonomy of CRM analyses needs to be developed to
Data Warehousing and Mining (Vol. 2, pp. 906-911).
determine factors that affect design decisions for CRM
Hershey, PA: Idea Group References.
data warehouse (Cunningham et al., 2006).
In data modeling area, to develop a more general Bellatreche, L., Schneider, M., Mohania, M., & Bhar-
solution for modeling data warehouse current ER model gava, B. (2002). PartJoin: An efficient storage and query
and dimensional model need to be extended to the next execution for data warehouses. Proceedings of the 4th
level to combine the simplicity of the dimensional model International Conference on Data Warehousing and
and the efficiency of the ER model with the support of Knowledge Discovery (DAWAK02), 109-132.
object oriented concepts.
Cunningham, C., Song, I., & Chen, P.P. (2006, April-
June). Data warehouse design to support customer
relationship management analyses. Journal of Database
CONClUSION
Management, 17(2), 62-84.
Several data warehousing development and design Datta, A., Moon, B., & Thomas, H. (1998). A case for
methodologies have been reviewed and discussed. parallelism in data warehousing and OLAP. Proceed-
Data warehouse data model differentiates itself from ings of the 9th International Workshop on Database and
ER model with an orientation toward specific busi- Expert Systems Applications (DEXA98), 226-231.
ness purposes. It benefits an enterprise greater if the
Dehne, F., Eavis, T., & Rau-Chaplin, A. (2006). The
CIF and MD architectures are both considered in the
cgmCUBE project: Optimizing parallel data cube
design of a data warehouse. Some of the methodologies
generation for ROLAP. Distrib. Parallel Databases,
have been practiced in the real world and accepted by
19, 29-62.
todays businesses. Yet new challenging methodologies,
particularly in data modeling and models for physical Gorry, G.A., & Scott Morton, M.S. (1971). A framework
data warehousing design, such as the metric-driven for management information systems. Sloan Manage-
methodology, need to be further researched and de- ment Review, 13(1), 1-22.
veloped.
Han, J., & Kamber, M. (2006). Data mining: Concepts
and techniques (2nd ed.). San Francisco, CA: Morgan
Kaufmann Publisher.
REFERENCES
Hand, D., Mannila, H., & Smyth, P. (2001). Principles
Ackoff, R.I. (1967). Management misinformation of data mining. Cambridge Massachusetts: The MIT
systems. Management Science, 14(4), 147-156. Press.
Ahmad, I., Azhar, S., & Lukauskis, P. (2004). Devel- Hoffer, J.A., Prescott, M.B., & McFadden, F.R. (2007).
opment of a decision support system using data ware- Modern database management (8th ed.). Upper Saddle
housing to assist builders/developers in site selection. River, NJ: Pearson Prentice Hall.
Automation in Construction, 13, 525-542.
Imhoff, C., Galemmco, M., & Geiger, J.G. (2004).
Artz, J. M. (2006). Data driven vs. metric driven data Comparing two data warehouse methodologies. (Data-
warehouse design. In J. Wang (Ed.), Encyclopedia of base and network intelligence). Database and Network
Data Warehousing and Mining (Vol. 1, pp. 223-227). Journal, 34(3), 3-9.
Hershey, PA: Idea Group References.
Inmon, W.H., Terdeman, R.H., & Imhoff, C. (2000).
Exploration warehousing: Turning business informa-
429
Data Warehousing Development and Design Methodologies
tion into business opportunity. NY: John Wiley & KEy TERMS
Sons, Inc.
Dimensions: They are the perspectives or entities
Kimball, R. (1996). The data warehouse toolkit:
with respect to which an organization wants to keep
Practical techniques for building dimensional data
records (Han & Kamber, 2006, p. 110).
warehouses. NY: John Wiley & Sons.
Dimensional Model: A model containing a central
Kimball, R. (1997, August). A dimensional modeling
fact table and a set of surrounding dimension tables,
manifesto. DBMS, 10(9), 58-70.
each corresponding to one of the components or dimen-
Kroenke, D.M. (2004). Database processing: Funda- sions of the fact table.
mentals, design and implementation (9th ed.). Saddle
Entity-Relationship Data Model: A model that
River, New Jersey: Prentice Hall.
represents database schema as a set of entities and the
Levene, M., & Loizou, G. (2003). Why is the snowflake relationships among them.
schema a good data warehouse design? Information
Fact Table: The central table in a star schema,
Systems, 28(3), 225-240.
containing the names of the facts, or measures, as well
Marakas, G.M. (2003). Modern data warehousing, as keys to each of the related dimension tables.
mining, and visualization: Core concepts. Upper Saddle
Metric-Drive Design: A data warehousing design
River, New Jersey: Pearson Education Inc.
approach which begins by defining key business pro-
Ponniah, P. (2001). Data warehousing fundamentals: A cesses that need to be measured and tracked over time.
comprehensive guide for IT professionals. New York: Then they are modeled in a dimensional model.
John Wiley & Sons, Inc.
Parallel Processing: The allocation of the operat-
Raisinghani, M.S. (2000). Adapting data modeling ing systems processing load across several processors
techniques for data warehouse design. Journal of (Singh, 1998, p. 209).
Computer Information Systems, 4(3), 73-77.
Star Schema: A modeling diagram which contains
Singh, H.S. (1998). Data warehousing: Concepts, a large central table (fact table) and a set of smaller
technologies, implementations, and management. Up- attendant tables (dimension tables) each represented
per Saddle River, NJ: Prentice Hall PTR.S by only one table with a set of attributes.
Tan, R.B. (2006). Online analytical processing systems.
In J. Wang (Ed.), Encyclopedia of Data Warehousing
and Mining (Vol. 2, pp. 876-884). Hershey, PA: Idea
Group References.
430
431
Love Ekenberg
Stockholm University, Sweden & Royal Institute of Technology, Sweden
INTRODUCTION BACKGROUND
There are several ways of building complex distributed Ramsey (1926/78) was the first to suggest a theory that
software systems, for example in the form of software integrated ideas on subjective probability and utility
agents. But regardless of the form, there are some com- in presenting (informally) a general set of axioms for
mon problems having to do with specification contra preference comparisons between acts with uncertain
execution. One of the problems is the inherent dynam- outcomes (probabilistic decisions). von Neumann and
ics in the environment many systems are exposed to. Morgenstern (1947) established the foundations for a
The properties of the environment are not known with modern theory of utility. They stated a set of axioms
any precision at the time of construction. This renders that they deemed reasonable to a rational decision-
a specification of the system incomplete by defini- maker (such as an agent), and demonstrated that the
tion. A traditional software agent is only prepared agent should prefer the alternative with the highest
to handle situations conceived of and implemented at expected utility, given that she acted in accordance
compile-time. Even though it can operate in varying with the axioms. This is the principle of maximizing
contexts, its decision making abilities are static. One the expected utility. Savage (1954/72) published a
remedy is to prepare the distributed components for a thorough treatment of a complete theory of subjective
truly dynamic environment, i.e. an environment with expected utility. Savage, von Neumann, and others
changing and somewhat unpredictable conditions. A structured decision analysis by proposing reasonable
rational software agent needs both a representation principles governing decisions and by constructing a
of a decision problem at hand and means for evalua- theory out of them. In other words, they (and later many
tion. AI has traditionally addressed some parts of this others) formulated a set of axioms meant to justify their
problem such as representation and reasoning, but particular attitude towards the utility principle, cf., e.g.,
has hitherto to a lesser degree addressed the decision Herstein and Milnor (1953), Suppes (1956), Jeffrey
making abilities of independent distributed software (1965/83), and Luce and Krantz (1971). In classical
components (Ekenberg, 2000a, 2000b). Such decision decision analysis, of the types suggested by Savage
making often has to be carried out under severe un- and others, a widespread opinion is that utility theory
certainty regarding several parameters. Thus, methods captures the concept of rationality.
for independent decision making components should After Raiffa (1968), probabilistic decision models
be able to handle uncertainties on the probabilities and are nowadays often given a tree representation (see Fig.
utilities involved. They have mostly been studied as 1). A decision tree consists of a root, representing a
means of representation, but are now being developed decision, a set of event nodes, representing some kind
into functional theories of decision making suitable for of uncertainty and consequence nodes, representing
dynamic use by software agents and other dynamic possible final outcomes. In the figure, the decision is
distributed components. Such a functional theory will a square, the events are circles, and final consequences
also benefit analytical decision support systems intended are triangles. Events unfold from left to right, until final
to aid humans in their decision making. Thus, the ge- consequences are reached. There may also be more than
neric term agent below stands for a dynamic software one decision to make, in which case the sub-decisions
component as well as a human or a group of humans are made before the main decision.
assisted by intelligent software.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Decision Making in Intelligent Agents
In decision trees, probability distributions are as- developed in (Huber, 1973, Huber & Strassen, 1973).
signed in the form of weights (numbers) in the prob- Capacities have subsequently been used for modelling
ability nodes as measures of the uncertainties involved. imprecise probabilities as intervals (capacities of order
Obviously, such a numerically precise approach puts 2 (Denneberg, 1994)). Since the beginning of the 1960s
heavy demands on the input capability of the agent. the use of first-order (interval-valued) probability func-
The shortcomings of this representation are many, tions, by means of classes of probability measures, has
and have to be compensated for, see, e.g., (Ekenberg, been integrated in classical probability theory by, e.g.,
2000a). Among other things, the question has been Smith (1961) and Good (1962). Similarly, Dempster
raised whether people are capable of providing the (1967) investigated a framework for modelling upper
input information that utility theory requires (cf., e.g., and lower probabilities, which was further developed
(Fischhoff et al., 1983)). For instance, most people by Shafer (1976), where a representation of belief in
cannot clearly distinguish between probabilities rang- states or events was provided. Within the AI community
ing roughly from 0.3 to 0.7 (Shapira, 1995). Similar the Dempster-Shafer approach has received a good
problems arise in the case of artificial agents, since deal of attention. However, their formalism seems to
utility-based artificial agents usually base their reason- be too strong to be an adequate representation of belief
ing on human assessments, for instance in the form of (Weichselberger & Phlman, 1990).
induced preference functions. The so-called reactive Other representations in terms of upper and lower
agents, for which this does not hold true, have not been probabilities have been proposed by, i.a., Hodges and
put to use in dynamic domains involving uncertainty Lehmann (1952), Hurwicz (1951), Wald (1950), Kyburg
(cf., e.g., (Russell & Norvig, 1995)). Furthermore, (1961), Levi (1974, 1980), Walley (1991), Danielson
even if an agent would be able to discriminate between and Ekenberg (1998, 2007), and Ekenberg et al. (2001).
different probabilities, very often complete, adequate, Upper and lower previsions have also been investigated
and precise information is missing. by various authors. For instance, Shafer et al. (2003)
Consequently, during recent years of rather intense suggests a theory for how to understand subjective
research activities several alternative approaches have probability estimates based on Walley (1991). A few
emerged. In particular, first-order approaches, i.e., based approaches have also been based on logic, e.g., Nilsson
on sets of probability measures, upper and lower prob- (1986). He develops methods for dealing with sentences
abilities, and interval probabilities, have prevailed. involving upper and lower probabilities. This kind of
A main class of such models has been focused on approaches has been pursued further by, among others,
expressing probabilities in terms of intervals. In 1953, Wilson (1999).
the concept of capacities was introduced (Choquet,
1953/54). This representation approach was further
432
Decision Making in Intelligent Agents
SECOND-ORDER REPRESENTATIONS and value measures have been developed into various
hierarchical models, such as second-order probability D
A common characteristic of the first-order representa- theory (Grdenfors & Sahlin, 1982, 1983, Ekenberg &
tions above is that they typically do not include all of Thorbirnson, 2001, Ekenberg et al., 2005). Grdenfors
the strong axioms of probability theory and thus they do and Sahlin consider global distributions of beliefs,
not require an agent to model and evaluate a decision but restrict themselves to interval representations and
situation using precise probability (and, in some cases, only to probabilities, not utilities. Other limitations
value) estimates. An advantage of representations using are that they neither investigate the relation between
upper and lower probabilities is that they do not require global and local distributions, nor do they introduce
taking probability distributions into consideration. methods for determining the consistency of user-as-
On the other hand, it is then often difficult to devise a serted sentences. The same applies to Hodges and
reasonable decision rule that finds an admissible alter- Lehmann (1952), Hurwicz (1951), and Wald (1950).
native out of a set of alternatives and at the same time Some more specialized approaches have recently been
fully reflects the intensions of an agent (or its owner). suggested, such as (Jaffray, 1999), (Nau, 2002), and
Since the probabilities and values are represented by (Utkin & Augustin, 2003). In general, very few have
intervals, the expected value range of an alternative will addressed the problems of computational complex-
also be an interval. In effect, the procedure retains all ity when solving decision problems involving such
alternatives with overlapping expected utility intervals, estimates. Needless to say, it is important in dynamic
even if the overlap is very small. Furthermore, they do agents to be able to determine, in a reasonably short
not admit for discrimination between different beliefs time, how various evaluative principles rank the given
in different values within the intervals. options in a decision situation.
All of these representations face the same trade- Ekenberg et al. (2006) and Danielson et al. (2007)
off. Zero-order approaches (i.e. fixed numbers rep- provide a framework for how second-order repre-
resenting probability and utility assessments) require sentation can be systematically utilized to put belief
unreasonable precision in the representation of input information into use in order to efficiently discriminate
data. Even though the evaluation and discrimination between alternatives that evaluate into overlapping
between alternatives becomes simple, the results are expected utility intervals when using first-order interval
often not a good representative of the problem and evaluations. The belief information is in the form of
sensitivity analyses are hard to carry out for more than a joint belief distribution, specified as marginal belief
a few parameters at a time. First-order approaches distributions projected on each parameter. It is shown
(e.g. intervals) offer a remedy to the representation that regardless of the form of belief distributions over
problem by allowing imprecision in the representation the originating intervals, the distributions resulting
of probability and utility assessments such as intervals, from multiplications and additions have forms very
reflecting the uncertainty inherent in most real-life deci- different from their components. This warp of result-
sion problems faced by agents. But this permissibility ing belief demonstrates that analyses using only first-
opens up a can of worms in the sense that evaluation order information such as upper and lower bounds are
and discrimination becomes much harder because of not taking all available information into account. The
overlap in the evaluation results of different options method is based on the agents belief in different parts
for the agent, i.e. the worst case for one alternative is of the intervals, expressed or implied, being taken into
no better than the best case for another alternative or consideration. It can be said to represent the beliefs in
vice versa, rendering a total ranking order between the various sub-parts of the feasible intervals. As a result,
alternatives impossible to achieve. The trade-off be- total lack of overlap is not required for successful dis-
tween realistic representation and discriminative power crimination between alternatives. Rather, an overlap by
has not been solved within the above paradigms. For interval parts carrying little belief mass, i.e. representing
a solution, one must look at second-order approaches a small part of the agents belief, is allowed. Then, the
allowing both imprecision in representation and power non-overlapping parts can be thought of as being the
of admissible discrimination. core of the agents appreciation of the decision situa-
Approaches for extending the interval representa- tion, thus allowing discrimination.
tion using distributions over classes of probability
433
Decision Making in Intelligent Agents
There are essentially three ways of evaluating (i.e. Second-order methods are not just nice theories, but
making a decision in) a second-order agent decision should be taken into account to provide efficient deci-
problem. The first way (centroid analysis) is to use sion methods for agents, in particular when handling
the centroid as the best single-point representative of aggregations of imprecise representations as is the case
the distributions. The centroid is additive and mul- in decision trees or probabilistic networks.
tiplicative. Thus, the centroid of the distribution of
expected utility is the expected utility of the centroids
of the projections. A centroid analysis gives a good REFERENCES
overview of a decision situation. The second way
(contraction analysis) is to use the centroid as a focal Choquet, G. (1953/54). Theory of Capacities, Ann.
point (contraction point) towards which the intervals Inst. Fourier 5, 131295.
are decreased while studying the overlap in first-order
expected utility intervals. The third way (distribution Danielson, M. & Ekenberg, L. (1998). A Framework
analysis) is more elaborated, involving the analysis for Analysing Decisions under Risk, European Journal
of the resulting distributions of expected utility and of Operational Research 104(3), 474484.
calculating the fraction of belief overlapping between Danielson, M. & Ekenberg, L. (2007). Computing Upper
alternatives being evaluated. and Lower Bounds in Interval Decision Trees, European
Journal of Operational Research 181, 808816.
In this article, we discuss various approaches to proba- Ekenberg, L. (2000b). The Logic of Conflicts between
bilistic decision making in agents. We point out that Decision Making Agents, Journal of Logic and Com-
theories incorporating second-order belief can provide putation 10(4), 583602.
more powerful discrimination to the agent (software Ekenberg, L., Boman, M., & Linneroth-Bayer, J. (2001).
agent or human being) when handling aggregations of General Risk Constraints, Journal of Risk Research
interval representations, such as in decision trees or 4(1), 3147.
probabilistic networks, and that interval estimates (up-
per and lower bounds) in themselves are not complete. Ekenberg, L., Danielson, M., & Thorbirnson, J. (2006).
This applies to all kinds of decision trees and proba- Multiplicative Properties in Evaluation of Decision
bilistic networks since they all use multiplications for Trees, International Journal of Uncertainty, Fuzziness
the evaluations. The key idea is to use the information and Knowledge-Based Systems 14(3), 293316.
available in efficient evaluation of decision structures. Ekenberg, L. & Thorbirnson, J. (2001). Second-
Using only interval estimates often does not provide Order Decision Analysis, International Journal of
enough discrimination power for the agent to gener- Uncertainty, Fuzziness and Knowledge-Based Systems
ate a preference order among alternatives considered. 9(1), 1338.
434
Decision Making in Intelligent Agents
Ekenberg, L., Thorbirnson, J., & Baidya, T. (2005). Jeffrey, R. (1965/83). The Logic of Decision, 2nd ed.,
Value Differences using Second-order Distributions, University of Chicago Press. (First edition 1965) D
International Journal of Approximate Reasoning 38(1),
Kyburg, H.E. (1961). Probability and the Logic of
8197.
Rational Belief, Connecticut: Wesleyan University
Fischhoff, B., Goitein, B., & Shapira, Z. (1983). Sub- Press.
jective Expected Utility: A Model of Decision Making,
Levi, I. (1974). On Indeterminate Probabilities, The
Decision making under Uncertainty, R.W. Scholz,
Journal of Philosophy 71, 391418.
ed., Elsevier Science Publishers B.V. North-Holland,
183207. Levi, I. (1980). The Enterprise of Knowledge, MIT
Press.
Grdenfors, P. & Sahlin, N.E. (1982). Unreliable Prob-
abilities, Risk Taking, and Decision Making, Synthese Luce, R.D. & Krantz, D. (1971). Conditional Expected
53, 361386. Utility, Econometrica 39, 253271.
Grdenfors, P. & Sahlin, N.E. (1983). Decision Making Nau, R.F. (2002). The aggregation of imprecise prob-
with Unreliable Probabilities, British Journal of Math- abilities, Journal of Statistical Planning and Inference
ematical and Statistical Psychology 36, 240251. 105, 265282.
Good, I.J. (1962). Subjective Probability as the Measure von Neumann, J. & Morgenstern, O. (1947). Theory of
of a Non-measurable Set, Logic, Methodology, and the Games and Economic Behaviour, 2nd ed., Princeton
Philosophy of Science, Suppes, Nagel, & Tarski, eds., University Press.
Stanford University Press, 319329.
Nilsson, N. (1986). Probabilistic Logic, Artificial Intel-
Herstein, I.N. & Milnor, J. (1953). An Axiomatic ligence 28, 7187.
Approach to Measurable Utility, Econometrica 21,
291297. Raiffa, H. (1968). Decision Analysis, Addison Wes-
ley.
Hodges, J.L. & Lehmann, E.L. (1952). The Use of
Previous Experience in Reaching Statistical Decisions, Ramsey, F.P. (1926/78). Truth and Probability, Founda-
The Annals of Mathematical Statistics 23, 396407. tions: Essays in Philosophy, Logics, Mathematics and
Economics, ed. Mellor, 58100, Routledge and Kegan
Huber, P.J. (1973). The Case of Choquet Capacities Paul. (Originally from 1926)
in Statistics, Bulletin of the International Statistical
Institute 45, 181188. Russell, S.J. & Norvig, P. (1995). Artificial Intelligence:
A Modern Approach, Prentice-Hall.
Huber, P.J. & Strassen, V. (1973). Minimax Tests and
the Neyman-Pearsons Lemma for Capacities, Annals Savage, L. (1954/72). The Foundations of Statistics,
of Statistics 1, 251263. 2nd ed., John Wiley and Sons. (First edition 1954)
Hurwicz, L. (1951). Optimality Criteria for Decision Shafer, G., Gillet, P.R. & Scherl, R.B. (2003). Subjective
Making under Ignorance, Cowles Commission Discus- Probability and Lower and Upper Prevision: A New
sion Paper 370. Understanding, Proceedings of ISIPTA 03.
Imprecise Probability Project (IPP), http://ippserv.rug. Shapira, Z. (1995). Risk Taking: A Managerial Perspec-
ac.be/home/ipp.html. tive, Russel Sage Foundation.
435
Decision Making in Intelligent Agents
Suppes, P. (1956). The Role of Subjective Probability outcomes. Usually, probability distributions are as-
and Utility Maximization, Proceedings of the Third signed in the form of weights in the probability nodes
Berkeley Symposium on Mathematical Statistics and as measures of the uncertainties involved.
Probability 1954-55 5, 113134.
Expected Value: Given a decision tree with r al-
Utkin, L.V. & Augustin,T. (2003). Decision Making ternatives Ai for i = 1,,r, the expression
with Imprecise Second-Order Probabilities, Proceed-
ings of ISIPTA03.
E(Ai) =
ni0 ni1 nim2 nim1
Wald, A. (1950). Statistical Reasoning with Imprecise
p p ii 1 ii 1 i 2 ... pii 1 i 2 ...i m2 i m1 pii 1 i 2 ...i m2 i m1 i m vii 1 i 2 ...i m2 i m1 i m 1
Probabilities, Chapman and Hall. i1 =1 i2 =1 im1 =1 im =1
f A ( x) = F ( x)dVB A ( x) .
where VB is some k-dimensional Lebesque measure B A
on B.
Decision Tree: A decision tree consists of a root Then fA is the projection of F on A. A projection of a
node, representing a decision, a set of intermediate belief distribution is also a belief distribution.
(event) nodes, representing some kind of uncertainty
and consequence nodes, representing possible final
436
437
Kyriacos Chrysostomou
Brunel University, UK
Sherry Y. Chen1
Brunel University, UK
Xiaohui Liu
Brunel University, UK
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Decision Tree Applications for Data Modelling
To create a decision tree model on the basis of the modelling different types of data. As a dataset may be
above-mentioned approach, the modelling processes constructed by different types of data, e.g., categori-
can be divided into three stages, which are: (1) tree cal data, numerical data, or the combination of both,
growing, (2) tree pruning, and (3) tree selection. there is a need to use a suitable decision tree algorithm
which can support the particular type of data used in
the dataset. All of the above-mentioned algorithms can
Tree Growing support the modelling of categorical data whilst only
the C4.5 algorithm and the CART algorithm can be
The initial stage of creating a decision tree model is used for the modelling of numerical data (see Table
tree growing, which includes two steps: tree merging 1). This difference can also be used as a guideline for
and tree splitting. At the beginning, the non-significant the selection of a suitable decision tree algorithm. The
predictor categorises and the significant categories other difference amongst these algorithms is the pro-
within a dataset are grouped together (tree merging). cess of model development, especially at the stages of
As the tree grows, impurities within the model will tree growing and tree pruning. In terms of the former,
increase. Since the existence of impurities may result the ID3 and C4.5 algorithms split a tree model into as
in reducing the accuracy of the model, there is a need to many ramifications as necessary whereas the CART
purify the tree. One possible way to do it is to remove algorithm can only support binary splits. Regarding
the impurities into different leaves and ramifications the latter, the pruning mechanisms located within the
(tree splitting) (Chang, 2007). C4.5 and CART algorithms support the removal of
insignificant nodes and ramifications but the CHAID
Tree Pruning algorithm hinders the tree growing process before the
training data is being overused (see Table 1).
Tree pruning, which is the key elements of the second
stage, is to remove irrelevant splitting nodes (Kirkos
et al., 2007). The removal of irrelevant nodes can help DECISION TREE APPLICATIONS
reduce the chance of creating an over-fitting tree. Such
a procedure is particularly useful because an over-fit- Business Management
ting tree model may result in misclassifying data in real
world applications (Breiman et al., 1984). In the past decades, many organizations had created
their own databases to enhance their customer services.
Tree Selection Decision trees are a possible way to extract useful
information from databases and they have already
The final stage of developing a decision tree model is been employed in many applications in the domain
tree selection. At this stage, the created decision tree of business and management. In particular, decision
model will be evaluated by either using cross-validation tree modelling is widely used in customer relationship
or a testing dataset (Breiman et al., 1984). This stage management and fraud detection, which are presented
is essential as it can reduce the chances of misclassify- in subsections below.
ing data in real world applications, and consequently,
minimise the cost of developing further applications. Customer Relationship Management
438
Decision Tree Applications for Data Modelling
ing online shopping is used as a label to classify users Previous research has proved that creating a decision
into two categories: (a) users who rarely used online tree is a possible way to address this issue as it can D
shopping and (b) users who frequently used online consider all variables during the model development
shopping. In terms of the former, the model suggests process. Kirkos et al. (2007) have created a decision
that the time customers need to spend in a transaction tree model to identify and detect FFS. In their study,
and how urgent customers need to purchase a product 76 Greek manufacturing firms have been selected and
are the most important factors which need to be con- their published financial statements, including balance
sidered. With respect to the latter, the created model sheets and income statements, have been collected for
indicates that price and the degree of human resources modelling purposes. The created tree model shows that
involved (e.g. the requirements of contacts with the all non-fraud cases and 92% of the fraud cases have
employees of the company in having services) are the been correctly classified. Such a finding indicates that
most important factors. The created decision trees also decision trees can make a significant contribution for
suggest that the success of an online shopping highly the detection of FFS due to a highly accurate rate.
depends on the frequency of customers purchases and
the price of the products. Findings discovered by deci- Engineering
sion trees are useful for understanding their customers
needs and preferences. The other important application domain that decision
trees can support is engineering. In particular, decision
Fraudulent Statement Detection trees are widely used in energy consumption and fault
diagnosis, which are described in subsections below.
Another widely used business application is the detec-
tion of Fraudulent Financial Statements (FFS). Such Energy Consumption
an application is particularly important because the
existence of FFS may result in reducing the govern- Energy consumption concerns how much electricity
ments tax income (Spathis et al., 2003). A traditional has been used by individuals. The investigation of
way to identify FFS is to employ statistical methods. energy consumption becomes an important issue as it
However, it is difficult to discover all hidden informa- helps utility companies identify the amount of energy
tion due to the necessity of making a huge number of needed. Although many existing methods can be used
assumptions and predefining the relationships among for the investigation of energy consumption, decision
the large number of variables in a financial statement. trees appear to be preferred. This is due to the fact that
439
Decision Tree Applications for Data Modelling
440
Decision Tree Applications for Data Modelling
441
Decision Tree Applications for Data Modelling
Lee, S., Lee, S., & Park, Y. (2007). A Prediction Model KEy TERMS
for Success of Services in E-commerce Using Decision
Tree: E-customers Attitude Towards Online Service. Attributes: Pre-defined variables in a dataset.
Expert Systems with Applications , 33 (3), 572-581.
Classification: An allocation of items or objects to
Quinlan, J. R. (1993). C4.5: Programs for Machine classes or categories according to their features.
Learning. San Mateo, CA: Morgan Kaufmann.
Customer Relationship Management: A dynamic
Quinlan, J. R. (1986). Induction of Decision Trees. process to manage the relationships between a company
Machine Learning , 1 (1), 81-106. and her customers, including collecting, storing and
analysing customers information.
Roiger, R. J., & Geatz, M. W. (2003). Data Mining: A
Tutorial-Baseed Primer. Montreal: Addison-Wesley. Data Mining: Also known as knowledge discovery
in database (KDD), which is a process of knowledge
Salford Systems (2004) CART 5.0 [WWW
discovery by analysing data and extracting information
page]. URL http://www.salfordsystems.com/
from a dataset using machine learning techniques.
cart.php
Decision Tree: A predictive model which can be
Spathis, C., Doumpos, M., & Zopounidis, C. (2003).
visualized in a hierarchical structure using leaves and
Using Client Performance Measures to Identify Pre-
ramifications.
engagement Factors Associated with Qualified Audit
Reports in Greece. The International Journal of Ac- Decision Tree Modelling: The process of creating
counting , 38 (3), 267-284. a decision tree model.
SPSS Inc. (2007). Target the Right People More Ef- Fault Diagnosis: An action of identifying a mal-
fectively with AnswerTree [WWW page]. URL http:// functioning system based on observing its behaviour.
www.spss.com/answertree
Fraud Detection Management: The detection of
Sugumaran, V., & Ramachandran, K. (2007). Automatic frauds, especially in those existing in financial state-
Rule Learning Using Decision Tree for Fuzzy Classi- ments or business transactions so as to reduce the risk
fier in Fault Diagnosis of Roller Bearing. Mechanical of loss.
Systems and Signal Processing , 21 (5), 2237-2247.
Healthcare Management: The act of preventing,
Sun, W., Chen, J., & Li, J. (2007). Decision Tree and treating and managing illness, including the preservation
PCA-based Fault Diagnosis of Rotating Machinery. of mental and physical problems through the services
Mechanical Systems and Signal Processing , 21 (3), provided by health professionals.
1300-1317.
Prediction: A statement or a claim that a particular
Tso, G. K., & Yau, K. K. (2007). Predicting Electric- event will happen in the future.
ity Energy Consumption: A Comparison of Regres-
sion Analysis, Decision Tree and Neural Networks.
Energy.
442
443
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
The Dempster-Shafer Theory
can be assigned to the empty set allowing m() 0. final results. Moreover, partial answers are present in
The set of mass values associated with a single piece the final BOE produced (through the combination of
of evidence is called a body of evidence (BOE), often evidence), including focal elements with more than one
denoted m(). The mass value m() assigned to the element, unlike the Bayesian approach where prob-
frame of discernment is considered the amount of abilities on only individual elements would be accrued.
ignorance within the BOE, since it represents the level This restriction of the Bayesian approach to consider
of exact belief that cannot be discerned to any proper singleton elements is clearly understood through the
subsets of . Principle of insufficient Reason, see Beynon et al.
D-S theory also provides a method to combine the (2000) and Beynon (2002, 2005).
BOE from different pieces of evidence, using Demp- To enable final results to be created with D-S theory,
sters rule of combination. This rule assumes these a number of concomitant functions exist with D-S
pieces of evidence are independent, then the function theory, including;
(m1 m2): 2 [0, 1], defined by:
i) The Belief function,
(m1 m2)(x) =
Bel(si) = m( s )
s j si
j
(1)
for all si , represents the extent to which we
is a mass value, where s1 and s2 are focal elements from
fail to disbelieve si,
the BOEs, m1() and m2(), respectively. The denominator
iii) The Pignistic function (see Smets and Kennes,
part of the combination expression includes:
1994),
m ( s1 )m2 ( s2 )
1
s1 s 2 =
| si s j |
BetP(si) = m( s j )
| sj |
,
that measures the level of conflict in the combination s j , s j
process (Murphy, 2000). It is the existence of the de-
nominator part in this combination rule that separates for all si , represents the extent to which we
D-S theory (includes it) from TBM (excludes it). fail to disbelieve si.
Benouhiba and Nigro (2006) view this difference as
whether considering the conflict mass:
From the definitions given above, the Belief function
is cautious of the ignorance incumbent in the evidence,
( m ( s1 )m2 ( s2 ) )
1
where as the Plausibility function is more inclusive of
s1 s 2 =
its presence. The Pignistic function acts more like a
probability function, partitioning levels of exact belief
as a further form of ignorance mass is an acceptable (mass) amongst the elements of the focal element it is
point of view. associated with.
D-S theory, along with TBM, also differs to the
Bayesian approach in that it does not necessarily produce
444
The Dempster-Shafer Theory
A non-specificity measure N(m()) within D-S Table 1. Intermediate combination of BOEs, m1() and
theory was introduced by Dubois and Prade (1985), m2() D
the formula is defined as,
m1() \ m2() {Tom, Sarah}, 0.6 , 0.4
{Henry, Tom},
{Henry, Tom}, 0.8 {Tom}, 0.48
N(m()) = m( s j ) log 2 | s j | , 0.32
s j 2 {Tom, Sarah},
, 0.2 , 0.08
0.12
where |sj| is the number of elements in the focal ele-
ment sj. Hence, N(m()) is considered the weighted
average of the focal elements, with m() the degree
of evidence focusing on sj, while log2|sj| indicates the multiplication of the focal elements and mass values
lack of specificity of this evidential claim. The general from the BOEs, m1() and m2(), see Table 1.
range of this measure is [0, log2||] (given in Klir and In Table 1, the intersection and multiplication of the
Wierman, 1998), where || is the number of elements focal elements and mass values from the BOEs, m1()
in the frame of discernment . and m2() are presented. The new focal elements found
are all non-empty, it follows, the level of conflict
Main Thrust
Witness 2, is 60% confident that Henry was leaving = m1({Henry, Tom})log2|{Henry, Tom}|
on a jet plane when the murder occurred, so a BOE + m1({Henry, Tom, Sarah})log2|{Henry, Tom, Sarah}|,
defined m2() includes, m2({Tom, Sarah}) = 0.6 and = 0.8 log22 + 0.2 log23 = 1.117,
m2({Henry, Tom, Sarah}) = 0.4.
and N(m2()) = 1.234. The non-specificity associated
The aggregation of these two sources of information with the combined is similarly calculated, found to be
(evidence from the two witnesses), using Dempsters N(m3()) = 0.567. The values further demonstrate the
combination rule (1), is based on the intersection and
445
The Dempster-Shafer Theory
effect of the combination process, namely a level of This approach to counter the conflict possibly pres-
concomitant non-specificity associated with the BOE ent when combining evidence is often viewed as not
m3(), found from the combination of the other two appropriate, with TBM introduced to offer a solution,
BOEs m1() and m2(). hence using the second Witness 2 statement, the re-
To allow a comparison of this combination process, sultant combined BOE, defined m5(), is taken directly
D-S theory is used with the situation for TBM, the evi- from Table 2;
dence from witness 2 is changed slightly, becoming;
m5() = 0.48, m5({Henry, Tom}) = 0.32,
m5({Sarah}) = 0.12 and m5({Henry, Tom, Sarah}) = 0.08.
Witness 2, is 60% confident that Henry and Tom were
leaving on a jet plane when the murder occurred, so
a BOE defined m2() includes, m2({Sarah}) = 0.6 and The difference between the BOEs, m4() and m5(),
m2({Henry, Tom, Sarah}) = 0.4. is in the inclusion of the focal element m5() = 0.48,
allowed when employing TBM. Beyond the differ-
The difference between the two Witness 2 state- ence in the calculations made between D-S theory and
ments is that, in the second statement, now Tom is also TBM, the important point is what is the interpretation
considered to be leaving on the jet plane with Henry. to the m5() expression in TBM. Put succinctly, fol-
The new intermediate calculations when combining the lowing Smets (1990), m5() = 0.48 corresponds to
evidence from the two witnesses is shown in Table 2. that amount of belief allocated to none of the three
In the intermediate results in Table 2, there is an suspects, taken further it is the proposition that none
occasion where the intersection of two focal elements of the three suspects is the murderer. Since the three
from m1() and m2() results in an empty set (). It individuals are only suspects, the murderer might be
follows, someone else, if the initial problem has said that one
of the three individuals is the murderer then the D-S
theory approach should be adhered to.
Returning to the analysis of the original witness
m ( s )m ( s )
s1s2 =
1 1 2 2 = 0.48,
statements, the partial results presented so far do not
identify explicitly which suspect is most likely to have
giving the value, 1 0.48 = 0.52, forms the denominator undertaken the murder of Mr. White. To achieve explicit
in the expression for the combination of this evidence results, the three measures, Bel(si), Pls(si) and BetP(si)
(see (1)), so the resultant BOE, here defined m4(), is; previously defined, are considered on singleton focal
elements (si are individual suspects);
m4({Henry, Tom}) = 0.32/0.52 = 0.615, m4({Sarah}) = 0.231
and m4({Henry, Tom, Sarah}) = 0.154. Bel({Henry}) = m (s j ) = 0.00,
3
s j {Henry}
Comparison with the results in the BOEs, m3() similarly Bel({Tom}) = 0.48 and Bel({Sarah}) = 0.00.
and m4(), show how the mass value associated with
m3({Tom}) = 0.48 has been spread across the three
focal elements which make up the m4() BOE.
Pls({Henry}) = m (s3 j ) = m3({Henry, Tom}) +
s j {Henry}
446
The Dempster-Shafer Theory
+ m () | {Henry} | , REFERENCES
3
|| D
= 0.16 + 0.027 = 0.187, Benouhiba, T., & Nigro, J.-M. (2006). An evidential
similarly, BetP({Tom}) = 0.727 and BetP({Sarah}) cooperative multi-agent system, Expert Systems with
= 0.087. Applications. 30, 255-264.
In this small example, all three measures identify the Beynon, M.J. (2002). DS/AHP method: A mathemati-
suspect Tom as having the most evidence purporting cal analysis, including an understanding of uncertainty.
to them being the murderer of Mr. White. European Journal of Operational Research, 140(1),
149-165.
Beynon, M.J. (2005). Understanding Local Igno-
FUTURE TRENDS rance and Non-specificity in the DS/AHP Method of
Multi-criteria Decision Making. European Journal of
Dempster-Shafer (D-S) theory is a methodology that Operational Research, 163, 403-417.
offers an alternative, possibly developed generality,
to the assignment of frequency-based probability to Beynon, M., Curry, B., & Morgan, P. (2000). The
events, in its case levels of subjective belief. However, Dempster-Shafer Theory of Evidence: An Alternative
the issues surrounding its position with respect to other Approach to Multicriteria Decision Modelling. OMEGA
methodologies such as the more well known Bayesian - International Journal of Management Science, 28(1),
approach could be viewed as stifling it utilisation. The 37-50.
important point to remember when considering D-S Cobb, B.R., & Shenoy, P.P. (2003). A Comparison of
theory is that it is a general methodology that requires Bayesian and Belief Function Reasoning. Information
subsequent pertinent utilisation when deriving nascent Systems Frontiers, 5(4), 345-358.
techniques.
Future work needs to aid in finding the position of Dempster, A.P. (1967). Upper and lower probabilities
D-S theory relative to the other methodologies. That induced by a multi-valued mapping. Annals of Math-
is, unlike methodologies like fuzzy set theory, D-S ematical Statistics, 38, 325-339.
theory is not able to be employed straight on top of Dempster, A.P., & Kong, A. (1988). Uncertain evidence
existing techniques, to create a D-S type derivative of and artificial analysis. Journal of Statistical Planning
the technique. Such derivatives, for example, could and Inference, 1, 355-368.
operate on incomplete data, including when there are
missing values, their reason for missing possibly due Dubois, D., & Prade, H. (1985). A note on measures
to ignorance etc. of specificity for fuzzy sets. International Journal of
General Systems, 10, 279-283.
Huang, Z., & Lees, B. (2005). Representing and reduc-
CONClUSION ing error in natural-resource classification using model
combination. International Journal of Geographical
Dempster-Shafer (D-S) theory, and its general devel- Information Science, 19(5), 603-621.
opments, continues to form the underlying structure
to an increasing number of specific techniques that Klir, G.J., & Wierman, M.J. (1998). Uncertainty-Based
attempt to solve certain problems within the context of Information: Elements of Generalized Information
uncertain reasoning. As mentioned in the future trends Theory. Physica-Verlag, Heidelberg.
section, the difficulty with D-S theory is that it needs Liu, L. (1999). Local computation of Gaussian belief
to be considered at the start of work at creating a new functions. International Journal of Approximate Rea-
technique for analysis. It follows, articles like this soning, 22, 217-248.
which show the rudimentary workings of D-S theory
allow researchers the opportunity to see its operation, Murphy, C.K. (2000). Combining belief functions
and so may contribute to its further utilisation. when evidence conflicts. Decision Support Systems,
29, 1-9.
447
The Dempster-Shafer Theory
Nguyen, H.T. (1978). On random sets and belief func- KEy TERMS
tions. Journal of Mathematical Analysis Applications,
65, 531-542. Belief: In Dempster-Shafer theory, the level of
representing the confidence that a proposition lies in
Schubert, J. (1994). Cluster-based specification tech-
a focal element or any subset of it.
niques in Dempster-Shafer theory for an evidential
intelligence analysis of multiple target tracks. Depart- Body of Evidence: In Dempster-Shafer theory, a
ment of Numerical Analysis and Computer Science series of focal elements and associated mass values.
Royal Institute of technology, S-100 44 Stockholm,
Sweden. Focal Element: In Dempster-Shafer theory, a set
of hypotheses with positive mass value in a body of
Shafer, G.A. (1976). Mathematical Theory of Evidence. evidence.
Princeton University Press, Princeton.
Frame of Discernment: In Dempster-Shafer theory,
Smets, P. (1990). The Combination of Evidence in the set of all hypotheses considered.
the Transferable belief Model. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 12(5), Dempster-Shafer Theory: General methodol-
447-458. ogy, also known as the theory of belief functions,
its rudiments are closely associated with uncertain
Smets, P., & Kennes, R. (1994). The transferable belief reasoning.
model. Artificial Intelligence, 66(2), 191-243.
Ignorance: In Dempster-Shafer theory, the level of
Xia, Y., Iyengar, S.S., & Brener, N.E. (1997). An event mass value not discernible among the hypotheses.
driven integration reasoning scheme for handling dy-
namic threats in an unstructured environment. Artificial Mass Value: In Dempster-Shafer theory, the level
Intelligence, 95, 169-186. of exact belief in a focal element.
Wadsworth, R.A., & Hall, J.R. (2007). Setting Site Non-Specificity: In Dempster-Shafer theory, the
Specific Critical Loads: An Approach using Endorse- weighted average of the focal elements mass values in
ment Theory and DempsterShafer. Water Air Soil a body of evidence, viewed as a species of a higher un-
Pollution: Focus, 7, 399-405. certainty type, encapsulated by the term ambiguity.
Plausibility: In Dempster-Shafer theory, the extent
to which we fail to disbelieve a proposition lies in a
focal element.
448
449
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Dependency Parsing
Figure1. Dependency Tree for the sentence The red car hit the big motorcycle
450
Dependency Parsing
Table 1. Treebank information; #T = number of tokens * 1000, #S = number of sentences * 1000, #T/#S = tokens
per sentence, %NST = % of non-scoring tokens (only in CoNLL-X), %NPR = % of non-projective relations,
%NPS = % of non-projective sentences, IR = has informative root labels
Arabic 54 112 1.5 2.9 37.2 38.3 8.8 - 0.4 11.2 10.1 Y -
0.4
Basque - 51 - 3.2 - 38.3 - - - 2.9 - 26.2 - -
Bulgarian 190 12.8 - 14.8 - 14.4 - 0.4 - 5.4 - N -
Catalan - 431 - 15 - 28.8 - - - 0.1 - 2.9 - -
Chinese 337 337 57 57 5.9 5.9 0.8 - 0.0 0.0 0.0 0.0 N -
Czech 432 72.7 25.4 17.2 17.0 14.9 - 1.9 1.9 23.2 23.2 Y -
Danish 94 - 5.2 - 18.2 - 13.9 - 1.0 - 15.6 - N -
Dutch 195 - 13.3 - 14.6 - 11.3 - 5.4 - 36.4 - N -
English - 447 - 18.6 - 24.0 - - - 0.3 - 6.7 - -
German 700 - 39.2 - 17.8 - 11.5 - 2.3 - 27.8 - N -
Greek - 65 - 2.7 - 24.2 - - - 1.1 - 20.3 - -
Hungarian - 132 - 6.0 - 21.8 - - - 2.9 - 26.4 - -
Italian - 71 - 3.1 - 22.9 - - - 0.5 - 7.4 - -
Japanese 151 - 17 - 8.9 - 11.6 - 1.1 - 5.3 - N -
Portuguese 207 - 9.1 - 22.8 - 14.2 - 1.3 - 18.9 - Y -
Slovene 29 - 1.5 - 18.7 - 17.3 - 1.9 - 22.2 - Y -
Spanish 89 - 3.3 - 27 - 12.6 - 0.1 - 1.7 - N -
Swedish 91 - 11 - 17.3 - 11.0 - 1.0 - 9.8 - N -
Turkish 58 65 5 5.6 11.5 11.6 33.1 - 1.5 5.5 11.6 33.3 N -
451
Dependency Parsing
linguistic data. Parsing was more focused on training parser. Nivre et al. (2004) does this by memory-based
and parsing with phrase structure trees and specifically learning. They are all deterministic parsers.
English language because the Penn Treebank (Marcus McDonald et al. (2005b) tried something new and
et al., 1993) was the only available source for a long applied graph spanning algorithms to dependency
time. With the introduction of treebanks of different parsing. They formalise dependency parsing as the
languages it is now possible to explore the bounds of problem of finding a maximum spanning tree in a di-
multilingual parsing. rected graph. MIRA is used to determine the weights
The early efforts of data-driven dependency parsing of dependency links as part of this computation. This
were focused on translating dependency structures to algorithm has two major advantages: it runs in O(n2)
phrase structure trees for which the parsers already time and it can handle non-projective dependencies
existed. But it was realised quickly that doing this was directly. They show that this algorithm significantly
not as trivial as previously thought. It is much more improves performance on dependency parsing for
trivial and more intuitive to represent some phenomena Czech, especially on sentences which contain at least
with dependency trees rather than phrase structure trees one crossed dependency. Variations of this parser has
such as local and global scrambling, in other words been used in CoNLL-X shared task and received the
free word-order. Thus the incompatible translations of highest ranking among the participants averaged over
dependency structures to phrase structure trees resulted the results of all of the 13 languages (Buchholz and
in varying degrees of loss of information. Marsi, 2006). However, when no linguistic or global
Collins et al. (1999) reports results on Czech. He constraints are applied it may yield absurd dependency
translates the dependency trees to phrase structure sequences such as assigning two subjects to a verb
trees in the flattest way possible and names the inter- (Riedel et al., 2006). McDonald (2005a) uses MIRA
nal nodes after part of speech tag of the head word of learning algorithm with Eisners parser and reports
that node. He uses Model 2 in Collins (1999) and then results for projective parsing.
evaluates the attachment score on the dependencies
extracted from the resulting phrase structure trees of
his parser. However, crossing dependencies cannot FUTURE TRENDS
be translated into phrase structure trees (akc and
Baldridge, 2006) unless surface order of the words is There is growing body of work on creating new tree-
changed. But Collins et al. (1999) does not mention banks for different languages. Requirements for the
crossing dependencies, therefore, we do not know how design of these treebanks are at least as diverse as these
he handled non-projectivity. natural languages themselves. For instance, some lan-
One of the earliest statistical systems that aims guages have a much more strong morphological com-
parsing dependency structures directly without an in- ponent or freer word order than others. Understanding
ternal representation of translation is Eisner (1996). He and modelling these in the form of annotated linguistic
proposes 3 different generative models. He evaluates data will guide the understanding of natural language,
them on the dependencies derived from the Wall Street and technological advancement will hopefully make it
Journal part of the Penn Treebank. Eisner reports 90 easier to understand the inner workings of the language
percent for probabilistic parsing of English samples faculty of humans. There are challenges both for de-
from WSJ. He reports 93 percent attachment score pendency parsing and for the dependency theory. For
when gold standard tags were used, which means 93 instance, modelling long-distance dependencies and,
percent of all the dependencies are correct regardless multiple head dependencies are still awaiting attention
of the percentage of the dependencies in each sentence. and there is much to do on morphemic dependency ap-
Eisners parser is a projective parser thus it cannot proach where heads of phrases can be morphemes rather
inherently predict crossing dependencies. than words in a sentence for morphologically complex
Discriminative dependency parsers such as Kudo and languages. Although these constitute a fraction of all the
Matsumoto (2000, 2002), and Yamada and Matsumoto phenomena in natural languages, they are the tricky
(2003) were also developed. They use support vector part of the NLP systems that will never be perfect as
machines to predict the next action of a deterministic long as these natural phenomena are ignored.
452
Dependency Parsing
453
Dependency Parsing
and Valency, Walter de Gruyter, pp. 593-635. Meluk, I.A. (1988). Dependency Syntax: Theory and
Practice. State Univ. of New York Press.
Hudson, R.A. (1990). English Word Grammar. Black-
well. Menzel, W. and Schrder, I. (1998). Decision Procedures
for Dependency Parsing Using Graded Constraints. In
Hudson, R.A. (1984). Word Grammar. Blackwell
Kahane, S. and Polguere, A. (eds), Proceedings of the
Joshi, A.K. (1985). Tree adjoining grammars: How Workshop on Processing of Dependency-Based
much context-sensitivity is required to provide rea- Grammars, pp. 78-87.
sonable structural descriptions? In David R. Dowty,
Nivre, J., Hall, J. and Nilsson, J. (2004). Memory-based
Lauri Karttunen, and Arnold M. Z w i c k y, e d i t o r s ,
Dependency Parsing. In Ng, H. T. and Riloff,E. (eds),
Natural Language Parsing, (pp. 206-250). Cambridge
Proceedings of the 8th Conference on Computational
University Press, Cambridge, UK.
Natural Language Learning (CoNLL), pp. 49-56.
Joshi, A.K., Levy, L.S., and Takahashi, M. (1975).
Nivre, J., Hall, J., Kbler, S., McDonald, R., Yret,
Tree Adjunct Grammars. Journal Computer Systems
D., Nilsson, J. and Riedel S. (2007). The CoNLL 2007
Science, 10(1).
Shared Task on Dependency Parsing. Proceedings of
Kruijff, G-J. M. (2001). A Categorial-Modal Archi- the CoNLL Shared Task Session of EMNLP-CoNLL
tecture of Informativity: Dependency Grammar Logic 2007, pp. 915-932,
and Information Structure, Ph.D. thesis, Faculty of
Petkevi V. (1987). A New Dependency Based Speci-
Mathematics and Physics, Charles University, Prague,
fication of Underlying Representations of Sentences.
Czech Republic
Theoretical Linguistics 14: 143-172.
Kudo, T. and Matsumoto, Y. (2000). Japanese Depen-
Petkevi V. (1995). A new Formal Specification of
dency Structure Analysis Based on Support Vector
Underlying Representations. Theoretical Linguistics
Machines. Proceedings of the Joint SIGDAT Confer-
21: 7-61.
ence on Empirical Methods in Natural Language
Processing and Very Large Corpora (EMNLP/VLC), Quirk, C., Menezes, A. and Cherry, C. (2005). Depen-
pp. 18-25. dency treelet translation: Syntactically informed phrasal
SMT. In 43rd Annual Meeting of the Association for
Kudo, T. and Matsumoto, Y. (2002). Japanese Depen-
Computational Linguistics, pp. 271-279.
dency Analysis Using Cascaded Chunking. Proceedings
of the Sixth Workshop on Computational Language Riedel, S., akc, R., and Meza-Ruiz, I. (2006). Multi-
Learning (CoNLL), pp. 63-69. lingual dependency parsing with incremental integer
linear programming. Proceedings of the 10th Confer-
Marcus, M. P., Santorini, B. and Marcinkiewicz, M. A.
ence on Computational Natural Language Learning
(1993). Building a Large Annotated Corpus of English:
(CoNLL).
The Penn Treebank. Computational Linguistics 19:
313-330. Robinson, J. J. (1970). Dependency Structures and
Transformational Rules. Language 46: 259285.
McDonald, R., Crammer, K. and Pereira, F. (2005 a).
Online Large-margin Training of Dependency Pars- Starosta, S. (1988). The Case for Lexicase: An Outline
ers. Proceedings of the 43rd Annual Meeting of the of Lexicase Grammatical Theory. Pinter Publishers.
Association for Computational Linguistics (ACL),
Samuelsson, C. (2000). A Statistical Theory of
pp. 91-98.
Dependency Syntax. Proceedings of the 18th Inter-
McDonald, R., Pereira F., Ribarov K., and Haji J. national Conference on Computational Linguistics
(2005 b). Non-projective Dependency Parsing u s - (COLING).
ing Spanning Tree Algorithms. Proceedings of Human
Steedman, M. (2000). The Syntactic Process, The MIT
Language Technology Conference and Conference on
Press, Cambridge, MA.
Empirical Methods in Natural Language Processing
(HLT/EMNLP), pp. 523-530,
454
Dependency Parsing
Sgall,P., Nebesk L., Goralkov, A. and Hajiov E. Morpheme: The smallest unit of meaning.
(1969). A Functional Approach to Syntax in Generative A word may consist of one morpheme (need), D
Description of Language, Elsevier, New York. two morphemes (need/less, need/ing) or more
Sgall, P., Hajiov, E. and Panevov, J. (1986). The (un/happi/ness).
Meaning of the Sentence in Its Pragmatic Aspects. Phrase Structure Tree: A structural representa-
Reidel.
tion of a sentence in the form of an inverted tree,
Tesnire, L. (1959). Elments de syntaxe structurale. with each node of the tree labelled according to
Klinsieck, Paris, France. the phrasal constituent it represents.
Tapanainen, P. and Jarvinen, T. (1997). A Non-projective Rule-Based Parser: A parser that uses hand
Dependency Parser. Proceedings of the 5thConference written (designed) rules as opposed to rules that
on Applied Natural Language Processing, pp. 64-71. are derived from the data.
Vijay-Shanker,K. and Weir, D. (1994). The Equiva-
Statistical Parser: A group of parsing methods
lence of Four Extensions of Context-free Grammar.
Mathematical Systems Theory, 27:511-546. within NLP. The methods have in common that they
associate grammar rules with a probability.
Yamada, H. and Matsumoto, Y. (2003). Statistical
Dependency Analysis with Support Vector Treebank: A text-corpus in which each
Machines. In Van Noord, G. (ed.), Proceedings of the sentence is annotated with syntactic structure.
8th International Workshop on Parsing Technologies Syntactic structure is commonly represented as a
(IWPT), pp. 195-206. tree structure. Treebanks can be used in corpus
linguistics for studying syntactic phenomena or
KEy TERMS in computational linguistics for training or testing
parsers.
Corpus (corpora plural): A collection of
written or spoken material in machine-readable
form. ENDNOTE
455
456
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Designing Unsupervised Hierarchical Fuzzy Logic Systems
A first step in the construction of a fuzzy logic sys- 2. Encode the fuzzy rules into bit-string of 0 and
tem is to determine which variables are fundamentally 1, D
important. Any number of these decision variables may 3. Use a genetic algorithm as a learning procedure
appear, but the more that are used, the larger the rule set to generate set of fuzzy rules,
that must be found. It is known [Raju, S., Zhou J. and 4. Use a fuzzy logic controller to assess the set of
Kisner, R. A. (1990), Raju G. V. S. and Zhou, J. (1993), fuzzy rules and assign a value to each generated
Kingham, M., Mohammadian, M, and Stonier, R. J. set of fuzzy rules,
(1998)], that the total number of rules in a system is an 5. Stop generating new sets of fuzzy rules once some
exponential function of the number of system variables. performance criteria is met,
In order to design a fuzzy system with the required
accuracy, the number of rules increases exponentially Figure 1 shows the genetic fuzzy rule generator
with the number of input variables and its associated architecture graphically. Suppose we wish to produce
fuzzy sets for the fuzzy logic system. A way to avoid the fuzzy rules for a fuzzy logic control with two inputs and
explosion of fuzzy rule bases in fuzzy logic systems is single output. This simple two-input u1, u2 single-output
to consider Hierarchical Fuzzy Logic Control (HFLC) y case is chosen in order to clarify the basic ideas of
[Raju G. V. S. and Zhou, J. (1993)]. A learning approach our new approach. Extensions to multi-output cases are
based on genetic algorithms [Goldberg, D. (1989)] is straightforward. For more information on multi-output
discussed in this paper for the determination of the rule cases refer to Mohammadian et al [Mohammadian, M.
bases of hierarchical fuzzy logic systems. and Stonier, R J., (1998)].
As a first step we divide the domain intervals of
u1, u2 and y into different fuzzy sets. The number of
THE GENETIC FUZZy RUlE the fuzzy sets is application dependent. Assume that
GENERATOR ARCHITECTURE we divide the interval for u1, u2 and y into 5, 7 and 7
fuzzy sets respectively. For each fuzzy set we assign
In this section we show how to learn the fuzzy rules a fuzzy membership function. Therefore a maximum
in a fuzzy logic rule base using a genetic algorithm. of 35 fuzzy rules can be constructed for this system.
The full set of fuzzy rules is encoded as a single string Now the fuzzy rule base can be formed as a 57 table
in the genetic algorithm population. To facilitate this with cells to hold the corresponding actions that must
we develop the genetic fuzzy rule generator whose be taken given the condition corresponding to u1, u2
architecture consists of five basic steps are satisfied.
In step 2 we encode the input and output fuzzy sets
1. Divide the input and output spaces of the system into bit-strings (of 0 and 1). Each complete bit-string
to be controlled into fuzzy sets (regions), consists of 35 fuzzy rules for this example and each
F u zzy in p u t
G A f u zzy r u le
r e g io n s
g e n e r a to r b lo c k
F u zzy o u tp u t
r e g io n s F L C e v a lu a to r A set of
R u le b a s e
b lo c k
f u zzy r u le s
457
Designing Unsupervised Hierarchical Fuzzy Logic Systems
fuzzy rule base has the same input conditions but to 7 which represents in this example the five output
may have different output control signal assigned to fuzzy sets. The genetic algorithm process performs
it. Therefore we need only to encode the output signal a self-directed search according to fitness value. In
of the fuzzy rule bit-strings into a complete bit-string. all applications in this paper we seek to minimise the
This will save the processing time for encoding and fitness function. The process can be terminated after
decoding of genetic algorithms strings. In this case the a desired number of generations or when the fitness
length of an individual string has been reduced from value of the best string in a generation is less than
665 bits (i.e. 1935) to 245 bits (i.e. 735). The choice some prescribed level.
of output control signal to be set for each fuzzy rule is
made by the genetic algorithm. It initialises randomly
a population of complete bit-strings. Each of these bit- VARIABlE SElECTION AND RUlE
strings is then decoded into fuzzy rules and evaluated BASE DECOMPOSITION
by fuzzy logic controller to determine the fitness value
for that bit-string. Application of proportional selection Traffic Light Control
and mutation and one-point crossover operations can
now proceed. Selection and crossover are the same as Traffic light control is widely used to resolve con-
in simple genetic algorithms while the mutation opera- flicts among vehicle movements at intersections. The
tion is modified. Crossover and mutation take place control system at each signalised intersection consists
based on the probability of crossover and mutation of the following three control elements, cycle time,
respectively. The mutation operator is changed to suit phase splits and offset. Cycle time is the duration of
this problem, namely, an allele is selected at random completing all phases of a signal; phase split is the
and it is replaced by a random number ranging from 1 division of the cycle time into periods of green phase
for competing approaches; and offset is the time dif-
ference in the starting times of the green phases of
adjacent intersections.
Figure 2. Three adjacent intersections In [Nainar, I., Mohammadian, M., Stonier, R. J.
and Millar, J. (1996)] a fuzzy logic control scheme is
1350 veh/hr
proposed to overcome the lack of interactions between
the neighbouring intersections. First, a traffic model is
developed and a fuzzy control scheme for regulating
sensors the traffic flow approaching a single traffic intersection
is proposed. A new fuzzy control scheme employing
1000 veh/hr a supervisory fuzzy logic controller is then proposed
1100 veh/hr
to coordinate the three intersections based on the
traffic conditions at all three intersections. Simula-
100 metres
tion results established the effectiveness of proposed
scheme. Figure 2 shows the three intersections used
1000 veh/hr
in the simulation.
1000 veh/hr
A supervisory fuzzy logic control system is then
developed to coordinate the three intersections far
more effectively than the three local fuzzy logic con-
trol systems. This is because using supervisory fuzzy
900 veh/hr logic controller each intersection is coordinated with
850 veh/hr all its neighbouring intersections [Nainar, I., Moham-
madian, M., Stonier, R. J. and Millar, J. (1996)]. The
100 metres fuzzy knowledge base of the supervisory fuzzy logic
controller was learnt using genetic algorithms. The
supervisory fuzzy logic controller developed to coor-
1300 veh/hr dinate the three intersections coordinated the traffic
458
Designing Unsupervised Hierarchical Fuzzy Logic Systems
signals far more effectively than the three local fuzzy In Figure 3, x; y gives the physical position of the
logic controllers. This is because using supervisory robot on the plane, is the directional heading of the D
fuzzy logic controller each intersection is coordinated robot, q is the steering angle, D1 and D2 are the distances
with all its neighbouring intersections. This proposed of the two robots from there respective targets, S1 and
fuzzy logic control scheme can be effectively ap- S2 are the speeds of the two robots, D is the distance
' ''
plied to on-line traffic control because of its ability to between the two robots and q1 , q 2' , q1 , q 2'' , s1' and s 2'
handle extensive traffic situations. Simulations results are the updates of variable outputs for the last two layers.
have shown that the multi-layer fuzzy logic system An important issue in this example is that of learning
consisting of three local fuzzy logic controllers and knowledge in a given layer sufficient for use in higher
the supervisory fuzzy logic controller is capable of layers. In the first layer of the hierarchical fuzzy logic
reducing the waiting time on the network of the traffic system, ignoring the possibility of collision, steering
intersections [Nainar, I., Mohammadian, M., Stonier, angles for the control of each robot to their associated
R. J. and Millar, J. (1996)]. target were determined by genetic algorithms. In the
second layer genetic algorithm was used to determine
Collision-Avoidance in a Robot System adjustments to steering angle and speed of each robot
to control the speed of the robot when arriving to its
Consider the following collision-avoidance problem in target. Next another layer is developed to adjust the
a simulated, point mass, two robot system. A three-level speed and steering angle of the robots to avoid colli-
hierarchical, fuzzy logic system was proposed to solve sion of the robots. Consider the knowledge base of a
the problem, full details can be found in [Mohammadian, single robot in layer one. It is not sufficient to learn a
M. and Stonier, R. J. (1998)], see also [Mohammadian, fuzzy knowledge base from an initial configuration and
M. and Stonier, R. J. (1995)]. In the first layer, two use this knowledge base for information on the steer-
knowledge bases, one for each robot, are developed ing angle of the robot to learn fuzzy controllers in the
to find the steering angle to control each robot to its second layer. Quite clearly this knowledge base is only
target. In the second layer two new knowledge bases guaranteed to be effective from this initial configura-
are developed using the knowledge in the first layer tion as not all the fuzzy rules will have fired in taking
to control the speed of each robot so that each robot the robot to its target. We have to find a knowledge
approaches its target with near zero speed. Finally in base that is effective to some acceptable measure, in
the third layer, a single knowledge base is developed controlling the robot to its target from any initial
to modify the controls of each robot to avoid collision configuration. One way is to first learn a set of local
in a restricted common workspace, see Figure 3. fuzzy controllers, each knowledge base learnt by an
459
Designing Unsupervised Hierarchical Fuzzy Logic Systems
genetic algorithms from a given initial configuration knowledge base and the membership functions in the
within a set of initial configurations spread uniformly inference process. This is usually accomplished by using
over the configuration space. These knowledge bases a genetic algorithms to produce the best fuzzy rules
can then be fused through a fuzzy amalgamation process and membership functions/parameters with respect
[Stonier, R. J. and Mohammadian, M. (1995)] into the to an optimisation criterion. There are three main ap-
global (final), fuzzy control knowledge base. An alter- proaches in the literature for learning the rules in a fuzzy
native approach [Mohammadian, M. and Stonier, R. J. knowledge base. They are, the Pittsburgh approach,
(1996), Stonier, R. J. and Mohammadian, M. (1998)], the Michigan approach and the iterative rule-learning
is to develop an genetic algorithms to learn directly approach [Cordon, O., Herrera, F., Hoffmann, F. and
the final knowledge base by itself over the region of Magdalena, L. (2001)]. The Pittsburgh and Michigan
initial configurations. approaches are the most commonly used methods in
In conclusion the proposed hierarchical fuzzy logic the area.
system is capable of controlling the multi-robot system Research by the authors, colleagues and postgradu-
successfully. By using hierarchical fuzzy logic system ate students has predominately used the Pittsburgh
the number of control laws is reduced. In the first layer approach with success in learning the fuzzy rules
of hierarchical fuzzy logic system ignoring the pos- in complex systems, across hierarchical and multi-
sibility of collision, steering angles for the control of layered structures in problems [Stonier, R. J. and
each robot to their associated target were determined Zajaczkowski, J. (2003), Kingham, M., Mohammad-
by genetic algorithms. In the second layer genetic al- ian, M, and Stonier, R. J. (1998), Mohammadian, M.
gorithm was used to determine adjustments to steering and Kingham, M. (2004), Mohammadian, M. (2002),
angle and speed of each robot to control the speed of Nainar, I., Mohammadian, M., Stonier, R. J. and Mil-
the robot when arriving to its target. Next another layer lar, J. (1996), Mohammadian, M. and Stonier, R. J.
is developed to adjust the speed and steering angle of (1998), Stonier, R. J. and Mohammadian, M. (1995),
the robots to avoid collision of the robots. If only one Thomas, P. J. and Stonier, R. J. (2003), Thomas, P. J.
fuzzy logic system was used to solve this problem with and Stonier, R. J. (2003a)].
the inputs x, y, f of each robot and D each with the
same fuzzy sets described in this paper then there would
be 153125 fuzzy rule needed for its fuzzy knowledge FUTURE TRENDS
base. Using a hierarchical fuzzy logic system there
is a total number of 1645 fuzzy rules for this system. In using the Pittsburgh approach the coding of the
The hierarchical concept learning using the proposed fuzzy rule base as a linear string in an evolutionary
method makes easier the development of fuzzy logic algorithm has its drawbacks other than the string may
control systems, by encouraging the development of even be relatively large in length under decomposi-
fuzzy logic controllers where the large number of tion into multi-layer and hierarchical structures. One
systems parameters inhibits the construction of such is that this is a specific linear encoding of a nonlinear
controllers. For more details, we refer the reader to structure and typical one-point crossover when imple-
the cited papers. mented introduces bias when reversing the coding to
obtain the fuzzy logic rule base. Using co-evolutionary
algorithms is also another option that needs further
ISSUES IN RUlE BASE investigation.
IDENTIFICATION
460
Designing Unsupervised Hierarchical Fuzzy Logic Systems
into hierarchical/multi-layered fuzzy logic sub-sys- Mohammadian, M. (2002), Designing customised hi-
tems reduces greatly the number of fuzzy rules to be erarchical fuzzy systems for modelling and prediction, D
defined and to be learnt, other issues arise such as the Proceedings of the International Conference on sinu-
decomposition is not unique and that it may give rise lated Evolution and Learning (SEAL02), Singapore,
to variables with no physical significance. This can ISBN 9810475233.
raise then major difficulties in obtaining a complete
Mohammadian, M. and Kingham, M. (2004), An adap-
class of rules from experts even when the number of
tive hierarchical fuzzy logic system for modelling of
variables is small.
financial systems, Journal of Intelligent Systems in
Accounting, Finance and Management, Wiley Inter-
science, Vol. 12, 61-82.
ACKNOWlEDGMENT
Mohammadian, M. and Kingham, M. (2005), Intel-
The authors wish to thank those colleagues and stu- ligent Data Analysis, Decision Making and Modelling
dents who have helped in this research and associated Adaptive Financial Systems Using Hierarchical Neural
publications. Networks, Knowledge-Base Intelligent Information
and Engineering Systems, KES2005, Springer Verlag,
Australia, ISBN 3540288953.
REFERENCES
Mohammadian, M. and Stonier, R. J. (1995), Adaptive
Two Layer Fuzzy Logic Control of a Mobile Robot
Bai, Y., Zhuang H. and Wang, D. (2006), Advanced
System, Proceedings of IEEE Conference on Evolu-
Fuzzy Logic Technologies in Industrial Applications,
tionary Computing, Perth, Australia.
Springer Verlag, USA, ISBN 1-84628-468-6.
Mohammadian, M. and Stonier, R. J. (1996), Fuzzy Rule
Cordon, O., Herrera, F. and Zwir, I. (2002), Linguistic
Generation by Genetic Learning for Target Tracking,
Modeling of Hierarchical Systems of Linguistic Rules,
Proceedingsof the 5th International Intelligent Systems
IEEE Transactions on Fuzzy Systems, Vol. 10, No. 1,
Conference, Reno, Nevada.
2-20, USA.
Mohammadian, M. and Stonier, R. J. (1998), Hierarchi-
Cordon, O., Herrera, F., Hoffmann, F. and Magdalena,
cal Fuzzy Control, Proceedings of the 7th International
L. (2001), Genetic Fuzzy Systems Evolutionary Tuning
conference on Information Processing and Manage-
and Learning of Fuzzy Knowledge Bases (Advances
ment of Uncertainty in Knowledge-based Systems,
in Fuzzy SystemsApplications and Theory Vol. 19),
Paris, France.
World Scientific Publishing, USA, ISBN 981-02-
4017-1. Mohammadian, M., Nainar, I. and Kingham, M. (1997),
Supervised and Unsupervised Concept Learning by
Goldberg, D. (1989), Genetic Algorithms in Search,
Genetic Algorithms, Second International ICSC Sym-
Optimisation and Machine Learning, AddisonWes-
posium on Fuzzy Logic and Applications ISFL97,
ley, USA.
Zurich, Switzerland.
Kingham, M., Mohammadian, M, and Stonier, R.
Mohammadian, M. and Stonier, R J., (1998), Hier-
J. (1998), Prediction of Interest Rate using Neural
archical Fuzzy Logic Control , The Seventh Confer-
Networks and Fuzzy Logic, Proceedings of ISCA 7th
ence on Information Processing and Management of
International Conference on Intelligent Systems, Melun,
Uncertainty in Knowledge-Based Systems - IPMU98,
Paris, France.
France.
Magdalena, L. (1998), Hierarchical Fuzzy Control of
Nainar, I., Mohammadian, M., Stonier, R. J. and Millar,
a Complex System using Metaknowledge, Proceed-
J. (1996), An adaptive fuzzy logic controller for control
ings of the 7th International conference on Informa-
fo traffic signals, Proceedings of the 4th International
tion Processing and Management of Uncertainty in
Conference on Control, Automation, Robotics and
Knowledge-based Systems, Paris, France.
Computer Vision (ICARCV96), Singapore.
461
Designing Unsupervised Hierarchical Fuzzy Logic Systems
462
Designing Unsupervised Hierarchical Fuzzy Logic Systems
1. Fitness - A positive measure of utility, called fit- Hierarchical Fuzzy Logic Systems: The idea of
ness, is determined for individuals in a population. hierarchical fuzzy logic control systems is to put the D
This fitness value is a quantitative measure of how input variables into a collection of low-dimensional
well a given individual compares to others in the fuzzy logic control systems, instead of creating a single
population. high dimensional rule base for a fuzzy logic control
2. Selection - Population individuals are assigned a system. Each low-dimensional fuzzy logic control
number of copies in a mating pool that is used to system constitutes a level in the hierarchical fuzzy
construct a new population. The higher a popula- logic control system. Hierarchical fuzzy logic control
tion individuals fitness, the more copies in the is one approach to avoid rule explosion problem. It
mating pool it receives. has the property that the number of rules needed to
3. Recombination - Individuals from the mating pool construct the fuzzy system increases only linearly with
are recombined to form new individuals, called the number of variables in the system
children. A common recombination method is
Unsupervised Learning: In unsupervised learn-
one-point crossover.
ing there is no external teacher or critic to oversee the
4. Mutation - Each individual is mutated with some
learning process. In other words, there are no specific
small probability << 1.0. Mutation is a mechanism
examples of the function to be learned by the system.
for maintaining diversity in the population.
Rather, provision is made for a task-independent mea-
sure of the quality or representation that the system is
required to learn. That is the system learns statistical
regularities of the input data and it develops the ability
to learn the feature of the input data and thereby create
new classes automatically.
463
464
Developmental Robotics
Max Lungarella
University of Zurich, Switzerland
Gabriel Gmez
University of Zurich, Switzerland
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Developmental Robotics
short (ontogenetic) time scales and single individuals with other robots or humans. Examples are visually-
(or small groups of individuals). guided grasping and manipulation, tool-use, perceptual
categorization, and navigation (Fitzpatrick et al., 2007;
Nabeshima et al., 2006).
AREAS OF INTEREST Agent-centered sensorimotor control: In these
studies, robots are used to investigate the exploration of
The spectrum of developmental robotics research bodily capabilities, the effect of morphological changes
can be roughly segmented into four primary areas of on motor skill acquisition, as well as self-supervised
interest. Although instances may exist that fall into learning schemes not linked to any functional goal.
multiple categories, the suggested grouping should Examples include self-exploration, categorization of
provide at least some order in the large spectrum of motor patterns, motor babbling, and learning to walk or
issues addressed by developmental roboticists. crawl (Demiris & Meltzoff, 2007; Lungarella, 2004).
Socially oriented interaction: This category includes Mechanisms and principles: This category embraces
research on robots that communicate or learn particular research on principles, mechanisms or processes
skills via social interaction with humans or other robots. thought to increase the adaptivity of a behaving system.
Examples are imitation learning, communication and Examples are: developmental and neural plasticity,
language acquisition, attention sharing, turn-taking mirror neurons, motivation, freezing and freeing of
behavior, and social regulation (Dautenhahn, 2007; degrees of freedom, and synergies; characterization
Steels, 2006). of complexity and emergence, study of the effects of
Non-social interaction: Studies on robots adaptation and growth, and practical work on body
characterized by a direct and strong coupling between construction or development (Arbib et al., 2007;
sensorimotor processes and the local environment Oudeyer et al., 2007; Lungarella & Sporns, 2006).
(e.g. inanimate objects), but which do not interact
465
Developmental Robotics
466
Developmental Robotics
complex brain-body system, it might be a good strategy The Principle of Cognitive Scaffolding
to start simple and gradually build on top of acquired D
abilities. The well-timed and gradual co-development Observations: Development takes place among
of body morphology and neural system provides an conspecifics with similar internal systems and similar
incremental approach to deal with a complex and external bodies (Smith & Breazeal, 2007). Human
unpredictable world. Early morphological constraints infants, for instance, are endowed from an early age
and cognitive limitations can lead to more adaptive with the means to engage in simple, but nevertheless
systems as they allow exploiting the role that experience crucial social interactions, e.g. they show preferences
plays in shaping the cognitive architecture. If an for human faces, smell, and speech, and they imitate
organism was to begin by using its full complexity, it protruding tongues, smiles, and other facial expressions
would never be able to learn anything (Gmez et al., (Demiris & Meltzoff, 2007).
2004). It follows that designers should not try to code Lessons for robotics: Social interaction bears
a full-fledged ready-to-be-used intelligence module many potential advantages for developmental robots:
directly into an artificial system. Instead, the system (a) it increases the systems behavioral diversity
should be able to discover on its own the most effective through mimicry and imitation (Demiris & Meltzoff,
ways of assembling low-level components into novel 2007); (b) it supports the emergence of language and
solutions [Lungarella (2004; starting simple); Pfeifer communication, and symbol grounding (Steels, 2006);
& Bongard (2007; incremental process principle)]. and (c) it helps structure the robots environment by
simplifying and speeding up the learning of tasks and the
The Principle of Interactive Emergence acquisition of skills. Scaffolding is often employed by
parents and caretakers (intentionally or not) to support,
Observations: Development is not determined by innate shape, and guide the development of infants. Similarly,
mechanisms alone (in other words: not everything the social world of the robot should be prepared to teach
should be pre-programmed). Cognitive structure, for the robot progressively novel and more complex tasks
instance, is largely dependent on the interaction history without overwhelming its artificial cognitive structure
of the developing system with the environment in which [Lungarella (2004; social interaction principle);
it is embedded (Hendriks-Jansen, 1996). Mareschal et al. (2007; ensocialment); Smith &
Lessons for robotics: In traditional engineering Breazeal (2007; coupling to intelligent others)].
the designer of the system imposes (hard-wires) the
structure of the controller and the controlled system.
Designers of adaptive robots, however, should avoid FUTURE TRENDS
implementing the robots control structure according
to their understanding of the robots physics, but The further success of developmental robotics
should endow the robot with means to acquire its will depend on the extent to which theorists and
own understanding through self-exploration and experimentalists will be able to identify universal
interaction with the environment. Systems designed principles spanning the multiple levels at which
for emergence tend to be more adaptive with respect to developmental systems operate. Here, we briefly
uncertainties and perturbations. The ability to maintain indicate some hot issues that need to be tackled en
performance in the face of changes (such as growth or route to a theory of developmental systems.
task modifications) is a long-recognized property of Semiotics: It is necessary to address the issue of how
living systems. Such robustness is achieved through a developmental robots (and embodied agents in general)
host of mechanisms: feedback, modularity, redundancy, can attribute meaning to symbols and construct semiotic
structural stability, and plasticity [Dautenhahn (2007; systems. A promising approach, explored under the
interactive emergence); Hendriks-Jansen (1996; label of semiotic dynamics, is that such semiotic
interactive emergence); Prince et al. (2005; ongoing systems and the associated information structure are
emergence)]. continuously invented and negotiated by groups of
people or agents, and are used for communication and
information organization (Steels, 2006).
467
Developmental Robotics
Clark, A. (2001). Mindware: An Introduction to the Mareschal, D., Johnson, M.H., Sirois, S., Spratling,
Philosophy of Cognitive Science. Oxford University M.W., Thomas, M.S.C., & Westermann, G. (2007). D
Press: Oxford. Neuroconstructivism: How the Brain Constructs
Cognition, Vol.1. Oxford University Press: Oxford,
Dautenhahn, K. (2007). Socially intelligent robots:
UK.
dimensions of human-robot interaction. Phil. Trans.
Roy. Soc. B, 362:679704. Nabeshima, C., Lungarella, M., & Kuniyoshi, Y. (2006).
Adaptive body schema for robotic tool-use. Advanced
Demiris, Y. & Meltzoff, A. (2007). The robot in the
Robotics, 20(10):11051126.
crib: a developmental analysis of imitation skills in
infants and robots. To appear in: Infant and Child Oudeyer, P.-Y., Kaplan, F., & Hafner, V.V. (2007).
Development. Intrinsic motivation systems for autonomous mental
development. IEEE Trans. Evolutionary Computation,
Fitzpatrick, P., Needham, A., Natale, L., & Metta, G.
11(2):265-286.
(2007). Shared challenges in object perception for
robots and infants. To appear in: Infant and Child Pfeifer, R. & Bongard, J.C. (2007). How the Body Shapes
Development. the Way we Think. MIT Press: Cambridge, MA.
Gmez, G., Lungarella, M., Eggenberger, P., Matsushita, Prince, C.G., Helder, N.A., & Hollich, G.J. (2005).
K., & Pfeifer, R. (2004). Simulating development in Ongoing emergence: A core concept in epigenetic
a real robot: on the concurrent increase of sensory, robotics. Proc. 5th Int. Workshop on Epigenetic
motor, and neural complexity. Proc. 4th Int. Workshop Robotics.
on Epigenetic Robotics, pp. 119122.
Smith, L.B. & Gasser, M. (2005). The development of
Gmez, G. & Eggenberger, P. (2007). Evolutionary embodied cognition: Six lessons from babies. Artificial
synthesis of grasping through self-exploratory Life, 11:1330.
movements of a robotic hand. Proc. IEEE Congress
Smith, L.B. & Breazeal, C. (2007). The dynamic lift
on Evolutionary Computation (accepted for
of developmental process. Developmental Science,
publication).
10(1):6168.
Kumar, S. & Bentley, P. (2003). On Growth, Form
Spelke, E. (2000). Core knowledge. American
and Computers. Elsevier Academic Press: San Diego,
Psychologist, 55:12331243.
CA.
Sporns, O. (2007). What neuro-robotic models can
Hendriks-Jansen, H. (1996). Catching Ourselves in the
teach us about neural and cognitive development.
Act. MIT Press: Cambridge, MA.
In: D. Mareschal et al., (eds.) Neuroconstructivism:
Lewis, M.D. & Granic, I. (eds.) Emotion, Development, Perspectives and Prospects, Vol.2, pp. 179204.
and Self-Organization Dynamic Systems Approaches
Steels, L. (2006). Semiotic dynamics for embodied
to Emotional Development. Cambridge University
agents. IEEE Intelligent Systems, 21(3):3238.
Press: New York.
Turing, A.M. (1950). Computing machinery and
Lungarella, M. (2004). Exploring Principles Towards
intelligence. Mind, LIX(236):433460.
a Developmental Theory of Embodied Artificial
Intelligence. Unpublished PhD Thesis. University of Varela, F.J., Thompson, E., & Rosch, E. (1991). The
Zurich, Switzerland. Embodied Mind. MIT Press: Cambridge, MA.
Lungarella, M., Metta, G., Pfeifer, R., & Sandini, G. Weng, J.J., McClelland, J., Pentland, A., Sporns, O.,
(2003). Developmental robotics: a survey. Connection Stockman, I., Sur, M., & Thelen, E. (2001). Autonomous
Science, 15:151190. mental development by robots and animals. Science,
291:599600.
Lungarella, M. & Sporns, O. (2006). Mapping
information flows in sensorimotor networks. PLoS Zlatev, J. & Balkenius, C. (2001). Introduction: why
Computational Biology, 2(10):e144. epigenetic robotics? In: C. Balkenius et al., (eds.) Proc.
1st Int. Workshop on Epigenetic Robotics, pp. 14.
469
Developmental Robotics
470
471
Walid Ibrahim
United Arab Emirates University, UAE
Sanja Lazarova-Molnar
United Arab Emirates University, UAE
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Device-Level Majority von Neumann Multiplexing
(faults). The origins of these errors can be found in the Exact gate-level simulations (as based on an exhaus-
manufacturing process, the physical changes appearing tive counting algorithm) and accurate device-level
during operation, as well as sensitivity to internal and estimates, including details of the effects played by
external noises and variations. It is not clear if emerging nanodevices on MAJ- vN-MUX, are introduced in
nanotechnologies will not require new fault models, the Main Focus of the Chapter section. Finally, implica-
or if multiple errors might have to be dealt with. Kuo tions and future trends are discussed in Future Trends,
(2006) even mentioned that: we are unsure as to and conclusions and further directions of research are
whether much of the knowledge that is based on past ending this chapter.
technologies is still valid for reliability analysis. The
well-known approach for fighting against errors is to
incorporate redundancy: either static (in space, time, BACKGROUND
or information) or dynamic (requiring fault detection,
location, containment, and recovery). Space (hardware) Multiplexing was introduced by von Neumann as
redundancy relies on voters (generic, inexact, mid- a scheme for reliable computations (von Neumann,
value, median, weighted average, analog, hybrid, etc.) 1952). vN-MUX is based on successive computing
and includes: modular redundancy, cascaded modular stages alternating with random interconnection stages.
redundancy, and multiplexing like von Neumann mul- Each computing stage contains a set of redundant
tiplexing vN-MUX (von Neumann, 1952), enhanced gates. Although vN-MUX was originally exemplified
vN-MUX (Roy & Beiu, 2004), and parallel restitution for NAND-2 it can be implemented using any type of
(Sadek et al., 2004). Time redundancy is trading space gate, and could be applied to any level of abstraction
for time, while information redundancy is based on (subcircuits, gates, or devices). The multiplexing of
error detection and error correction codes. each computation tries to reduce the likelihood of er-
This chapter explores the performance of vN-MUX rors propagating further, by selecting the more-likely
when using majority gates of fan-in (MAJ-). The result(s) at each stage. Redundancy is quantified by a
aim is to get a clear understanding of the trade-offs redundancy factor RF, which indicates the multiplicative
between the reliability enhancements obtained when increase in the number of gates (subcircuits, or devices).
using MAJ- vN-MUX at the smallest redundancy In his original study, von Neumann (1952) assumed
factors RF = 2 (see Fig. 1) on one side, versus both independent (un-correlated) gate failures pfGATE and
the fan-ins and the unreliable nanodevices on the other very large RF. The performance of NAND-2 vN-MUX
side. We shall start by reviewing some theoretical and was compared with other fault-tolerant techniques in
simulation results for vN-MUX in Background section. (Forshaw et al., 2001), and it was analyzed at lower
Figure 1. Minimum redundancy MAJ- vN-MUX: (a) MAJ-3 (RF = 6); and (b) MAJ-5 (RF = 10)
x1 z1 x1 z1
MAJ MAJ MAJ MAJ
x2 z2
MAJ MAJ
x2 z2 Out x3 z3 Out
MAJ MAJ MAJ MAJ MAJ MAJ
x4 z4
MAJ MAJ
x3 MAJ MAJ
z3 x5 z5
MAJ MAJ
(a) (b)
472
Device-Level Majority von Neumann Multiplexing
RF (30 to 3,000) in (Han & Jonker, 2002), while the The results based on exhaustive counting when
first exact analysis at very low RF (3 to 100) for MAJ-3 varying pfMAJ- have confirmed both the theoretical and D
vN-MUX was done in (Roy & Beiu, 2004). the simulation ones. MAJ- vN-MUX at the minimum
The issue of which gate should one use is debatable redundancy factor RF = 2 improves the reliability
(Ibrahim & Beiu, 2007). It was proven that using MAJ-3 over MAJ- when pfMAJ- 10%, and increasing
could lead to improved vN-MUX computations only for increases the reliability. When pfMAJ- 10%, using vN-
pfMAJ-3 < 0.0197 (von Neumann, 1952), (Roy & Beiu, MUX increases the reliability over that of MAJ- as
2004). This outperforms the NAND-2 error threshold long as pfMAJ- is lower than a certain error threshold. If
pfNAND-2 < 0.0107 (von Neumann, 1952), (Sadek et al., pfMAJ- is above the error threshold, the use of vN-MUX
2004). Several other studies have shown that the error is detrimental, as the reliability of the system is lower
thresholds of MAJ are higher than those of NAND when than that of MAJ-. Still, these do not explain the Monte
used in vN-MUX. Evans (1994) proved that: Carlo simulation results mentioned earlier.
pf MAJ 1 2 2
, (1)
2 C(11)/ 2 MAIN FOCUS OF THE CHAPTER
while the error threshold for NAND- was determined Both the original vN-MUX study and the subsequent
in (Gao et al., 2005) by solving: theoretical ones have considered unreliable gates. They
did not consider the elementary devices, and assumed
1/( 1)
that the gates have a fixed (bounding) pfGATE. This as-
1 1
1 + = 1 pf NAND. (2) sumption ignores the fact that different gates are built
(1 2 pf NAND ) using different (numbers of) devices, logic styles, or
An approach for getting a better understanding of (novel) technological principles. While a standard
vN-MUX at very small RF is to use Monte Carlo simu- CMOS inverter has 2 transistors, NAND-2 and MAJ-3
lations (Beiu, 2005), (Beiu & Sulieman, 2006), (Beiu have 4 and respectively 10 transistors. Forshaw et al.
et al., 2006). These have revealed that the reliability of (2001) suggested that pfGATE could be estimated as:
NAND-2 vN-MUX is in fact better than that of MAJ-3
vN-MUX (at RF = 6) for small geometrical variations pfGATE = 1 (1 )n, (4)
. As opposed to the theoretical resultswhere the
reliability of MAJ-3 vN-MUX is always better than where denotes the probability of failure of a nanode-
NAND-2 vN-MUXthe Monte Carlo simulations vice (e.g., transistor, junction, capacitor, molecule,
showed that MAJ-3 vN-MUX is better than NAND-2 quantum dot, etc.), and n is the number of nanodevices
vN-MUX, but only for > 3.4%. Such results were a gate has.
neither predicted (by theory) nor suggested by (gate- Using eq. 4 as pfMAJ- = 1 (1 )2 the reliabilities
level) simulations. have been estimated by modifying the exact counting
It is to be highlighted here that all the theoretical results reported in (Beiu et al., 2007). The device-level
publications discuss unreliable organs, gates, nodes, estimates can be seen in Fig. 2. They show that increas-
circuits, or formulas, but very few mention devices. For ing will not necessarily increase the reliability of MAJ-
getting a clear picture we have started by developing an vN-MUX (over MAJ-). This is happening because
exhaustive counting algorithm which exactly calculates MAJ- with larger require more nanodevices. In
the reliability of MAJ- vN-MUX (Beiu et al., 2007). particular, while MAJ-11 vN-MUX is the best solution
The probability of failure of MAJ- vN-MUX is: for 1 (Fig. 2(a)), it becomes the worst for > 2%
(Fig. 2(b)). Hence, larger fan-ins are advantageous for
PfMAJ- vN-MUX lower ( 1), while small fan-ins perform better for
larger (> 1%). Obviously, there is a swapping region
k where the ranking is being reversed. We detail Fig. 2(b)
C2
) k
( ) 2 k
(#vNfaults ik pf MAJ 1 pf
2
=
. (3) for > 1% (Fig. 2(c)) and for the swapping region
k =0 i =1 2 1 < < 2% (Fig. 2(d), where marks show the
envelope). These results imply that increasing and/or
473
Device-Level Majority von Neumann Multiplexing
Fig. 2. Probability of failure of MAJ- vN-MUX plotted versus the probability of failure of the elementary
(nano)device : (a) small (< 1); (b) large (> 1%); (c) detail for in between 1% and 10%; (d) detailed
view of the swapping region ( in between 1 and 2%)
(a) (b)
(c) (d)
RF does not necessarily improve the overall systems The next step we took was to compare the device-
reliability. This is because increasing and/or RF leads level estimates of MAJ- vN-MUX (Fig. 3(a)), with
to increasing the number of nanodevices: the Monte Carlo simulation ones (Fig. 3(b), adapted
from (Beiu, 2005) (Beiu & Sulieman, 2006)). The two
NMAJ- vN-MUX = RMAJ- vN-MUX nMAJ- = 22 = 42. plots in Fig. 3 have the same vertical scale and exhibit
(5) similar shapes. Still, a direct comparison is not trivial
as these are mapped against different variables ( and
This quadratic dependence on has to be accounted respectively ). This similarity makes us confident that
for. Basically, it is and the number of devices N, and the estimated results are accurate and supporting the
not (only) RF and pfGATE, which should be used when claim that a simple estimate for pfGATE leads to good
trying to accurately predict the advantages of vN- approximations at the system level. For other insights
MUXor of any other redundancy scheme. the interested reader should consult (Anghel & Nico-
laidis, 2007) and (Martorell et al., 2007).
474
Device-Level Majority von Neumann Multiplexing
Figure 3. Probability of failure of MAJ-3 vN-MUX: (a) using device-level estimates and exact counting results;
(b) C-SET Monte Carlo simulations for MAJ-3 vN-MUX D
(a) (b)
475
Device-Level Majority von Neumann Multiplexing
Figure 4. Probability of failure of MAJ- and MAJ- vN-MUX plotted versus the device probability of failure
: (a) = 3; (b) = 5; (c) = 7; (d) = 9. For uniformity, the same interval [0, 0.11] was used for all
the plots
(a) (b)
(c) (d)
of unreliable nanodevices (implicitly accounted Device-level estimates show much more com-
for by Monte Carlo simulations, but neglected plex behaviors than those revealed by gate-level
by theoretical approaches and gate-level simula- analyses (nonlinear for large , leading to multiple
tions). and intricate crossings).
Extending the exact gate-level simulations to Device-level estimates suggest that reliability op-
device-level estimates using 1 (1 )n, leads timizations for large will be more difficult than
to quite accurate approximations (as compared what was expected from gate-level analyses.
to Monte Carlo simulations). One way to maximize reliability when is large
and unknown (e.g., time varying (Srinivasan
476
Device-Level Majority von Neumann Multiplexing
Figure 5. Error thresholds for MAJ- vN-MUX versus fan-in : (a) gate-level error threshold, both theoretical
(red) and exact (yellow) as obtained through exhaustive counting; (b) device-level error threshold D
(a) (b)
et al., 2005)) is to rely on adaptive gates, so Beiu, V. (2003). Neural Inspired Architectures for
neural-inspiration should play a(n important) Nanoelectronics: Highly Reliable, Ultra Low-power,
role in future nano-IC designs (Beiu & Ibrahim, Reconfigurable, Asynchronous. Neural Information
2007). Processing Systems NIPS03, Whistler, Canada, www.
eecs.wsu.edu/~vbeiu/workshop_nips03/.
Finally, precision is very important as small errors
Beiu, V. (2005). The Quest for Practical Redundant
... have a huge impact in estimating the required level
Computations. NSF Workshop on Architectures for
of redundancy for achieving a specified/target reli-
Silicon Nanoelectronics & Beyond, Portland State
ability (Roelke et al., 2007). It seems that the current
University, Portland, OR, USA, web.cecs.pdx.
gate-level models tend to underestimate reliability,
edu/~strom/beiu.pdf. [Beiu, V. (2005). The Quest for
while we do not do a good job at the device-level, with
Reliable Nano Computations. Proceedings ICM05,
Monte Carlo simulation the only widely used method.
Islamabad, Pakistan, xix.]
More precise estimates than the ones presented in this
chapter are possible using Monte Carlo simulations Beiu, V., & Makaruk, H. E. (1998). Deeper Sparsely
in combination with gate-level reliability algorithms. Nets Can Be Optimal. Neural Processing Letters, (8)
These are clearly needed and have just started to be 3, 201-210.
investigated (Lazarova-Molnar et al., 2007).
Beiu, V., & Sulieman, M. H. (2006). On Practical
Multiplexing Issues. Proceedings IEEE-NANO06,
Cincinnati, USA, 310-313.
REFERENCES
Beiu, V., & Ibrahim, W. (2007). On Computing Nano-
Anghel, L., & Nicolaidis, M. (2007). Defects Toler- architectures Using Unreliable Nano-devices. In Ly-
ant Logic Gates for Unreliable Future Nanotechnolo- shevski, S.E. (Editor), Nano and Molecular Electronics
gies. Proceedings IWANN07, San Sebastin, Spain, Handbook, Taylor & Francis, Chapter 12, 1-49.
Springer-Verlag, LNCS 4507, 422-429.
477
Device-Level Majority von Neumann Multiplexing
Beiu, V., & Rckert, U. (Editors) (2009). Emerging Han, J., & Jonker, P. (2002). A System Architecture
Brain Inspired Nano Architectures. World Scientific Solution for Unreliable Nanoelectronic Devices, IEEE
Press. Transactions on Nanotechnology, (1) 4, 201-208.
Beiu, V., Ibrahim, W., & Lazarova-Molnar, S. (2007). Hutchby, J. A., Bourianoff, G. I., Zhirnov, V. V., &
What von Neumann Did Not Say About Multiplex- Brewer, J.E. (2002). Extending the Road beyond CMOS.
ing. Proceedings IWANN07, San Sebastin, Spain, IEEE Circuit & Device Magazine, (18) 2, 28-41.
Springer-Verlag, LNCS 4507, 487-496.
Ibrahim, W., & Beiu, V. (2007). Long Live Small Fan-in
Beiu, V., Rckert, U., Roy, S., & Nyathi, J. (2004). On Majority Gates Their Reign Looks Like Coming! Pro-
Nanoelectronic Architectural Challenges and Solutions. ceedings ASAP07, Montral, Canada (pp. 278-283).
Proceedings IEEE-NANO04, Munich, Germany,
Kuo, W. (2006). Challenges Related to Reliability in
628-631.
Nano Electronics. IEEE Transactions on Reliability,
Beiu, V., Ibrahim, W., Alkhawwar, Y. A., & Sulieman, M. (55) 4, 569-570 [corrections in IEEE Transactions on
H. (2006). Gate Failures Effectively Shape Multiplex- Reliability, (56) 1, 169].
ing. Proceedings DFT06, Washington, USA, 29-35.
Lazarova-Molnar, S., Beiu, V., & Ibrahim, W. (2007).
Constantinescu, C. (2003). Trends and Challenges in Reliability: The Fourth Optimization Pillar of Nanoelec-
VLSI Circuit Reliability. IEEE Micro, (23) 4, 14-19. tronics. Proc. ICSPC07, Dubai, UAE (pp. 73-76).
Evans, W. S. (1994). Information Theory and Noisy Likharev, K. K. (1999). Single-Electron Devices and
Computation. PhD dissertation. Tech. Rep. TR-94- Their Applications. Proceedings of the IEEE, (87) 4,
057, International Computer Science Institute (ICSI), 606-632.
Berkeley, CA, USA, ftp://ftp.icsi.berkeley.edu/pub/
Lin, C., Liu, Y., Rinker, S., & Yan, H. (2006). DNA Tile
techreports/1994/tr-94-057.pdf. [Evans, W. S., &
Based Self-Assembly: Building Complex Nanoarchi-
Schulman, L. J. (2003). On the Maximum Tolerable
tectures. ChemPhysChem, (7) 8, 1641-1647.
Noise of k-input Gates for Reliable Computations by
Formulas. IEEE Transactions on Information Theory, Martorell, F., Coofan, S.D., & Rubio, A. (2007).
(49) 11, 3094-3098.] Fault Tolerant Structures for Nanaoscale Gates. Proc.
IEEE-NANO07, Hong Kong, 605-610.
Feldkamp, U., & Niemeyer, C. M. (2006). Rational
Design of DNA Nanoarchitectures. Angewandte Chemie Massoud, Y., & Nieuwoudt, A. (2006). Modeling and
International Edition, (45) 12, 1856-1876. Design Challenges and Solutions for Carbon Nano-
tube-Based Interconnect in Future High Performance
Forshaw, M., Nikoli, K., & Sadek, A. S. (2001).
Integrated Circuits. ACM Journal on Emerging Tech-
ANSWERS: Autonomous Nanoelectronic Systems
nologies in Computing Systems, (2) 3, 155-196.
With Extended Replication and Signaling. MEL-ARI
#28667 Report, ipga.phys.ucl.ac.uk/research/answers/ Nikoli, K., Sadek, A. S., & Forshaw, M. (2001). Ar-
reports/3rd_year_U CL.pdf. chitectures for Reliable Computing with Unreliable
Nanodevices. Proceedings IEEE-NANO01, Maui,
Gao, J. B., Qi, Y., & Fortes, J.A.B. (2005). Bifurca-
USA, 254-259. [Forshaw, M., Crawley, D., Jonker, P.,
tions and Fundamental Error Bounds for Fault-Tolerant
Han, J., & Sotomayor Torres, C. (2004). A Review of
Computations. IEEE Transactions on Nanotechnology,
the Status of Research and Training into Architectures
(4) 4, 395-402.
for Nanoelectronic and Nanophotonic Systems in the
Green, J. E., Choi, J. W., Boukai, A., Bunimovich, Y., European Research Area. FP6/2002/IST/1, #507519
Johnston-Halperin, E., DeIonno, E., Luo, Y., Sheriff, Report, www.ph.tn.tudelft.nl/People/albert/papers/
B. A., Xu, K., Shin, Y. S., Tseng, H.-R., Stoddart, J. NanoArchRev_finalV2.pdf.]
F., & Heath, J. R. (2007). A 160-Kilobit Molecular
Roelke, G., Baldwin, R., & Bulutoglu, D. (2007). Ana-
Electronic Memory Patterned at 1011 Bits per Square
lytical Models for the Performance of von Neumann
Centimeter. Nature, (445) 7126, 414-417.
Multiplexing. IEEE Transactions on Nanotechnology,
(6) 1, 75-89.
478
Device-Level Majority von Neumann Multiplexing
Roy, S., & Beiu, V. (2004). Multiplexing Schemes for Device: Any physical entity deliberately affecting
Cost-effective Fault-tolerance. Proceedings IEEE- the information carrying particle (or their associated D
NANO04, Munich, Germany, 589-592. [Roy, S., & fields) in a desired manner, consistent with the intended
Beiu, V. (2005). Majority MultiplexingEconomical function of the circuit.
Redundant Fault-tolerant Designs for Nanoarchitec-
Error Threshold: The probability of failure of a
tures. IEEE Transactions on Nanotechnology, (4) 4,
component (gate, device) above which the multiplexed
441-451.]
scheme is not able to improve over the component
Sadek, A. S., Nikoli, K., & Forshaw, M. (2004). itself.
Parallel Information and Computation with Restitu-
Fan-In: Number of inputs (to a gate).
tion for Noise-tolerant Nanoscale Logic Networks.
Nanotechnology, (15) 1, 192-210. Fault-Tolerant: The ability of a system (circuit)
to continue to operate rather than failing completely
SIA Semiconductor Industry Association (2005).
(possibly at a reduced performance level) in the event
International Technology Roadmap for Semiconduc-
of the failure of some of its components.
tors. SEMATECH, USA, public.itrs.net.
Gate (Logic): Functional building block (in digital
Srinivasan, J., Adve, S. V., Bose, P., & Rivers, J. A.
logic a gate performs a logical operation on its logic
(2005). Lifetime Reliability: Toward an Architectural
inputs).
Solution. IEEE Micro, (25) 3, 70-80
Majority (Gate): A logic gate of odd fan-in which
von Neumann, J. (1952). Probabilistic Logics and
outputs a logic value equal to that of the majority of
the Synthesis of Reliable Organisms from Unreliable
its inputs.
Components. Lecture Notes, Caltech, Pasadena, CA,
USA. [In Shannon, C. E., & McCarthy, J. (Editors) Monte Carlo: A class of stochastic (by using
(1956). Automata Studies. Princeton University Press, pseudorandom numbers) computational algorithms for
43-98.] simulating the behaviour of physical and mathemati-
cal systems.
Waser, R. (Editor) (2005). Nanoelectronics and Infor-
mation Technology (2nd edition). Wiley-VCH. Multiplexing (von Neumann): A scheme for
reliable computations based on successive computing
stages alternating with random interconnection stages
(introduced by von Neumann in 1952).
KEy TERMS
Redundancy (Factor): Multiplicative increase in
Circuit: Network of devices. the number of (identical) components (subsystems,
blocks, gates, devices), which can (automatically)
Counting (Exhaustive): The mathematical action
replace (or augment) failing component(s).
of repeated addition (exhaustive considers all possible
combinations). Reliability: The ability of a circuit (system, gate,
device) to perform and maintain its function(s) under
given (as well as hostile or unexpected) conditions, for
a certain period of time.
479
480
Ma Carmen Garrido
Universidad de Murcia, Spain
Enrique Muoz
Universidad de Murcia, Spain
Carlos Cruz-Corona
Universidad de Granada, Spain
David A. Pelta
Universidad de Granada, Spain
Jos L. Verdegay
Universidad de Granada, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Different Approaches for Cooperation with Metaheuristics
There are different ways of obtaining this coop- In this context we propose two approaches in or-
eration. One way are ant colony systems (Dorigo & der to control the exchange of information, one using D
Sttzle, 2003) and swarm based methods (Eberhart & memory to cope with this problem, and the other using
Kennedy, 2001) appear as one of the first cooperative a process of knowledge extraction.
mechanisms inspired by nature. Nevertheless, the co- The first approach (Pelta, Cruz, Sancho-Royo &
operation principle they have presented to date is too Verdegay, 2006) proposes a cooperative strategy where
rigid for a general purpose model (Crainic & Toulouse, a coordinating agent, modelled by a set of ad hoc fuzzy
2003). Another way are parallel metaheuristics, where rules, receives information from a set of solver agents
very interesting perspectives appear. This is the line and sends instructions to each of them telling how to
we will follow. continue. Each solver agent implements the Fuzzy
There have been huge efforts to parallelize different Adaptive Neighbourhood Search (FANS) metaheuristic
metaheuristics. Thus we may find synchronic implemen- (Blanco, Pelta & Verdegay, 2002) as a clone. FANS is
tations of these methods where the information is shared conceived as an adaptive fuzzy neighbourhood based
at regular intervals, (Crainic, Toulouse & Gendreau, metaheuristic. Its own characteristics allow FANS to
1997) using Tabu Search and (Lee & Lee, 1992) using capture the qualitative behaviour of several metaheu-
Simulated Annealing. More recently there have been ristics, and thus, can be considered as a framework
multi-thread asynchronic cooperative implementations of metaheuristics.
(Crainic, Gendreau, Hansen & Mladenovic, 2004) or The second approach (Cadenas, Garrido, Liern,
multilevel cooperative searches (Baos, Gil, Ortega Muoz & Serrano, 2007) uses the same structure
& Montoya, 2004) which, according to the reports in but combines a set of different metaheuristics which
(Crainic & Toulouse, 2003) provide better results than cooperate within a single coordinated schema, where
the synchronic implementations. a coordinating agent modelled by a set of fuzzy rules
However, it seems that a cooperative strategy receives information from the different metaheuristics
based on a single metaheuristic does not cover all the and sends instructions to each of them. The difference
possibilities and the use of strategies which combine with the previous system lies on the way the rules are
different metaheuristics is recommended. The paper obtained. Here, as a result of a knowledge extraction
(Le Bouthillier & Crainic, 2005) is a good example. A process (Cadenas, Garrido, Hernndez & Muoz,
whole new area of research opens up. Questions such as, 2006), (Cadenas, Daz-Valladares, Garrido, Hernndez
what will be the role of each metaheuristic? or What & Serrano, 2006).
cooperation mechanisms should be used? arise.
Within parallel metaheuristics, we will focus
following the classification of (Crainic & Toulouse, TWO COOPERATIVE MULTI-SEARCH
2003) on Multi-search metaheuristics, where several METAHEURISTICS
concurrent strategies search the solution space. Among
them, we concentrate on those techniques, known as A Cooperative Multi-Search
Cooperative multi-search metaheuristics, where each Metaheuristic Using Memory and
strategy exchanges information with the others during
Fuzzy Rules
the execution.
Cooperative multi-search metaheuristics obtain
The idea of the first strategy can be explained with the
better quality solutions than independent methods. But
help of the diagram in Fig. 1. Given a concrete prob-
previous studies (Crainic & Toulouse, 2002), (Crainic,
lem to solve, we have a set of solvers to deal with it.
Toulouse & Sans, 2004) demonstrate that coopera-
Each solver develops a particular strategy to solve the
tive methods with a non-restrictive access to shared
problem independently, and the whole set of solvers
information may experiment problems of premature
works simultaneously without direct interaction. In
convergence. This seems to be due to the stabilization
order to coordinate the solvers there is a coordinator
of the shared information, stabilization caused by the
which knows all the general aspects of the problem
intense exchange of the better solutions. So it would
concerned and the particular solver features. The
be interesting to find a way of controlling this informa-
coordinator receives reports from the solvers with the
tion exchange.
obtained results, and returns orders to them.
481
Different Approaches for Cooperation with Metaheuristics
The inner workings of the strategy are quite simple. FANS is essentially a threshold-acceptance local
In the first step, the co-ordinator determines the initial search technique and is therefore easy to under-
behaviour (set of parameters) for each solver, which stand and implement, and does not require many
models a particular optimization strategy. This behav- computational resources.
iour is passed to each solver, which is then executed. FANS can be used as a heuristic template. Each
For each solver, the co-ordinator keeps the last two set of parameters implies a different behaviour
reports containing their times, and the corresponding of the method, and therefore, the qualitative be-
solutions at such times. From the solver information haviour of other local search techniques can be
set, the co-ordinator calculates performance measures simulated (Blanco, Pelta & Verdegay, 2002).
that will be used to adapt the fuzzy rule base. This The previous point enables different search
fuzzy rule base is generated manually following the schemes to be built, and diversification and
principle that If a solver is working well, keep it; but intensification procedures to be driven easily.
if a solver seems to be trapped, do something to alter We therefore do not need to implement different
its behaviour. algorithms but merely use FANS as a template.
Solvers execute asynchronously by sending and
receiving information. The co-ordinator checks which A Cooperative Multi-Search
solver provided new information and decides whether Metaheuristic Using Data Mining to
its behaviour needs to be adapted using the fuzzy rule Obtain Fuzzy Rules
base. If this is the case, it will obtain a new behaviour and
send it to the solvers. Solver operation is quite simple: The idea of the second strategy is very similar to the
once execution has begun, performance information first one, and can be seen on Fig. 2. Given a concrete
is sent and adaptation orders from the coordinator are problem to solve, we have a set of solvers to deal with
received alternately. it. Each solver implements a different metaheuristic,
Each solver thread is implemented by means of and has to solve the problem while coordinates itself
the FANS metaheuristic. We use this metaheuristic with the rest of metaheuristics. In order to perform the
following three main reasons: cooperation we use a coordinating agent which will con-
482
Different Approaches for Cooperation with Metaheuristics
trol and modify the behaviour of the agents. To perform each metaheuristic and to decide how their behaviour
the communication among the different metaheuristics has to be modified.
an adapted blackboard model is used. In this model To give intelligence to the coordinator we propose
each agent controls a part of the blackboard where the use of a set of fuzzy rules obtained as a result of
writes its performance information, and periodically the methodological process proposed on (Cadenas,
updates the solution found. The coordinator consults Garrido, Hernndez & Muoz, 2006), (Cadenas, Daz-
the blackboard in order to monitor the behaviour of Valladares, Garrido, Hernndez & Serrano, 2006), and
based on a process of knowledge extraction.
483
Different Approaches for Cooperation with Metaheuristics
n
The process of knowledge extraction is shown in
max p j x j
(Fig. 3) and divided in the following phases: j =1
484
Different Approaches for Cooperation with Metaheuristics
is stored in the coordinator memory and how this is Regional (FEDER) for the support given to develop
used to control the global search behavior of the strat- this work under the projects TIN2005-08404-C04-01
egy. Related to the second, the knowledge extraction and TIN2005-08404-C04-02. The three first authors
process needs to be improved and tested with different also thank the ``Fundacin Sneca, Agencia de Ciencia
data mining techniques. And to both strategies, the y Tecnologa de la Regin de Murcia, which support
improvement of the fuzzy rule base is another topic this research under the Programa Sneca. The three
to be addressed. second authors also carried out this research in part
One last consideration is the application field. The under project MINAS (TIC-00129).
knapsack problem is considered one of the easiest
NP-hard problems, so it would be interesting to apply
these strategies to more complex problems such as REFERENCES
the p-median, p-hub median or the protein structure
comparison. Baos, R., &. Gil, C, & Ortega, J., & Montoya, F.G.
(2004). A Parallel Multilevel Metaheuristic for Graph
Partitioning. Journal of Heuristics, 10(3), 315336.
CONClUSION
Blanco, A., & Pelta, D., & Verdegay, J.L. (2002). A
fuzzy valuation-based local search framework for com-
This paper proposes two strategies to cope with
binatorial problems. Fuzzy Optimization and Decision
convergence problems showed by cooperative multi-
Making, 1(2), 177193.
search metaheuristics. The first strategy suggests the
use of memory in order to define a set of fuzzy rules Cadenas, J.M., & Garrido, M.C., & Hernndez, L.D.,
which control the exchanges of solutions associated & Muoz, E. (2006). Towards the definition of a Data
to a coordinated schema where similar metaheuristics Mining process based in Fuzzy Sets for Cooperative
cooperate. The second strategy proposes the use of a Metaheuristic systems. In Proceedings from IPMU2006:
knowledge extraction process to obtain a set of fuzzy The 11th Information Processing and Management of
rules which control the exchanges of solutions of a Uncertainty in Knowledge-Based systems International
coordinated system where different metaheuristics Conference (pp. 28282835). Paris, France.
cooperate. Both approaches have been tested and have
shown their good performance. Cadenas, J.M., & Daz-Valladares, R.A., & Garrido,
M.C., & Hernndez, L.D., & Serrano E. (2006). Mod-
elado del Coordinador de un sistema Meta-Heurstico
Cooperativo mediante SoftComputing. In Proceed-
ACKNOWlEDGMENT
ings from CMPI2006: Campus Multidisciplinar en
Percepcin e Inteligencia (pp. 679689). Albacete,
The authors thank the Ministerio de Educacin y Cien-
Spain.
cia of Spain and the Fondo Europeo de Desarrollo
485
Different Approaches for Cooperation with Metaheuristics
Cadenas, J.M., & Garrido, M.C., & Liern, V. & Muoz, Lee, K., & Lee, S. (1992). Efficient parallelization of
E., & Serrano E. (2007). Un prototipo del coordina- simulated annealing using multiple markov chains:
dor de un Sistema Metaheurstico Cooperativo para an application to graph partitioning. In Proceedings
el Problema de la Mochila. In Proceedings from from ICPP02: International Conference on Parallel
MAEB07: V congreso espaol sobre Metaheursticas, Processing (pp. 177180). Vancouver, Canada.
Algoritmos Evolutivos y Bioinspirados (pp. 811818).
Melin, B., & Moreno, J.A., & Moreno, J.M. (2003).
Tenerife, Spain.
Metaheursticas: una visin global. Revista Iberoame-
Crainic, T.G., & Toulouse, M., & Gendreau, M. (1997). ricana de Inteligencia Artificial, 19, 728.
Towards a taxonomy of parallel tabu search algorithms.
Pardalos, P., & Resende, M. (Eds.) (2002). Handbook
INFORMS Journal on Computing, 9(1), 6172.
of Applied Optimization. Oxford: Oxford University
Crainic, T.G., & Toulouse, M. (2002). Cooperative Press.
parallel tabu search for capacitated network design.
Pelta, D., & Cruz, C., & Sancho-Royo, A., & Verde-
Journal of Heuristics, 8(6), 601627.
gay, J.L. (2006). Using memory and fuzzy rules in a
Crainic, T.G., & Toulouse, M. (2003). Parallel Strat- cooperative multi-thread strategy for optimization.
egies for Metaheuristics. In F. Glover & G.A. Ko- Information Sciences, 176(13), 18491868.
chenberger (Eds.), Handbook of Metaheuristics (pp.
475514). London: Kluwer Academic Publisher.
Crainic, T.G., & Toulouse, M., & Sans, B. (2004). KEy TERMS
Systemic behavior of cooperative search algorithms.
Parallel Computing, 30(1), 5779. Blackboard: A shared repository of problems, par-
tial solutions, suggestions, and contributed information.
Crainic, T.G., & Gendreau, M., & Hansen, P., & Mla-
The blackboard can be seen as a dynamic library of
denovic, N. (2004). Cooperative parallel variable
contributions to the current problem that have been
neighborhood search for the p-median. Journal of
recently published by other knowledge sources.
Heuristics, 10(3), 293314.
Cooperative Multi-Search Metaheuristics: A
Dorigo, M., & Sttzle, T. (2003). Ant Colony Optimiza-
parallelization strategy for metaheuristics in which
tion Metaheuristic. In F. Glover & G.A. Kochenberger
parallelism is obtained from multiple concurrent explo-
(Eds.), Handbook of Metaheuristics (pp. 251286).
rations of the solution space and where metaheuristics
London: Kluwer Academic Publisher.
exchange information during their execution in order
Eberhart, R., & Kennedy, J. (Eds.) (2001). Swarm to cooperate.
Intelligence. London: Academic Press.
Data Mining: The most characteristic stage of the
Glover, F. (1986). Future paths for integer program- Knowledge Extraction process, where the aim is to
ming and links to artificial intelligence. Computers and produce new useful knowledge by constructing a model
Operations Research, l3(5), 533549. from the data gathered for this purpose.
Glover, F., & Kochenberger G.A. (Eds.) (2003). Hand- Fuzzy Rules: Linguistic if-then constructions that
book of Metaheuristics. London: Kluwer Academic have the general form if A then B where A and B
Publishers. are collections of propositions containing linguistic
variables (A is called the premise and B is the con-
Hart, W., & Krasnogor, N., & Smith, J. (Eds.) (2004).
sequence). The use of linguistic variables and fuzzy
Recent Advances in Memetic Algorithms. Studies
if-then rules exploits the tolerance for imprecision
in Fuzziness and Soft Computing, Physica-Verlag,
and uncertainty.
Wurzburg.
Heuristic: A method or approach that tries to apply
Le Bouthillier, A., & Crainic, T.G. (2005). A coop-
expert knowledge in the resolution of a problem with
erative parallel meta-heuristic for the vehicle routing
the aim of increasing the probability of solving it.
problem with time windows. Computers and Operations
Research, 32(7), 16851708.
486
Different Approaches for Cooperation with Metaheuristics
487
488
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Differential Evolution with Self-Adaptation
these values, which remain fixed during the run. The RElATED WORK
latter means that values for the parameters are changed D
during the run. According to Eiben et al. (Eiben et Work Releted to Differential Evolution
al., 1999) (Eiben & Smith, 2003), the change can be
categorized into three classes: The DE (Storn & Price, 1995) (Storn & Price, 1997)
algorithm was proposed by Storn and Price, and since
1. Deterministic parameter control takes place then it has been used in many practical cases. The
when the value of a parameter is altered by some original DE was modified and many new versions
deterministic rule. have been proposed. Ali and Trn (Ali & Trn, 2004)
2. Adaptive parameter control is used when there is proposed new versions of the DE algorithm, and also
some form of feedback from the search that is used suggested some modifications to the classical DE, in
to determine the direction and/or the magnitude order to improve its efficiency and robustness. They
of the change to the parameter. introduced an auxiliary population of NP individuals
3. Self-adaptive parameter control is the idea that alongside the original population (noted in (Ali, 2004),
evolution of the evolution can be used to imple- a notation using sets is used). Next they proposed a rule
ment the self-adaptation of parameters. Here, for calculating the control parameter F, automatically.
the parameters to be adapted are encoded into Jiao et al. (Jiao, Dang, Leung & Hao, 2006) proposed a
the chromosome (individuals) and undergo the modification of the DE algorithm, applying a number-
actions of genetic operators. The better values of theoretical method for generating the initial population,
these encoded parameters lead to better individu- and using simplified quadratic approximation with the
als which, in turn, are more likely to survive and three best points. Mezura-Montes et al. (Mezura-Mon-
produce offspring and, hence, propagate these tes, Velzquez-Reyes & Coello Coello, 2006) conducted
better parameter values. a comparative study of DE variants. They proposed a
rule for changing control parameter F at random from
DE has three control parameters: amplification interval [0.4, 1.0] at generation level. They used differ-
factor of the difference vector - F, crossover control ent values of control parameter CR for each problem.
parameter - CR, and population size - NP. The original The best CR value for each problem was obtained by
DE algorithm keeps all three control parameters fixed additional experimentation. Tvrdik in (Tvrdik, 2006)
during the optimization process. However, there still proposed a DE algorithm using competition between
exists a lack of knowledge about how to find reason- different control parameter settings. The prominence
ably good values for the control parameters of DE, for of the DE algorithm and its applications is shown in
a given function (Liu & Lampinen, 2005). recently published books (Price et al., 2005), (Feoktis-
Although the DE algorithm has been shown to tov, 2006). Feoktistov in his book (Feoktistov, 2006, p.
be a simple, yet powerful, evolutionary algorithm 18) says, that the concept of differential evolution is a
for optimizing continuous functions, users are still spontaneous self-adaptability to the function.
faced with the problem of preliminary testing and
hand-tuning of its control parameters prior to com- Work Releted to Adaptive or
mencing the actual optimization process (Teo, 2006). Self-Adaptive DE
As a solution, self-adaptation has proved to be highly
beneficial for automatically and dynamically adjusting Liu and Lampinen (Liu & Lampinen, 2005) proposed
control parameters. Self-adaptation allows an evolu- a version of DE, where the mutation control param-
tionary strategy to adapt itself to any general class of eter and the crossover control parameter are adaptive.
problem, by reconfiguring itself accordingly, and does A self-adaptive DE (SDE) is proposed by Omran et
this without any user interaction (Bck, 2002) (Bck, al. (Omran, Salman & Engelbrecht, 2005) (Salman,
Fogel & Michalewicz, 1997) (Eiben, Hinterding & Engelbrecht & Omran, 2007), where parameter tuning
Michalewicz, 2003). is not required. Self-adapting was applied for control
489
Differential Evolution with Self-Adaptation
parameters F and CR. Teo (Teo, 2006) made an attempt at NP. Both of the control parameters were applied at
self-adapting the population size parameter, in addition individual level.
to self-adapting crossover and mutation rates. Brest et Brest et al. (Brest et al., 2006a) proposed a self-
al. (Brest et al., 2006a) proposed a DE algorithm, using adaptive DE where new control parameters Fi,G+1 and
a self-adapting mechanism on the control parameters CRi,G+1 are calculated as follows:
F and CR. The performance of the self-adaptive dif-
ferential evolution algorithm was evaluated using a if rand1 < t1 then Fi,G+1 = Fl + rand2 Fu else Fi,G+1
set of benchmark functions provided for constrained = Fi,G ,
real parameter optimization (Brest, umer & Sepesy
Mauec, 2006b). Qin and Suganthan (Qin & Suganthan, if rand3 < t2 then CRi,G+1 = rand4 else CRi,G+1 =
2005) proposed the Self-adaptive Differential Evolu- CRi,G ,
tion algorithm (SaDE), where the choice of learning
strategy and the two control parameters F and CR do and they produce control parameters F and CR in a
not require pre-defining. During evolution, suitable new vector. The quantities randj, j {1, 2, 3, 4} are
learning strategy and parameter settings are gradually uniform random values [0, 1]. The quantities t1
self-adapted, according to the learning experience. and t2 represent the probabilities of adjusting control
Brest et al. (Brest, Bokovi, Greiner, umer & Sepesy parameters F and CR, respectively. The parameters t1,
Mauec, 2007) reported performance comparison of t2, Fl, Fu were taken as fixed values 0.1, 0.1, 0.1, 0.9,
certain selected DE algorithms, which use different self- respectively. The new F takes a value from [0.1,1.0]
adaptive or adaptive control parameter mechanisms. In in a random manner. The new CR takes a value from
this paper the DE algorithms used more than one of the [0,1]. The new Fi,G+1 and CRi,G+1 are obtained before
DE strategies (Price et al., 2005), (Feoktistov, 2006). the mutation is performed. So they influence the mu-
Self-adaptation has been used extensively in evolution- tation, crossover and selection operations of the new
ary programming (Fogel, 1995) and evolution strategies vector xi,G+1.
(ES) (Bck & Schwefel, 1993) to adjust the search step In (Brest et al., 2006a) a self-adaptive control
size for each objective variable (Liang, Yao & Newton, mechanism was used to change the control parameters
2001). Abbass (Abbass, 2002) proposed self-adaptive F and CR during the evolutionary process. The third
DE for multi-objective optimization problems. control parameter NP was kept unchanged.
Abbasss Approach
SELF-ADAPTIVE CONTROL
PARAMETERS IN DIFFERENTIAL Abbass (Abbass, 2002) proposed Self-adaptive Pareto
EVOlUTION Differential Evolution (SPDE) algorithm. The SPDE
was used for multi-objective optimization problems.
This section presents three Self-Adaptive DE ap- New control parameters Fi,G+1 and CRi,G+1 are calculated
proaches, which has been applied to the control pa- as follows:
rameters F and CR.
Fi,G+1 = N(0,1),
The Self-Adaptive Control Parameters
Using Uniform Distribution CRi,G+1 = CRr1,G + N(0,1) (CRr2,G - CRr3,G) ,
The Self-Adaptive DE refers to the self-adapting where N(0,1) is Gaussian distribution. If Fi,G+1 value
mechanism on the control parameters, proposed by is not in [0,1] then simple rule is used to repair it. And
Brest et al. (Brest et al., 2006a). This self-adapting similar for CRi,G+1 value. Then mutant vector vi,G is
mechanism used rand/1/bin strategy. Each individual calculated:
in the population was extended using the values of
two control parameters: (xi,G, Fi,G, CRi,G), i 1, 2, ..., vi,G = xr1,G + Fi,G+1 (xr2,G - xr3,G)
490
Differential Evolution with Self-Adaptation
and crossover operation is performed: evaluations) and robust (e.g. it does not get trapped in
local optimum) at the same time. Based on our experi- D
if (rand CRi,G+1 or j = jrand) then ui,j,G = vi,j,G else ences (other authors reported similar observations) with
ui,j,G = xi,j,G , the DE algorithm, we can conclude that DE provides
more robustness if the population size is higher. If the
The control parameter CR is self-adapted, by encod- population size is increased, on the other hand, more
ing it into each individual. computational power (on average) is needed. The
population size is an important control parameter, and
Approach Proposed by Omran, Salman thus adaptive and/or self-adaptive approaches on this
and Engelbrecht parameter are expected in the future.
In this chapter only the rand/1/bin DE strategy is
Due to the success achieved in SPDE by self-adapt- used, but the DE algorithm has more strategies (Price
ing CR, Omran et al. (Omran, Salman & Engelbrecht, et al., 2005), (Feoktistov, 2006). Which combination of
2005) (Salman, Engelbrecht & Omran, 2007) proposed (self-adaptive) DE strategies should someone use to get
a self-adaptive DE (SDE), where the same mechanism the best performances. One can use the involved strategy
is applied to self-adapt the control parameter F. The with the same probability, or with different probability,
control parameter CR is generated for each individual or even use a self-adaptation to choose the most suitable
from a normal distribution (CR ~ N(0.5,0.15)) in SDE strategy during the optimization process.
and the mutation operation changes as follows: Future work may also be directed towards testing the
proposed self-adaptive versions of the DE algorithm,
vi,G = xr1,G + Fi,G+1 (xr2,G - xr3,G), especially on constrained optimization problems.
Multi-objective optimization is also a challenge for
where future work.
491
Differential Evolution with Self-Adaptation
and Numerical Studies. Computers & Operations Re- Feoktistov, V. (2006). Differential Evolution: In Search
search. 31(10), 1703-1725. of Solutions. Springer, New York.
Bck, T. (2002) Adaptive Business Intelligence Based Fogel, D. (1995). A Comparison of Evolutionary
on Evolution Strategies: Some Application Examples Programming and Genetic Algorithms on Selected
of Self-Adaptive Software. Information Sciences. Constrained Optimization Problems. Simulation, 64(6),
148(1-4), 113-121. 397-404.
Bck, T., Fogel, D.B., & Michalewicz, Z. (Editors) Jiao, Y.-C., Dang, C., Leung, Y., & Hao, Y. (2006). A
(1997). Handbook of Evolutionary Computation. In- modification to the new version of the prices algo-
stitute of Physics Publishing and Oxford University rithm for continuous global optimization problems.
Press. Journal of Global Optimization. 36(4), 609-626. DOI
10.1007/s10898-006-9030-3.
Bck, T. (1996). Evolutionary Algorithms in Theory
and Practice: Evolution Strategies, Evolutionary Lee, C.Y., & Yao, X. (2004). Evolutionary Program-
Programming, Genetic Algorithms. Oxford University ming Using Mutations Based on the Levy Probability
Press, New York. Distribution, IEEE Transactions on Evolutionary Com-
putation. 8(1), 1-13.
Bck, T., & Schwefel, H.-P. (1993) An Overview of
Evolutionary Algorithms for Parameter Optimization. Liang, K.-H., Yao, X., & Newton, C.S. (2001). Adapting
Evolutionary Computation. 1(1), 1-23. Self-adaptive Parameters in Evolutionary Algorithms.
Applied Intelligence. 15(3), 171-180.
Brest, J., Bokovi, B., Greiner, S., umer, V., &
Sepesy Mauec, M. (2007). Performance comparison Liu, J., & Lampinen, J. (2005). A Fuzzy Adaptive
of self-adaptive and adaptive differential evolution Differential Evolution Algorithm. Soft Computing - A
algorithms. Soft Computing - A Fusion of Foundations, Fusion of Foundations, Methodologies and Applica-
Methodologies and Applications. 11(7), 617-629. DOI tions. 9(6), 448-462.
10.1007/s00500-006-0124-0.
Mezura-Montes, E., Velzquez-Reyes, J., & Coello
Brest, J., Greiner, S., Bokovi, B., Mernik, M., & Coello, C.A. (2006). A comparative study of differential
umer, V. (2006). Self-Adapting Control Parameters evolution variants for global optimization. Genetic and
in Differential Evolution: A Comparative Study on Evolutionary Computation Conference GECCO2006.
Numerical Benchmark Problems. IEEE Transactions 485-492.
on Evolutionary Computation. 10(6), 646-657. DOI
Ohkura, K., Matsumura, Y., & Ueda, K. (2001). Ro-
10.1109/TEVC.2006.87213.
bust Evolution Strategies. Applied Intelligence. 15(3),
Brest, J., umer, V., & Sepesy Mauec, M. (2006). 153-169.
Self-adaptive Differential Evolution Algorithm in Con-
Omran, M.G.H., Salman, A., Engelbrecht, A.P. (2005).
strained Real-Parameter Optimization. The 2006 IEEE
Self-adaptive Differential Evolution, Lecture Notes in
Congress on Evolutionary Computation, CEC2006.
Computer Science. 3801, 192-199.
919-926. IEEE Press.
Price, K.V., Storn, R.M., & Lampinen, J.A. (2005).
Eiben, A.E., Hinterding, R., & Michalewicz, Z. (1999).
Differential Evolution, A Practical Approach to Global
Parameter Control in Evolutionary Algorithms, IEEE
Optimization. Springer.
Transactions on Evolutionary Computation. 3(2),
124-141. Qin, A.K., & Suganthan, P.N. (2005). Self-adaptive
Differential Evolution Algorithm for Numerical Op-
Eiben, A.E., & Smith, J.E. (2003). Introduction to Evo-
timization, The 2005 IEEE Congress on Evolutionary
lutionary Computing. Natural Computing. Springer-
Computation CEC2005. 1785-1791. IEEE Press. DOI:
Verlag, Berlin.
10.1109/CEC.2005.1554904.
Fan, H.-Y., & Lampinen, J. (2003). A Trigonometric
Rnkknen, J., Kukkonen, S., & Price, K.V. (2005).
Mutation Operation to Differential Evolution. Journal
Real-Parameter Optimization with Differential Evolu-
of Global Optimization. 27(1), 105-129.
492
Differential Evolution with Self-Adaptation
493
494
Anutosh Maitra
Dhirubhai Ambani Institute of Information and Communication Technology, India
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Discovering Mappings Between Ontologies
2001), SUMA(Niles & Pease, 2003), OpenCyc(Sicilia tion preparation phase (or normalization phase) of more
et al., 2004) are examples of the same. SWEET (Raskin, complex semantic oriented approaches. Many of the D
2003) provides standard Ontologies in environmental Schema Matching techniques are directly applicable
science domain. Hence, the levels in Ontology also for concept level approaches.
address important dimension in knowledge engineering
through integrating available Ontologies. String Level Concept Matching
495
Discovering Mappings Between Ontologies
Natural Language Processing domain offers various text Approaches that considers the ontology in graph
processing techniques such as stemming, tokenization, structure and consider upper and lower levels of the
tagging, elimination, expansion etc. that can improve given concepts to find out similarity among concepts
result of similarity finding effort. Usage and grammar of different Ontologies.
of language may result in mismatch for example, like
Product and Products, in such cases Lemmatization Structural Similarity
can be used. Tokenization that can remove grammati-
cal elements from concept name can be utilized. Un- COMA(Aumueller et al., 2005) system employs path
necessary articles, Preposition, conjunction and other similarity as a basis for calculating similarity among
features can be removed using Elimination technique. concept. SMART tool employs algorithm that considers
Lexical relation can be also identified using sense based structure of the relation in the vicinity of the concept
approach by exploring Hypernyms, hyponyms etc. in being processed. It is implemented with PROMPT
WordNet(Miller, 1995). HCONE-Merge(Kotis et al., system that can be plugged in with Protg-a widely
2006) is a Ontology merging approach that uses Latent accepted open-source knowledge engineering tool.
Semantic Analysis (LSA) technique and carryout lookup Anchor-PROMPT(Noy & Musen, 2001) extends the
WordNet and by expanding word sense by hyponymy. simple PROMPT by calculating similarity measures
Domain or Application specific terminology can also based on ontology structure.
be integrated for disambiguation. Cupid - a schema
matching approach-employs linguistic matching in Semantic Matching
its initial phase. Quick Ontology Mapping (Ehrig &
Sure, 2005) employs finding of linguistic similarity of S-Match focuses on computing semantic matching be-
concepts. ASCO(Le et al., 2004) technique calculate tween graph structures extracted from separate Ontolo-
linguistic similarity as a linear combination of name, gies. With this technique it is possible to differentiate the
label and description similarity of concepts. meaning of Concept of node and Concept at node.
496
Discovering Mappings Between Ontologies
Similarity in terms of joint probability distribution of mapping can also benefit from the representation and
instances of given concept. Ontology Mapping En- reasoning capability of DL. MAFRA(Maedche et al., D
hancer (OMEN) (Mitra et al., 2005) adopts generation of 2002) uses DL for representing Semantic Bridge Ontol-
Bayesian Net on the basis of mappings defined a priori. ogy (SBO). OntoMapO uses DL for creating a Meta
ITTalk (Cost et al., 2002) adopts Bayesian reasoning Ontology for consistent representation of Ontologies
along with text based classification technology that and mapping among them. CTXMatch and S-Match
collects similarity information in source Ontologies. employs satisfiability techniques that is achieved us-
The discovered semantic similarity is codified as labels ing DL. ConcepTool(Compatangelo & Meisel, 2003)
and arcs of a graph. system uses DL in formalizing class-centered enhanced
entity relationship model.
Community Oriented Approach
497
Discovering Mappings Between Ontologies
ogy. Figure 1 indicates one of the challenging open Choi, N., Song, I.-Y., & Han, H. (2006). A survey on
problems of investigating appropriate methods that ontology mapping. SIGMOD Rec., 35 (3), 3441.
allows integration of Ontologies defining Domain Inde-
Compatangelo, E., & Meisel, H. (2003). Conceptool:
pendent, Domain Specific, Local Specific and Applica-
Intelligent support to the management of domain
tion Specific concepts to provide complete coverage of
knowledge. In V. Palade et al. (Eds.), Kes (Vol. 2773,
knowledge representation. This approach ensures that
p. 81-88). Springer.
knowledge engineers can reuse the integrated part of
Domain Independent and Domain Specific concepts Cost, R. S., Finin, T. W., Joshi, A., Peng, Y., Nicholas,
and focus on only local specific concept to suitably C. K., Soboroff, I., et al. (2002). Ittalks: A case study
integrate with application being built. For explaining in the semantic web and daml+oil. IEEE Intelligent
a example scenario Figure 1 indicates the Ontology Systems, 17 (1), 40-47.
integration required for Disaster Management Agencies
across the world. The Domain Specific and Domain Davies, J., Studer, R., & Warren, P. (2006). Semantic
Independent concepts are required by every agency web technologies trends and research in ontology-based
and can be directly integrated with local specific and systems. John Wiley & Sons Ltd.
application specific concepts that can be unique to each Dhamankar, R., Lee, Y., Doan, A., Halevy, A., & Do-
implementing agency. mingos, P. (2004). imap: discovering complex semantic
matches between database schemas. In Sigmod 04:
Proceedings of the 2004 acm sigmod international
CONClUSION conference on management of data (pp. 383394). NY,
USA: ACM Press.
With proliferation of knowledge representation and
reasoning techniques and standard based tools, do- Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng,
main knowledge captured in the form of Ontology is Y., et al. (2004). Swoogle: a search and metadata engine
increasingly being available for reuse. The quality and for the semantic web. In Cikm 04: Proceedings of the
quantity and level of detail with which domain concepts thirteenth acm international conference on information
are defined differ considerably based on the discretion and knowledge management (pp. 652659). NY, USA:
of knowledge engineer. The reuse requires concepts ACM Press.
defined in multiple such Ontologies to be extracted or Doan, A., Madhavan, J., Domingos, P., & Halevy,
mapped to a resulting comprehensive representation that A. (2002). Learning to map between ontologies on
suits the requirement of problem on hand. This in-turn the semantic web. In Www 02: Proceedings of the
introduce the problem of syntactic and semantic hetero- 11th international conference on world wide web (pp.
geneity. This article provided state-of-the-art in present 662673). NY, USA: ACM Press.
techniques targeted at resolving heterogeneity.
Ehrig, M., & Sure, Y. (2005). Foam - framework for
ontology alignment and mapping - results of the on-
REFERENCES tology alignment evaluation initiative. In B. Ashpole,
M. Ehrig, J. Euzenat, & H. Stuckenschmidt (Eds.),
Antoniou, G., & Harmelen, F. van. (2004). Web ontol- Integrating ontologies (Vol. 156). CEUR-WS.org.
ogy language: Owl. In S. Staab & R. Studer (Eds.), Ehring, M. (2007). Ontology alignment bridging the
Handbook on ontologies (p. 67-92). Springer. semantic gap. Springer.
Aumueller, D., Do, H.-H., Massmann, S., & Rahm, E. Euzenat, J., Guegan, P., & Valtchev, P. (2005). Ola
(2005). Schema and ontology matching with coma++. in the oaei 2005 alignment contest. In B. Ashpole et
In Sigmod 05: Proceedings of the 2005 acm sigmod al. (Eds.), Integrating ontologies (Vol.156). CEUR-
international conference on management of data (pp. WS.org.
906908). NY, USA: ACM Press.
Giunchiglia, F., Shvaiko, P., & Yatskevich, M. (2005).
Brachman, R. J., & Levesque, H. J. (2004). Knowledge S-match: an algorithm and an implementation of
representation and reasoning. Morgan Kaufmann semantic matching. In Y. Kalfoglou et al. (Eds.), Se-
Publishers.
498
Discovering Mappings Between Ontologies
mantic interoperability and integration (Vol. 04391). conference on formal ontology in information systems
IBFI, Germany. (pp. 29). NY, USA: ACM Press. D
Horrocks, I., & Patel-Schneider, P. F. (2001). The ge- Niles, I., & Pease, A. (2003). Linking lixicons and
neration of daml+oil. In Description logics. ontologies: Mapping wordnet to the suggested upper
merged ontology. In H. R. Arabnia (Ed.), Ike (p. 412-
Kalfoglou, Y., Hu, B., Reynolds, D., & Shadbolt, N.
416). CSREA Press.
(2005). Semantic integration technologies survey (Tech.
Rep. No. 10842). University of Southampton. Noy, N., & Musen, M. (2000). Prompt: Algorithm and
tool for automated ontology merging and alignment. In
Kalfoglou, Y., & Schorlemmer, W. M. (2005). Ontology
Aaai/iaai (p. 450-455). AAAI Press / The MIT Press.
mapping: The state of the art. In Y. Kalfoglou et al.
(Eds.), Semantic interoperability and integration (Vol. Noy, N., & Musen, M. (2001). Anchor-prompt: Using
04391). IBFI, Germany. non-local context for semantic matching. In Workshop
on ontologies and information sharing (ijcai-2001),
Kotis, K., Vouros, G. A., & Stergiou, K. (2006). Towards
WA.
automatic merging of domain ontologies: The hcone-
merge approach. J. Web Sem., 4 (1), 60-79. Noy, N. F. (2004a). Semantic integration: A survey of
ontology based approaches. ACM SIGMOD Record,
Lacher, M. S., & Groh, G. (2001). Enabling personal
33 (4), 65-70.
perspectives on heterogeneous information spaces.
wetice, 00, 285. Noy, N. F. (2004b). Tools for mapping and merging
ontologies. In S. Staab & R. Studer (Eds.), Handbook
Le, B. T., Dieng-Kuntz, R., & Gandon, F. (2004). On
on ontologies (p. 365-384). Springer.
ontology matching problems - for building a corporate
semantic web in a multi-communities organization. In Predoiu, L., Feier, C., Scharffe, F., Bruijn, J. de, Martin-
Iceis (4) (p. 236-243). Recuerda, F., Manov, D., et al. (2006). State-of-the-art
survey on ontology merging and aligning v2, (Tech.
Li, J. (2004). Lom: A lexicon-based ontology mapping
Rep.). DERI, University of Innsbruck.
tool. Teknowledge Corporation, Palo Alto, CA.
Raskin, R. (2003). Semantic web for earth and envi-
Madhavan, J., Bernstein, P. A., & Rahm, E. (2001).
ronmental terminology (sweet). In Proc. of nasa earth
Generic schema matching with cupid. In P. M. G. Apers
science technology conference.
et al. (Eds.), Vldb (p. 49-58). Morgan Kaufmann.
Shvaiko, P., & Euzenat, J. (2006). A survey of sche-
Maedche, A., Motik, B., Silva, N., & Volz, R. (2002).
ma-based matching approaches (Tech. Rep.). INRIA,
Mafra - a mapping framework for distributed ontolo-
France,.
gies. In A. Gomez-Perez & V. R. Benjamins (Eds.),
Ekaw (Vol. 2473, p. 235-250). Springer. Sicilia, M.- A., Garcia, E., Sanchez, S., & Rodriguez,
E. (2004). On integrating learning object metadata
Manola, F., & Miller, E. (2004, February). Rdf primer
inside the opencyc knowledge base. In Icalt.
(W3C Recommendation). World Wide Web Consor-
tium. Stumme, G. (2005). Ontology merging with formal
concept analysis. In Y. Kalfoglou et al. (Eds.), Semantic
Miller, G. A. (1995). Wordnet: a lexical database for
interoperability and integration (Vol. 04391). IBFI,
english. Commun. ACM, 38 (11), 3941.
Germany.
Mitra, P., Noy, N. F., & Jaiswal, A. (2005). Omen:
A probabilistic ontology mapping tool. In Y. Gil, et
al. (Eds.), International semantic web conference
KEy TERMS
(Vol.3729, p. 537-547). Springer.
Niles, I., & Pease, A. (2001). Towards a standard upper Articulation Ontology: Articulation ontology
ontology. In Fois 01: Proceedings of the international consists of concepts and relations that are identified
499
Discovering Mappings Between Ontologies
as link among the concepts defined in two separate Ontology Mapping: Ontology Mapping is a process
Ontologies also known as articulation rules. to articulate similarities among the concepts belonging
to separate source Ontologies.
Context Aware Techniques: Techniques that are
focused on nearness of weight assigned to specific rela- Ontology Mediation: Ontology mediation is a
tions among concepts considering application context process that reconciles difference between separate
as basis for mapping. Ontologies to achieve semantic interoperability by
performing alignment, mapping, merging and other
Extension Aware Techniques: Techniques that
required operations.
are focused on finding nearness among features of
available instances of different Ontologies to form Ontology Merging: Ontology Mapping is a process
basis for mapping. that results in generation of a new ontology derived
as a union of two or more source Ontologies of same
Intension Aware Techniques: Based on the In-
subject domain.
formation flow theory, techniques that are focused on
finding two different tokens (instances) belonging to Semantic Similarity Techniques: Techniques
separate Ontologies that maps to single type (concept) that are focused on logic satisfiability as basis of
as a basis for mapping. mapping.
Linguistic Similarity Techniques: Set of tech- String Similarity Techniques: Set of techniques
niques that refer linguistic nearness of concepts in that uses syntactic similarity of concepts as basis of
the form of synonyms, hypernyms, and hyponyms by mapping.
referring to related entries in the thesaurus as basis
Structure Aware Techniques: Techniques that
for mapping.
also consider structural hierarchy of concepts as basis
Ontology Alignment: Ontology Alignment is a of mapping.
process to articulate similarity in the form of one-to-
one equality relation between every elements of two
separate Ontologies.
Ontology Integration: Ontology Integration is a
process that results in generation of a new ontology
derived as a union of two or more source Ontologies
of different but related subject domain.
500
501
Disk-Based Search D
Stefan Edelkamp
University of Dortmund, Germany
Shahid Jabbar
University of Dortmund, Germany
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Disk-Based Search
EXTERNAl MEMORy SEARCH other. By an external scanning of this sorted set, all
AlGORITHMS duplicates are removed. Still, there can be nodes in
this set that have already been expanded in the pre-
External Memory Breadth-First Search vious layers. Munagala and Ranade proved that for
undirected graphs, it is sufficient to subtract only two
Breadth-first search (BFS) is one of the basic search layers, Open(i1) and Open(i2), from Open(i). Since
algorithms. It explores a graph by first expanding the all three lists are sorted, this can be done by a parallel
nodes that are closest to the start node. BFS for ex- external scanning. The accumulated I/O complexity of
ternal memory has been proposed by Munagala and this algorithm is O(|V| + sort(|E|)) I/Os, where |V| is
Ranade (1999). It only considers undirected and explicit for the unstructured access to the adjacency lists, and
(provided beforehand in the form of adjacency lists) sort(|E|) for duplicates removal.
graphs. The working of the algorithm is illustrated on a An implicit graph variant of the above algo-
graph in Fig. 1. Let Open(i) be the set of nodes at BFS rithm has been proposed by Korf (2003). It applies
level i residing on disk. The algorithm builds Open(i) O(sort(|Succ(Open(i1))|) +scan(|Open(i1)| +
from Open(i-1) as follows. Let Succ(Open(i-1)) be the |Open(i2)|))) I/Os in each iteration. Since no explicit
multi-set of successors of nodes in Open(i-1); this set access to the adjacency list is needed (as the state space
is created by concatenating all adjacency lists of nodes is generated on-the-fly), by using i |Succ(Open(i))| =
in Open(i-1). As there can be multiple copies of the O(|E|) and i |Open(i)| = O(|V|), the total execution time
same node in this set the next step is to remove these is bounded by O(sort(|E|)+ scan(|V|)) I/Os.
duplicate nodes. In an internal memory setting this To reconstruct a solution path, we may store pred-
can be done easily using a hash table. Unfortunately, ecessor information with each node on disk (thus
in an external setting a hash-table is not affordable due doubling the state vector size). Starting from the goal
to random accesses to its contents. Therefore, we rely node, we recursively search for its predecessor in the
on alternative methods of duplicates removal that are previous layer through external scanning. The process
well-suited for large data on disk. The first step is to continues until the first layer containing the start node
sort the successor set using external sorting algorithms is reached. Since the Breadth-first search preserves
resulting in duplicate nodes lying adjacent to each the shortest paths in a uniformly weighted graph, the
Figure 1. An example graph (left); Stages of External Breadth-First Search (right). Each horizontal bar cor-
responds to a file. The grey-shaded A(2) and A(2) are temporary files.
502
Disk-Based Search
constructed path is the optimal one. The complexity responding to the same bucket, resulting in one sorted
is bounded by the scanning time of all layers in con- file. This file can then be scanned to remove the duplicate D
sideration, i.e., by O(scan(|V|)) I/Os. nodes from it. In fact, both the merging and removal
of duplicates can be done simultaneously. Another
External Memory Heuristic Search case of the duplicate nodes appears, when the nodes
that have already been evaluated in the upper layers
Heuristic search algorithms utilize some form of guid- are generated again. As in the algorithm of Munagala
ance; be it user-provided or automatically inferred and Ranade, External A* exploits the observation that,
from the problem structure, to hone in on the goal. in an undirected problem graph, duplicates of a node
In practice, such algorithms are very effective when with BFS-level i can at most occur in levels i, i1 and
compared with the blind search algorithms like BFS. i2. In addition, since h is a total function, we have
A* (Hart et al., 1968) is one such algorithm that pri- h(n) = h(n), if n = n. These duplicate nodes can be
oritizes the nodes based on their actual distance from removed by file subtraction for the next active bucket
the start node along with their heuristic estimate to the Open(g+1,h1). We remove any node that has appeared
goal state. Formally, if g(n) represents the path cost for in buckets Open(g,h1) and Open(g1,h1). This file
reaching a node n, and h(n) the heuristic estimate of subtraction can be done by a mere parallel scan of the
reaching the goal starting from n, then A* orders the pre-sorted files and by using a temporary file in which
nodes based on their f-value defined as f(n)=g(n)+h(n). the intermediate result is stored. It suffices to perform
For an efficient implementation of A*, a priority queue the duplicate removal only for the bucket that is to be
data structure is required that allows us to remove the expanded next.
node with the minimum f-value for expansion. If the
heuristic function h is consistent, then on each search Duplicate Detection Scope
path, no successor will have a smaller f-value than the
predecessor. Therefore, A* traversing the node set in The number of previous layers that are sufficient for full
f-order expands each node at most once. duplicate detection in directed graphs, is dependent on
External A* by Edelkamp et al. (2004) maintains the a property of the search graph called locality (Zhou and
search frontier on disk as files. Each file corresponds Hansen, 2006). In the following, we generalize their
to an external representation of a bucket-based priority concept to weighted and directed search graphs. For a
queue data structure. A bucket is a set of nodes sharing problem graph with node set V, discrete cost function
common properties. In a 1-level bucket implementation c, successor set Succ, initial state s, and being defined
of A* by Dial (1969) each bucket addressed with index as the minimal cost between two states, the shortest-
i contains all nodes u that have priority f(n)=i. A refine- path locality is defined as L = max{(s,n) (s,n) +
ment proposed by Jensen et al. (2002) distinguishes c(n,n) | n, nV, n Succ(n)}. In unweighted graphs,
between nodes with different g-values, and designates we have c(n,n)=1 for all n, n. Moreover, (s,n) and
bucket Open(i,j) to all nodes n with path length g(n)=i (s,n) differ by at most 1, so that the locality is 2,
and heuristic estimate h(n)=j. An external memory rep- which is consistent with the observation of Munagala
resentation of this data structure memorizes each bucket and Ranade.
in a different file. During the exploration process, only The locality determines the thickness of the search
nodes from Open(i,j) with i+j=f are expanded, up to its frontier needed to prevent duplicates from appearing
exhaustion. Buckets are selected in lexicographic order in the search. While the locality is dependent on the
for (i,j). By that time, the buckets Open(i,j) with i<i graph, the duplicate detection scope also depends on
and i+j=f are closed, whereas the buckets Open(i,j) the search algorithm applied. For BFS, the search tree
with i+j>f or with i>i and i+j=f are open. Depending is generated with increasing path lengths (number of
on the expansion progress, nodes in the active bucket edges), while for weighted graphs the search tree is
are either open or closed. generated with increasing path cost (this corresponds
It is practical to pre-sort buffers in one bucket im- to Dijkstras algorithm in the one-level bucket priority
mediately by an efficient internal sorting algorithm to queue data structure).
ease merging. Duplicates within an active bucket are In a positively weighted search graph, the number
eliminated by merging all the pre-sorted buffers cor- of buckets that need to be retained to prevent duplicate
503
Disk-Based Search
Jabbar, S. & Edelkamp, S. (2006). Parallel external Delayed Duplicate Detection: In difference to hash
directed model checking with linear I/O. Conference on tables that eliminate duplicate states on-the-fly during
Verification, Model Checking and Abstract Interpreta- the exploration, the process of duplicate detection can
tion (VMCAI), LNCS, 3855:237252. be delayed until a large set of states is available. It is
very effective in external search, where it is efficiently
Jensen, R.M., Bryant, R.E., & Veloso, M.M. (2002).
achieved by external sorting and scanning.
SetA*: An efficient BDD-based heuristic search algo-
rithm. National Conference on Artificial Intelligence Graph: A set of nodes connected through edges. The
(AAAI), 668673. node at the head of an edge is called as the target and
at the tail as the source. A graph can be undirected, i.e.,
Korf, R.E. (2003). Breadth-first frontier search with
it is always possible to return to the source through the
delayed duplicate detection. Model Checking and
same edge the converse is a directed graph. If given
Artificial Intelligence (MOCHART), 8792.
beforehand in the form of adjacency lists (e.g., a road
network), we call it an explicit graph. Implicit graphs
another name for state spaces are generated on-
505
Disk-Based Search
the-fly from a start node and a set of rules/actions to Search Algorithm: An algorithm that when given
generate the new states (e.g., a checkers game). two graph nodes, start and goal, returns a sequence
of nodes that constitutes a path from start to the goal,
Heuristic Function: A function that assigns a node,
if such a sequence exists. A search algorithm gener-
an estimated distance to the goal node. For example, in
ates the successors of a node through an expansion
route-planning, the Euclidean distance can be used as
process, after which, the node is termed as a closed
a heuristic function. A heuristic function is admissible,
node. The newly generated successors are checked for
if it never overestimates the shortest path distance. It is
duplicates, and when found as unique, are added to the
also consistent, if it never decreases on any edge more
set of open nodes.
than the edge weight, i.e., for a node n and its successor
n, h(n) h(n) c(n,n). Value Iteration: Procedure that computes a policy
(mapping from states to action) for a probabilistic or
Memory Hierarchy: Modern hardware has a
non-deterministic search problem most frequently in
hierarchy of storage mediums: starting from the fast
form of a Markov Decision Problem (MDP).
registers, the L1 and L2 caches, moving towards RAM
and all the way to the slow hard disks and tapes. The
latency timings on different levels differ considerably,
e.g., registers: 2ns, cache: 20ns, hard disk: 10ms, tape:
1min.
Model Checking: It is an automated process that
when given a model of a system and a property specifi-
cation, checks if the property is satisfied by the system
or not. The properties requiring that something bad will
never happen are referred as safety properties, while
the ones requiring that something good will eventually
happen are referred as liveness properties.
506
507
Makoto Yokoo
Kyushu University, Japan
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Distributed Constraint Reasoning
such problem can be translated into problems where all not use synchronizations, namely where participants
non-shared constraints are unary (constraints involv- are at no point aware of the current state of other par-
ing only one variable), also called domain constraints. ticipants, are flexible but more difficult to design. With
Here one can assume that there exists a single unary the exception of a few solution detection techniques
constraint for each variable. It is due to the fact that any (Yokoo & Hirayama, 2005), (Silaghi & Faltings, 2005),
second unary constraint can be reformulated on a new most approaches gather the answer to the problem by
variable, required to be equal to the original variable. reading the state of agents after the system becomes
The agent holding the unique domain constraint of a idle and reaches the so called quiescence state (Yokoo
variable is called the owner of that variable. Due to et al., 1998). Algorithms that eventually reach quies-
the availability of this transformation many solutions cence are also called self-stabilizing (Collin, Dechter,
focus on the case where only the unary constraints are & Katz, 1991). A complete algorithm is an algorithm
not shared by everybody (also said to be private to the that guarantees not to miss any existing solution. A
agents that know them). Another common simplification sound algorithm is a technique that never terminates
consists in assuming that each agent has a single unary in a suboptimal state.
constraint (i.e., a single variable). This simplification Another challenge picked by distributed constraint
does not reduce the generality of the addressable prob- reasoning research consists of providing privacy for the
lems since an agent can participate in a computation sub-problems known by agents (Yokoo et al., 1998).
under several names, e.g., one instance for each unary The object of privacy can be of different types. The
constraint of the original agent. Such false identities existence of a constraint between two variables may
for an agent are called pseudo-agents (Modi, Shen, be secret as well as the existence of a variable itself.
Tambe, & Yokoo, 2005), or abstract agents (Silaghi Many approaches only try to ensure the secrecy of
& Faltings, 2005). the constraints, i.e., the hiding of the identity of the
The second way of distributing a problem is in terms valuations that are penalized by that constraint. For
of who may propose instantiations of a variable. In such optimization problems one also assumes a need to
an approach each variable may be assigned a value keep secret the amount of the penalty induced by the
solely by a subset of the agents while the other agents constraint. As mentioned previously, it is possible to
are only allowed to reject the proposed assignment. model such problems in a way where all secret con-
This distribution is similar to restrictions seen in some straints are unary (approach known as having private
societies where only the parliament may propose a ref- domains). Some problems may have both secret and
erendum while the rest of the citizens can only approve public constraints. Such public constraints may be used
or reject it. Approaches often assume the simultaneous for an efficient preprocessing prior to the expensive
presence of both ways of distributing the problem. They negotiation implied by secret constraints. Solvers
commonly assume that the only agent that can make that support guarantees of privacy at any cost employ
a proposal on a variable is the agent holding the sole cryptographic multi-party computations (Yao 1982).
unary constraint on that variable, namely its owner There exist several cryptographic technologies for such
(Yokoo, Durfee, Ishida, & Kuwabara, 1998). When computations, and some of them can be used inter-
several agents are allowed to propose assignments of changeably by distributed problem solvers. However,
a variable, these authorized agents are called modifiers some of them offer information theoretical security
of that variable. An example is where each holder of a guarantees (Shamir, 1979) being resistant to any amount
constraint on a variable is a legitimate modifier of that of computation, while others offer only cryptographic
variable (Silaghi & Faltings, 2005). security (Cramer, Damgaard, & Nielsen, 2000) and
can be broken using large amounts of computation or
quantum computers. The result of a computation may
BACKGROUND reveal secrets itself and its damages can be reduced by
being careful in formulating the query to the solver.
The first challenge addressed was the development of For example, less information is lost by requesting
asynchronous algorithms for solving distributed prob- the solution to be picked randomly than by request-
lems. Synchronization forces distributed processes to ing the first solution. The computations can be done
run at the speed of the slowest link. Algorithms that do cryptographically by a group of semi-trusted servers,
508
Distributed Constraint Reasoning
or they can be performed by participants themselves. predecessors of Ai and the agents {Ai+1, ..., An}, are its
A third issue in solving distributed problems is raised successors. D
by the size and the dynamism of the system.
To understand the generality and limitations of this
restriction, consider a conference organization problem
DISTRIBUTED CONSTRAINT with 3 variables x1 (time), x2 (place), and x3 (general
REASONING chair) and 3 constraints 12 (between x1 and x2), 23
(between x2 and x3), and 13 (between x1 and x3), where
Framework Alice has 12, Bob enforces 23, and Carol is interested
in 13, Figure 3.
A common definition of a distributed constraint op-
timization problem (DisCOP) (Modi et al., 2005) This problem can be modeled as a DisCOP with 4
consists of a set of variables X={x1, ..., xn} and a set agents. Alice uses two agents, A1 and A2. The original
of agents A={A1, ..., An}, each agent Ai holding a set participant is called physical agent and the agents of
of constraints. Each variable xi can be assigned only the model are called pseudo-agents. Bob uses the agent
with those values which are allowed by a domain A3 and Carol uses an agent A4. The new variable x4 of
constraint Di. A constraint j on a set of variables Xj the agent A4 is involved in a ternary constraint 134
is a function associating a positive numerical value to with x1 and x3. The constraint 134 is constructed such
each combination of assignments to the variables Xj. that its projection on x1 and x2 is 13.
The typical challenge is to find an assignment to the
variables in X such that the sum of the values returned However the restricted framework cannot help gen-
by the constraints of the agents is minimized. eral purpose algorithms to learn and exploit the fact that
A tuple of assignments is also called a partial solu- agents A1 and A2 know each others constraints. It also
tion. A restriction often used with DisCOPs requires requires finding an optimal value for the variable x4,
that each agent Ai holds only constraints between xi which is irrelevant to the query. To avoid aforementioned
and a subset of the previous variables, {x1,...,xi-1}. limitations some approaches remove the restriction on
Also, for any agent Ai, the agents {A1, ..., Ai-1} are the which variables can be involved in the constraints of
an agent and can obtain some improvements in speed
509
Distributed Constraint Reasoning
(Silaghi & Faltings, 2005). Other frameworks typi- If the cost assigned to each message is 0 and the cost
cally used with hill-climbing solvers, with solvers that of a constraint check is 1, the obtained value gives the
reorder agents, and with arc consistency, assume that number of non-concurrent constraint checks (NCCC)
each agent Ai knows all the constraints that involve the (Meisels, Kaplansky, Razgon, & Zivan, 2002). When
variable xi. This implies that any constraint between a constraint check is assumed to cost a fraction of a
two variables xi and xj is known by both agents Ai and message then the obtained value gives the equivalent
Aj. In general, a problem modeled as a DisCOP where NCCCs (ENCCCs). One can evaluate the actual fraction
any private constraint may be hold by any agent can between message latencies and constraint checks in the
be converted to its dual representation in order to operating point (OP) of the target application (Silaghi
obtain a model with this framework. When penalties & Yokoo, 2007). However many distributed solvers
for constraint violation can only take values in {0,}, do not check constraints directly but via nogoods and
corresponding to {true, false}, one obtains distributed there is no standardized way of accounting the handling
constraint satisfaction problems. of the latter ones.
A protocol is a set of rules about what messages may
be exchanged by agents, when they may be sent, and Techniques
what may be contained in their payload. A distributed
algorithm is an implementation of a protocol as it speci- Solving algorithms span the range between full cen-
fies an exact sequence of operations to be performed as tralization, where all constraints are submitted to a
a response to each event, such as start of computation central server that returns a solution, through incre-
or receipt of a message. Autonomous self-interested mental centralization (Mailler & Lesser, 2004), to very
agents are more realistically expected to implement decentralized approaches (Walsh, Yokoo, Hirayama,
protocols rather than to strictly adhere to algorithms. & Wellman, 2003).
Protocols can be theoretically proved correct. However The Depth-First Search (DFS) spanning trees of
experimental validation and efficiency evaluation of a the constraint graph proves useful for distributed
protocol is done by assuming that agents strictly follow DisCOP solvers. When used as a basis for ordering
some algorithm implementing that protocol. agents, the assignment of any node of the tree makes
its subtrees independent (Collin, Dechter, & Katz,
Efficiency Metrics 2000). Such independence increases parallelism and
decreases the complexity of the problem. The struc-
The simplest metric for evaluating DisCOP solvers uses ture can be exploited in three ways. Subtrees can be
the time from the beginning of a distributed computa- explored in parallel for an opportunistic evaluation of
tion to its end. It is possible only with a real distributed the best branch, reminding of iterative A* (Modi et
system (or a very realistic simulation). The network al., 2005). Alternatively a branch and bound approach
load for benchmarks is evaluated by counting the total can systematically evaluate different values of the root
number of messages exchanged or the total number of for each subtree (Chechetka & Sycara, 2006). A third
bytes exchanged. The total time taken by a simulator approach uses dynamic programming to evaluate the
yields the efficiency of a DisCOP solver when used as a DFS trees from leaves towards the root (Petcu & Falt-
weighted CSP solver. Another common metric is given ings, 2006).
by the highest logic clocks (Lamport, 1978) occurring Asynchronous usage of lookahead techniques based
during the computation. Lamports logic clocks asso- on maintenance of arc consistency and bound consis-
ciate a cost with each message and another cost with tency require handling of interacting data structures
each local computation. When the cost assigned to each corresponding to different concurrent computations.
message is 1 and the cost for local computations is 0, Concurrent consistency achievement processes at dif-
the obtained value gives the longest sequential chain of ferent depths in the search tree have to be coordinated
causally ordered messages (Silaghi & Faltings, 2005). giving priority to computations at low depths in the
When all message latencies are identical, this metric tree (Silaghi & Faltings, 2005).
is equivalent to the number of rounds of a simulator The concept at the basis of many asynchronous
where at each round an agent handles all messages algorithms is the nogood, namely a self contained
received in the previous round (Yokoo et al., 1998). statement about a restriction to the valuations of the
510
Distributed Constraint Reasoning
variables, inferred from the problem. A generalized Figure 3, constraints are represented as Boolean values
valued nogood has the form [R,c,T] where T specifies in an array. The ith value in this array set to T signi- D
a set of partial solutions {N1,...,Nk} for which the set of fies that the constraints of Ai are used in the inference
constraints R specifies a penalty of at least c. A common of that nogood. The agents start selecting values for
simplification, called valued nogood (Dago & Verfaille, their variables and announce them to interested lower
1996), refers to a single partial solution, [R,c,N]. Priority priority agents. The first exchanged messages are ok?
induced vector clock timestamps called signatures can messages sent by A1 to both successors A2 and A3 and
be used to arbitrate between conflicting assignments proposing the assignment x1=1. A2 sends an ok? mes-
concurrently proposed by several modifiers (Silaghi & sage to A3 proposing x2=2.
Faltings, 2005). They can also handle other types of A3 detects a conflict with x1, inserts A1 in its DFS
conflicting proposals, such as new ordering. tree ancestors list, and sends a nogood with cost 2 to
ADOPT-ing is an illustrative algorithm unifying A1 (message 3). A1 answers the received nogood by
the basic DisCSP and DisCOP solvers ABT (Yokoo et switching its assignment to a value with lower cur-
al., 1998) and ADOPT (Modi et al., 2005). It works by rent estimated value, x1=2 (message 4). A2 reacts by
having each agent concurrently chose for its variable the switching x2 to its lowest cost value, x2=1 (message 5).
best value given known assignments of predecessors A3 detects a conflict with x2 and inserts A2 in its ances-
and cost estimations received from successors (Silaghi tors list, which becomes {A1, A2}. A3 also announces
& Yokoo 2007). Each agent announces its assignments the conflict to A2 using the nogood message 6. This
to interested successors using ok? messages. Agents are nogood received by A2 is combined with the nogood
interested in variables involved in their constraints or locally inferred by A2 for its value 2 due to the constraint
nogoods. When a nogood is received, agents announce x1x2(#4). That inference also prompts the insertion of
new interests using add-link messages. A forest of A1 in the ancestors list of A2. The obtained nogood is
DFS trees is dynamically built. Initially each agent is therefore sent to A1 using message 7. A1 and later A2
a tree, having no ancestors. When a constraint is first switch their assignments to the values with the lowest
used, the agent adds its variables to his ancestors cost, attaching the latest nogoods received for those
list and defines his parent in the DFS tree as the clos- values as threshold nogoods (messages 8, 9 and 10).
est ancestor. Ancestors are announced of their own At this moment the system reaches quiescence.
new ancestors. Nogoods inferred by an agent using
resolution on its nogoods and constraints are sent to
targeted predecessors and to its parent in the DFS FUTURE TRENDS
tree using nogood messages, to guarantee optimality.
Known costs of DFS subtrees for some values can be The main remaining challenges with distributed con-
announced to those subtrees using threshold nogoods straint reasoning are related to efficient ways of achiev-
attached to ok? messages. ing privacy and with handling very large problems.
Example
CONClUSION
An asynchronous algorithm could solve the problem in
Figure 2 using the trace in Figure 3. In the messages of The distributed constraint reasoning paradigms allow
easy specification of new problems. The notion varies
largely between almost any two researchers. It can
Figure 3. The constraint graph of a DisCOP. The fact refer to the distribution of subproblems or it can refer
that the penalty associated with not satisfying the con- to the distribution of authority in assigning variables.
straint x1x2 is 4, is denoted by the notation (#4). The reason and goals of the distribution vary as well,
where either privacy of constraints, parallelism in
computation, or size of data are cited as major concern.
Most algorithms can be easily translated from one
framework to the other, but they may not be appropri-
ate for a new goal.
511
Distributed Constraint Reasoning
Figure 4. Simplified trace of an asynchronous solver (ADOPT-ing (Silaghi & Yokoo, 2007)) on the problem in
Figure 3
512
Distributed Constraint Reasoning
lem: Formalization and Algorithms. In IEEE TKDE, DisCOP: Distributed Constraint Optimization
10(5) 673-685. Problem framework (also DCOP). D
Yokoo, M., & Hiramaya, K. (2005). The Distributed DisCSP: Distributed Constraint Satisfaction Prob-
Breakout Algorithm. Artificial Intelligence Journal. lem framework (also DCSP).
161(1-2), 229-246.
Nogood: A logic statement about combinations of as-
signments that are penalized due to some constraints.
Optimality: The quality of an algorithm of return-
KEy TERMS
ing only solutions that are at least as good as any other
solution.
Agent: A participant in a distributed computation,
having its own constraints. Quiescence: The state of being inactive. The system
will not change without an external stimulus.
Constraint: A relation between variables specifying
a subset of their Cartesian product that is not permit-
ted. Optionally it can also specify numeric penalties
for those tuples.
513
514
INTRODUCTION BACKGROUND
AI models are often categorized in terms of the con- The invention of the backpropagation algorithm (Ru-
nectionist vs. symbolic distinction. In addition to being melhart, Hinton, & Williams 1986) led to a flurry of
descriptively unhelpful, these terms are also typically research in which neurally inspired models were ap-
conflated with a host of issues that may have nothing plied to tasks for which the use of traditional AI data
to do with the commitments entailed by a particular structures and algorithms were commonly assumed to
model. A more useful distinction among cognitive rep- be the only viable approach. A compelling feature of
resentations asks whether they are local or distributed these new models was that they could discover the
(van Gelder 1999). representations best suited to the modelling domain,
Traditional symbol systems (grammar, predicate unlike the manmade representations used in traditional
calculus) use local representations: a given symbol has AI. These discovered or learned representations were
no internal content and is located at a particular address typically vectors of numbers in a fixed interval like [0,
in memory. Although well understood and successful in 1], representing the values of the hidden variables. A
a number of domains, traditional representations suffer statistical technique like principal component analysis
from brittleness. The number of possible items to be could be applied to such representations, revealing inter-
represented is fixed at some arbitrary hard limit, and a esting regularities in the training data (Elman 1990).
single corrupt memory location or broken pointer can Issues concerning the nature of the representa-
wreck an entire structure. tions learned by backpropagation led to criticisms of
In a distributed representation, on the other hand, this work. The most serious of these held that neural
each entity is represented by a pattern of activity networks could not arrive at or exploit systematic,
distributed over many computing elements, and each compositional representations of the sort used in tra-
computing element is involved in representing many ditional cognitive science and AI (Fodor & Pylyshyn
different entities (Hinton 1984). Such representa- 1988). A minimum requirement noted by critics was
tions have a number of properties that make them that a model that could represent e.g. the idea John loves
attractive for knowledge representation (McClelland, Mary should also be able to represent Mary loves John
Rumelhart, & Hinton 1986): they are robust to noise, (systematicity) and to represent John, Mary, and loves
degrade gracefully, and support graded comparison individually in the same way in both (compositionality).
through distance metrics. These properties enable fast Critics claimed that neural networks are in principle
associative memory and efficient comparison of entire unable to meet this requirement.
structures without unpacking the structures into their Systematicity and compositionality can be thought
component parts. of as the outcome of two essential operations: binding
This article provides an overview of distributed and bundling. Binding associates fillers (John, Mary)
representations, setting the approach in its historical with roles (lover, beloved). Bundling combines role/
context. The two essential operations necessary for filler bindings to produce larger structures. Crucially,
building distributed representation of structures bind- representations produced by binding and bundling
ing and bundling are described. We present example must support an operation to recover the fillers of
applications of each model, and conclude by discussing roles: it must be possible to ask Who did what to
the current state of the art. whom? questions and get the right answer. Starting
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Distributed Representation of Compositional Structure
around 1990, several researchers began to focus their In order to recover the fillers, a corresponding set
attention on building models that could perform these of matrices must be trained to decode the vectors pro- D
operations reliably. duced by the encoder matrices. Together, the encoder
and decoder matrices form an autoassociator network
(Ackley, Hinton, & Sejnowski 1985) that can be trained
VARIETIES OF DISTRIBUTED with backpropagation. The only additional constraint
REPRESENTATION needed for backprop is that the vector/matrix products
be passed through a limiting function, like the sigmoidal
This article describes the various approaches found in squashing function f(x) = 1 / (1 + e-x), whose output
the recent neural network literature to implementing falls in the interval (0,1). Figure 1 shows an example
the binding and bundling operations. Although several of autoassociative learning for a simple hypothetical
different models have been developed, they fall into structure, using three roles, with N = 4. The same net-
one of two broad categories, based on the way that work is shown at different stages of training (sub-tree
roles are represented and how binding and bundling and full tree) during a single backprop epoch. Note that
are performed. the network devises its own compositional represen-
tations on its intermediate (hidden) layer, based on
Recursive Auto-Associative Memory arbitrary binary vectors chosen by the experimenter.
Unlike these binary vectors (black and white units), the
In Recursive Auto-Associative Memory, or RAAM intermediate representations can have values between
(Pollack 1990), fillers are represented as relatively zero and one (greyscale).
small vectors (N=10-50 elements) of zeros and ones. Once the RAAM network has learned a set of
Roles are represented as N N matrices of real values, structures, the decoder sub-network should be able to
and role/filler binding as the vector/matrix product. recursively unpack each learned representation into its
Bundling is performed by element-wise addition of constituent elements. As shown in Figure 2, decoding is
the resulting vectors. There are typically two or three a recursive process that terminates when the decoders
role matrices, representing general role categories like output is similar enough to a binary string and continues
agent and patient, plus another N N matrix must to otherwise. In the original RAAM formulation, similar
represent the predicate (loves, sees, knows, etc.). Be- enough was determined by thresholds: if a units value
cause all vectors are the same size N, vectors containing was above 0.8, it was considered to be on, and if it was
bindings can be used as fillers, supporting structures of below 0.2 it was considered to be off.
potentially unlimited complexity (Bill knows Fred said RAAM answered the challenge of showing how neu-
John loves Mary.) The goal is to learn a set of matrix ral networks could represent compositional structures
values (weights) to encode a set of such structures. in a systematic way. The representations discovered by
RAAM could be compared directly via distance metrics,
Figure 1. Learning the structure (knows bill (loves john mary))) with RAAM
515
Distributed Representation of Compositional Structure
and transformed in a rule-like way, without having to analysis (PCA) as a means of learning internal rep-
recursively decompose the structure elements as in tra- resentations (Callan 1996). This approach yielded
ditional localist models (Chalmers 1990). Nevertheless, the ability to learn a much larger set of structures in
the model failed to scale up reliably to data sets of more many fewer iterations (Voegtlin & Dominey 2005),
than a few dozen different structures. This limitation and showed generalization similar in some respects to
arose from the termination test, which created a variety what has been observed for children acquiring a first
of halting problem: decoding often terminated too language (Tomasello 1992).
soon or continued indefinitely. In addition, encodings
of novel structures were typically decoded to already- Vector Symbolic Architectures
learned structures, a failure in generalization.
A number of solutions were developed to deal Vector Symbolic Architectures is a term coined by Gayler
with this problem. One solution (Levy and Pollack (2003) for a general class of distributed representation
2001) built on the insight that the RAAM decoder is models that implement binding and bundling directly,
essentially an iterated function system, or IFS (Barnsley without an iterative learning algorithm of the sort used
1993). Use of the sigmoidal squashing function en- by RAAM. These models can trace their origin to
sures that this IFS has an attractor, which is the infinite the Tensor Product model of Smolensky (1990). Ten-
set (Cantor dust) of vectors reachable on an infinite sor-product models represent both fillers and roles as
number of feedback iterations from any initial vector vectors of binary or real-valued numbers. Binding is
input to the decoder. A more natural termination implemented by taking the tensor (outer) product of a
test is to check whether the output is a member of the role vector and a filler vector, resulting in a mathemati-
set of vectors that make up this attractor. Fixing the cal object (matrix) having one more dimension than the
numerical precision of the decoder results in a finite filler. Given vectors of sufficient length, each tensor
number of representable vectors, and a finite time to product will be unique. As with RAAM, bundling can
reach the attractor, so that membership in the attractor then be implemented as element-wise addition (Figure
can be determined efficiently. 3), and bundled structures can be used as roles, opening
This approach to the termination test produced a the door to recursion. To recover a filler (role) from a
RAAM decoder that could store a provably infinite bundled tensor product representation, the product is
number of related structures (Melnik, Levy, & Pollack simply divided by the role (filler) vector.
2001). Because the RAAM network was no longer an Because the dimension of the tensor product
autoassociator, however, it was not clear what sort of increases with each binding operation, this method
algorithm could replace backpropagation for learning suffers from the well-known curse of dimensional-
a specific, finite set of structures. ity (Bellman 1961). As more recursive embedding
The other solution to the RAAM scaling problem is performed, the size of the representation grows
discarded the nonlinear sigmoidal squashing function exponentially. The solution is to collapse the N N
and replaced backprop with principal components role/filler matrix back into a length-N vector. As shown
516
Distributed Representation of Compositional Structure
in Figure 4, there are two ways of doing this. In Binary Fourier Transform, which is O(N log N). The price
Spatter Coding, or BSC (Kanerva 1994), only the ele- paid is that most of the crucial operations (circular D
ments along the main diagonal are kept, and the rest convolution, vector addition) are a form of lossy com-
are discarded. If bit vectors are used, this operation is pression that introduces noise into the representations.
the same as taking the exclusive or (XOR) of the two The introduction of noise requires that the unbinding
vectors. In Holographic Reduced Representations, or process employ a cleanup memory to restore the
HRR (Plate 1991), the sum of each diagonal is taken, fillers to their original form. The cleanup memory can
with wraparound (circular convolution) keeping the be implemented using Hebbian auto-association, like
length of all diagonals equal. Both approaches use very a Hopfield Network (Hopfield 1982) or Brain-State-
large (N > 1000 elements) vectors of random values in-a-Box model (Anderson, Silverstein, Ritz, & Jones
drawn from a fixed set or interval. 1977). In such models the original fillers are attractor
Despite the size of the vectors, VSA approaches basins in the networks dynamical state space. These
are computationally efficient, requiring no costly methods can be simulated by using a table that stores
backpropagation or other iterative algorithm, and can the original vectors and returns the one closest to the
be done in parallel. Even in a serial implementation, noisy version.
the BSC approach is O(N) for a vector of length N, and
the HRR approach can be implemented using the Fast
517
Distributed Representation of Compositional Structure
Both the RAAM and VSA approaches have provided Ackley, D. H, Hinton, G., & Sejnowski, T. (1985) A
a basis for a significant amount of research in AI and Learning Algorithm For Boltzmann Machines. Cogni-
cognitive science. The invention of the simplified linear tive Science 9, 147169.
RAAM, trained by relatively fast iterative method us-
Anderson, J., Silverstein, J., Ritz, S., & Jones, R. (1977)
ing principal components (Voegtlin & Dominey 2005)
Distinctive Features, Categorical Perception, and Prob-
makes RAAM a practical approach to representing
ability Learning: Some Applications of a Neural Model.
structure in distributed representation. Recent work
Psychological Review 84 (5), 413451.
in modelling language acquisition (Dominey, Hoen,
& Inui 2006) hints at the possibility that RAAM may Barnsley, M. F. (1993) Fractals Everywhere. New
continue to thrive in its linear incarnation. York: Academic Press.
On the other hand, the simplicity of the VSA ap-
proach, using vector arithmetic operations available Bellman, R. (1961) Adaptive Control Processes. Princ-
as primitives in many programming environments, eton University Press.
has made it increasingly popular for modelling a va- Callan, R. E. (1996) Netting the Symbol: Analytically
riety of cognitive tasks especially those requiring Deriving Recursive Connectionist Representations of
the integration of different types of information. For Symbol Structures. Ph.D. Dissertation, Systems Engi-
example, the DRAMA analogy model of Eliasmith & neering Faculty, Southampton Institute, England.
Thagard (2001) shows how HRR be used to integrate
semantic and structural information in a way that Chalmers, D. (1990). Syntactic Transformations On
closely parallels results seen in analogy experiments Distributed Representations. Connection Science 2,
with human subjects. HRR has also been successfully 5362.
applied to representing word order and meaning in a Dominey P.F., Hoen, M., Inui, T. (2006) A Neurolin-
unified way, based on exposure to language data (Jones guistic Model of Grammatical Construction Processing.
& Mewhort 2007). Journal of Cognitive Neuroscience 18, 2088-2107.
Eliasmith, C. & Thagard, P. (2001) Integrating Struc-
CONClUSION ture and Meaning: A Distributed Model of Analogical
Mapping. Cognitive Science 25(2), 245286.
This article has presented two popular methods for Elman, J. (1990) Finding Structure in Time. Cognitive
representing structured information in a distributed Science 14, 179211.
way: Recursive Auto-Associative Memory (RAAM)
and Vector Symbolic Architectures (VSA). The main Fodor, J. & Pylyshyn, Z. (1988) Connectionism and
difference between the two approaches lies in the Cognitive Architecture: A Critical Analysis. Cognition
way that representations are learned: RAAM uses 28, 371.
an iterative algorithm to converge on a single stable Gayler, R.(2003) Vector Symbolic Architectures Answer
representation for a set of structures, whereas VSA Jackendoffs Challenges for Cognitive Neuroscience. In
assembles structures in a one-shot learning stage, Slezak, P., ed.: ICCS/ASCS International Conference
but the representation of each such structure must be on Cognitive Science. CogPrints, Sydney, Australia,
stored separately. While no single approach is likely to University of New South Wales, 133138.
solve all or even most of the challenges of AI, there is
little doubt that the advantages provided by distributed Hinton, G. (1984) Distributed Representations. Tech-
representations robustness to noise, fast comparison nical Report CMU-CS-84-157, Computer Science
without decomposition, and other psychologically re- Department, Carnegie Mellon University.
alistic features will continue to make them attractive
Hopfield, J. (1982) Neural Networks And Physical
to anyone interested in how the mind works.
Systems with Emergent Collective Computational
518
Distributed Representation of Compositional Structure
Abilities. Proceedings of the National Academy of Voegtlin, T. & Dominey, P.F. (2005) Linear Recursive
Sciences 79, 25542558. Distributed Representations. Neural Networks 18, D
878-895.
Jones, M.N. & Mewhort, D.J.K. (2007) Representing
Word Meaning and Order Information in a Composite
Holographic Lexicon. Psychological Review 114 (1),
1-37. KEy TERMS
Kanerva, P. (1994) The Binary Spatter Code for En-
Binary Spatter Codes (BSC): VSA using bit vectors
coding Concepts at Many Levels. In Marinaro, M.
and element-wise exclusive-or (XOR) or multiplication
& Morasso, P., eds.: ICANN 94: Proceedings of the
for role/filler binding.
International Conference on Artificial Neural Networks,
Volume 1. London: Springer-Verlag, 226229. Binding: In its most general sense, a term used to
describe the association of values with variables. In
Levy, S. & Pollack, J. (2001). Infinite RAAM: A
AI and cognitive science, the variables are usually a
Principled Connectionist Substrate for Cognitive Mod-
closed set of roles (AGENT, PATIENT, INSTRUMENT)
eling. Proceedings of ICCM2001. Lawrence Erlbaum
and the values an open set of fillers (entities).
Associates.
Bundling: VSA operation for combining several
McClelland, J., Rumelhart, D., & Hinton, G.: (1986)
items into a single item, through vector addition.
The Appeal of Parallel Distributed Processing. In Ru-
melhart, D., McClelland, J., eds.: Parallel Distributed Cleanup Memory: Mechanism required to com-
Processing: Explorations in the Microstructure of pensate for noise introduced by lossy compression in
Cognition, Volume 1. MIT Press, 3-44. VSA.
Melnik, O., Levy, S. & Pollack, J.B. (2000). RAAM Distributed Representation: A general method
for Infinite Context-Free Languages. Proceedings of of representing and storing information in which the
IJCNN 2000, IEEE press. representation of each item is spread across the entire
memory, each memory element simultaneously stores
Plate, T. (1991) Holographic Reduced Representa-
the components of more than one item, and items are
tions. Technical Report CRG-TR-91-1, Department of
retrieved by their content rather than their address.
Computer Science, University of Toronto.
Holographic Reduced Representation (HRR):
Rumelhart, D.E., Hinton, G., & Williams, R.J. (1986)
The most popular variety of VSA, uses circular con-
Learning Internal Representations by Error Propagation.
volution to bind fillers to roles and circular correlation
. In Rumelhart, D., McClelland, J., eds.: Parallel Dis-
to recover the fillers or roles from the bindings.
tributed Processing: Explorations in the Microstructure
of Cognition. Volume 1. MIT Press, 318-362. Recursive Auto-Associative Memory (RAAM):
Neural network architecture that uses vector/matrix
Pollack, J. (1990). Recursive distributed representa-
multiplication for binding and iterative learning to
tions. Artificial Intelligence 36, 77105.
encode structures.
Smolensky, P. (1990) Tensor Product Variable Bind-
Tensor Products: Early form of VSA that uses the
ing and the Representation of Symbolic Structures
outer (tensor) product as the binding operation,
in Connectionist Systems. Artificial Intelligence 46,
thereby increasing the dimensionality of the representa-
159216.
tions without bound.
Tomasello, M. (1992). First Verbs: A Case Study In
Vector Symbolic Architecture (VSA): General
Early Grammatical Development. Cambridge Uni-
term for representations that use large vectors of random
versity Press.
numbers for roles and fillers, and fast, lossy compres-
vanGelder, T. (1999) Distributed Versus Local Rep- sion operations to bind fillers to roles.
resentation. In: The MIT Encyclopedia of Cognitive
Sciences. MIT Press, 236238.
519
520
Olivier Lezoray
University of Caen Basse-Normandie, France
Christophe Charrier
University of Caen Basse-Normandie, France
Hubert Cardot
University Franois-Rabelais of Tours, France
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
EA Multi-Model Selection for SVM
order to have a global low multi-class error rate with the i and is characterized by the expected error rate ei for
combination of all binary decision functions involved new datasets with the same binary decomposition. E
in. The search for efficient values of hyper-parameters Each model i contains all hyper-parameters values
is commonly designed by the term of model selection. for training a SVM on dedicated binary sub-problem.
The classical way to achieve optimization of multi-class Expected error rate ei associated to a model i is com-
schemes is an individual model selection for each re- monly determined by cross-validation techniques. All
lated binary sub-problem. This methodology overtones the i models constitute the multi-model = (1,..., k).
that a multi-class scheme based on SVM combination The expected error rate e of a SVM multi-class com-
is optimal when each binary classifier involved in bination scheme is directly dependent on the selected
that combination scheme is optimal on the dedicated multi-model . Let denote the multi-model space
binary problem. But, if it is supposed that a decoding for a multi-class problem (i.e. : ) and i the
strategy can more or less easily correct binary classi- model space for the ith binary sub-problem. The best
fiers errors, then individual binary model selection on * multi-model is the one for which expected error e is
each binary sub-problem cannot take into account error minimum and corresponds to the following optimiza-
correcting possibilities. For this main reason, we are tion problem:
thinking that another way to achieve optimization of
multi-class schemes is a global multi-model selection = arg min e( ) (1)
for binary problems altogether. In fact, the goal is to
have a minimum of errors on a muti-class problem. where e() denotes the expected error e of a multi-class
The selection of all sub-problem models (multi-model combination scheme with the multi-model . The huge
selection) has to be globally performed to achieve that size of the multi-model space ( = i[1,k] i) makes
goal, even if that means that error rates are not optimal the optimization problem (0.1) very hard. To reduce
on all binary sub-problems when they are observed the optimization problem complexity, it is classic to
individually. EA is an efficient meta-heuristic approach use the following approximation:
to realize that multi-model selection.
= {arg min e( ) | i [1, k ]} (2)
i
EA MULTI-MODEL SELECTION
Hypothesis is made that
This section is decomposed in 3 subsections. In the first
section, the multi-model optimization problem for muti- ()
e e(
).
class combination schemes is exposed. More details
than in previous section and useful notations for next
subsections are introduced. In the second section, our This hypothesis also supposes that
EA multi-model selection is exposed. Details on fitness
estimation of multi-model and crossover operator over ( )
e i e ( i
).
them are described. In the third section, experimental
protocol and results with our EA multi-model selec- If it is evident that each individual model i in the best
tion are provided. multi-model * must correspond to efficient SVM (i.e.
low value of ei) on the corresponding ith binary sub-
Multi-Model Optimization Problem problem, all best individual models (1*,,k*) do not
necessarily define the best multi-model *. The first
A multi-class combination scheme induces several reason is that all error rates ei are estimated with some
binary sub-problems. The number k and the nature tolerance and combination of all these deviations can
of binary sub-problems depend on the decomposition have a great impact on the final multi-class error rate
involved in the combination scheme. For each binary e. The second reason is that even if all the binary clas-
sub-problem, a SVM must be trained to produce an sifiers of a combination scheme have identical ei error
appropriate binary decision function hi (1 < i < k). The rates for different multi-models, these binary classifiers
quality of hi is greatly dependent on the selected model can have different binary class predictions for a same
521
EA Multi-Model Selection for SVM
example according to the used multi-model. Indeed to select the ith model in 1 as the ith model in the
multi-class predictions by combining these binary child multi-model . fij denotes the internal fitness
classifiers could be different for a same feature vector of the ith binary classifier with the multi-model
example since the correction involved in a given de- j. For the child multi-models generated by the
coding strategy depends on the nature of the internal
crossover operator, an important advantage is
errors of the binary classifiers (mainly, the number of
that no new SVM training is necessary if all the
errors). Then, multi-class classification schemes with
the same internal-errors ei, but different multi-models related binary classifiers were already trained.
, can have different capacities of generalization. For In contrast, only the BER error rates of all child
all these reasons, we claim that multi-model optimiza- multi-models have to be evaluated. SVM Training
tion problem (0.1) can outperform individual model is only necessary for the first step of the EA and
optimization (0.2). when models go through a mutation operator.
The rest of our EA for multi-model selection is
Evolutionary Optimization Method similar to other EA approaches. First, at initialization
step, a population of multi-models is generated at
Within our AE multi-model selection method, a fitness random. Each model ij (1 i , 1 j k) corresponds
measure f is associated to a multi-model which is all to an uniform random within all possible values of SVM
the more large as the error e associated to is small; this hyper-parameters. New multi-models are produced
enables to solve (0.1) optimization problem. Fitness by combination of multi-models couples selected by a
value is normalized in order to have f=1 when error e Stochastic Universal Sampling (SUS) strategy. A fixed
is zero and f=0 when error e corresponds to a random selective pressure ps is used for the SUS selection. Each
draw. Moreover, the number of examples in each class model ij has a probability of pm/k to mutate (uniform
are not always well balanced for many multi-class random as for the initialization step of EA). Fitness f
datasets; to overcome this, the error e corresponds to a of all child multi-models are then evaluated. A second
Balanced Error Rate (BER). As regards these two key selection step is used to define the population of the
points, the proposed fitness formulation is: next iteration of our EA. individuals are selected by
a SUS strategy (same selective pressure ps is used)
1 1 from both the parents and the children. Its become
f = 1 e (3) the multi-models population in the next iteration. The
1
1 nc number of iterations of EA is fixed to nmax. At the end
nc of the EA, the multi-model with the best fitness f from
with nc denoting the number of classes in a multi-class all these iterations is selected as *.
problem. In the same way, the internal-fitness fi is
defined as fi = 1 2ei for the ith binary classifier with Experimental Results
corresponding BER ei.
The EA crossover operator for the combination of In this section, three well known multi-class datasets
two multi-models 1 and 2 must favor the selection are used: Satimage (nc = 6), Letter (nc = 26) from the
of most efficient models in these two multi-models. Statlog collection (Blacke & Merz, 1998), and USPS
It is worth noting that one should not systematically (nc = 10) dataset (Vapnik, 1998). In (Wu, Lin & Weng,
select all the best models to produce an efficiency child 2004), two sampling sizes of 300/500 and 800/1000
multi-model as explained in previous sub-section. For are used to constitute training/testing datasets. For
each sub-problem, internal-fitness fi1 and fi2 are used to each sampling sizes, 20 random splits are generated.
determine the probability We have used the same sampling sizes and the same
split for the 3 datasets: Satimage, Letter and USPS.
pi =
(f ) i
1 2
Two optimization methods are used for the selection
(4) of the best multi-model * for each training datasets.
(f ) + (f )
i
1 2
i
2 2
The first one is the classical individual model selection
and the second one is our EA multi-model selection.
For both methods, two combination schemes are used:
522
EA Multi-Model Selection for SVM
Table 1. Average BER with individual model selection (column classic) and our EA multi-model selection (column
EA). Negative values in column ( = EA classic) correspond to an improvement of the performance of a multi- E
class combination scheme when our EA multi-model selection method is used.
one-versus-one and one-versus-all (Rifkin & Klautau, optimization method produces SVM combination
2004)1. For each binary problem, a SVM with Gauss- schemes with best generalization capacities than the
ian kernel K(u,v) = exp(||u v||2) is trained (Vapnik, classical one. That effect appears to be more marked
1998). Possible values of SVM hyper-parameters for when number of classes in the multi-class problem
a model are C trade-off SVM constant (Vapnik, 1998) increases. A reason is that the multi-model space search
and widthband g of gaussian kernel function (i (Ci, size exponentially increases with the number k of binary
gi)). For all binary problems: i i = [25, 23,...,215] problems involved in a combination scheme (121k for
[25, 23,...,215]. Individual space model i is based those experiments). This effect is directly linked to the
on grid search techniques (Chang & Lin, 2001). BER number of classes nc and could explain why improve-
e on a multi-class problem and BER ei on binary sub- ments are not measurable with Satimage dataset. In
problems are estimated by five-fold cross-validation some way, a classical optimization method explores the
(CV). These BER values are used by our EA for the multi-model space in blink mode, because cumulate
multi-model selection. Final BER e of a selected effect of the combination of k SVM decision functions
multi-model by our EA is estimated on a test datasets could not be determined without estimation of e. That
not used during the multi-model selection process. effect is emphasized when estimated BER ei are poor
Our EA has several constants that must be fixed and (i.e. training and testing data size are low). Comparison
we have made the following choices: ps = 2, =50, of values when training/testing dataset size change
nmax = 100, pm = 0.01. in table 1 illustrates this one.
Table 1 gives average BER under all 20 split sets of
previously mentioned datasets for each training set size
(row size of table 1). This is done for the two combination FUTURE TRENDS
schemes (one-versus-one and one-versus-all), and for
the two above mentioned selection methods (columns The proposed EA multi-model selection method has
classic and EA). Column provides the average varia- to be tested with other combination schemes (Rifkin
tion of BER between our multi-model selection and & Klautau, 2004), like error-correcting output codes
classical one. Results of that column are particularly in order to measure their influence. Effect with others
important. For two datasets (USPS and Letter) our datasets, which have a great range in number of classes,
523
EA Multi-Model Selection for SVM
must also be tested. Adding feature selection (Frhlich, Lebrun, G., Charrier, C., Lezoray, O., & Cardot, H.
Chapelle & Schlkopf, 2004) abilities to our AE muti- (2005). Fast Pixel Classification by SVM Using Vector
model selection is also of importance. Quantization, Tabu Search and Hybrid Color Space.
Another key point to take into account is the reduc- Computer Analysis of Images and Patterns. (LNCS,
tion of the learning time of our EA method which is Vol. 3691) 685-692.
actually expensive. One way to explore this is to use
Lebrun, G., Charrier, C., Lezoray, O., & Cardot, H.
fast CV error estimation technique (Lebrun, Charrier,
(2006). Speed-up LOO CV with SVM classifier. Intel-
Lezoray & Cardot, 2006) for the estimation of BER.
ligence Data Engineering and Automated Learning.
(LNCS, Vol. 4224) 108-115.
CONClUSION Lebrun, G., Charrier, C., Lezoray, O., & Cardot, H.
(2007). An EA multi-model selection for SVM multiclass
In this paper, a new EA multi-model selection method schemes. Computational and Ambient Intelligence.
is proposed to optimize the generalization capacities 260-267 (LNCS, Vol. 4507).
of SVM combination schemes. The definition of a
Moreira, M., & Mayoraz, E. (1998). Improved Pairwise
cross-over operator based on internal fitness of SVM
Coupling Classification with Correcting Classifiers. Eu-
on each binary problem is the core of our EA method.
ropean Conference on Machine Learning. 160-171.
Experimental results show that our method increases
the generalization capacities of one-versus-one and Price, D., Knerr, S., Personnaz, L., & Dreyfus, G.
one-versus-all combination schemes when compared (1994). Pairwise Neural Network Classifiers with
with individual model selection method. Probabilistic Outputs. Neural Information Processing
Systems. 1109-1116.
Quost, B., Denoeux, T., & Masson, M. (2006). One-
REFERENCES
against-all classifier combination in the framework of
belief functions. Information Processing and Manage-
Beasley, D. (1997). Possible applications of evolution-
ment of Uncertainty in Knowledge-Based Systems.
ary computation. Handbook of Evolutionary Compu-
(1) 356-363.
tation. 97/1, A1.2. IOP Publishing Ltd. And Oxford
University Press. Rechenberg, I. (1965). Cybernetic Solution Path of an
Experimental Problem. Royal Aircraft Establishment
Blacke, C., & Merz, C., (1998). UCI repository of ma-
Library Translation.
chine learning databases. Advances in Kernel Methods,
Support Vector Learning. University of California, Rifkin, R., & Klautau, A. (2004). In Defense of One-
Irvine, Dept. of Information and Computer Sciences. Vs-All Classification. Journal of Machine Learning
Research. (5) 101-141.
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library
for support vector machines. Software available at Vapnik, V.N. (1998). Statistical Learning Theory.
http://www.csie.ntu.edu.tw/\~cjlin/libsvm. Wiley Edition.
Dietterich, T. G., & Bakiri, G. (1995). Solving Multi- Wu, T.-F., Lin, C.-J., & Weng, R. C., (2004). Probability
class Learning Problems via Error-Correcting Output Estimates for Multi-class Classification by Pairwise
Codes. Journal of AI Research. (2) 263-286. Coupling. Journal of Machine Learning Research. (5)
975-1005.
Duan, K.-B., & Keerthi, S. S. (2005). Which Is the Best
Multiclass SVM Method? An Empirical Study. Multiple
Classifier Systems. 278-285.
Frhlich, F., Chapelle, O., & Schlkopf, B. (2004). KEy TERMS
Feature Selection for Support Vector Machines Using
Genetic Algorithms. International Journal on Artificial Cross-Validation: A method of estimating predic-
Intelligence Tools. 13(4) 791-800. tive error of inducers. Cross-validation procedure splits
524
EA Multi-Model Selection for SVM
that dataset into k equal-sized pieces called folds. k Support Vector Machine (SVM): SVM maps input
predictive function are built, each tested on a distinct data in a higher dimensional feature space by using a E
fold after being trained on the remaining folds. non linear function and finds in that feature space the
optimal separating hyperplane maximizing the margin
Evolutionary Algorithm (EA): Meta-heuristic
(that is the distance between the hyperplane and the
optimization approach inspired by natural evolution,
closest points of the training set) and minimizing the
which begins with potential solution models, then it-
number of misclassified patterns.
eratively applies algorithms to find the fittest models
from the set to serve as inputs to the next iteration, Trade-Off Constant of SVM: The trade-off con-
ultimately leading to a sub-optimal solution which is stant, noted C, permit to fix the importance to increase
close to the optimal one. the margin for the selection of optimal hyper-plan
in comparison with reducing predictive errors (i.e.
Model Selection: Model Selection for Support
examples which not respect margin distance from
Vector Machines concerns the tuning of SVM hyper-
hyper-plan separator).
parameters as C trade-off constant and the kernel
parameters.
Multi-Class Combination Scheme: A combination
ENDNOTE
of several binary classifiers to solve a given multiclas
problem. 1
More details on used combinations schemes are
Search Space: Set of all possible situations of the given in (Lebrun, Lezoray, Charrier & Cardot,
problem that we want to solve could ever be in. 2007).
525
526
Beln Gonzlez-Fonteboa
University of A Coroa, Spain
Fernando Martnez-Abella
University of A Coroa, Spain
Copyright 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
EC Techniques
to be coded with the same number of bits. Each one GENETIC PROGRAMMING
of the bits in a gene is usually called allele. Once the E
individuals genotype (the structure for the creation of Genetic Programming (GP), like GAs, is a search
an individual in particular) is defined, we are ready to mechanism inspired on the functioning of living beings.
carry out the evolutionary process that will reach the The greater difference between both methods consists
solution to the problem posed. in the way solutions are coded. In this case, it is carried
We start with a random set of individuals, called out as a tree structure (Koza, 1990) (Koza, 1992). The
a population. Each of these individuals is a potential main goal of GP is to produce solutions to problems
solution to the problem. This would be the initial popu- through program and algorithm induction.
lation or zero generation, and successive generations The general functioning is similar to that of the GAs.
will be created from it until the solution is reached. Nevertheless, due to the difference in solution coding,
The mechanisms used in the individuals evolution are great variations exist in the genetic operations of initial
analogous to the functioning of living beings: solution generation, crossover and mutation. The rest
of operations, selection and replacement algorithms,
Selection of individuals for reproduction. All remain the same, as do the metrics used to evaluate
selection algorithms are based on the choice of individuals (fitness).
individuals by giving higher survival probabili- We will now describe two cases where both tech-
ties to those which offer a better solution to the niques have been applied. They refer to questions
problem, but allowing worse individuals to be related to Structural Concrete, approached to as both
also selected so genetic diversity will not be lost. a material and a structure.
Unselected individuals will be copied through to
the following generation. EXAMPLE 1: Procedure to determine optimal mixture
Once the individuals have been selected, cross- proportion in High Performance Concrete.
over is performed. Typically, two individuals
(parents) are crossed to produce two new ones. A High Performance Concrete (HPC) is a type of concrete
position is established before which the bits will designed to attain greater durability together with high
correspond to one parent, with the rest belonging resistance and good properties in its fresh state to al-
to the other. This crossover is named single point low ease of mixing, placing and curing (Forster, 1994)
crossover, but a number of points could be used, (Neville and Aitcin, 1998) . Its basic components are
where bit subchains separated by points would the same of ordinary concrete, with the intervention in
belong alternatively to one or the other parent. diverse quantities of additions (fly ash or silica fume,
Once the new individuals have been obtained, byproducts of other industries that display pozzolanic
small variations in a low percentage of them are resistance capacities) plus air-entraining and/or flu-
performed. This is called mutation, and its goal idifying admixtures. Indeed, an adequate proportion
is to carry out an exploration in the state space. of these components with quality cement, water and
aggregates produces concrete with excellent behavior,
Once the process is over, the new individuals will generally after an experimental process to adjust the
be inserted in the new population, constituting the next optimal content in each material. When very high re-
generation. sistance is not a requirement, the addition introduced is
New generations will be produced until the popula- fly ash (FA) ; air-entraining admixtures (AE) are used
tion has created a sufficiently adequate solution, the to improve behavior in frost/defrost situations. When
maximum number of generations has been reached, high resistance is needed, FA is substituted by a silica
or the population has converged and all individuals fume (SF) addition, eliminating AE altogether. In every
are equal. case high cement contents and low water/binder (W/B)
ratios are used (both cement and pozzolanic additions
are considered binders).
A number of mixture proportioning methods exist,
based on experimental approaches and developed by
different authors. The product of such mixtures can
527
EC Techniques
be controlled through various tests, two of which are Using multiple regression techniques, the authors
particularly representative of the fresh and hardened firstly obtain for each group two fitted curves group
states of concrete: the measurement of workability and that predict slump and strength from starting variables.
evaluation of compressive strength, respectively. The GAs are used to solve the reverse problem. For each
first one is carried out through the slump test, where group, they first reach an individual which determines
the subsidence of a molded specimen after removal of optimal W/B, W, s/a, FA and AE, or W/B, W, s/a and
the mold is measured. A high value ensures adequate SF for a specific compressive strength. From these
placing of concrete. The second test consists on the parameters, using the prediction curve previously
compression of hardened concrete until failure is obtained, the optimal SP value for a particular slump
produced; HPC can resist compressive stresses in the is calculated.
range of 40 to 120 MPa. The development of genetic algorithms used is
The goal of mixture proportioning methods is to based on programs by Houck et al. (Houck, Joines,
adjust the amount of each component to obtain slump and Kay 1996) In this case, ranking selection based
and strength values within a chosen range. There is a on normalized geometric distribution has been used
large body of experience, and basic mixtures which for individual selection. One-point, two-point, and uni-
require some experimental adjustment are available. form crossover algorithms are used. Different strategies
It is difficult to develop a theoretical model to predict are used for mutation operations: boundary mutation,
slump and resistance of a particular specimen, though multi-nonuniform mutation, nonuniform mutation, and
roughly adequate fitted curves do exist (Neville, 1981). uniform mutation. The first trials were carried out with
It is even harder to approach theoretically the reverse an initial population consisting of only 15 individuals,
problem, that is, to unveil mixture proportioning starting which lead to local minima only. Optimal results are
from the goal results (slump and strength). obtained increasing population to 75 individuals.
Chul-Hyun Lim et al. (Chul-Hyun, Young-Soo and It should be pointed out that the reverse problem is
Joong-Hoon, 2004) have developed a GA based appli- thus solved in two phases. In the first phase all compo-
cation to determine the relationship between different nent amounts are fixed except for SP, with compressive
parameters used to characterize a HPC mixture. Two strength as a target; following this, SP is fixed to attain
types of mixture are considered regarding their goal the desired slump. Initial fitting functions are used as
strength: mixtures that reach values between 40 and a simple but accurate approach to the real database, to
80 MPa and mixtures that reach between 80 and 120 avoid using it directly, which would make solving the
MPa. If a good database is available, it is not hard to reverse problem a more difficult task.
obtain correlations between different variables to predict For the first group, highest errors correspond to
slump and strength values for a particular specimen. AE and SP determination, up to 12.5% and 15% re-
Nevertheless, the authors use GAs to solve the reverse spectively. Errors committed in the second group are
problem, that is, to obtain mixture parameters when smaller. In any case, errors are relatively minor since
input data are slump and strength. these materials are included as admixtures in very small
To that effect they use a database with 104 mixtures amounts. As a conclusion for this example, it is interest-
in the 40 to 80 MPa range and 77 in the 80 to 120 MPa ing to point out that the procedure is not only a useful
range. In the first group, essential parameters are, ac- application of GAs to concrete mixture proportioning,
cording to prior knowledge, W/B ratio (%), amount of but also constitutes by itself a new mixture proportion-
water (W, kg/m3), fine aggregate to total aggregate ratio ing method that requires GAs for its application.
(s/a, %), amount of AE admixture (kg/m3), amount of
cement replaced by FA (%), and amount of high-range EXAMPLE 2: Determination of shear strength in re-
water-reducing admixture (superplasticizer, SP, kg/m3). inforced concrete deep beams
Tests on the different mixtures give slump values (be-
tween 0 and 300 mm) and compressive strength. In the Deep beams are those that can be considered short (span,
second group, essential parameters are W/B, W, s/a, L) in relation to their depth (h). There is no consensus
amount of cement replaced by silica fume (SF, %) and in international codes as to what is the span-to-depth
SP. Tests give out slump and resistance values. threshold value dividing conventional and deep beams.
In the example shown here, L/h ratios between 0.9 and
528
EC Techniques
Asv Ast
h Ash sh
sv Asb
4.14 are considered for simply supported elements Normalized shear strength can be written as R=P/
working under two point loads as shown in Figure (bhfc), where P is failure load (Figure 1).
1. Shear failure is a beam collapse mode produced Ashour et al. (Ashour, Alvarez and Toropov, 2003)
by the generation of excesssive shear stresses in the undertake the task of finding an expression that can
vicinity of the supports. Its value depends on various predict shear strength from these variables. It is not easy
parameters commonly used for beam design (see Fig- to develop an accurate mathematical model capable of
ure 1): a/h, L/h, section area (Asv and Ash), strength (fyv predicting shear strength. Usual curve fitting techniques
and fyh) and distances between vertical and horizontal do not produce good results either, though they have
steel rebars (sv y sh) placed in the failure zone, which been the base of diverse international codes that include
will be respectively called vertical and horizontal design prescriptions for these structural elements. GP
transverse reinforcement; section area (Asb and Ast) appears to be a tool that can reach consistent results.
and strength (fyb and fyt) of longitudinal rebars placed The authors develop various expressions with differ-
in the lower and higher zones of the beam, which will ent complexity levels, obtaining different degrees of
be respectively called bottom and top reinforcement; accuracy. A database of 141 tests available in scientific
compressive strength of concrete used in the beam, literature is used.
and width (b) of the latter. A remarkable feature of GP techniques is that if the
The dependence of shear strength on these param- possibility of using complex operators is introduced,
eters is known through multiple studies (Leonhardt and fitting capacity is impressive when variables are well
Walther, 1970) and it can be presented in a normalized distributed and their range is wide. Notwithstanding, if
form through simple relationships. Relevant parameters the goal of the process is to obtain an expression that
are reduced to the following: can be used by engineers, one of its requirements is
simplicity. The authors of this study take into account
X1=a/h this premise and only choose as operators addition,
X2=L/h multiplication, division and squaring. A first application
X3=(Asvfyv)/(bsvfc) leads to a complex expression that reveals the very low
X4=(Ashfyh)/(bshfc) influence of parameter X2, which is thus eliminated.
X5=(Asbfyb)/(bhfc) With the 5 remaining variables, a simple and accurate
X6=(Astfyt)/(bhfc) expression is obtained (root mean square RMS- train-
ing error equal to 0.033; average ratio between predicted
and real R equal to 1.008 and standard deviation equal
529
EC Techniques
530
EC Techniques
531
532
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
E-Learning in New Technologies
533
E-Learning in New Technologies
New Technologies Feature 2 the same or related subjects and using the computer
games strategies that give rise to investigate.
In order to get it, some intelligent agents can take
charge of selecting and showing, in any case, the suit- New Technologies Feature 5
able information (from the information repository)
according to the preferences and level of each student. When the contents of the course are part of an Institu-
These agents can perform different tasks, and divide tional Memory, the existence of a global ontology can
them between the users computers and the server where facilitate the display of the elements remarking the con-
the Institutional Memory is stored. nections between them. Besides, as alleged previously,
the use of intelligent agents allows us to show these
Pedagogical Characteristic 3 connections according to the individual preferences.
The strategies utilized in computer games, including
An e-learning model must facilitate the students all the the apprenticeship trough the action will help to attract
available information, in different formats and coming and maintain the students interest.
from different sources, for the students to learn how to
choose the most relevant elements for their learning.
FUTURE WORKS
New Technologies Feature 3
Some prototypes for the aspects mentioned in the previ-
To reach this, it can be used a global ontology, estab- ous point have been developed in our research labora-
lishing a classification in levels and the relationships tory for testing the proposed features (Pedreira, 2004a,
between the available information, managed trough 2004b, 2005a, 2005b). Each of them has reached quite
the Knowledge Management System. good results. These approximations show that the use of
New Technologies on education allows the students to
Pedagogical Characteristic 4 extend or improve their problem-solving methods and
their abilities to transfer knowledge (Friss de Kereki,
Computer e-learning models ought to propose this 2004). After these first approaches, we are working on
apprenticeship by means of works and problems to the joint of the prototypes and their enlargement with
solve, so that the students knowledge grows as they some characteristics that have not still been tested.
go on with the resolution of their works.
It is necessary to establish much different kind of In this article we suggest several features that e-learning
works, at different levels, for each unit the student systems should have in order to improve online learn-
must prepare. The tasks and the available information ing, which can be achieved by using New Technolo-
as well, will be founded in the Institutional Memory gies. In short, we propose a computer model based on
organized under ontology, making easy the access to a Knowledge Management System which, by using
the relevant information at any moment. Carrying out a global ontology, maintains the highest quantity of
the tasks, the learner will build his own knowledge relationships between the available information and
(Nonaka, 1995). its classification at different levels. By means of this
support of knowledge, apprenticeship can be established
Pedagogical Characteristic 5 by means of task proposal, based on computer game
strategies. Using the philosophy of intelligent agents,
An e-learning model should join strategies to get these systems can interact with the students showing
and raise the students motivation and encourage its them the information according to their preferences, in
inquisitiveness, relating the available information with order to motivate them and to stimulate their capacity
its interest, proposing the possibility of explore deeper of raising questions.
534
E-Learning in New Technologies
Nonaka, I.& Takeuchi, H. (1995). The Knowledge Best Practice: A management idea which asserts
Creating Company: How Japanese Companies Cre- that there is a technique, method, process, activity,
ate the Dynamics of Innovation. New York: Oxford incentive or reward that is more effective at delivering
University Press. a particular outcome than any other technique, method
or process.
Pedreira, N., Rabual, J. (2005a) La Gestin del
Conocimiento como Instrumento de e-Learning. E- Computer Game: A video game played on a per-
World, 129-141. Fundacin Alfredo Braas. Santiago sonal computer, rather than on a video game console
de Compostela. or arcade machine.
Pedreira, N., Dorado, J., Rabual, J., Pazos, A. (2005b) Computer Model: A computer program that at-
Knowledge Management as the Future of E-learning. tempts to simulate an abstract model of a particular
Encyclopedia of Distance Learning. Vol 1 - Distance system.
Learning Technologies and Applications, 1189-1194.
E-Learning: Learning that is accomplished over the
Idea Group Reference.
Internet, a computer network, via CD-ROM, interactive
Pedreira, N., Dorado, J., Rabual, J., Pazos, A., Silva, TV, or satellite broadcast
A. (2004a). A Model of Virtual Learning to Learn.
Intelligent Agent: A real time software system that
Proceedings of the IEEE International Conference
interacts with its environment to perform non-repetitive
on Advanced Learning Technologies (ICALT04),
computer-related tasks.
838-839. IEEE Computer Society Washington, DC,
USA Knowledge Management: The collection, orga-
nization, analysis, and sharing of information held by
Pedreira, N., Dorado, J., Rabual, J., Pazos, A., Silva,
workers and groups within an organization.
A. (2004b). Knowledge Management and Interactive
Learning. Lecture Notes in Computer Science. Vol Learn to Learn: In this context, learn to manage
3257, 481-482. Springer-Verlag Heidelberg (select, extract, classify) the great amount of informa-
tion existing in actual society, in order to identify real
Piaget, J.(1999). De la pedagoga. Buenos Aires:
and significant knowledge.
Paids.
New Technologies: In this context, Computer,
Wiig, K. M. (1995). Knowledge Management Methods.
Information and Communication Technologies.
Virtual: Not physical.
535
536
Paul M. Chapman
University of Hull, UK
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Emerging Applications in Immersive Technologies
537
Emerging Applications in Immersive Technologies
The marine biologist may modify parameters re- Figure 3. VR paragliding simulator
lating to an individual krills field of view, foraging
speed, collision avoidance, exhaustion speed, desire
for food etc. The researcher may also modify more
global parameters relating to sea currents, temperature
and quantity and density of algae (food) etc.
Upon execution, the biologist may interact with the
model by changing the 3D viewpoint and modifying the
time-step which controls the run speed of the model.
Krill states are represented using different colours.
For example, a red krill represents a starved unhealthy
krill, green means active and healthy, blue represents
a digestion period where krill activity is at a minimal.
Recent advances in processor and graphics technology
means that these sorts of simulations are possible using
high specification desktop computers. Marine biolo-
gists have been very interested in seeing how making
minor changes to certain variables, such as field of view
for krill, can have major consequences to the flocking
behaviour over time of the entire swarm.
PARAGLIDING SIMULATOR
538
Emerging Applications in Immersive Technologies
Figure 4. Real-time pilots view from the paragliding simulator including other AI controlled pilots
E
540
541
Santiago Planet
Universitat Ramon Llull, Spain
Francesc Alas
Universitat Ramon Llull, Spain
Joan-Claudi Socor
Universitat Ramon Llull, Spain
Elisa Martnez
Universitat Ramon Llull, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Emulating Subjective Criteria in Corpus Validation
observed in Cowie et al. (2001). Some studies have speech by means of this strategy for synthesis purposes
been carried out to observe the influence of emotion in (Cowie et al., 2005), although other authors argue in
speech signals like the work presented by Rodrguez et favor of constructing enormous corpora gathered from
al. (1999), but more recently, due the increasing power recordings of the daily life (Campbell, 2005). For the
of modern computers that allows the analysis of huge design of texts semantically related to different expres-
amount of data in relatively small time lapses, machine sive styles, we have made use of an existing textual
learning techniques have been used to recognise emo- database of advertisements extracted from newspapers
tions automatically by using labelled expressive speech and magazines. Based on a study of the voice in the
corpora. Most of these studies have been centred on audio-visual publicity (Montoya, 1998), five categories
few algorithms and little sets of parameters. of the textual corpus have been chosen and the most
However, recent works have performed more suitable emotion/style has been assigned to them:
exhaustive experiments testing different machine New technologies (neutral-mature), education (joy-
learning techniques and datasets, as the described by elation), cosmetic (style sensual-sweet), automobiles
Oudeyer (2003). All this kind of studies had the goal (aggressive-hard) and trips (sad-melancholic). The
of achieving the best possible recognition rate obtain- recorded database has 4638 sentences and it is 5 hours
ing, in many cases, better results than those obtained 12 minutes long.
in subjective tests ((Oudeyer, 2003), (Planet, Morn From these categories, a set of sentences has been
& Formiga, 2006), (Iriondo, Planet, Socor & Alas, chosen by means of a greedy algorithm (Franois &
2007)). Nevertheless, many differences can be found Boffard, 2002) that has allowed us to select phoneti-
when analyzing the results obtained from objective cally balanced sentences. In addition to looking for a
and subjective classifications and, to our knowledge, phonetic balance, phrases that contain foreign words
there are not studies with the goal of emulating these and abbreviations have been discarded because they
subjective criteria before those carried out by Iriondo, difficult the automatic process of phonetic transcrip-
Planet, Alas, Socor & Martnez (2007). tion and labeling.
The corpus has been segmented in phrases and then
in phonemes by means of a semiautomatic process based
VALIDATION OF AN EXPRESSIVE on a forced alignment with Hidden Markov Models.
SPEECH CORPUS BY MAPPING
SUBJECTIVE CRITERIA Acoustic Analysis
The creation of a speech corpus with authentic emo- Cowie et al. (2001) show how prosodic features of
tional content is one of the most important challenges speech (fundamental frequency (F0), energy, duration
in the study of expressive speech. Once the corpus is of phones, and frequency of pauses) are related to vocal
recorded, a validation process is required to prune those expression of emotion. The analysis of F0 performed
utterances that show distinct emotion to their label. in this work is based on the result of the pitch marks
This article is based on the work exposed by Iriondo, algorithm described by Alas, Monzo & Socor (2006).
Planet, Alas, Socor & Martnez (2007) and presents This system can assign marks over the whole signal,
the production of an expressive speech corpus in Span- interpolating values from the neighbour phonemes in
ish with the goal of being used in a synthesis system, unvoiced segments and silences. Energy is measured
validating it by pruning automatically bad utterances with 20 ms rectangular windows and 50% of overlap,
emulating the criteria of human listeners. computing the mean energy in decibels (dB) every 10
ms. Also, rhythm parameters have been incorporated
The Production of the Corpus using the z-score as a means to analyze the temporal
structure of speech (Schweitzer & Mbius, 2003).
The recording of the corpus has been carried out by a Moreover, for each utterance two parameters relating
female professional speaker. There is a high consensus the number of pauses per time unit and the percentage
in the scientific community for obtaining emotional of silence respect to the total time are considered.
542
Emulating Subjective Criteria in Corpus Validation
Subjective Test with the same corpus that is being considered in this
article. Each utterance is defined by 464 attributes E
A subjective test allows validating the expressive- representing the speech signal characteristics but this
ness of a corpus of acted speech from a users point first dataset is divided into different subsets to reduce
of view. Nevertheless, an extensive evaluation of the its dimensionality. The experiments show almost the
complete corpus would be very tedious due the big same results in the full dataset and in a dataset reduced
amount of elements in it. For this reason, only a 10 to 68 parameters, so the reduced dataset is being used
percent of the corpus utterances have been selected in this work. In this dataset, the prosody of an utterance
for this test. A forced answer test has been designed is represented by the vectors of logarithmic F0, energy
using the TRUE platform (Planet, Iriondo, Martnez, in dB and normalized durations. For each sequence,
& Montero, 2008) with the question: What emotional the first derivative is also calculated. Some statistics
state do you recognize from the voice of the speaker in are obtained from these sequences: mean, variance,
this phrase? The possible answers are the 5 emotional maximum, minimum, range, skewness, kurtosis, quar-
styles of the corpus and one more option Dont know / tiles, and interquartilic range. Thus, 68 parameters by
Another (Dk/A) in order to minimize biasing the results utterance are computed, considering both parameters
due to confusing cases. The addition of this option has related to the pausing previously described.
the risk of allowing some users to use excessively this In the referenced work, twelve machine learning
answer to accelerate the end of the test (Navas, Hernez algorithms are tested considering different datasets. All
& Luengo, 2006). However, this effect has not been the experiments are carried out using Weka software
observed in this experiment. The evaluators were 30 (Witten & Frank, 2005) by means of ten-fold cross-
volunteers with a quite heterogeneous profile. validation. Very high recognition rates are achieved as
The achieved average classification accuracy in the in other previously referenced works: SMO (SVM of
subjective test is 87%. The test also reveals that sad Weka) obtained the best results (~97%) followed by
style (SAD) is the best rated (98.5% in average). The Nave-Bayes (NB) with 94.6% and J48 (Weka Deci-
second and third best rated styles are sensual (SEN) sion Tree based on C4.5) with 93.5%, considering
(87.2%) and neutral (NEU) (86.1%), followed by happy the average of all the results. The conclusion is that,
(HAP) (81.9%) and aggressive (AGR) (81.6%). Ag- in general, the styles of the developed speech corpus
gressive and happy styles are often confused between can be clearly differentiated. Moreover, the results of
them. Moreover, sensual is slightly misclassified as sad the subjective test showed a good authenticity of the
or neutral. The Dk/A option is hardly used, although expressive speech content. However, going one step
it is more present in neutral and sensual than in the further by developing a method to validate each utter-
rest of styles. ance of the corpus following subjective criteria and
To decide if utterances are not optimally performed not only the automatic classification from the space
by the speaker, two simple rules have been created of attributes is considered necessary.
from the subjective test results by empirically adjusting
two thresholds. These rules remove utterances whose Subjective Based Attribute Selection
identification percentage is lower than 50% or with a
Dk/A percentage larger than 12%. There are 33 out of The proposed approach to find the optimum classifier
the 480 utterances of the subjective test that satisfy at schema able to follow the subjective criteria consists
least one rule. on the development of an attribute selection method
guided by the results obtained in the subjective test.
Statistical Analysis, Datasets and Once the best schema is found it will be applied to the
Supervised Classification whole corpus in order to generate a list of candidate
utterances to be pruned. Two methods for attribute
Iriondo, Planet, Socor & Alas (2007) present an selection have been developed in order to determine
experiment of emotion recognition covering differ- the subset of attributes that allows a better mapping of
ent datasets and algorithms. The experiment is done the subjective test results. As previously mentioned,
543
Emulating Subjective Criteria in Corpus Validation
the original set has 68 attributes per utterance, so an the final candidates by a stacking technique (Witten &
exhaustive exploration of subsets is not viable and Frank, 2005). Also, we will evaluate the suitability of
greedy search procedures will be used. On the one the proposed method by performing acoustic modeling
hand, a Forward Selection (FW) process is chosen, of the resulting emotional speech after pruning with
which starts without any attribute and adds one at a respect to the results obtained with the whole corpus.
time and, on the other hand, the Backward Elimina- A lower error on the prosodic estimation could confirm
tion (BW) technique is considered, which starts from the hypothesis that it is advisable to eliminate the bad
the full set and deletes one attribute at a time. At each utterances of the corpus previously.
step, the classifier is tested with the 480 utterances that
are considered in the subjective test, being previously
trained with the 4158 remaining utterances. The wrong CONClUSION
classified cases take part on the evaluation process of
the involved subset of attributes. The novelty of this This article exposes the need of an automatic validation
process is to use a subjective-based measure to evaluate of an expressive speech corpus due to the impossibility
the expected performance of the subset at each iteration. of carrying out a subjective test in a large corpus. Also,
The used measure is the F1-score computed from the an approach to achieve this goal has been presented,
precision and the recall of the wrong classified utter- performing a subjective test with 30 subjects and 10
ances compared with the 33 utterances rejected by the percent of the utterances, approximately. The result of
subjective test. this test has shown that some utterances are perceived
with a dissimilar or poor expressive content with respect
Results to their labeling. The proposed automatic classification
tries to learn from the result of the subjective test in
The process consists of six iterations: one per algorithm order to generalize the solution to the rest of the cor-
(SMO, NB and J48) and attribute selection technique pus, by means of a suitable attribute selection carried
(FW and BW). SMO algorithm obtains practically the out by two different strategies (Forward Selection and
same F1-score (~ 0.50) in both FW and BW attribute Backward Elimination) using the F1-measure computed
selection techniques. The results for both NB are also taking into account the misclassifications resulting from
similar between them (F1 0.43). The main difference the subjective test as a reference.
is for J48 that obtains better F1 with FW (0.45) than
with BW (0.37) process. Moreover, SMO-FW seems to
be the most stable configuration as it achieves almost REFERENCES
the same result (F1 = 0.49) with a broad range of at-
tribute subsets (the results are very similar with a range Alas, F., Monzo, C. & Socor, J. C. (2006). A pitch
of attributes between 18 and 35). Results show that marks filtering algorithm based on restricted dynamic
J48-FW has the best recall (18/33), which implies the programming. InterSpeech2006 - International Confer-
highest number of coincidences; however the precision ence on Spoken Language Processing, pp. 1698-1701.
measure is quite low (18/51), indicating an excessive Pittsburgh.
number of general misclassifications.
Bolt, R. A. (1980). Put-that-there: Voice and gesture
at the graphics interface. Proceedings of the 7th an-
nual conference on Computer graphics and interactive
FUTURE TRENDS
techniques pp. 262-270. Seattle, Washington: ACM
Press.
Future work will consist on applying this automatic
process over the full corpus. For instance, a ten-fold Campbell, N. (2000). Databases of emotional speech.
cross validation would be a good technique to cover Proceedings of the ISCA Workshop, pp. 34-38.
the whole corpus. The misclassified utterances would
be candidate to be pruned. A first approach would Campbell, N. (2005). Developments in corpus-based
consist on running different classifiers and selecting speech synthesis: approaching natural conversational
speech. IEICE - Trans. Inf. Syst. , E88-D (3), 376-
383.
544
Emulating Subjective Criteria in Corpus Validation
Cowie, R., Douglas-Cowie, E. & Cox, C. (2005). de Informao no Espao Ibrico, II, pp. 837-854.
Beyond emotion archetypes: databases for emotion Esposende, Portugal. E
modelling using neural networks. Neural Networks ,
Rodrguez, A., Lzaro, P., Montoya, N., Blanco, J. M.,
18, 371-388.
Bernadas, D., Oliver, J. M., et al. (1999). Modelizacin
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Vot- acstica de la expresin emocional en el espaol. Pro-
sis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion cesamiento del Lenguaje Natural (25), 152-159.
Recognition in Human-Computer Interaction. IEEE
Schrder, M. (2004). Speech and emotion research: An
Signal Processing Magazine , 18 (1), 32-80.
overview of research frameworks and a dimensional
Franois, H. & Boffard, O. (2002). The greedy al- approach to emotional speech synthesis. PhD thesis,
gorithm and its application to the construction af a Saarland University.
continuous speech database. Proceedings of LREC, 5,
Schweitzer, A. & Mbius, B. (2003). On the structure
pp. 1420-1426.
of internal prosodic models. Proceedings of the 15th
Iriondo, I., Planet, S., Alas, F., Socor, J.-C. & Mar- ICPhS, pp. 1301-1304. Barcelona.
tnez, E. (2007). Validation of an expressive speech
Ververidis, D. & Kotropoulos, C. (2006). Emotional
corpus by mapping automatic classification to subjective
speech recognition: Resources, features and methods.
evaluation. Lecture Notes on Computer Science. Com-
Speech Communication , 48 (9), 1162-1181.
putational and Ambient Intelligence, 9th International
Work-Conference on Artificial Neural Networks. 4507. Witten, I. & Frank, E. (2005). Data Mining: Practical
San Sebastin: Springer-Verlag. Machine Learning Tools and Techniques (2nd ed.). San
Francisco: Morgan Kaufmann.
Iriondo, I., Planet, S., Socor, J.-C. & Alas, F. (2007).
Objective and subjective evaluation of an expressive
speech corpus. ISCA Tutorial and Research Workshop
on NOn LInear Speech Processing. Paris. KEy TERMS
Montoya, N. (1998). El papel de la voz en la publicidad
audiovisual dirigida a los nios. Zer. Revista de estudios Backward Elimination Strategy: Greedy attribute
de comunicacin , 161-177. selection method that evaluates the effect of removing
one attribute from a dataset. The attribute that improves
Navas, E., Hernez, I. & Luengo, I. (2006). An objec- the performance of the dataset when it is deleted is
tive and subjective study of the role of semantics and chosen to be removed for the next iteration. Process
prosodic features in building corpora for emotional TTS. begins with the full set of attributes and stops when no
IEEE Transactions on Audio, Speech and Language attribute removing improves performance.
Processing (14).
Decision Trees: Classifier consisting on an arboreal
Oudeyer, P.-Y. (2003). The production and recognition structure. A test sample is classified by evaluating it
of emotions in speech: features and algorithms. Special in each node, starting at the top one and choosing a
issue on Affective Computing. International Journal of specific branch depending on this evaluation. The
Human Computer Interaction , 59 (1-2), 157-183. classification of the sample is the class assigned in
the bottom node.
Planet, S., Iriondo, I., Martnez, E. & Montero, J. A.
(2008). TRUE: an online testing platform for multimedia F1-Measure: The F1-measure is an approach of
evaluation. In Proceedings of the Second International combining the precision and recall measures of a classi-
Workshop on EMOTION: Corpora for Research and fier by means of an evenly harmonic mean of both them.
Affect at the 6th Conference on Language Resources & Its expression is F1-measure = (2precisionrecall) /
Evaluation (LREC 2008). Marrakech, Morocco. (precision+recall).
Planet, S., Morn, J. A. & Formiga, L. (2006). Recono- Forward Selection Strategy: Greedy attribute
cimiento de emociones basado en el anlisis de la selection method that evaluates the effect of adding
seal de voz parametrizada. Sistemas e Tecnologias one attribute to a dataset. The attribute that improves
545
Emulating Subjective Criteria in Corpus Validation
the performance of the dataset is chosen to be added to Recall: Measure that indicates the percentage of
the dataset for the next iteration. Process begins with correctly classified cases of one class with regard to the
no attributes and stops when adding new attributes total number of cases that actually belong to this class.
provides no performance improvement. This measure says if the classifier is ignoring cases that
should be classified as members of one specific class
Greedy Algorithm: Algorithm -usually applied to
when doing a classification.
optimization problems- based on the idea of finding a
global solution for a problem (despite of not being the SVM: Acronym of Support Vector Machines. SVM
optimal one) by choosing locally optimal solutions in are models able to distinguish members of classes
different iterations. whose limits are not lineal. This is possible by a non-
linear transformation of input data mapping it into a
Nave-Bayes: Probabilistic classifier based on
higher-dimensionality space where data can be easily
Bayes rule that assumes that all the pairs parameter-
divided by a maximum margin hyperplane.
value that define a case are independent.
Precision: Measure that indicates the percentage
of correctly classified cases of one class with regard
ENDNOTE
to the number of cases that are classified (correctly or
not) as members of that class. This measure says if the 1
Considering as valid utterances those with the
classifier is assuming as members of one specific class
adequate expressiveness.
cases from other different classes.
546
547
Antonio Martnez
University of Castilla La Mancha, Spain
Roberto Gonzlez
University of Castilla La Mancha, Spain
Manuel Torres
University of Castilla La Mancha, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Energy Minimizing Active Models in Artificial Vision
and stiffness, to be integrated across the ROIs. This energy). Therefore, it can be expressed as a weighted
last property makes them different from the previous combination of energy function.
models and therefore more suitable for some artificial To apply snakes to images, external potentials are
vision applications. designed whose local minima coincides with intensity
The next sections contain a brief introduction to the extrema, edges and other image features of interest.
mathematical foundations of deformable models. For example, the contour will be attracted to intensity
edges in an image by choosing a potential
548
Energy Minimizing Active Models in Artificial Vision
E
2
flows and geometrical distance conditions (Solaiman, E geo ( s, t ) = v s ( s, t ) ds cg ( I [v( s, t )]) 2 ds
0
2
1999), (Caselles, 1997). 0
(3)
GEODESIC ACTIVE CONTOUR MODEl The solution of the particular energy snake model of
equation (3) is given by a geodesic curve in a Riemann
The geodesic active contour model is based on the com- space induced from the image I(x,y), where a geodesic
putation of a minimal distances curve. Thereby, the AC curve is a local minimal distance path between given
evolves following the geometric heat flow equation. Let points. To show this, the classical Maupertuis principle
us consider a particular class of snake models in which from dynamical systems together with the Fermats
the rigidity coefficient is set to zero, that is = 0. Two principle is used.
main reasons motivate this selection: i) this will allow By assuming Einternal = Eext_potential, it is possible to
us to derive the relation between these energy-based reduce the minimization of equation (1) to the fol-
active contours and geometric curve evolution ones, ii) lowing form:
the regularization effect on the geodesic active contours
1
comes from curvature-based curve flows, obtained only min v ( s ) = g ( I ( x, y ) (v( s )) v s ( s ) ds
from the other terms in equation (1). This will allow 0
to achieve smooth curves in the proposed approach
without having the high-order smoothness given by
This is done by using Euler-Lagrange, and defining
0 in energy-based approaches.
an embedding function of the set of curves v(s), (s,
Moreover, this smoothness component in equation
t). That is an implicit representation of v(s), assuming
(1) appears in order to minimize the total squared curva-
that v(s) is a level set of a function (s,t):[0,a] *[0,b]
ture. It is possible to prove that the curvature flow used
R, the following equation for curve/surface evolu-
in the geodesic model decreases the total curvature. The
tion is derived:
use of the curvature-driven curve motions as smoothing
terms was proven to be very efficient. Therefore, curve
Y
smoothing will be obtained also with = 0, having only = g (I )(C + K ) Y
t
the first regularization term. Assuming this, equation
(1) may be reduced to:
where C is a positive real constant and K is the Euclidean
1
A ( s, t ) 2
1 curvature in the direction of the normal.
E geo ( s, t ) = v s ( s, t ) ds c I [v( s, t )]ds To summarize the force (C+K) acts as the inter-
0
2 0
nal force in the classical energy-based snake model,
(2) smoothness being provided by the curvature part of the
flow. The heat-flow K is the regularization curvature
Observe that, by minimizing the functional of equa- flow that replaces the second order smoothness term
tion (2), we are trying to locate the curve at the points in equation (1). The external-image dependent force
of maxima |I| (acting as edge detector) while keeping is given by the stopping function, g(I). Thus this
a certain smoothness in the curve (object boundary). function stops the evolving curve when it arrives at
This is actually the goal in the general formulation the objects boundaries.
(equation 1) as well. The advantage of using a level set is that one can
It is possible to extend equation (2), generalizing perform numerical computations involving curves and
the edge detector part in the following way: let {g: [0, surfaces on a fixed Cartesian grid without having to
[ R+} be a strictly decreasing function, which acts parameterize these objects (this is called the Eulerian
as a function of the image gradient used for the stop- approach). Moreover, level set representation can
ping criterion. Hence we can replace -|I| with g(|I|)2, handle topological shape changes as the surface evolves
obtaining a general energy function: and it is less sensitive to initialisation. Figure 1 shows
549
Energy Minimizing Active Models in Artificial Vision
the results of the level set approach applied to a bio- ciency and convergence. It has non-free parameters,
medical colour image. Figure 1.a) shows the original is dependent of the time step, t, and the spatial one.
image, b) show the initialization, which is composed
by multiple curves. This is also an advantage over the
classical ACM. Figure 1.c) shows the results after 50 DEFORMABlE FINITE ElEMENT
iterations and d) show the final ROI detection. MODEl
Figure 2 shows different results for ROI detection
in biomedical images with the geodesic active contour A powerful approach to computing the local minima
model. Column (a) and (c) show the original image and of a functional such as equation (1) is to construct
(b) and (d) the final segmentation result. a dynamical system that is governed by the energy
The model is efficient in time and accuracy. How- minimization function and allows the system to evolve
ever, there are also some drawbacks in terms of effi- to equilibrium.
Figure 2. Results of ROI detection with the geodesic active contour model
550
Energy Minimizing Active Models in Artificial Vision
In order to compute a minimum energy solution the ith modes shape vector and wi is the corresponding
numerically, it is necessary to discretize the energy (1). frequency of vibration. Each eigenvector i consists of E
Thus, the continuous model, v is represented in discrete the (x, y, z) displacement for each mode that param-
form by a vector U of shape parameters associated eterize the ROI. Usually the basis set is reduced by its
with the local-support basis functions. The discretized Karhunen-Loeve (KL) expansion.
version of the Lagrangian dynamics equation may be The model maybe also extended to contain a set of
written as a set of second-order ordinary differential ROIs. Thus, for a given image in a training set, a vec-
equations for the discrete nodal points displacements tor containing the largest vibration modes describing
U, that is a vector of the (x, y, z) displacements of the different deformable ROI surfaces is created. This
the n nodal points that represents the ROI, thus: random vector may be statistically constrained by re-
taining the most significant variation modes of its KL
MU + CU + KU = F (4) expansion on the image data set. By these means, the
conjunction of ROI surfaces may be deformed according
That is the governing equation of the physical to the variability observed in the training set.
model, which is characterized by its mass matrix M, Figure 3 shows the results of the FEM model ap-
its stiffness matrix K and its dumping matrix C and plied to a brain magnetic resonance image and a lung
the vector on the right hand side is describing the x, y, computed tomography image. Figure 3.a) shows the
and z components of the forces acting on the nodes. original structure inside the initial mesh surface and
Equation (4) is known as the governing equation in the b) show the model obtained after evolve the spherical
FEM method, and may be interpreted as assigning a mesh by means of equation (3). Figure 3. c) shows the
certain mass to each nodal point and a certain material original lung structure and d) the final model when us-
stiffness and damping between nodal points. ing also a spherical mesh as initial surface.
The main drawback of the FEM is the large computa-
tional expense. Another important problem when using
the FEM for vision is that all the degrees of freedom FUTURE TRENDS
are coupled. Therefore, closed-form solutions are not
possible. In order to solve the problem the system may Deformable models are suitable for different applica-
be diagonalized by means of a linear transform into tions and domains such as computer vision, computa-
the vibration modal space of the mesh modelling the tional fluid dynamics, computer graphics and biome-
ROI (Pentland, 1991). Thus, vector U is transformed chanics. However, each model usually is application
by a matrix P derived from the free vibrations modes dependent and further improvements and processing
of the equilibrium equation. Therefore, an eigenvalue of the model should be done to achieve satisfactory
problem can be derived with the basis set of eigensolu- results. Techniques based on region properties, fuzzy
tions composed by {wi, i}. The eigenvector i is called logic theory and combination of different models have
551
Energy Minimizing Active Models in Artificial Vision
already been suggested (Ray, 2001), (Bueno, 2004), Duncan J.S. and Ayache N. (2000). Medical Image
(Yu, 2002), (Bridson, 2005). Analysis: Progress over Two Decades and the Chal-
Moreover, these techniques have been broadly used lenges Ahead. IEEE Trans. on PAMI, vol. 22, pp.
to model real life phenomena, like fire, water, cloth, 85-106.
fracturing materials, etc. Results and models from these
Ivins J., Porrill J. (1994). Active Region Models for
research areas may be of practical significance if they
Segmenting Medical Images. IEEE Trans. on Image
are applied in Artificial Vision.
Processing, pp. 227-231.
Kass M., Witkin A., Terzopoulos D. (1987). Snakes:
CONClUSION Active Contour Models. Int. J. of Computer Vision,
pp. 321331.
Energy minimizing active models has been shown to be a
Malladi R., Sethian J. A. and Vemuri B.C. (1995). Shape
powerful technique to detect, track and model interfaces
Modelling with Front Propagation: A Level Set Ap-
and shapes of ROIs. Since the work on ACM by (Kass,
proach. IEEE Trans. on PAMI, vol. 17, pp. 158-175.
1987) energy minimizing active models have been ap-
plied and improved by using different mathematical and McInerney T. and Terzopoulos D. (1996). Deformable
physical techniques. The most promising models are models in medical image analysis: A survey. Medical
those presented here: the geodesic and the FEM one Image Analysis, vol. 2(1), pp. 91-108.
for ROI tracking and modeling respectively.
The models are suitable for color and 3D images. Osher, S. and Sethian, J. A. (1988). Fronts propagating
The geodesic model may handle topological changes with curvature-dependent speed: Algorithms based on
on the ROI surface and in not sensitive to initialization. Hamilton-Jacobi formulations. Journal of Computation
The FEM model may represent the relative location of Physics, vol. 79, pp. 1249.
different ROI surfaces and it is able to accommodate Pentland A. and Sclaro. S. (1991). Closed-form solu-
their significant variability across different images of tions for physically based shape modelling and recogni-
the ROI. The surfaces of each ROI are parameterized by tion. IEEE Trans. Pattern Anal. Machine Intelligence,
the amplitudes of the vibration modes of a deformable vol. 13, pp. 730-742.
geometrical mesh, which can handle small rotations and
translations. However, as mentioned before there is still Ray N., Havlicek J., Acton S. T., Pattichis M. (2001).
room for further improvements within these models. Active Contour Segmentation Guided by AM-FM
Dominant Component Analysis. IEEE Int. Conference
on Image Processing pp. 78-81.
REFERENCES Solaiman B., Debon R. Pipelier F., Cauvin J.-M., and
Roux C. (1999). Information Fusion: Application to
Bridson, R. Teran, J. Molino, N. and Fedkiw, R. (2005). Data and Model Fusion for Ultrasound Image Seg-
Adaptative Physics Based Tetrahedral Mesh Genera- mentation. IEEE Trans. on BioMedical Engineering,
tion Using Level Sets, Engineering with Computers vol. 46(10), pp. 1171-1175.
21, 2-18.
Terzopoulos D. and Metaxas D. (1991) Dynamic 3D
Bueno G., Martnez A. (2004). Fuzzy-Snake Segmenta- Models with Local and Global Deformations:Deform-
tion of Anatomical Structures Applied to CT Images. able Superquadrics. IEEE Transactions on PAMI, vol.
Lecture Notes in Computer Science, vol. 3212, pp. 13(7), pp. 703 714.
33-42.
Wang H., Ghosh B. (2000) Geometric Active Deform-
Caselles V., Kimmel R., and Sapiro G. (1997). Geo- able Models in Shape Modelling. IEEE Transactions
desic Active Contours, Int. J. Computer Vision, vol. on Image Processing, vol. 9(2), pp. 302-308.
22(1), pp. 61-79.
Yu Z. and Bajaj C. (2002). Image Segmentation Us-
Cootes T.F., Cooper D., Taylor C. and Graham J. (1992). ing Gradient Vector Diffusion and Region Merging.
A Trainable Method of Parametric Shape Description.
Image and Vision Computing, vol. 10, pp.289-294.
552
Energy Minimizing Active Models in Artificial Vision
IEEE Int. Conference on Pattern Recognition, vol. 2, Geodesic Curve: In presence of a metric, geodesics
pp. 20941. are defined to be (locally) the shortest path between E
points on the space. In the presence of an affine connec-
tion, geodesics are defined to be curves whose tangent
vectors remain parallel if they are transported along it.
KEy TERMS Geodesics describe the motion of point particles.
Active Model: It is a numerical technique for track- Karhunen-Loeve: Mathematical techniques
ing interfaces and shapes based on partial differential equivalent to Principal Component Analysis transform
equations. The model is a curve or surface which itera- aiming to reduce multidimensional data sets to lower
tively deforms to fit to an object in an image. dimensions for analysis of their variance.
Conventional Mathematical Modeling: The ap- Modal Analysis: Study of the dynamic properties
plied science of creating computerized models. That is a and response of structures and or fluids under vibrational
theoretical construct that represents a system composed excitation. Typical excitation signals can be classed as
by set of region of interest, with a set of parameters, impulse, broadband, swept sine, chirp, and possibly
both variables together with logical and quantitative others. The resulting response will show one or more
relationships between them, by means of mathemati- resonances, whose characteristic mass, frequency and
cal language to describe the behavior of the system. damping can be estimated from the measurements.
Parameters are determined by finding a curve in 2D or
Tracking: Tracking is the process of locating a
a surface in 3D, each patch of which is defined by a net
moving object (or several ones) in time. An algorithm
of curves in two parametric directions, which matches
analyses the image sequence and outputs the location
a series of data points and possibly other constraints.
of moving targets within the image. There are two
Finite Element Method: Numerical technique used major components of a visual tracking system; Target
for finding approximate solution of partial differential Representation and Localization and Filtering and
equations (PDE) as well as of integral equations. The Data Association. The 1st one is mostly a bottom-up
solution approach is based either on eliminating the dif- process which involve segmentation and matching. The
ferential equation completely (steady state problems), 2nd one is mostly a top-down process, which involves
or rendering the PDE into an equivalent ordinary dif- incorporating prior information about the scene or
ferential equation, which is then solved using standard object, dealing with object dynamics, and evaluation
techniques such as finite differences. of different hypotheses.
553
554
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Ensemble of ANN for Traffic Sign Recognition
555
Ensemble of ANN for Traffic Sign Recognition
two-class binary
patterns ANN
Feature
selection X1
NN1
Decision Module
Feature
selection X2
NN2 System
Multi-
class output
patterns
Feature
selection Xk
NNk
DPM CLM
The proposed architecture adopts a model in which task. In addition, each NN can have its own architecture
the feature subset that describes an example is not (number of hidden nodes, activation function, learning
unique but depends on the task associated to each clas- rate, etc) and since there is no connection between the
sifier. In other words, since the classification problem individual networks, the training can be performed
is divided in k binary sub-problems, k feature selection distributing the work on several processors.
procedures are necessary.
In this work, the feature selection module has been ANNs Output Combination
built using the Weka tool (Witten & Frank, 2005). At
first, several feature selection algorithms from those Once the binary NNs have been trained the global
included in Weka were considered (Sesmero, Alonso- classifier system can be generated. However, since each
Weber, Gutirrez, Ledezma & Sanchis, 2007). After classifier makes its own prediction, a decision module
analyzing both, the feature set size and the experimental that integrates the results from the set of classifiers and
results, combination of Best First (Russell & Norvig, produces a unique final classification is required. Experi-
2003) and Correlation-based Feature Selection (Hall, mentally it is found that, for the proposed classification
1998) was selected as base for the DPM construc- task, the most efficient decision criterion is selecting
tion. the NN with the highest output value. Therefore, the
formula used in the decision module is:
Classification Module
f ( x , f1 , f 2 ,........ f k ) = arg max( f i ), (1)
i =1,..., k
The Classification Module is based on an One Against
All (OAA) model. In this modeling, the final classifier where fi is the output value of the neural network as-
is composed of a collection of binary classifiers where sociated to the i-th class.
each of them is specialized in discriminating a specific
road sign type from the others. Therefore, for a clas- Classification Process
sification problem where k road sign types have to be
separated, this approach results in a system in which When the system receives an unlabeled road sign to be
for each existing class a different NN is used. classified in some of the fixed categories, such sign is
Decomposing the global classifier into a set of inde- sent to each classifiers input module. The DPM selects
pendent NN not only reduces the complexity problem the pixel subset according to its relevant attribute list.
but also permits that the DPM is able to select the most The chosen pixels are used as the input for the asso-
significant attribute set for each binary classification
556
Ensemble of ANN for Traffic Sign Recognition
ciated ANN, which applies its knowledge to make a MLP with 1024 input nodes, 36 hidden nodes and 1
prediction. The individual predictions are sent to the output node. E
decision module that carries out an integration of the Finally, in the OAA approach with Feature Selection
received information and produces a unique final clas- (experiment 3), the problem is solved with an ensemble
sification. This process is shown is Figure 2. containing 10 binary MLP with 36 hidden nodes in
each. The number of input units and, therefore, the
Empirical Evaluation feature space used by each ANN is determined by the
DPM. This number is shown in Table1.
The proposed system has been validated over 5000 In order to build the binary classifiers used in the
examples arranged in ten generic kinds of prohibition last two experiments, a new class encoding schema is
road signs: no pedestrians, no left/right turn ahead, no necessary. In both cases, the class associated with each
stopping and no parking, no overtaking, and 20-30-40- patter is encoded using a bit. Since in both experiments,
50-60 and 100 km speed limits. the i-th binary classifier is trained to distinguish the
In order to evaluate our approach, three classifica- class i from all the other class, the new encoding is
tion methods have been compared: equivalent to select the ci component from the previ-
ous codification.
The direct multi-class approach, In Table 2 we show the estimate classification ac-
The OAA approach with the full feature space curacy for the described experiments when a 10-fold
and, cross validation process is used.
The OAA approach with feature selection. The experimental evaluation reflects that splitting
the classification task into binary subtasks (experiment
In the direct multi-class approach (experiment 1), 2) increases the classification accuracy.
the classification problem has been solved with a MLP On the other hand, the loss of classification accu-
with 1024 (32 32) input nodes, one hidden layer with racy when the feature selection process is performed
50 neurons and one output layer with 10 neurons. In (experiment 3) is not very significant compared with
this approach the class associated to each learning pat- the benefits of the drastic input data reduction.
tern is encoded using a vector C, which has as many
components ci as existing class (10). The component
value ci will be 1 if the sign belongs to class i, and 0 FUTURE TRENDS
in any other case.
In the OAA approach with the full feature space The future work will be mainly focused on extending
(experiment 2) the previous net is split into ten binary the system in order to cope with regulatory, warning,
ANN. In other words, this approach uses 10 binary indication, etc, signs, i.e, with a bigger number of
classes. This task will allow us to investigate and de-
NN1
Decision Module
f1
Unlabeled fk
road sign
NNk
557
Ensemble of ANN for Traffic Sign Recognition
Table 1. Number of selected features; in the first column appears the label of each class
558
Ensemble of ANN for Traffic Sign Recognition
fication. IEEE Transactions on Industrial Electronics, Pomerleau D. & Jochem, T. (1996). Rapidly Adapting
(44) 6, 848-859. Machine Vision for Automated Vehicle Steering. IEEE E
Expert Magazine, (11) 2, 19-27.
Franke, U., Gavrila D., Grxig, S., Lindner, F., Paetzold,
F. & Whler, C. (1998). Autonomous Driving Goes Russell, S.J. & Norvig, P. (2003). Artificial Intelligence:
Downtown. IEEE Intelligent Systems, (13) 6, 40-48. A Modern Approach. Prentice Hall.
Hall, M.A. Correlation-based Feature Selection for Sesmero, M.P., Alonso-Weber, J.M., Gutirrez, G.,
Machine Learning. Ph.D diss. Hamilton, NZ: Waikato Ledezma, A. & Sanchis A. (2007). Testing Feature
University Department of Computer Science (1998) Selection in Traffic Signs. Proceedings of 11th In-
ternational Conference on Computer Aided Systems
Handmann, U., Kalinke, T., Tzomakas, C., Werner,
Theory, 396-398.
M. & Seelen, W. (1999). An Image Processing System
for Driver Assistance. Image and Vision Computing, Soetedjo, A. & Yamada, K. (2005). Traffic Sign Clas-
(18) 5, 367-376. sification Using Ring Partitioned Method. IEICE
Transactions on Fundamentals of Electronics, Com-
Hsien, J.C. & Chen, S.Y. (2003). Road Sign Detection
munications and Computer Sciences archive (E88-A)
and Recognition Using Markov Model. 14th Workshop
9, 2419-2426.
on Object-Orient Technology and Applications, 529-
536. Witten, I.H. & Frank, E. (2005). Data Mining: Practi-
cal Machine Learning Tools and Techniques. Morgan
Hsu S.H. & Huang, C.L. (2001). Road sign detection
Kaufmann.
and recognition using matching pursuit method. Image
and Vision Computing, (19) 3, 119-129 Yang, H.M., Liu, C.L. & Huang, S.M. (2003). Traf-
fic Sign Recognition in disturbing Environments.
Kubat, M., Bratko, I. & Michalski, R. (1998). A Review
Proceedings of the 14th International Symposium on
of Machine Learning Methods. Machine Learning and
Methodologies for Intelligent Systems, 28-31.
Data Mining: Methods and Applications, Michalski,
R.S., Bratko, I., and Kubat, M. (Editors), John Wiley Zhu, J. & Sutton, P. (2003). FPGA Implementations of
and Sons, 3-70. Neural Networks A survey of a Decade of Progress. In
Field-Programmable Logic and Applications. Lecture
Lalond, M. & Li, Y. (1995). Road Sign Recognition.
Notes in Computer Science, (2778), 1062-1066.
Collection scientifique et technique, CRIM-IIT-95/09-
35.
Lu, B.L. & Ito, M. (1999). Task Decomposition and
KEy TERMS
Module Combination Based on Class Relations: A
Modular Neural Network for Pattern Classification.
Artificial Neural Network: Structure composes
IEEE Transaction on Neural Networks, (10) 5, 1244-
of a group of interconnected artificial neurons or units.
1256.
The objective of a NN is to transform the inputs into
Masulli, F. & Valentini, G. (2000). Comparing Decom- meaningful outputs.
position Methods for Classification. 4th International
Correlation-based Feature Selection: Feature
Conference on Knowledge-Based Intelligent Engineer-
Selection(*) algorithm which heuristic measures the
ing Systems & Allied Technologies, 788-791.
correlation between attributes and rewards those feature
Ou, G. & Murphey Y. L. (2007). Multi-Class Pattern subsets in which each feature is highly correlated with
Classification using neural Networks. In Pattern Rec- the class and uncorrelated with other subset features.
ognition. (40) 1, 4-18.
Feature Selection: Process, commonly used
(*)
Paclk, P. Novovicov, J., Pudil, P. & Somol, P. (1999). in machine learning, of identifying and removing as
Road sign classification using the Laplace Kernel Clas- much of the irrelevant and redundant information as
sifier. Proceedings of the 11th Scandinavian Conference possible.
on Image Analysis, (1), 275-282.
559
Ensemble of ANN for Traffic Sign Recognition
Feature Space: n-dimensional space where each Machine Learning: Computer Scientific field
example (pattern) is represented as a point. The dimen- focused on the design, analysis and implementation,
sion of this space is equal to the number of features of algorithms that learn from experience.
used to describe the patterns.
One Against All: Approach to solve multi-class
Field Programmable Array (FPGA): A FPGA classification problems which creates one binary prob-
is an integrated circuit that can be programmed in the lem for each of the K classes. The classifier for class
field after manufacture. i is trained to distinguish examples in class i from all
other examples.
K-Cross-Validation: Method to estimate the accu-
racy of a classifier system. In this approach, the dataset, Weka: Collection of machine learning algorithms
D, is randomly split into K mutually exclusive subsets for solving data mining problems implemented in Java
(folds) of equal size (D1, D2, , Dk) and K classifiers and open sourced under the GPL.
are built. The i-th classifier is trained on the union of
all Dj ji and tested on Di. The estimate accuracy is
the overall number of correct classifications divided
by the number of instances in the dataset.
560
561
Manuel Martn-Merino
Universidad Pontificia de Salamanca, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Ensemble of SVM Classifiers for Spam Filtering
When the emails are codified in high dimensional a singular value decomposition (Golub, 1996):
and noisy spaces, the dissimilarities mentioned above
are affected by the `curse of dimensionality' (Ag- Xk = Vk k 1/2, (2)
garwal, 2001 & Martn-Merino, 2004). Hence, most
of the dissimilarities become almost constant and the where Vk Rnxk is an orthogonal matrix with columns
differences among dissimilarities are lost (Hinneburg, the first k eigen vectors of X XT and k = diag(1 k)
2000 & Martn-Merino, 2005). This problem can be Rkxk is a diagonal matrix with i the i-th eigenvalue.
avoided selecting a small number of features before Several dissimilarities introduced in section 2 generate
the dissimilarities are computed. inner product matrices B non semi-definite positive.
Fortunately, the negative values are small in our
application and therefore can be neglected without
COMBINING DISSIMIlARITy BASED losing relevant information about the data (Pekalska,
ClASSIFIERS 2001).
Once the training patterns have been embedded into
In this section, we explain how the SVM can be ex- a low dimensional Euclidean space, the test pattern can
tended to work directly from a dissimilarity measure. be added to this space via a linear projection (Pekalska,
Next, the ensemble of classifiers based on multiple 2001). Next we comment briefly the derivation.
dissimilarities is presented. Finally we comment briefly Let Xk Rnxk be the object configuration for the
the related work. training patterns in Rk and Xn = [x1,.., xs]T Rsxk the
The SVM is a powerful machine learning technique matrix of the object coordinates sought for the test
that is able to deal with high dimensional and noisy patterns. Let Dn(2) Rsxn be the matrix of the square
data (Vapnik, 1998). In spite of this, the original SVM dissimilarities between the s test patterns and the n
algorithm is not able to work directly from a dissimilar- training patterns that have been already projected. The
ity matrix. To overcome this problem, we follow the matrix Bn Rsxn of inner products among the test and
approach of (Pekalska, 2001). First, the dissimilari- training patterns can be found as:
ties are embedded into an Euclidean space such that
the inter-pattern distances reflect approximately the Bn = - (Dn(2) J - U D(2) J) (3),
original dissimilarity matrix. Next, the test points are
embedded via a linear algebra operation and finally the where J Rnxn is the centering matrix and U = 1/n
SVM is trained and evaluated. We comment briefly the 1T 1 Rsxn. The derivation of equation is detailed in
mathematical details. (Pekalska, 2001). Since the matrix of inner products
Let D Rnxn be the dissimilarity matrix made up of verifies
the object proximities for the training set. A configura-
tion in a low dimensional Euclidean space can be found Bn = Xn XkT, (4)
via a metric multidimensional scaling algorithm (MDS)
(Cox, 2001) such that the original dissimilarities are then, Xn can be found as the least mean-square error
approximately preserved. Let X= [x1,,xn]T Rnxp solution to (4), that is:
be the matrix of the object coordinates for the train-
ing patterns. Define B = X XT as the matrix of inner Xn = Bn Xk (XkT Xk)-1 (5)
products which is related to the dissimilarity matrix
via the following equation: Given that XkT Xk = k and considering that Xk
= Vk k1/2 the coordinates for the test points can be
B = -1/2 J D(2) J (1) obtained as:
562
Ensemble of SVM Classifiers for Spam Filtering
Figure 1. Aggregation of classifiers using a voting First, our method generates the diversity of classi-
strategy. Bold patterns are missclassified by a single fiers by considering different dissimilarities and thus E
hyperplane but not by the combination. will induce a stronger diversity among classifiers.
A second advantage of our method is that it is able
to work directly with a dissimilarity matrix. Finally,
the combination of several dissimilarities avoids the
problem of choosing a particular dissimilarity for the
application we are dealing with. This is a difficult and
time consuming task.
Notice that the algorithm proposed earlier can be
easily applied to other classifiers such as the k-nearest
neighbor algorithm that are based on distances.
EXPERIMENTAL RESULTS
563
Ensemble of SVM Classifiers for Spam Filtering
The dissimilarities have been computed without figure 2 shows that the first twenty eigenvalues preserve
normalizing the variables because this may increase the main structure of the dataset.
the correlation among them. Once the dissimilarities The combination strategy proposed in this paper
have been embedded in a Euclidean space, the variables has been also applied to the k-nearest neighbor clas-
are normalized to unit variance and zero mean. This sifier. An important parameter in this algorithm is the
preprocessing improves the SVM accuracy and the number of neighbors which has been estimated using
speed of convergence. 20 % of the patterns as a validation set.
Regarding the ensemble of classifiers, an important The classifiers have been evaluated from two differ-
issue is the dimensionality in which the dissimilarity ent points of view: on the one hand we have computed
matrix is embedded. To this aim, a metric Multidi- the misclassification errors. But in our application,
mensional Scaling algorithm is first run. The number false positives errors are very expensive and should
of eigenvectors considered is determined by the curve be avoided. Therefore false positive errors are also
induced by the eigenvalues. For the dataset considered, computed.
Table 1. Experimental results for the ensemble of SVM classifiers. Classifiers based solely on a single dissimilar-
ity and Bagging have been taken as reference
Table 2. Experimental results for the ensemble of k-NN classifiers. Classifiers based solely on a single dissimi-
larity and Bagging have been taken as reference
564
Ensemble of SVM Classifiers for Spam Filtering
Finally the errors have been evaluated considering a errors of classifiers based solely on a single distance.
subset of 20 % of the patterns drawn randomly without Besides, the algorithm is able to work directly from a E
replacement from the original dataset. dissimilarity matrix. The algorithm has been applied
Table 1 shows the experimental results for the to the identification of spam messages.
ensemble of classifiers using the SVM. The method The experimental results suggest that the method
proposed has been compared with bagging introduced proposed help to improve both, misclassification er-
in section 3 and with classifiers based on a single dis- rors and false positive errors. We also report that our
similarity. algorithm outperforms classifiers based on a single
From the analysis of table 1 the following conclu- dissimilarity and other combination strategies such
sions can be drawn: as bagging.
As future research trends, we will try to apply other
The combination strategy improves significantly combination strategies that assign different weight to
the Euclidean distance which is usually considered each classifier.
by most SVM algorithms.
The combination strategy with polynomial kernel
reduces significantly the false positive errors of the REFERENCES
best single classifier. The improvement is smaller
for the linear kernel. This can be explained because Aggarwal, C. C., Re-designing distance functions and
the non-linear kernel allow us to build classifiers distance-based applications for high dimensional ap-
with larger variance and therefore the combina- plications. Proc. of the ACM International Conference
tion strategy can achieve a larger improvement on Management of Data and Symposium on Principles
of the false positive errors. We also report that of Database Systems (SIGMOD-PODS), vol. 1, March
for the combination strategy as the C parameter 2001, pp. 13-18.
increases the false positive errors converge to 0
although the false negative errors increase. Androutsopoulos, I., Koutsias, J., Chandrinosand, K.
The combination strategy proposed outperforms V., Spyropoulos, C. D. An experimental comparison
a widely used aggregation method such as Bag- of naive bayesian and keyword-based anti-spam fil-
ging. The improvement is particularly important tering with personal E-mail Messages. 23rd Annual
for the polynomial kernel. Internacional ACM SIGIR Conference on Research
and Development in Information Retrieval, 160-167,
Table 2 shows the experimental results for the en- Athens, Greece, 2000.
semble of k-NNs classifiers. As in the previous case, Bauer, E., Kohavi, R., An empirical comparison of
the combination strategy proposed improves particu- voting classification algorithms: Bagging, boosting,
larly the false positive errors of classifiers based on and variants, Machine Learning, vol.36, pp. 105-139,
a single distance. We also report that Bagging is not 1999.
able to reduce the false positive errors of the Euclidean
distance. Besides, our combination strategy improves Breiman, L. Bagging predictors, Machine Learning,
significantly the Bagging algorithm. Finally, we observe vol. 24, pp. 123-140, 1996.
that the misclassification errors are larger for k-NN Carreras, X., Mrquez, I. Boosting Trees for Anti-
than for the SVM. This can be explained because the Spam Email Filtering. RANLP-01, Forth International
SVM has a higher generalization ability. Conference on Recent Advances in Natural Language
Processing, Tzigov Chark, BG, 2001.
CONClUSIONS AND FUTURE Cox, T., Cox, M. Multidimensional Scaling, 2nd ed.
RESEARCH TRENDS New York: Chapman & Hall/CRC Press, 2001.
Drucker, H., Wu, D. Vapnik, V. N. Support Vector
In this paper, we have proposed an ensemble of clas- Machines for Spam Categorization. IEEE Transactions
sifiers based on a diversity of dissimilarities. Our ap- on Neural Networks, 10(5), 1048-1054, September,
proach aims to reduce particularly the false positive 1999.
565
Ensemble of SVM Classifiers for Spam Filtering
Fawcett, T. ``In vivo spam filtering: A challenge Pekalska, E., Paclick, P., Duin, R. A generalized kernel
problem for KDD. ACM Special Interest Group on approach to dissimilarity-based classification. Journal
Knowledge Discovery and Data Mining (SIGKDD), of Machine Learning Research, vol. 2, pp. 175-211,
5(2), 140-148, December, 2003. 2001.
Golub, G.H., Loan, C.F.V. Matrix Computations, 3rd Provost, F., Fawcett., T. Robust Classification for
ed. Baltimore, Maryland, USA: Johns Hopkins Uni- Imprecise Environments. Machine Learning, 42, 203-
versity press, 1996. 231, 2001.
Hastie, T., Tibshirani, R., Friedman, J. H. The Elements Valentini, G., Dietterich, T. Bias-variance analysis of
of Statistical Learning. Springer Verlag, Berlin, 2001. support vector machines for the development of svm-
UCI Machine Learning Database. Available from: www. based ensemble methods. Journal of Machine Learning
ics.uci.edu/~mlearn/MLRepository.html. Research, vol. 5, pp. 725-775, 2004.
Hershkop, S., Salvatore, J. S. Combining Email Models Vapnik, V. Statistical Learning Theory. New York:
for False Positive Reduction. ACM Special Interest John Wiley & Sons, 1998.
Group on Knowledge Discovery and Data Mining
(SIGKDD), 98-107, August, Chicago, Illinois, 2005.
Hinneburg, C. C. A. A., Keim, D. A. What is the near- KEy TERMS
est neighbor in high dimensional spaces? Proc. of the
International Conference on Database Theory (ICDT). Bootstrap: Resampling technique based on several
Cairo, Egypt: Morgan Kaufmann, September 2000, random samples drawn with replacement.
pp. 506-515.
Dissimilarity: It is a measure of proximity that does
Kittler, J., Hatef, M., Duin, R., Matas, J. On combining not obey the triangle inequality.
classifiers, IEEE Transactions on Neural Networks,
vol. 20, no. 3, pp. 228-239, March 1998. Kernel: Non-linear transformation to a high di-
mensional feature space.
Kolcz, A., Alspector, J. SVM-based filtering of E-mail
Spam with content-specific misclassification costs. K-NN: K-Nearest Neighbor algorithm for classica-
Workshop on Text Mining (TextDM2001), 1-14, San tion purposes.
Jose, California, 2001. MDS: Multidimensional Scaling Algorithm applied
Martn-Merino, M., Muoz, A. A new Sammon al- for the visualization of high dimensional data.
gorithm for sparse data visualization. International SVD: Singular Value Decomposition. Linear
Conference on Pattern Recognition (ICPR), vol.1. algebra operation that is used by many optimization
Cambridge (UK): IEEE Press, August 2004, pp. 477- algorithms.
481.
SVM: Support Vector Machines classifier.
Martn-Merino, M., Muoz, A. Self organizing map
and Sammon mapping for asymmetric proximities. UCE: Unsolicited Commercial Email, also known
Neurocomputing, vol. 63, pp. 171-192, 2005. as Spam.
566
567
David Klimanek
Czech Technical University in Prague, Czech Republic
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Algorithms in Discredibility Detection
568
Evolutionary Algorithms in Discredibility Detection
(1) is evaluated. Also, the average value of the in step 3. A vector of random values of the sensor
residual function values is computed. model parameters is selected and the value of the E
2. New set of parameter vectors after starting the residual function is computed.
iteration process, a new set of parameter vectors 6. New set of parameter vectors the iteration index
is generated (in ET, new population) when the is increased and, using a stochastic strategy, a new
selection operator is employed. Selecting a set of vector of the sensor model parameters is randomly
the new parameter vectors, the following algo- generated (in ET it is spoken about generating
rithm is used: the parameter vector of the sensor new individuals) and the corresponding value of
model that provides residual values lower than the residual variable is obtained (in ET, value of
the average value is replicated into the next subset the cost function).
for generating new parameter candidates in more 7. Boltzmann criterion computation the difference
copies than in the original set, and the individuals between the residual value obtained in step 2 and
with below-average residual values are rejected the residual value from the previous iteration is
(Witczak, Obuchowicz, Korbicz, 2002). evaluated. If the difference is negative, then the
3. Crossover and mutation (ET) next comes the new parameter vector is accepted automatically.
crossover operation over the randomly selected Otherwise, the algorithm may accept the new pa-
pairs of the parameter vector of the sensor model rameter vector based on the Boltzmann criterion.
of the topical set. In the presenting application of The control parameter is weighted with a coef-
the standard genetic algorithm, the selected sensor ficient (in ET, gradual temperature reduction).
model parameters are coded into binary strings If the control parameter is less than or equal to
and the standard one point crossover operator the given final control parameter, then the stop
is used. The mutation operator mimics random criterion is met and the current vector of the sensor
mutations (Fleming, Purshouse, 1995). The newly model parameters is accepted. Otherwise, return-
created parameters are coded into binary strings ing to step 3 repeats the process of optimizing
and one bit of each string is switched with random search.
probability. The value of the residual function
(1) for the current run is evaluated. Also, the Comparison of Usability of the
average value of the residual function values is Algorithms for Discredibility Detection
computed.
4. Stopping-criterion decision if the stopping The comparison of both evolutionary algorithms applied
criterion is not met, a return to step 2 repeats the to controlled variable sensor discredibility detection
process. The stopping criterion is met, e.g. when
the size of the difference between the average of
the residual values from the current run and the
average of the residual values from the previous
run is lower then the given size (Fleming, Purs- Figure 2. A flow chart of the simulated annealing al-
house, 2002). gorithm applied for dis-credibility detection
569
Evolutionary Algorithms in Discredibility Detection
is shown by Figure 5. This comparison represents a sensor discredibility has no conclusive impacts on the
part of results form paper Sulc, Klimanek. (2006). control results and the time needed for the detection
It is evident, that the simulated annealing algorithm does not affect the control process.
needs more evaluation time for one evaluation period
a period for simulated annealing required 80 itera- Testing Model-Based Sensor
tions, while genetic algorithm needed 40 iterations. Discredibility Detection Method
This difference is because genetic algorithm works
with a group of potential solutions, while simulated The model-based control variable sensor discredibility
annealing compares only two potential solutions and detection method using evolutionary algorithms was
accepts better one. tested to find whether the method is able to detect the
No difference was found between the two evolu- control variable sensor properties changes via presented
tionary algorithms used here; their good convergence evolutionary algorithms. The simulation experiments
depends mainly on the algorithm settings. Although are more described in (Klimanek, Sulc 2006a, Kli-
evolutionary algorithms are generally much more manek, Sulc 2006b). Results from the simulated experi-
time consuming than other optimizing procedures, this ments were summarized and they can be graphically
consideration does not matter in control variable sensor demonstrated in the next paragraph.
discredibility detection. This is because control variable
Figure 3. Detection of gradual changes of the level sensor gain via genetic algorithm and the simulated anneal-
ing algorithm
570
Evolutionary Algorithms in Discredibility Detection
Results and Findings from the Tests the controlled variable (in this case the water level) is
different from the correct value. E
Figure 4 depicts a simulation run during which the It is apparent that the algorithm used for sensor model
sensor gain has been gradually decreased from a start- parameter detection (in this case the genetic algorithm)
ing (correct) value. It can be seen that after the sensor is able to find the sensor model gain km, because the
properties has been changed, the measured value of sensor model parameter development corresponds to
the simulated real sensor parameter changing.
Figure 4. Detection of gradual changes of the level sensor gain via genetic algorithm
571
Evolutionary Algorithms in Discredibility Detection
The sensor level discredibility detection results By this method the operator is informed about the
obtained using the simulated annealing algorithm, estimated time remaining before sensor discredibility
were similar. Figure 5 shows results obtained when occurs. If the time is critical, the operator also receives
the model-based method using the simulated annealing timely a warning about the situation.
algorithm was tested. A step change of sensor gain was
simulated and it is obvious that the algorithm was able
to capture the change.
Figure 5. Detection of the step change of the level sensor gain via simulated annealing method
572
Evolutionary Algorithms in Discredibility Detection
573
Evolutionary Algorithms in Discredibility Detection
Venkatasubramanian V., Rengaswamy R. (2003). A real sensor output. In the fault detection terminology,
Review of Process Fault Detection and Diagnosis. the cost function corresponds to the term residuum (or
Quantitative Model-based Methods. Computers & residual function).
Chemical Engineering. Vol. 27, No 3, 293311.
Evolutionary Time: The number assigned to steps
Witczak M. (2003). Identification and Fault Detection in the sequence of iteration performed during a search
of Non-Linear Dynamic Systems. Poland, University for sensor model parameters based on evolutionary
of Zielona Gora Press. development.
Witczak M., Obuchowicz A., Korbicz J. (2002). Ge- Individual: A vector of the sensor model parameters
netic Programming Based Approaches to Identification in a set of possible values (see population).
and Fault Diagnosis of Non-linear Dynamic Systems.
Initial Annealing Temperature: An initial algo-
International Journal of Control, Vol. 75, No. 13,
rithm parameter. Annealing temperature is used as a
10121031.
measure of evolutionary progress during the simulated
annealing algorithm run.
Population: A set of the vectors of the sensor model
KEy TERMS
parameters with which the sensor model has a chance
to approach the minimum of the residual function.
The following terms and definitions introduce a fault
detection engineering interpretation of the terms usual Population Size: The number of the sensor model
in the evolutionary algorithm vocabulary. This should parameter vectors taken into the consideration in
facilitate orientation in the presented engineering population.
problem.
Sensor Discredibility: A stage of the controlled
variable sensor at which the sensor is not completely
Chromosome: A particular sensor model parameter
out of function yet, but its properties have gradually
vector (a term for individuals used in evolutionary
changed to the extend that the data provided by the
terminology).
sensor are so biased that the tolerated inaccuracy of the
Cost Function: A criterion evaluating level of the controlled variable is over-ranged and usually linked
congruence between the sensor model output and the with possible side effects.
574
575
Andrea G. B. Tettamanzi
University of Milan, Italy
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Approaches for ANNs Design
optimum in a neighbourhood of the initial solution. Architecture enables ANNs to adapt their topolo-
EANNs provide a solution to these problems and an gies to different tasks without human intervention.
alternative for controlling network complexity. It also provides an approach to automatic ANN
ANN design can be regarded as an optimization design as both weights and structures can be
problem. Tettamanzi and Tomassini (2001) presented a evolved. In this case a further subdivision can be
discussion about evolutionary systems and their interac- made by defining a pure architecture evolution
tion with neural and fuzzy systems, and Cantu-Paz and and a simultaneous evolution of both architecture
Kamath (2005) also described an empirical comparison and weights.
of EAs and ANNs for classification problems.
Other approaches consider the evolution of transfer
functions of an ANN and input feature selection, but
EVOlUTIONARy ARTIFICIAl NEURAl they are usually applied in conjunction with one of the
NETWORKS three methods above in order to obtain better results.
The use of evolutionary learning for ANNs design
There are several approaches to evolve ANNs, that usu- is no more than two decades old. However, substan-
ally fall into two broad categories: problem-independent tial work has been made in these years, whose main
and problem-dependent representation of EAs. The for- outcomes are presented below.
mer are based on a general representation, independent
of the type and structure of the ANN sought for, and Weight Optimization
require the definition of an encoding scheme suitable
for Genetic Algorithms (GAs). They can include map- Evolution of weights may be regarded as an alterna-
ping between ANNs and binary representation, taking tive training algorithm. The primary motivation for
care of decoders or repair algorithms, but this task is using evolutionary techniques instead of traditional
not usually easy. gradient-descent techniques such as BP, as reported
The latter are EAs where chromosome representa- by Rumelhart et al. (1986), lies in avoiding trapping
tion is a specific data structure that naturally maps to an in local minima and the requirement that the activa-
ANN, to which appropriate genetic operators apply. tion function be differentiable. For this reason, rather
EAs are used to perform various tasks, such as con- than adapting weights based on local improvement
nection weight training, architecture design, learning only, EAs evolve weights based on the fitness of the
rule adaptation, input feature selection, connection whole network.
weight initialization, rule extraction from ANNs, etc. Some approaches use GAs with real encodings
Three of them are considered as the most popular at for biases and weights, like in the work presented by
the following levels: Montana and Davis (1989); others used binary weights
encoding at first, and then implemented a modified
Connection weights concentrates just on weights version with real encodings as Whitley et al. (1990).
optimization, assuming that the architecture of Mordaunt and Zalzala (2002) implemented a real num-
the network is given. The evolution of weights ber representation to evolve weights, analyzing evolu-
introduces an adaptive and global approach to tion with mutation and a multi-point crossover, while
training, especially in the reinforcement learning Seiffert (2001) described an approach to completely
and recurrent network learning paradigm, where substitute a traditional gradient descent algorithm by
gradient-based training algorithms often experi- a GA in the training phase.
ence great difficulties. Often, during the application of GAs, some prob-
Learning rules can be regarded as a process lems, e.g., premature convergence and stagnation of
of learning how to learn in ANNs where the solution can occur as reported by Goldberg (1992). In
adaptation of learning rules is achieved through order to solve this problem, an improved algorithm
evolution. It can also be regarded as an adaptive was proposed by Yang et al. (2002), where a genetic
process of automatic discovery of novel learning algorithm, based on evolutionary stable strategy, was
rules. implemented to keep the balance between population
diversity and convergence speed during evolution.
576
Evolutionary Approaches for ANNs Design
Recently, a new GA was proposed by Pai (2004), criteria, e.g., minimum error, learning speed, lower
where a genetic inheritance operator was implemented complexity, etc., about architectures, the performance E
to determine the weights of EANN, without considering level of all these forms a surface in the design space.
mutation operators, but only two-point crossover for Several approaches have been carried out in this direc-
reproduction, applying it to decimal chromosomes. tion. A neuro-evolutionary approach was presented by
Miikkulainen and Stanley (2002), using augmenting
Learning Rule Optimization topologies. It has been designed specifically to outper-
form the solutions that employ a principled method of
In supervised learning algorithms, standard BP is the crossover of different topologies, to protect structural
most popular method for training multilayer networks. innovation using speciation, and to incrementally grow
The design of training algorithms, in particular the from minimal structure.
learning rules used to adjust connection weights, Another work carried out by Wang et al. (2002),
depends on the type of ANN architecture considered. considered the definition of an optimal network that
Several standard learning rules have been proposed, was based on the combination of constructing and
but designing an optimal learning rule becomes very pruning by GAs, while, more recently, Bevilacqua et
difficult when there is little prior knowledge about the al. (2006) presented a multi-objective GA approach to
network topology, producing a very complex relation- optimize the search for the optimal topology, based on
ship between evolution and learning. The evolutionary Schema Theory.
approach becomes important in modelling the creative One of the most important forms of deception in
process since newly evolved learning rules can deal ANNs structure optimization arises from the many-
with a complex and dynamic environment. to-one and from one-to-many mapping from geno-
The first kind of optimization considers the adjust- types in the representation space to phenotypes in the
ment of learning parameters and can be seen as the evaluation space. The existence of networks function-
first attempt to evolve learning rules. They comprise ally equivalent and with different encodings makes
BP parameters, like the learning rate and momentum, evolution inefficient. This problem is termed as the
and genetic parameters, like mutation and crossover competing convention problem. Other important issues
probabilities. Some works have been carried out by involve representation and the definition of the EA. In
Merelo et al. (2002), that presented several solutions the encoding phase, an important aspect is to decide
for the optimal learning parameters of multilayer com- how much information about architecture should be
petitive-learning neural networks. encoded into the genotype. Then, the performance of
Considering learning-rule optimization, one of the ANNs strongly depends on their topology, consider-
first studies was conducted by Chalmers (1990). He also ing size and structure, and, consequently, its defini-
noticed that discovering complex learning rules using tion characterizes networks features like its learning
GAs is not easy, due to the highly complex genetic process speed, learning precision, noise tolerance, and
coding used, making the search space large and hard to generalization capability.
explore, while GAs used a simpler coding which allows
known learning rules as a possibility, making the search Transfer Function Optimization
very biased. In order to overcome these limitations,
Chalmers suggested to apply GP, a particular kind of Transfer function perturbations can begin with a fixed
GA. Several studies have been carried out in this direc- function, as linear, sigmoidal or gaussian, and allow
tion and some of them are described, along with a new the GA to adapt to a useful combination according to
approach presented by Poli and Radi (2002). the situation. Some work has been carried out by Yao
and Liu (1996) in order to apply a transfer function
Architecture Optimization adaptation over generations, and by Figueira and Poli
(1999), with a GP algorithm evolving functions.
The design of an optimal architecture can be formulated To improve solutions, often, this kind of evolution
as a search problem in the architecture space, where is carried out together with the other kinds of ANNs
each point represents an ANN topology. As pointed out optimizations, here described.
by Yao (1999), given some performance (optimality)
577
Evolutionary Approaches for ANNs Design
One of the most important factors for training neural There are still several open issues in EANNs re-
networks is the availability and the integrity of data. search.
They should represent all possible states of the problem Regarding connection weights, a critical aspect is
considered, and they should have enough patterns for that the structure has to be predetermined, giving some
building also the test and validation set. problems when such a topology is difficult to define
The consistency of all data has to be guaranteed, and in the first place.
the training data must be representative of the problem, Also in learning rule evolution, the design of train-
in order to avoid overfitting. ing algorithms, in particular the learning rules, depends
Input data selection can be regarded as a search on the type of the network architecture. Therefore, the
problem in the discrete space of the subsets of data design of such rules can become very difficult when
attributes. The solution requires the removal of un- there is little prior knowledge about the network topol-
necessary, conflicting, overlapping and redundant ogy, giving a complex relationship between evolution
features in order to maximize the classifier accuracy, and learning.
compactness, and learning capabilities. The architecture evolution has an important impact
Input data reduction has been approached by Reeves on the neural network evolution, and the evolution of
and Taylor (1998), who applied genetic algorithms to pure architecture presents difficulties in evaluating
select training sets for a kind of ANNs, and by Castellani fitness accurately.
(2006), embedding the search for the optimal feature The simultaneous evolution of architecture and
set into the training phase. weights is one of the most interesting evolutionary
ANNs techniques, and nowadays it concerns useful
Joint Evolution of Architecture and solutions for ANN design. Different works are carried
out in these directions and are still open issues. Some
Weights
of them concern about the application of cooperative or
competitive co-evolutionary approaches, some others
The drawbacks related to individual architecture and
regarding the design of NN ensembles.
weights evolutionary techniques can be overcome with
approaches that consider their conjunction.
The advantage of combining these two basic ele-
ments of an ANN is that a completely functioning CONClUSION
network can be evolved without any intervention by
an expert. This work present a survey of the state of the art of
Several methods that evolve both the network evolutionary systems investigated in these decades
structure and the connection weights were proposed and presented in the literature. In particular, this work
in the literature. focuses on the application of evolutionary algorithms
Castillo et al. (1999) presented a method to search to neural network design optimization.
for the optimal set of weights, the optimal topology Several approaches for NN evolution are presented,
and learning parameters using a GA for the network together with some related works, and for each method
evolution and BP for network training, while Yao et the most important features are presented together with
al. (1997, 2003) implemented, respectively, an evolu- their main advantages and shortcomings.
tionary system for evolving feedforward ANNs based
on evolutionary programming, and, more recently, a
novel constructive algorithm for training cooperative REFERENCES
NN ensembles. Azzini and Tettamanzi (2006) presented
a neuro-genetic approach for the joint optimization of Abraham, A. (2004). Meta learning evolutionary arti-
network structures and weights, taking advantage of ficial neural networks. Neurocomputing, 56, 1-38.
BP as a specialized decoder, and Pedrajas et al. (2003) Azzini, A. & Tettamanzi, A.G.B. (2006). A neural evo-
proposed a cooperative co-evolutionary method for lutionary approach to financial modeling. In Maarten
ANN design.
578
Evolutionary Approaches for ANNs Design
Keijzer et al. editor, Proceedings of the Genetic and gs of the 11th International Conference on Artificial
Evolutionary Computation Conference, GECCO 2006. Intelligence, 762-767. E
ACM Press, New York, NY.
Mordaunt, P. & Zalzala, A.M.S. (2002). Towards an
Bevilacqua, V. & Mastronardi, G. & Menolascina, F. evolutionary neural network for gait analysis. Proce-
& Pannarale, P. & Pedone, A. (2006). A Novel Multi- edings of the Congress on Evolutionary Computation,
Objective Algorithm Approach to Artificial Neural 2, 1238-1243.
Network Topology Optimisation: The Breast Cancer
Pai, G.A. (2004). A fast converging evolutionary neural
Classification Problem. Proceedings of the International
network for the prediction of uplift capacity of suction
Joint Conference on Neural Networks, IJCNN06,
caissons. Proceedings of the IEEE Conference on Cy-
1958-1965, Vancourver, Canada, IEEE Press.
bernetics and Intelligent Systems, 654-659.
Cantu-Paz, E & Kamath, C. (2005). An empirical com-
Pedrajas, N.G. & Martinez, C.H. & Prez, J.M. (2003).
parison of combinations of evolutionary algorithms
Covnet: A cooperative coevolutionary model for evol-
and neural networks for classification problems. IEEE
ving artificial neural networks. IEEE Transactions on
Transactions on Systems, Management and Cybernetic,
Neural Networks, 14(3), 575-596.
part B, 35 (5), 915-927.
Poli, R. & Radi, A. (2002). Discovering efficient le-
Castellani, M. (2006). ANNE - A New Algorithm for
arning rules for feedforward neural networks using
Evolution of Artificial Neural Network Classifier Syste-
genetic programming. Technical Report, University
ms. Proceedings of the IEEE Congress on Evolutionary
of Essex.
Computation - CEC 2006, 3294-3301, Vancouver,
Canada. Reeves, C.R. & Taylor, S.J. (1998). Selection of trai-
ning data for neural networks by a genetic algorithm.
Castillo, P.A. & Rivas, V. & Merelo, J.J. & Gonzalez,
Parallel Problem Solving from Nature, Lecture Notes
J. & Prieto, A. & Romero, G. (1999). G-PropIII: global
in Computer Science, 1498, 633-642.
optimization of multilayer perceptrons using an evo-
lutionary algorithm. Proceedings of the Genetic and Rumelhart, D.E. & McClelland, J.L. & Williams, R.J.
Evolutionary Computation Conference, 1, 942.6 (1986). Learning representations by back-propagating
errors. Nature, 323, 533-536.
Chalmers, D.J. (1990). The evolution of learning: An
experiment in genetic connectionism. Proceedings of Seiffert, U. (2001). Multiple layer perceptron trai-
the Connectionist Models Summer School, 81-90. ning using genetic algorithms. Proceedings of the
European Symposium on Artificial Neural Networks,
Figueira, J.C. & Poli, R. (1999). Evolution of neural
ESANN2001, 159-164.
networks using weight mapping. Proceedings of the
Genetic and Evolutionary Computation Conference, Tettamanzi, A.G.B. & Tomassini, M. (2001). Soft
2, 1170-1177. computing: integrating evolutionary, neural, and fuzzy
systems. Springer-Verlag.
Goldberg, D.E. (1992). Genetic Algorithms in Search
Optimization & Machine Learning. Addison-Wesley. Wang, W. & Lu, W. & Leung, A.Y.T. & Lo, S. & Xu,
Z. & Wang, X. (2002). Optimal feed-forward neural
Merelo, J.J & Castillo, P.A. & Prieto, A. & Rojas, I. &
networks based on the combination of constructing
Romero, G. (2002). Statistical analysis of the parame-
and pruning by genetic algorithms. Proceedings of the
ters of a neuro-genetic algorithm. IEEE Transactions
International Joint Conference on Neural Networks,
on Neural Networks, 13(6), 1374 - 1394.
636-641.
Miikkulainen, R. & Stanley, K. (2002). Evolving neural
Whitley, D. & Starkweather, T. & Bogart, C. (1990).
networks through augmenting topologies. Evolutionary
Genetic algorithms and neural networks: Optimizing
Computation, 10(2), 99-127.
connections and connectivity. Parallel computing, 14,
Montana, D. & Davis, L. (1989). Training feedforward 347-361.
neural networks using genetic algorithms. Proceedin-
579
Evolutionary Approaches for ANNs Design
Yang, B. & Su, X.H. & Wang, Y.D. (2002). Bp neural Error Backpropagation: Essentially a search
network optimization based on an improved genetic procedure that attempts to minimize a whole network
algorithm. Proceedings of the IEEE First International error function such as the sum of the squared error of
Conference on Machine Learning and Cybernetics, the network output over a set of training input/output
64-68. pairs.
Yao, X. & Liu, Y. (1996). Evolving artificial neural Evolutionary Algorithms: Algorithms based on
networks through evolutionary programming. Evolu- models that consider artificial or simulated genetic
tionary Programming V: Proceedings of the Conference evolution of individuals in a defined environment. They
on Evolutionary Programming, MIT Press, 257-266. are a broad class of stochastic optimization algorithms,
inspired by biology and in particular by those biological
Yao, X. & Liu, Y. (1997). A new evolutionary system for
processes that allow populations of organisms to adapt
evolving artificial neural networks. IEEE Transactions
to their surrounding environment: genetic inheritance
on Neural Networks, 8(3), 694-713.
and survival of the fittest.
Yao, X. (1999). Evolving artificial neural networks.
Evolutionary Artificial Neural Networks: Special
Proceedings of the IEEE, 87, 1423-1447.
class of artificial neural networks in which evolution
Yao, X. & Murase, K. & Islam, M.M. (2003). A cons- is another fundamental form of adaptation in addi-
tructive algorithm for training cooperative neural tion to learning. They are represented by biologically
network ensembles. IEEE Transactions on Neural inspired computational models that use evolutionary
Networks, 14(4), 820-834. algorithms in conjunction with neural networks to
solve problems.
Evolutionary Computation: In computer science it
KEy TERMS is a subfield of artificial intelligence (more particularly
computational intelligence) involving combinatorial
Adaptive System: System able to adapt its behavior optimization problems. Evolutionary computation
according to changes in its environment or in parts of defines the quite young field of the study of computa-
the system itself. tional systems based on the idea of natural evolution
and adaptation.
Artificial Neural Networks: Models inspired by
the working of the brain, considered as a combination Multi-Layer Perceptrons (MLPs): Class of neural
of neurons and synaptic connections, which are capable networks that consists of a feed-forward fully connected
of transmitting data through multiple layers, giving a network with an input layer of neurons, one or more
system able to solve different problems like pattern hidden layers and an output layer. The output value is
recognition and classification. obtained through the sequence of activation functions
defined in each hidden layer. Usually, in this kind of
Backpropagation Algorithm: A supervised learn- network, the supervised learning process is the back-
ing technique used for training ANNs. It is based on a propagation algorithm.
set of recursive formulas for computing the gradient
vector of the error function, that can be used in a first- Search Space: Set of all possible situations of the
order method like gradient descent. problem that we want to solve could ever be in.
580
581
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Approaches to Variable Selection
time for processing it will be required); understanding Therefore, different basic steps will be required to
improvement (if two models resolve the same task, implement a GA: codification of the problem, which
but one of them uses less information this would be results in a population structure, initialisation of the first
more thoroughly interpreted. Therefore, the simpler population, defining a fitness function to evaluate how
the model, the easier the knowledge extraction and the good is each individual to solve the problem and, finally,
easier the understanding, the easier the validation). a cyclic procedure of reproductions and replacements
In addition, it was proved the analysis of IR data (Michalewicz, 1999; Goldberg, 2002).
involved a highly multimodal problem, as many com-
binations of variables each (obtained using a different
method) led to similar results when the samples were DATA DESCRIPTION
classified.
In the present practical application, the spectral range
measured by IR spectrometry (wavenumbers from 1250
GENETIC AlGORITHMS cm-1 to 900 cm-1) provided 176 absorbances (which
measured light absorption)(Gmez-Carracedo, Gestal,
A GA (Holland, 1975)(Goldberg, 1989) is a recurrent Dorado & Andrade, 2007).
and stochastic process that operates with a group of The main goal of the application consisted on the
potential solutions to a problem, known as genetic prediction of the amount of pure juice on a sample
population, based on one of the Darwins principles: using absorbance values returned for the IR measure-
the survival of the best individuals (Darwin, 1859). ments. But the amount of data obtained for a sample
Briefly a GA works as follows. Initially, a population by IR spectrometry is huge, so the direct application
of solutions is generated randomly and the solutions of mathematical and/or computational methods (al-
evolve continuously after consecutive stages of cross- though possible) requires a lot of time. Accordingly, it
overs and mutations. Every individual at the population is important to establish whether all raw data provided
has an associated value that quantifies associated its relevant information for sample differentiation. Hence,
usefulness (adjustment or fitness), in accordance to its the problem was an appropriate case for the use of
adequacy to solve the problem. This value has to be variable selection techniques.
obtained for each potential solution and constitutes the Previous to variable selection construction of data
quantitative information the evolutionary algorithm sets for both model development and validation was
will use to guide the search. The process will continue required. Thus, samples with different amounts of pure
until a predetermined stopping criterion is reached. This apple juice were prepared at the laboratory. Besides,
might be a particular threshold error for the solution or 23 apple juice-based beverages sold in Spain were
a certain number of generations (populations). analysed (the declared amount of juice printed out on
Training 19 17 16 22 21 20 19 134
Validation 1 1 13 6 6 6 6 39
Commercial 0 0 0 0 1 1 0 2
Training 20 19 16 14 7 86
Validation 6 18 13 1 6 44
Commercial 0 2 0 0 19 21
582
Evolutionary Approaches to Variable Selection
their labels was used as input data). The samples were set of samples with low (2-20%) percentages of juice
distributed in 2 ranges: samples containing less than was far more complex and difficult to classify than the E
20% of pure juice and samples with more than 20% samples with higher percentages (25-100%). Indeed,
of pure apple juice (Table 1). IR spectra were obtained the number of errors was usually higher for the 2-20%
for all the samples. range both in calibration and validation. Classification
This data was split in two datasets to extract the rules of the commercial samples agreed quite fine with the
(ANN training) and to validate them. The commercial percentages of juice declared into the labels, but for
samples were used to further check the performance of a particular sample. When that sample was studied in
the model. It is worth noting that whenever a predicted detail it was observed that its spectrum was slightly
value does not match that given on the labels of the different from the usual ones. This suggested that the
commercial products it might be owing to either a true juice contained an unusually high amount of added
wrong performance of the model (classification error) or sugar(s).
an inaccurate labelling of the commercial beverage.
ANN Configuration
Topology: 176 / 8 / 5 / 5 learning rate: 0.0005 stop criterion: mse=1 or epochs=500.000
583
Evolutionary Approaches to Variable Selection
apple juice-based beverages properly (according to the not and, therefore, if the variable should be considered
amount of pure apple juice they contain). for classification.
Any time a subset of IR variables is proposed, the The evaluation function has to guide the pruning
associated absorbance values are used as input patterns process on getting individuals with a low number of
to the ANN. So, ANN will consider as much input variables. To achieve this, the function should help
processing elements (PEs) as variables. The output those individuals that, besides classifying accurately,
layer has one PE per category (6 for the lower range make use of fewer variables. In this particular case a
and 5 for the high ones). After several previous trials factor proportional to the percentage of active genes
considering several hidden layers (from 1 to 4), each was defined to multiply the MSE obtained by the
with different PEs (from 1 to 50), a compromise was ANN, so that individuals with less active genes and
established between the final fitness level reached by with a similar classification performance will have
the ANN and the time required to its training. This a higher fitness and consequently a higher probability
compromise was essential because although better of survival.
results were obtained with more hidden layers the time Table 3 shows the results of several runs of Pruned
required for training was much higher as well. Then, Search approach. It is worth noting that each solution
it was decided not to extensively train the net but to was extracted from a different execution and that, within
get a good approximation to its real performance and the same execution, only one solution provided valid
elucidate whether the input variables are really suitable classification rates. As it can be noted, classification
to accurately classify the samples. results were very similar although the variables used
The goal is to determine which solutions, among to perform the classifications were different.
those provided by the GA, represent good starting These ANN models were obtained are slightly worse
points to perform more exhaustive training. Therefore, than those obtained using 176 wavenumbers, although
it would be enough to extend the ANN learning up to the generalization capabilities of the best ANNs model
the point where it starts to converge. For this particular were quite satisfactory as there was only one error when
problem, convergence started after 800 cycles; to war- the commercial beverages were classified.
rant it, 1000 iterations were fixed.
Next, each of the most promising solutions are used Fixed Search
as inputs to an external ANN intended to provide the
final classification results. This approach uses a real codification in the chromo-
some of the genetic individuals. The genetic popula-
Pruned Search tion consists of individuals with n genes, where n
is the amount of wavenumbers that are considered
This approach starts by considering all variables where sufficient for the classification according to some
from groups of variables are gradually discarded. external criterion. Each gene represents one of the
The GA will steadily reduce the amount of variables 176 wavenumbers considered in the IR spectra. The
that characterise the objects, until an optimal subset GA will find out the n-variables subsets yielding the
that allows for an overall satisfactory classification is best classification models. The number of variables is
obtained. This is used to classify the samples, and the predefined in the genotype.
results are used to determine how relevant the discarded As the final number of variables has to be decided
wavenumbers were for the classification. This process in advance, some external criterion is needed. In order
can be continued as long as the classification results are to simplify comparison of the results, the final number
equal, or at least similar, to those obtained using the of variables was defined by the minimum number of
overall set of variables. Therefore, the GA determines principal components which can describe our data set,
how many and which wavenumbers will be selected they were two.
for the classification. Since the amount of wavenumbers remains constant
In this approach, each individual in the genetic in the course of the selection process, this approach
population is described by n genes, each representing defines the fitness of each genetic individual as the
one variable. With a binary encoding each gene might mean square error reached by the ANN at the end of
be either 0 or 1, indicating whether the gene is active or the training process.
584
Evolutionary Approaches to Variable Selection
High Concentrations 86 43 21
ANN Configuration
Topology: 2 / 10 / 60 / 5 learning rate: 0.0001 stop criterion: mse=1 or epochs=500.000
Table 4 shows the results of several runs of Pruned after analysing different similar solutions of the same
Search approach. Again, note that each run provides problem.
only a solution. For example, after analysing the different solutions
A problem was that the genetic individuals contains provided by one execution of the classification task with
only two genes. Hence, any crossover operator will use Hybrid Two-Population Genetic Algorithm (Rabual,
the unique available crossing point (between the genes) Dorado, Gestal & Pedreira, 2005) it was obtained three
and only half of the information from each parent will valid (and similar) models (Table 5). Furthermore the
be transmitted to its offspring. This converts the Fixed results were clearly superior to those obtained with the
Search approach into a random search when only previous alternatives. But, the most important, it was
two genes constituted the chromosome. observed that the solutions concentrated along specific
spectral areas (around the 88 wavenumber). It would not
Hybrid Two-Population Genetic be possible with the previous approaches. This approach
Algorithm will be studied deeply on a dedicated chapter (Finding
multiple solutions with GA in multimodal problems)
The main disadvantage of non-multimodal approaches and no more details will be presented here.
is that they discard local optimal solutions because
a final or global solution is preferred. But there are
situations where the final model has to be extracted
585
Evolutionary Approaches to Variable Selection
High Concentrations 86 43 21
ANN Configuration
Topology: 2 / 10 / 60 / 5 learning rate: 0.0001 stop criterion: mse=1 or epochs=500.000
A next natural stage should be to consider multimodal Several conclusions can be drawn for variable selec-
approaches. Evolutionary Computation provides useful tion tasks:
tools like fitness sharing, crowding which should First, satisfactory classification results can be ob-
be compared to the hybrid two populations approach. tained from reduced sets of variables extracted using
Research is needed to implement criteria to allow the quite different techniques, all based on the combination
GA to stop when a satisfactory low number of variables of GA and ANN.
is found. Best results were obtained using a multimodal GA
Another suitable option should be include more (the hybrid two population approach), as it was expected,
scientific information within the system to guide the based on its ability to maintain the genetic individuals
search. For example, a final user would provide a de- homogeneously distributed over the search space. Such
scription of the ideal solution in terms of efficiency, data diversity not only induces the appearance of optimal
acquisition cost, simplicity, etc. All these parameters solutions, but also avoids the search to stop on a local
may be used as targets in a multiobjective Genetic minimum. This option does not provide only a solution
Algorithm intended to provide the best variable subset but a group of them, with similar fitness. This allows
complying with all the requirements. scientists to select a solution with a sound chemical
background and extract additional information.
586
Evolutionary Approaches to Variable Selection
High Concentrations 86 43 21
ANN Configuration
Topology: 2 / 10 / 60 / 5 learning rate: 0.001 stop criterion: mse=2 or epochs=500.000
587
Evolutionary Approaches to Variable Selection
Michalewicz, Z. (1999). Genetic Algorithms + Data Artificial Neural Network: Interconnected group
Structures = Evolution Programs. Springer-Verlag. of artificial neurons that uses a mathematical or com-
putational model for information processing. They are
Rabual, J.R., Dorado, J., Gestal, M., & Pedreira, N.
based on the function of biologic neurons. It involves
(2005). Diversity and Multimodal Search with a Hybrid
a group of simple processing elements (the neurons)
Two-Population GA: an Application to ANN Develop-
which can exhibit complex global behaviour, as result
ment. Lecture Notes in Computer Science. 382-390.
of the connections between the neurons.
Rodriguez-Saona, L.E, Fry, F.S., McLaughlin, M.A.,
Evolutionary Technique: Technique which pro-
& Calvey, E.M. (2001). Rapid analysis of sugars in
vides solutions for a problem guided by biological
fruit juices by FT-NIR spectroscopy. Carbohydrate
principles such as the survival of the fittest. These
Research. (336) 63-74.
techniques start from a randomly generated population
Saavedra, L., Garca, A., Barbas, C. (2000). Develop- which evolves by means of crossover and mutation
ment and validation of a capillary electrophoresis me- operations to provide the final solution.
thod for direct measurement of isocitric, citric, tartaric
Knowledge Extraction: Explicitation of the internal
and malic acids as adulteration markers in orange juice.
knowledge of a system or set of data in a way that is
Journal of Chromatography. (881)1-2, 395-401.
easily interpretable by the user.
Stber, P., Martin, G.G., & Peppard, T.L. (1998).
Search Space: Set of all possible situations of the
Quantitation of the undeclared addition of industrially
problem that we want to solve could ever be in. Com-
produced sugar syrups to fruit by juices capillary gas
bination of all the possible values for all the variables
chromatography. Deutsche Lebensmittel-Rundschau.
related with the problem.
(94) 309-316.
Spectroscopy (Spectrometry): Production, mea-
Yuan, J.P. & Chen, F. (1999). Simultaneous separation
surement and analysis of electromagnetic spectra
and determination of sugars, ascorbic acid and furanic
produced as a result of the interactions between elec-
compounds by HPLC-dual detection. Food Chemistry.
tromagnetic radiation and matter, such as emission or
(64) 423427.
absorption of energy.
Spectrum: Intensity of a electromagnetic radiation
across a range of wavelengths. It represents the intensity
KEy TERMS of emitted or transmitted energy versus the energy of
the received light.
Absorbance: Function (usually logarithmic) of the
percentage of transmission of a wavelength of light Variable Selection: Selection of a subset of relevant
through a liquid. variables (features) which can describe a set of data.
588
589
Sudip Misra
Yale University, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Computing Approach for Ad-Hoc Networks
the current day ad-hoc networks such as flow and error solution to the problem using the GP approach. We
control over multi-hop communication route, deriving then present the Weighted Clustering Algorithm given
and maintaining network topology information and by (Chatterjee, Das & Turgut, 2002) that are used for
mechanism to handle router mobility and power and clustering of nodes in mobile ad-hoc networks, as an
size requirements, among others. However, since the instantiation of this approach.
electronic devices were huge then, the packet radios
were not easily movable, leading to limited mobility. GP
In addition, the network coverage was slow and since
Bellman-Fords shortest path algorithm was used for GP is a popular bio-inspired computing method. The
routing, transient loops were present. Since then, a lot concepts in genetic algorithms are inspired by the
of research has been done on ad-hoc networks and a phenomenon of life and evolution itself. Life is a
number of routing algorithms have been developed problem whose solution includes retaining those who
which provide far greater performances and are loop have strong enough characteristics to survive in the
free. The rapid development of silicon technology environment and discarding the others. This exquisite
has also led to ever shrinking devices with increasing process can provide solutions to complex analytical
computation power. Ad-hoc networks are deliberated problems awaiting the most fitting result.
for use in medical, relief-and-rescue, office environ- The basics include the role of chromosomes and
ments, personal networking, and many other daily life genes. Chromosomes carry the genes which contain the
applications. parameters/characteristics that need to be optimized.
Bio-inspired algorithms, to which the evolution- Hence, GP starts with declaration of data structures that
ary computing approaches such as genetic algorithms form the digital chromosomes. These digital chromo-
belong, have been around for more than past 50 years. somes contain genetic information where each gene
In fact, there have been evidences that suggest that the represents a parameter to be optimized. The gene could
Artificial Neural Networks are rooted in the unpub- be represented as a single bit, which could be 1 (ON)
lished works of A. Turing (related to the popular Turing or 0 (OFF). So, a chromosome is a sequence of 1s and
Machine) (Paun, 2004). Finite Automata Theory was 0s and a parameter is either totally present or totally
developed about a decade after that based on the neural absent. Other abstractions could represent the presence
modeling. This ultimately led to the area that is currently of a parameter in relative levels. For instance, a gene
known as Neural Computing. This can effectively be could be represented using 5 bits where the magnitude
called as the initiation of bio-inspired computing. Since of the binary number tells about the magnitude of the
then, techniques such as GP, Swarm Intelligence and presence of a parameter in the range 0 (00000) to
Ant Colony Optimization and DNA computing have 31 (11111).
been researched and developed as nature continues to First, these digital chromosomes are created using
inspire us and show us the way for solving the most stochastic means. Then their fitness is tested either
complex problems known to man. by static calculation of fitness using some method or
dynamically by modelling fights between the chromo-
somes. The chromosomes with a set level of fitness
MAIN FOCUS OF THE CHAPTER are retained and allowed to produce a new generation
of chromosomes. This can be done either by genetic
This chapter and the the chapter entitled, Swarm Intel- recombination that is new chromosomes are produced
ligence Approaches for Wireless Ad Hoc Networks with combination of present chromosomes, or by
of this book, in combination, present an introduction mutation, that is new chromosomes are produced by
to various bio-inspired algorithms and describe their randomly producing changes in present chromosomes.
implementation in the area of wireless ad-hoc networks. This process of testing for fitness and creating new
This chapter primarily presents the GP approach to ad- generations is repeated until the fittest chromosomes
hoc networks. We first give a general introduction to GP are deemed as optimized enough for the task, which
and explain the concepts of genes and chromosomes. the genetic algorithm was created for. The process is
We also explain the stochastic nature of GP and the described in Figure 1.
process of mutation and crossover to provide optimal
590
Evolutionary Computing Approach for Ad-Hoc Networks
Genetic algorithms begin with a stochastic process Dressler (2006) gives the following list of self
and arrive at an optimized solution and are time con- organization capabilities:
suming.. Hence, they are generally used for solving
complex problems. As mentioned by Ashby (1962), self Self-healing: The system should be able to detect
organization is one of these complex problems. Self and repair the failures cause by overloading, com-
organization is the problem where the components of ponent malfunctioning or system breakdown.
a multi-component system achieve (or try to achieve) Self-configuration: The system should be able
a common goal without any centralized or distributed to generate adequate configurations including
control. The organization is generally done by chang- connectivity, quality of service etc. as required
ing the direct environment which can be adapted by in the existing scenario
the various system components and hence affect the Self-management: The system should be able to
behaviour of these components. maintain devices and in turn the network depend-
As has been mentioned, Self-organization is es- ing on the set configuration parameters.
pecially important in ad-hoc networking because of Self-optimization: The system should be to make
the spontaneous interaction of multiple heterogeneous an optimal choice of methods depending on the
components over wireless radio connections without system behaviour .
human interaction (Murthy & S., 2004).
591
Evolutionary Computing Approach for Ad-Hoc Networks
Adaptation: The system should dynamically Wv = w1v + w2Dv + w3Mv + w4Pv (1)
adapt to the changing environment conditions,
for example, change in node positions, change In Equation (1),
in number of nodes in the network etc.
v signifies the difference between optimal and actual
Genetic algorithms have been used extensively in connectivity of a node.
robotics, electronic circuit design, natural language Dv signifies the sum of distances with all the neigh-
processing, game theory, multi-model search, com- bouring nodes.
puter network topology design among many other Mv is the running average of the speed of the node.
applications. Pv signifies the time that the node has acted as a clus-
In the context of wireless ad-hoc networks, genetic ter-head.
algorithms have been used in solving shortest path w1, w2, w3 and w4 are relative weights given to different
routing problem (Ahn & Ramakrishna, 2002), QoS parameters.
path discovery (Fu, Li & Zhang, 2005), developing
broadcasting strategy (Alba, Dorronsoro, Luna, Nebro It should be noted that (Chatterjee, Das and Turgut,
& Bouvry, 2005), QoS routing (Barolli, Koyama & 2002):
Shiratori, 2003), among others. In the interest of brevity,
we present below only one representative application w1 + w2 + w3 + w4 = 1 (2)
of the use of genetic algorithms to solve problems in
ad-hoc networks. A genetic algorithm-based approach was presented
by the designers of WCA (Chatterjee, Das, & Turgut,
Weighted Clustering Using Genetic 2002). The sub-optimal solutions are mapped to chro-
Algorithm mosomes and given as input to produce best solutions
using genetic techniques. This leads to better perfor-
Nodes in ad-hoc networks are sometimes grouped into mance and more evenly balanced load sharing. The basic
various clusters with each cluster having a cluster-head. building blocks of the algorithm are given below:
Clustering aims at introducing a kind of organisation
in ad-hoc networks. This leads to better scalability of Initial Population: A candidate solution which acts
networks resulting in better utilisation of resources in as a chromosome can be represented as shown
larger networks. A cluster-head is responsible for the in Figure 6. This initial population set is gener-
formation of clusters and maintenance of clusters in ated randomly by arranging the nodes in strings
ad-hoc networks. Several clustering mechanisms have and then traversing this string. A node which
been proposed for ad-hoc networks, like Lowest-ID is not a cluster-head and is not a neighbour of
(Ephremides, Wieselthier & Baker, 1987), Highest a cluster-head (and hence, part of an existing
Connectivity (Gerla & Tsai, 1995) Distributed Mobil- cluster) is chosen as a cluster head if it has less
ity-Adaptive Clustering (DMAC) (Basagni, 1999), than (a pre-defined constant) neighbours. is
Distributed Dynamic Clustering Algorithm (McDonald chosen such as to prevent a cluster-head with more
& Znati, 1999) and Weight-Based Adaptive Clustering than optimal neighbour causing over-loading at
Algorithm (WBACA) (Dhurandher & Singh, 2007). cluster-heads.
Weighted Clustering Algorithm (WCA) (Chat- Objective Fitness Function: Fitness value of a chro-
terjee, Das and Turgut, 2002) is a popular clustering mosome can be calculated as the sum of Wv of the
algorithm which selects a cluster-head on the basis of contained genes. All nodes present at a gene [1]
node mobility, battery-power, connectivity, distance are analysed. If a node is not a cluster-head or a
from neighbour and degree of connectivity. Weights are member of a cluster-head and has a node degree
assigned to each parameter and a combined weighted less than MAX_DEGREE, it is added to the clus-
metric Wv is calculated as shown by Equation (1) ter-head list and its Wv value is added to the total
(Chatterjee, Das and Turgut, 2002). Within certain sum. For remaining nodes in the network, if the
constraints, nodes with minimum Wv are selected as node is not a cluster-head or a member of other
the cluster-head. cluster-head, its Wv value is added to the sum and
592
Evolutionary Computing Approach for Ad-Hoc Networks
the node is added to the cluster-head list. Lesser Das, & Turgut, 2002) as the crossover technique
the sum of the Wv values of the genes, the higher for the genetic implementation. E
is the fitness value of the chromosome. Mutation: Mutation introduces randomness into the
Crossover: The crossover rate is 80 %. The authors solution space. If mutation rate is low, there is a
used a technique called X_Over1 (Chatterjee, chance of the solution converging to a non-optimal
Figure 2. Candidate solution as chromosome and its single gene ( Based On: Turgut, Das, Elmasri, & Turgut,
2002, Fig. 3)
Single Gene on
Chromosome
Algo_genetic_cluster{
Generate initial population randomly, with population size = number of
nodes
do{
while(new_pool_size < old_pool_size){
select chromosomes using Roulette wheel
method;
apply crossover using X_Over1;
apply mutation by swapping genes;
find fitness value of all the chromosomes;
replace through appending;
}
593
Evolutionary Computing Approach for Ad-Hoc Networks
solution space. Hence, mutation is a very impor- Alba, E., Dorronsoro, B., Luna, F., Nebro, A. J., &
tant step in genetic computations. In the stated Bouvry, P. (2005). A Cellular Multi-Objective Genetic
algorithm, the inventors used swapping for the Algorithm for Optimal Broadcasting Strategy in Met-
process of mutation. Two genes of a chromosome ropolitan MANETs. 19th IEEE International Parallel
were selected randomly and swapped. Mutation and Distributed Processing Symposium (IPDPS05),
rate used is 10%. IEEE Computer Society Press.
Selection Criteria: As mentioned earlier, chromo-
Ashby, W. R. (1962). Principles of the Self-organizing
somes with lower values of Wv are considered
system. In H. V. Foerster, & G. W. Zopf, Principles
fitter. Roulette wheel method is used for selection
of Self-Organization (pp. 255278). Eds. Pergamon
in accordance with the fitness values of these
Press.
chromosomes.
Elitism: If the new generation produced has fitness value Barolli, L., Koyama, A., & Shiratori, N. (2003). A
better than the best fitness value of the previous QoS Routing Method for Ad-Hoc Networks Based on
generation, than the best solution is replaced by Genetic Algorithm. 14th International Workshop on
the new generation. Since the best solutions of Database and Expert Systems Applications(DEXA) (pp.
a generation are replaced, this step helps avoid 175-179). Prague: IEEE Computer Society.
the local maxima of the solution space and move
towards the global maxima. Basagni, S. (1999), Distributed Clustering for ad hoc
Replacement: This method states that, during replace- networks, in Proceedings of International Symposium
ment, the best solution of a generation is appended on Parallel Architectures, Algorithms, and Networks
to the solution set of the next generation. This (IS-PAN), Perth/Fremantle, Australia.
step helps in preserving the best solution during Chatterjee, M., Das, S., & Turgut, D. (2002). WCA:
genetic operations. A Weighted Clustering Algorithm for Mobile Ad Hoc
Networks. Cluster Computing , 193-204
Using these building functions, the genetic algorithm
for the weighted clustering algorithm was given. This Dhurandher S. K., and Singh G.V. (2007), Stable
algorithm is given in Figure 3. Clustering with Efficient Routing in Wireless Ad
Hoc Networks, in Proceedings of IEEE International
Conference on COMmunication System softWAre and
CONClUSION MiddlewaRE (IEEE COMSWARE 2007).
Dressler, F. (2006). Self-Organization in Ad Hoc Net-
This article presented an overview of an evolutionary works: Overview and Classification. University of
computing approach using GP and their application to Erlangen, Dept. of Computer Science.
wireless ad-hoc networks, both of which are currently
hot topics amongst the computer science and network- Ephremides, A., Wieselthier, J. E. and Baker, D. J.
ing research community. The intention of writing this (1987), A Design Concept of Reliable Mobile Radio
article was to show how one could marry together Networks with Frequency Hopping Signaling, in Pro-
concepts from GP with ad-hoc networks to arrive at ceedings of the IEEE, Vol. 75, No.1, pp. 56-73.
interesting results. In particular we reviewed the WCA Ephremides, A., Wieselthier, J. E. and Baker, D. J.
algorithm (Chatterjee, Das and Turgut, 2002) which (1987), A Design Concept of Reliable Mobile Radio
uses GP for clustering of nodes in ad-hoc networks. Networks with Frequency Hopping Signaling, in Pro-
ceedings of the IEEE, Vol. 75, No.1, pp. 56-73.
REFERENCES Fu, P., Li, J., & Zhang, D. (2005). Heuristic and Dis-
tributed QoS Route Discovery for Mobile Ad hoc
Ahn, C. W., & Ramakrishna, R. S. (2002). A Genetic Networks. The Fifth International Conference on
Algorithm for Shortest Path Routing and the sizing of Computer and Information Technology (CIT05). IEEE
population. IEEE Transactions on Evolutionary Com- Computer Society.
putation, Vol. 6, No. 6,. IEEE Computer Society.
594
Evolutionary Computing Approach for Ad-Hoc Networks
McDonald, A. B., and Znati, T. F. (1999), A mobility- having a cluster-head which is responsible for formation
based framework for adaptive clustering in wireless and management of its cluster. E
ad hoc networks, IEEE Journal on Selected Areas in
Cross-Over: Genetic operation which produces a
Communications, Vol. 17, No. 8, pp. 1466-1487.
new chromosome by combining the genes of two or
McQuillan, J. M., & Walden, D. C. (1977). The ARPA more parent chromosome.
Network Design Decisions. Computer Networks , I,
Fitness Function: A function which maps the sub-
243-289.
jective property of a fitness of a solution to an objective
Murthy, C. S., & S., M. B. (2004). Ad Hoc Wireless value which can be used to arrange different solutions
Networks. Upper Saddle River, NJ: Prentice Hall. in the order of their suitability as final or intermediate
solution.
Olario, S., & Zomaya, A. Y. (2006). Handbook of
Bioinspired Algorithms and Applications. Chapman Genes: Genes are building blocks of chromo-
and Hall / CRC Press. somes and represent the parameters that need to be
optimized.
Paun, G. (2004). Bio-Inspired Computing Paradigms
(Natural Computing). Pre-Proc. Unconventional Genetic Algorithms: The algorithms that are
Programming Paradigms (pp. 155-160). Berlin: modelled on the natural process of evolution. These
Springer. algorithms employ methods such as crossover, muta-
tion and natural selection and provide the best possible
Toh, C. K. (2002). Ad Hoc Mobile Wireless Systems.
solutions after analyzing a group of sub-optimal solu-
Prentice Hall PTR.
tions which are provided as inputs.
Turgut, D., Das, S., Elmasri, R., & Turgut, B. (2002).
Initial Population: Set of sub-optimal solutions
Optimizing parameters of a mobile ad hoc network
which are provided as inputs to a genetic algorithm
protocol with a genetic algorithm. Global Telecom-
and from which an optimal solution evolves.
munications Conference (pp. 62 - 66). IEEE Computer
Society. Mobile Ad-Hoc Network: A multi-hop network
formed by a group of mobile nodes which co-operate
among each other to achieve communication, without
requiring any supporting infrastructure.
KEy TERMS
Mutation: Genetic operation which randomly alters
Bio-Inspired Algorithms: Group of algorithms a chromosome to produce a new chromosome adding
modelled on the observed natural phenomena which new solution to the solution-set.
are employed to solve mathematical problems.
Chromosome: A proposed solution to a problem
which is represented as a string of bits and can mutate ENDNOTE
and cross-over to create a new solution.
1
Based on Chatterjee, Das and Turgut, 2002
Clustering: Grouping the nodes of an ad hoc net-
work such that each group is a self-organized entity
595
596
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Grammatical Inference
Javed and his colleagues (2004) proposed a Genetic More details about formal languages and gram-
Programming (GP) approach with grammar-specific mars can be found in textbooks such as Hopcroft et E
heuristic operators with non-random construction of the al (2001).
initial grammar population. Their approach succeeded
in inducing small context-free grammars. The Chomsky Hierarchy
More recently, Rodrigues and Lopes (2006) pro-
posed a hybrid GP approach that uses a confusion Grammars are classified according to the form of the
matrix to compute the fitness. They also proposed a production rules used. They are commonly grouped
local search mechanism that uses information obtained into a hierarchy of four classes, known as the Chomsky
from the sentence parsing to generate a set of useful hierarchy (Chomsky, 1957).
productions. The system was used for the parenthesis
and palindromes languages with success. Recursively enumerable languages: a grammar
is unrestricted, and its productions may replace
any number of grammar symbols by any other
BACKGROUND number of grammar symbols. The productions
are of the form a b with a, b ( ).
A formal language is usually defined as follows. Given Context-sensitive languages: they have grammars
a finite alphabet of symbols, we define the set of all with productions that replace a single nonterminal
strings (including the empty string ) over as *. by a string of symbols, whenever the nonterminal
Thus, we want to learn a language L *. The alphabet occurs in a specific context, i.e., has certain left
could be a set of characters or a set of words. The and right neighbors. These productions are of the
most common way to define a language is based on form aAg abg, with A N and a, b, g (
!
grammars which gives rules for combining symbols ). A is replaced by b if it occurs between a
and to produce the all sentences of a language. and g .
A grammar is defined by a quadruple G = (N, , P, Context-free languages: in this type, grammars
S), where N is an alphabet of nonterminal symbols, have productions that replace a single nonterminal
is an alphabet of terminal symbols such that N = by a string of symbols, regardless of this nonter-
f, P is a finite set of production rules of the form a minals context. The productions are of the form
b for a, b ( N )* where * represents the set of A a for A N and a ( N )*; thus A has
symbols that can be formed by taking any number of no context.
them, possibly with repetitions. S is a special nonter- Regular languages: they have grammars in which
minal symbol called the start symbol. a production may only replace a single nontermi-
The language L(G) produced from grammar G nal by another nonterminal and a terminal. The
is the set of all strings consisting only of terminal productions are of the form A Ba or A a
symbols that can be derived from the start symbol S for A, B N and a *.
by the application of production rules. The process of
deriving strings by applying productions requires the It is sometimes useful to write a grammar in a
definition of a new relation symbol . Let aXb be a particular form. The most commonly used in gram-
string of terminals and nonterminals, where X is a non- matical inference is the Chomsky Normal Form. A
terminal. That is, a and b are strings in ( N )*, and CFG G is in Chomsky Normal Form (CNF) if all
X N. If X is a production of G, we can say aXb production rules are of the form A BC or A a for
ab. It is important to say that one derivation step A, B, C N and a .
can replace any nonterminal anywhere in the string.
We may extend the relationship to represent one The Cocke-Younger-Kasami Algorithm
or many derivation steps. We use a * to denote more
steps. Therefore, we formally define the language To determine whether a string can be generated by
L(G) produced from grammar G as L(G) = { w | w a given context-free grammar in CNF, the Cocke-
*, S * w }. Younger-Kasami (CYK) algorithm can be used. This
597
Evolutionary Grammatical Inference
598
Evolutionary Grammatical Inference
There are two main selection methods used in GP: left-hand side as a root and the derivations as leaves.
fitness proportionate and tournament selection. In the Figure 3 shows the grammar G = ( N, , P, S ) with E
fitness proportionate selection, programs are selected = {a, b} , N = {S, A} and P = {S AS ; S b;
randomly with probability proportional to its fitness. In A SA ; A a }.
the tournament selection, a fixed number of programs The initial population can be created with random
are taken randomly from the population and the one productions, provided that all the productions are reach-
with the best fitness in this group is chosen. In this able direct or indirectly starting with S.
work, we use the tournament selection.
Reproduction is a genetic operator that simply Genetic Operators
copies a program to the next generation. Crossover,
on the other hand, combines parts of two individuals The crossover operator is applied over a pair of gram-
to create two new ones. Mutation changes randomly mars and works as follows. First, a production is chosen
a small part of an individual. using a tournament selection. If the second grammar
Each run of the main loop of GP creates a new has no production with the same left-hand side of the
generation of computer programs that substitutes the production chosen, crossover is rejected. Otherwise,
previous one. The evolution is stopped when a satis- the productions are swapped.
factory solution is achieved or a predefined maximum The mutation operation is applied to a single selected
number of generations is reached. grammar. A production is then chosen using the same
mechanism of crossover. A new production, with the
same left-hand side and with a randomly right-side,
A GRAMMAR GENETIC replaces the production chosen.
PROGRAMMING APPROACH The crossover probability is usually high (90%)
and the mutation probability is usually low (10%).
We present how a GP approach can be applied to the Unfortunately, using only the genetic operators
inference of context-free grammars. First, we discuss mentioned, the convergence of the algorithm is not
the representation of the grammars. The modification guaranteed. In our recently work, we demonstrated
needed in the genetic operators are also presented. In the that the use of two local search operators is needed:
last section, the grammar evaluation are discussed. an incremental learning operator (Rodrigues & Lopes,
2006) and an expansion operator (Rodrigues & Lopes,
Initial Population 2007). The first uses the information obtained from a
CYK table to discover which production is missing
It is possible to represent a CFG as a list of struc- to cover the sentence. The latter can expand the set of
tured trees. Each tree represents a production with its productions dynamically providing diversity.
599
Evolutionary Grammatical Inference
This operator is applied before the evaluation of each In grammatical inference, we need to train the system
grammar in the population. It uses the CYK table with both positive and negative examples to avoid
obtained from the parsing of positive examples to overgeneralization. Usually the evaluation is done
allow the creation of an useful new production. The counting the positive examples covered by a grammar
pseudocode is in Figure 4. in proportion to the total of positive examples. If the
Once this process is completed with success, hope- grammar cover some negative examples, it is penal-
fully, there will be a set of positive examples (possible ized in some way.
all) recognized by the grammar. Although, there is no In our recently work, we use a confusion matrix
warranty that some negative examples will still remain that is typically used in supervised learning (Witten &
being rejected by the grammar. Frank, 2005). Each column of the matrix represents
the number of instances predicted either positively or
The Expansion Operator negatively, while each row represents real classification
of the instances. The entries in the confusion matrix have
This operator adds a new nonterminal to the grammar the following meaning in the context of our study:
and generates a new production with this new nontermi-
nal as a left-side. This new approach allows grammars TP is the number of positive instances recognized
to grow dynamically in size. To avoid a new useless by the grammar.
production, a production with another non-terminal in TN is the number of negative instances rejected
the left-side and the new non-terminal in the right-side by the grammar.
is generated. It is important to emphasize that the new FP is the number of negative instances recognized
operator adds two productions to the grammar. by the grammar.
This operator promotes diversity in the population FN is the number of positive instances rejected
that is required in the beginning of the evolutionary by the grammar.
process.
600
Evolutionary Grammatical Inference
601
Evolutionary Grammatical Inference
using genetic programming. Proceedings of the 42nd. Proceedings of the Second International Colloquium
Annual ACM Southeast Conference 04, 404-405. in Grammatical Inference (ICGI-94). LNAI n 862.
Springer-Verlag, 222235.
Korkmaz, E.E. & Ucoluk, G. (2001) Genetic program-
ming for grammar induction. Proceedings of 2001 Wyard, P. (1991) Context free grammar induction using
Genetic and Evolutionary Computation Conference genetic algorithms. Proceedings of the Fourth Interna-
Late Breaking Papers, 245-251. tional Conference on Genetic Algorithms (ICGA91).
Morgan Kaufmann, 514-518.
Koza, J.R. (1992) Genetic Programming: On the
Programming of Computers by Natural Selection.
Cambridge: MIT Press.
KEy TERMS
Lopes, H.S., Coutinho, M.S. & Lima, W.C. (1998) An
evolutionary approach to simulate cognitive feedback
CYK: A Cocke-Younger-Kasami algorithm used
learning in medical domain. Proceedings of the Genetic
to determine whether the sentence can be generated
Algorithms and Fuzzy Logic Systems: Soft Computing
by the grammar.
Perspectives, 193-207.
Evolutionary Computation: Large and diverse
Monsieurs, P. & Flerackers, E. (2001) Reducing bloat
class of population-based search algorithms that is
in genetic programming. Proceedings of the Interna-
inspired by the process of biological evolution through
tional Conference, 7th Fuzzy Days on Computational
selection, mutation and recombination. They are itera-
Intelligence, Theory and Application, LNCS v. 2206,
tive algorithms that start with an initial population of
Springer-Verlaq, 471-478.
candidate solutions and then repeatedly apply a series
Rodrigues, E. & Lopes, H.S. (2007) Genetic Program- of the genetic operators.
ming for induction of context-free grammars. Proceed-
Finite Automata: A model of behavior composed
ings of the 7th. International Conference on Intelligent
of a finite number of states, transitions between those
Systems Design and Applications (ISDA07) (to be
states, and actions. They are used to recognize regular
published).
languages.
Rodrigues, E. & Lopes, H.S. (2006) Genetic program-
Genetic Algorithm: A type of evolutionary com-
ming with incremental learning for grammatical infer-
putation algorithm in which candidate solutions are
ence. Proceedings of the 6th. International Conference
represented typically by vectors of integers or bit strings,
on Hybrid Intelligent Systems (HIS06), IEEE Press,
that is, by vectors of binary values 0 and 1.
Auckland, 47.
Heuristic: Function used for making certain deci-
Sen, S. & Janakiraman, J. (1992) Learning to construct
sions within an algorithm; in the context of search algo-
pushdown automata for accepting deterministic con-
rithms, typically used for guiding the search process.
text-free languages. Proceedings of the Applications of
Artificial Intelligence X: Knowledge-Based Systems, Local Search: A type of search method that starts at
SPIE v. 1707, 207213. some point in search space and iteratively moves from
position to neighbouring position using heuristics.
Starckie, B,. Costie, F. & van Zaanen, M. (2004) The
Omphalos context-free grammar learning competition. Pushdown Automata: A finite automaton that can
Proceedings of the International Colloquium on Gram- make use of a stack containing data. They are used to
matical Inference, 16-27. recognize context-free language.
Witten, I. H. & Frank, E. (2005) Data Mining. San Search Space: Set of all candidate solutions of a
Francisco: Morgan Kaufmann. given problem instance.
Wyard, P. (1994) Representational issues for context
free grammar induction using genetic algorithms.
602
603
Evolutionary Robotics E
J. A. Becerra
Universidade da Corua, Spain
R. J. Duro
Universidade da Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Robotics
tion we will talk about those problems and, in general, On one hand, they are noise and failure tolerant and,
about the main aspects to take into account when using on the other, they can be used as universal function
evolution in Autonomous Robotics. approximators and can be easily integrated with an
evolutionary algorithm to obtain a controller from
scratch. Many researchers have used ANNs, just to men-
EVOlUTIONARy ROBOTICS tion some of them: (Beer and Gallagher, 1992), (Cliff
et al, 1992), (Floreano and Mondada, 1998), (Harvey
The basis of Evolutionary Robotics is to use evolution- et al, 1993), (Kodjabachian and Meyer, 1995), (Lund
ary algorithms to automatically obtain robot control- and Hallam, 1996), (Nolfi et al, 1994) and (Santos and
lers. In order to do that, there are many decisions to Duro, 1998).
be made. First of all, one must decide what to evolve
(controllers, morphology, both?). Then, whatever is How to Encode What We are Evolving
to be evolved has to be encoded in chromosomes. An
evolutionary algorithm must be chosen. It has to be When encoding a controller into the chromosome, the
decided where and how to evaluate each individual, most obvious choice, and the most common one, is to
etc. These issues will be addressed in the following make a direct encoding. That is, each controller param-
sections. eter becomes a gene in the chromosome. For instance,
if the controller is an ANN, each synaptic weight as
What to Evolve well as the biases and other possible parameters that
describe the ANN topology correspond to a gene, (Ma-
The first decision to be made is what to evolve. The most taric and Cliff, 1996), (Miglino et al, 1995a). This can
common choice is to evolve controllers for a given robot, lead to very large chromosomes, as the chromosome
but we can also evolve the morphology or both things size grows proportional to the square of the network
together. If we choose to evolve only the controllers, size (in case of feedforward networks), increasing
we also have to decide how they will be implemented. the dimensionality of the search space and making it
The most usual choices are artificial neural networks, more difficult to obtain a solution in reasonable time.
fuzzy logic systems and classifier systems. Another problem is that the designer has to predefine
Classifier systems are made up of rules (the clas- the full topology (size, number of neurons, etc.) of
sifier set). Each rule consists of a set of conditions the ANN, which is, in general, not obvious usually
and a message. If the conditions are accomplished, a leading to a trial and error procedure. To address this
message can produce an action on an actuator and is problem, some researchers employ encoding schemes
stored in a message list. Sensor values are also stored where the chromosome length may vary in time (Cliff
in this message list. Messages in the message list may et al., 1993b).
change the state of conditions, leading to a different set Another possibility is to encode elements that,
of activated rules. There is an apportionment of credit following a set of rules, encode the development of
system that changes the strength for each rule and a rule the individual (Guillot and Meyer, 1997), (Angelo
discovery system, where a genetic algorithm generates Cangelosi et al, 1994), (Kodjabachian and Meyer,
new rules using existing rules in the classifier set and 1998). Some authors even simultaneously evolve with
their strength. An example of classifier systems is the this system both the controller and the morphology,
work of Dorigo and Colombetti (Colombetti et al, 1996), but mostly for virtual organisms (Sims, 1994) or very
(Dorigo and Colombetti, 1993, 1995, 1998). simplified real robots.
Fuzzy logic has also been used to encode control-
lers. Possible sensed values and acting values are Where to Carry Out the Evolution
encoded into predefined fuzzy sets and the rules that Process
relate both things can be evolved. Examples: (Cooper,
1995), (Hoffmann and Pfister, 1994), (Vicente Matel- To determine how good an individual is, it is necessary
ln et al, 1998). to evaluate this individual in an environment during a
Artificial neural networks are the most common way given time interval. This evaluation has to be performed
of implementing controllers in evolutionary robotics. more than once in order to make the process indepen-
604
Evolutionary Robotics
dent from the initial conditions. The more complex the it is very difficult to decide beforehand the fitness of
behaviour is the more time that is required to evaluate each action towards a final objective. Sometimes the E
an individual. The evaluation of the individual is usually same action may be good or bad depending on what
the most time consuming phase, by far, in the whole happened before or after in the context. In addition, this
evolutionary process for evolutionary robotics. The approach implies an external determination of good-
evaluation can be carried out in a real environment, in ness as it is the designer who is imposing through these
a simulated environment or both. action-fitness pairs how the robot must act.
Evaluation in a real environment has the obvious The global approach, on the other hand, implies
advantage that the controllers obtained will work with- defining a fitness criteria based on how good the robot
out problems in the real robot. But it is much slower was at achieving its final goal. The designer does not
than evaluation in simulated environments, it presents specify how good each action is in order to maximize
the danger of harming the robot and many limitations fitness, and the evolutionary algorithm has a lot more
on the evaluation functions that may be used. These freedom for discovering the best final controller. The
is why researchers that consider evaluation in a real main problem here is that there is a lot less knowledge
environment mostly use small robots in controlled en- injected in the evolutionary process, and thus the evo-
vironments (Dario Floreano and Francesco Mondada, lution may find loopholes in the problem specification
1995, 1996, 1998). and thus maximize fitness without really achieving the
The alternative is to perform the evaluation in a function we seek.
simulated environment (Beer and Gallagher, 1992), There are two main ways in which global fitness
(Cliff et al, 1993a), (Meeden, 1996), (Miglino et al, can be obtained: external and internal. By external we
1995a, b). This is faster and it permits parallelizing the mean a fitness assigned by someone or some process
algorithm and using fitness functions that are impossible outside the robot. An extreme example of external
in a real environment. Nevertheless, it has the additional global evaluation of the robot behaviour is presented
problem of how to carry out this simulation in order to in (Lund et al,1998).
obtain controllers that work in the real world. Jakobi The other possible approach is to employ an in-
(Jakobi 1997) formalized this problem and established ternal representation of fitness. This is, employ clues
the conditions to be taken into account in order to suc- in the environment the robot is conscious of and that
cessfully transfer the controllers obtained in simulation it can use in order to judge its level of fitness without
to the real robot. the help from any external evaluator (apart from the
Some researchers choose to perform a simulated environment itself). A concept often used in order to
evaluation for the different generations of the evolu- implement this approach is that of internal energy.
tionary process except in the last generations where The robot has an internal energy level and this energy
a real evaluation is performed to make sure that the level increases or decreases according to a set of rules
controllers work in the real robot. Nevertheless, this ap- or functions related to robot perceptions. The final
proach presents the same problems, although somehow fitness of the robot is given by the level of energy at
reduced, as the case of evolution in a real environment the end of its life.
(Miglino et al, 1995a).
Once the previous choices have been made, it is nec- The main problem of the evolutionary approach to
essary to decide how the fitness will be calculated. robotics is that, although it is easy to obtain simple
There are two different perspectives for doing this: a behaviours, it does not scale well to really complex
local perspective or a global perspective (Mondada and behaviours without introducing some human knowledge
Floreano, 1995). The first one consists in establishing in the system. An obvious solution to this problem is the
for each step of the robot life a fitness of its actions in reduction of evaluation time, which is slowly happen-
relation to its goal. The final fitness will be the sum of the ing thanks to the increasing computational capabilities.
fitness values corresponding to each step. This strategy Another way of controlling this problem is to try to
presents two main drawbacks. Except in toy problems, obtain a better encoding scheme using development.
605
Evolutionary Robotics
Another problem that arises when trying to minimize Cliff, D., Harvey, I., and Husbands, P. (1992), Incre-
human intervention is that the evolutionary algorithm mental Evolution of Neural Network Architectures
can lead to a behaviour that optimizes the defined fitness for Adaptive Behaviour, Tech. Rep. No. CSRP256,
function but the result does not correspond with the Brighton, School of Cognitive and Computing Sci-
desired behaviour due to an incomplete or inadequate ences, University of Sussex, UK.
fitness function. This problem must be addressed from
Cliff, D., Harvey, I., and Husbands, P. (1993a), Explo-
a theoretical and formal point of view so that the ap-
rations in Evolutionary Robotics, Adaptive Behavior,
propriate fitness functions can be methodologically
Vol. 2, pp. 73110.
obtained.
Finally, there is a problem in common with Au- Cliff, D., Husbands, P. and Harvey, I. (1993b), Evolving
tonomous Robotics: benchmarking. It is not easy to Visually Guided Robots, From Animals to Animats
compare different approaches and usually we can only 2, Proceedings of the Second International Conference
say if a behaviour is satisfactory or not. It is hard to on Simulation of Adaptive Behaviour (SAP 92), JA.
compare it with other similar behaviours or to know Meyer, H. Roitblat and S. Wilson (Eds.), MIT Press
if the results would be better by changing something Bradford Books, Cambridge, MA, pp. 374383.
in the approach followed.
Colombetti, M., Dorigo, M., and Borghi, G. (1996),
Behavior Analysis and Training A Methodology for
Behavior Engineering, IEEE Transactions on Systems,
CONClUSION
Man and Cybernetics, Vol. 26, No. 3, pp. 365380.
Evolutionary Robotics has quickly developed from Cooper, M.G. (1995), Evolving a RuleBased Fuzzy
its birth in the beginning of the nineties as the most Controller, Simulation, Vol. 65, No. 1.
common way of obtaining behaviours in Behaviour
Based Robotics. It has shown that it is the easiest way Dorigo, M. and Colombetti, M. (1993), Robot Shaping:
to automatically obtain controllers for autonomous Developing Autonomous Agents through Learning,
robots while trying to minimize the human factor, at Artificial Intelligence, Vol. 71, pp. 321370.
least when the behaviours are not too complex, due Dorigo, M. (1995), ALECSYS and the AutonoMouse:
to the fact that it is easy to encode every tool used to Learning to Control a Real Robot by Distributed Clas-
implement controllers and obtain a solution by just de- sifier Systems, Machine Learning Journal, Vol. 19,
fining a fitness function. Inside Evolutionary Robotics, No. 3, pp. 209240.
simulated evaluation is the preferred way of evaluating
how good a candidate solution is and Artificial Neural Dorigo, M. and Colombetti, M. (1998), Robot Shaping:
Networks are the preferred tool to implement control- An Experiment in Behavior Engineering, MIT Press.
lers due to their noise and failure tolerance and their Floreano, D. and Mondada, F. (1996), Evolution of
nature as a universal function approximator. Due to the Homing Navigation in a Real Mobile Robot, IEEE
time required to evaluate individuals and the number of Transactions on Systems, Man, and Cybernetics,
individuals that must be evaluated, the selection of the PartB, Vol. 26, pp. 396407.
correct evolutionary algorithm and its parallelization
are critical factors. Floreano, D. and Mondada, F. (1998), Evolutionary
Neurocontrollers for Autonomous Mobile Robots,
Neural Networks, Vol. 11, pp. 14611478.
REFERENCES Guillot, A. and Meyer, JA. (1997), Synthetic Ani-
mals in Synthetic Worlds, In Kunii et Luciani (Eds),
Beer, R.D. and Gallagher, J.C. (1992), Evolving Synthetic Worlds, John Wiley and Sons.
Dynamical Neural Networks for Adaptive Behavior,
Adaptive Behavior, Vol. 1, No. 1, pp. 91122. Harvey, I., Husbands, P., and Cliff, D. (1993), Issues
in Evolutionary Robotics, From Animals to Animats
Cangelosi, A., Parisi, D., and Nolfi, S. (1994), Cell 2. Proceedings of the Second International Conference
Division and Migration in a Genotype for Neural Net- on Simulation of Adaptive Behavior (SAB92), JA.
works, Network, Vol. 5, pp. 497515.
606
Evolutionary Robotics
Meyer, H. Roitblat, and S. Wilson (Eds.), MIT Press, Miglino, O., Lund, H.H., and Nolfi, S. (1995a), Evolv-
Cambridge, MA, pp. 364373. ing Mobile Robots in Simulated and Real Environ- E
ments, Artificial Life, Vol. 2, No. 4, pp. 417434.
Hoffmann, F. and Pfister, G. (1994), Automatic
Design of Hierarchical Fuzzy Controllers Using Ge- Miglino, O., Nafasi, K., and Taylor, C. (1995b), Se-
netic Algorithms, Proceedings of EUFIT 94, Aachen, lection for Wandering Behavior in a Small Robot,
Germany. Artificial Life, Vol. 2, pp. 101116.
Husbands, P., Harvey, I., Cliff, D. and Miller, G. (1994), Mondada, F. and Floreano, D. (1995), Evolution
The Use of Genetic Algorithms for the Develop- of Neural Control Structures: Some Experiments on
ment of Sensorimotor Control Systems, P. Gaussier Mobile Robots, Robotics and Autonomous Systems,
and JD. Nicoud (Eds.), From Perception to Action, Vol. 16, pp. 183-195.
IEEE Computer Society Press, Los Alamitos CA, pp.
Nolfi, S., Elman, J., and Parisi, D. (1994), Learning
110121.
and Evolution in Neural Networks, Adaptive Behavior,
Jakobi, N., (1997), HalfBaked, AdHoc and Noisy Mini- Vol. 1, pp. 528.
mal Simulations for Evolutionary Robotics, Fourth
Santos, J. and Duro, R.J. (1998), Evolving Neural
European Conference on Artificial Life, P. Husbands
Controllers for Temporally Dependent Behaviors in
and I. Harvey (Eds.), MIT Press, pp. 247269.
Autonomous Robots, Tasks and Methods in Applied
Kodjabachian, J. and Meyer, JA (1995), Evolution Artificial Intelligence, A.P. del Pobil, J. Mira and M.
and Development of Control Architectures in Ani- Ale (Eds.), Lecture Notes in Artificial Intelligence, Vol.
mats, Robotics and Autonomous Systems, Vol. 16, 1416, SpringerVerlag, Berln, pp. 319328.
pp. 161182.
Sims, K. (1994), Evolving 3D Morphology and Be-
Kodjabachian, J. and Meyer, JA (1998), Evolution havior by Competition, R. Brooks and P. Maes (Eds.),
and Development of Modular Control Architectures for Alife IV, MIT Press, Cambridge, MA, pp. 2839.
1D Locomotion in SixLegged Animats, Connection
Science, Vol. 10, No. 34, pp. 21137.
Lund, H.H. and Hallam, J.C. (1996), Sufficient Neu- KEy TERMS
rocontrollers can Be Surprisingly Simple, Research
Paper 824, Department of Artificial Intelligence, Uni- Artificial Neural Network: An interconnected
versity of Edinburg. group of artificial neurons, which are elements that use
a mathematical model that reproduce, through a great
Lund, H.H., Miglino, O., Pagliarini, L., Billard, A.,
simplification, the behaviour of a real neuron, used for
and Ijspeert, A. (1998), Evolutionary Robotics A
distributed information processing. They are inspired
Childrens Game, Proceedings of IEEE 5th Interna-
by nature in order to achieve some characteristics pre-
tional Conference on Evolutionary Computation.
sented in the real neural networks, such as error and
Mataric, M.J. and Cliff, D. (1996), Challenges in noise tolerance, generalization capabilities, etc.
Evolving Controllers for Physical Robots, Robotics
Autonomous Robotics: The field of Robotics that
and Autonomous Systems, Vol. 19, No. 1, pp. 6783.
tries to obtain controllers for robots so that they are tol-
Matelln, V., Fernndez, C., and Molina, J.M. (1998), erant and may adapt to changes in the environment.
Genetic Learning of Fuzzy Reactive Controllers, Ro-
Behaviour Based Robotics: The field of Autono-
botics and Autonomous Systems, Vol. 25, pp. 3341.
mous Robotics that proposes not to pay attention to
Meeden, L. (1996), An Incremental Approach to the knowledge that leads to behaviours, but just to
Developing Intelligent Neural Network Controllers implement them somehow. It also proposes a direct
for Robots, IEEE Transactions on Systems, Man and connection between effectors and actuators for every
Cybernetics Part. B: Cybernetics, Vol. 26, No. 3, pp. controller running in the robot, eliminating the typical
474485. sensor interpretation, world modelling and planning
stages.
607
Evolutionary Robotics
Evolutionary Algorithm: Stochastic population Knowledge Based Robotics: The field of Autono-
based search algorithm inspired on natural evolution. mous Robotics that tries to achieve intelligent behav-
The problem is encoded in an n-dimensional search iours through the modelling of the environment and a
space where individuals represent candidate solutions. process of planning over that model, that is, modelling
Better individuals have higher reproduction probabili- the knowledge that generates the behaviour.
ties than worse individuals, thus allowing the fitness of
MacroEvolutionary Algorithm: Evolutionary
the population to increase through the generations.
algorithm using the concept of species instead of indi-
Evolutionary Robotics: The field of Autonomous viduals. Thus, low fitness species become extinct and
Robotics, usually also considered as a field of Behaviour new species appear to fill their place. Its evolution is
Based Robotics, that obtains the controllers using some smoother, slower but less inclined to fall into local op-
kind of evolutionary algorithm. tima as compared to other evolutionary algorithms.
608
609
Alin Mazare
University of Pitesti, Romania
Gheorghe erban
University of Pitesti, Romania
Emil Sofron
University of Pitesti, Romania
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolved Synthesis of Digital Circuits
market was up most by complex programmable gates Most workers are content with the extrinsic evolution
array or by the low grains field programmable gates because the evolutionary algorithm is software based.
array where upon the main problems are the number But Scott and others achieve different solutions for
of Boolean cells available on chip and the delay time. hardware implementation of evolutionary algorithms. In
The fast evolution of the technologies increases in our (Scott, 1997) is design a hardware genetic algorithm
day the performance of programmable circuits. Thus, implemented as pipe line hardware structure in FPGA.
today is possible to implement a high speed central His work is a demonstration that full integrated evolved
processing units core which is comparable with the hardware (intrinsic) solution can be implemented. To
application specific integrated circuit implementations. design hardware modules for crossover, selection and
Therewith the low product costs make that a modern mutation are used combinational networks such as
programmable digital circuit can be purchased by end systolic arrays presented by (Bland2001).
users like students and researchers. Thus an evolution Miller et al. (Miller,2000) give a reference with
in designing, synthesis and implementation techniques his work concerned of the evolution of combinational
with programmable logic circuits is required. One very digital circuit to implement arithmetic functions. First
attractive direction of research is implementation of he uses the gates networks which evolve in arithmetic
hardware bio-inspired systems on programmable logic circuit but demonstrate that is possible to use evolu-
circuits like neural networks or evolutionary algorithms tion of some sub-circuits to achieve more complex
(e.g. genetic algorithms). circuits.
The first research direction in this area is to find solu- Another example of extrinsic evolved synthesis
tions for improve the genetic algorithm performance by is give by Martins work (Martin 2001). Here the
hardware implementation. Software implementations phenotype is give by a hardware description language
have the advantage of flexibility and easily configura- (HandleC) sequences.
tion. However, the convergence speed is slow because But utilization of genetic algorithm in hardware
the serial execution of the steps whiles the algorithm design is in any more areas. In (Shaaban 2001) is used
run. To increase the speed a parallel implementation of in integrated circuit design in semiconductor detectors.
the modules is required (Goldberg, 1995). This is done Yasunaga (Yasunaga 2001) use a hardware synthesis
by hardware implementation on programmable logic of digital circuit in reconfigurable programmable logic
circuits. More investigations are done in this area. array structure to implement speach recognition system.
The second research direction is to join the concept Recently evovable synthesis is used for sequential
of assemble-and-test together with an evolutionary circuit design(Ali 2004) or in digital filters design
algorithm to gradually improve the quality of a design (Iana2006).
has largely been adopted in the nascent field of evolv- The evolved combinational and sequential synthesis
able hardware where the task is to build an electronic for digital circuits is included as control module for
circuit. self-contained mobile system to execute more tasks
Thompson (Thompson, 1999) makes the first re- like obstacle avoidance and target hit in (Sharabi2006).
search, uses a reconfigurable platform and showed that In fact the solving of the multi objective (multi task)
is possible to evolve a circuit which could discriminate problems in hardware system by evolutionary is treated
between two square wave signals. He demonstrates by Coello (Collelo 2002) and used in more recent works
that is possible to design digital circuit using evolved (Zhao 2006).
algorithm. The evolutionary process had utilized physi- The question is: is possible to design digital circuits
cal proprieties of the underlying silicon substrate to using an evolutionary algorithm like genetic algorithm
produce an efficient circuit. He argued that artificial (GA)?
evolution can produce design for electronics circuit To answer of this question, first, the design of a
which lies outside the scope of conventional meth- reconfigurable circuit which can be programmed by
ods. Koza (Koza, 1997) have pioneered the extrinsic evolutionary algorithm is required. A solution can be
evolution of analogue circuit using SPICE and have reconfigurable multilayer gates network. Each gate in
automatically generated circuit which are competitive layer x can be connected or not whit a gate in layer
with human designer. x+1.
610
Evolved Synthesis of Digital Circuits
Figure 1. a) The generic gate cell of the hardware reconfigurable structure, b) gates network coded as con-
figuration vector
y i1
y o1
P
y op
y bar o 1
y in
y bar op
i1 i2 i n- 1 in f1 f2 o1 op op + 1 on
(a)
(b)
611
Evolved Synthesis of Digital Circuits
Figure 2. The recursive sorting and permutation circuit used for hardware implementation of the genetic algo-
rithm modules
x2 X n -3 Xn - 2 Xn - 1 Xn
x1
p1 p2 p3 p4 pn -1 pn
RS n
.
...
R Pn
R Pn/2 R Pn /2
R S n -1
.
q1 q2 q3 q4 qn -1 qn
y1 y2 y n- 2 y n -1 yn
Design of Hardware Genetic Algorithm, the biggest one. So, the individuals with poor (small
Dynamic Reconfigurable Circuit and value) fitness will cross array on horizontal, from the
Application Description left to right and individuals with good fitness will cross
array from top to bottom on vertical. Finally, we have
This section gets into the techniques used for evolvable in the left side outputs with the best fitness.
hardware microstructures design. First it presents The same concept are used to crossover circuit.
the design of hardware genetic algorithm by using the Some individual sorted by selection module enter in
models from the preceding section. Each module are crossover module. Operator is applied on a certain
designed individually, describe with combinational number of pair of individuals. Usually, individuals
networks. The hardware genetic algorithm is a circuit with the best fitness are parents for generation 1 of
which connect all modules in a fully hardware solu- offspring which result from this module.
tion. The block diagram of HGA (hardware genetic For mutation module we use one single column
algorithm) is presented in figure below (fig.3). of logical structures (mutation cells) too. Array is
The main issue of this solution is that the modules design with follow restriction: mutation must affect
can be used in another algorithm and can be intercon- only one single bit from an individual and one single
nected in another way without to be redesigned. Each individual from generation (all individuals from one
module can work individual, therefore the structure iteration step). Structure can be easily changed if we
can process more generation in the same time. On the want another behavior.
other side, each module was designed as networks of Evaluation is done by comparing response of partial
elementary functions like min/max or permutation. solutions provide by each individual and the desirable
Thus any new change claimed like increasing the size response. Here is defining a target Boolean function,
or number of individuals is very easily to made by function which must be implemented in hardware.
adding the elementary functions blocks. In the selec- Another evaluation criterion is give by minimization
tion module individual (bits string) does, with fitness of digital resources used to implement target function.
computed, enter in the left side of the array. Each cell The fitness value is compute consulting both evalua-
collates fitness values from two inputs xin and yin. The tion criterions.
output xout get the individual with the smallest fitness
value from the inputs and the output yout individual with
612
Evolved Synthesis of Digital Circuits
xin xout
-1
yout
y , x > y in
= in in
Input strings (ind.)
x out
-1 x in , x in y in Individuals and fitness
N
y in , x in y in
y out =
x in , x in > y in
-1
EVALUATION
CROSSOVER
RNG
xin1 xout1
Inputs strings (parents)
xin2
xout2
yout
N/2
xin1 , y in < p
xout1 =
xin 2 , y in p
xin 2 , y in < p
xout 2 =
xin1 , y in p
yout = f(yin)
M xin xout
M
Inputs strings (parents)
M yout zout
N
Generation with new
individuals
Bzin ( xin ), yin n
B zin ( xout ) =
M B zin ( xin ), yin = n
y out = f ( yin )
z out = g ( z in )
Random
numbers
PSEUDO-RANDOM NUMBER
GENERATOR
613
Evolved Synthesis of Digital Circuits
f total = 0.9 f eval .corect + 0.1 f min im generic gates. Generic gate can implement a Boolean
elementary functions (AND, OR, XOR) and more
complex circuits like MUX. This solution increases
Three dynamic reconfigurable circuits are de- the size of the individuals and the complexity of the
signed and tested. All are based on hardware recon- reconfigurable circuit but is almost invariant with
figurable structure presented in figure 1. First schema, number of inputs and outputs.
min-max terms reconfigurable circuit, use the same The last reconfigurable circuit explores the larg-
principles as programmable logic array. The scheme is est solution space, beyond the bounds of traditional
composed by three layers: INV layer, AND layer and design methods.
OR layer. Genetic algorithm command connections The evolvable hardware is used in three applica-
between INV layer and AND layer. This reconfigurable tions. First, the target function is static and algorithm
circuit has the fast convergence speed and the individu- must find hardware solution to implement it. Each
als with the smallest size but explore only traditional individual represent here a potential solution for hard-
space solution and its size grow exponentially with ware implementation of target function. Evolution loop
inputs number and linear with outputs number. The is repeated until optimal solution is found. Hardware
second circuit is reconfigurable INV-AND-OR circuit. solution finding here is named evolved hardware. At
Like the first circuit, it has three layers: INV layer, AND this time evolution loop is stopped.
layer and OR layer. Genetic algorithm configure in this In the second application the target function is also
case connections between INV - AND layer and AND static. But here the individual codes only a sub circuit
OR layer. This schema reduces the increase of size one generic gate or one gates layer. The individuals
with number of outputs but remain exponential increase evolve and the offspring replace the parents in different
with number of inputs and the size of individuals is position and evaluation is done to entire circuit until the
bigger than first circuit. new solution is better than the old solution. Evolution
The last reconfigurable circuit is elementary func- loop is repeated until optimal solution is found.
tions reconfigurable circuit (e reconfigurable). It This solution is used to design circuit with big
contains more layers. Each layer contains a number of number of inputs, outputs and sub circuits.
x1 x2 x3
f (x )
f (x ) &
& | f( x )
| ^ & F1
^ ~ |
~ ^
~
f (x )
f (x ) &
& |
| ^
^ ~
~
f( x )
& F2
|
f (x ) ^
f (x ) & ~
& |
| ^
^ ~
~
c1 c2 c3
1 (e1 j ( x1 , x 2 , x 3 )) f1 j ( x1 , x 2 , x 3 ) 2 (e 2 j ( x1 , x 2 , x 3 )) f 2 j ( x1 , x 2 , x 3 ) 3 (e 3 j ( x 1 , x 2 , x 3 )) f 3 j ( x1 , x 2 , x 3 )
j=1 j=1 j=1
j : 1..c1 j : 1..c 2 j : 1..c3
614
Evolved Synthesis of Digital Circuits
Figure 5. Application schema: Finding optimal solution for target function implementation
E
H a r d w a r e g e n e tic a lg o r ith m
E va l . E va l . E va l . E va l . E va l . E va l . E va l . E va l .
In d iv. 1 In d iv. 2 In d iv. 3 In d iv. 4 In d iv. 5 In d iv. 6 In d iv. 7 In d iv. 8
C ir cu it C ir cu it C ir cu it C ir cu it C ir cu it C ir cu it C ir cu it C ir cu it
1 2 3 4 5 6 7 8
O p tim a l
so lu tio n
F in a l cir cu it
The last applications use dynamic target functions. We present here three architectures of reconfigurable
Here each individual represent a complete solution for circuits which can be dynamically programmed by same
circuit. Evolution loop here is in two steps. First step hardware genetic algorithm module. The structure was
is same like in the first application: loop until solution implemented on Xilinx FPGA Spartan 3.
is found. After the solution is found in an individual
named main individual the evolution continue for the
others individuals. The target of the second step in FUTURE TRENDS
evolution loop is to obtain different individuals rela-
tive to the main individual. When the target function There are more directions of research from this paper.
is changed, the evolution loop pass in first step and the First is design of reconfigurable circuit by using FPGA
individuals, with high degree of dispersion, evolve to primitives. The new generation of FPGA (Virtex5) al-
new solution. lows dynamically reconfiguration using primitives. In
this case the generic gate is replaced by physical cells
from FPGA. Another direction is implementation of
CONClUSIONS hybrid neuro-genetic structure. A hardware implementa-
tion neural network can be used to store the best solu-
In this paper we have presented the concept of the tions from genetic algorithm. This configuration can
evolvable hardware and show a practical implementa- be used to improve convergence of genetic algorithm.
tion of hardware genetic algorithm and reconfigurable The evolved hardware can be used to design analog
hardware structure. circuits. In this case, Boolean reconfigurable circuit
Hardware genetic algorithm increases the conver- can be replacing by analog reconfigurable circuit (like
gence speed to solutions which represent configuration Field Programmable Transistors Area).
for reconfigurable circuit.
It can be used for evolvable synthesis of digital
circuit in intrinsic evolvable hardware. The bit string REFERENCES
solutions which are giving by genetic algorithm can be
connections configuration for a dynamic reconfigurable Ali B., Almaini A. and Kalganova T., Evolutionary
hardware circuit. Algorithms and Their Use in the Design of Sequential
615
Evolved Synthesis of Digital Circuits
Logic Circuits, Springer Genetic Programming and 97-001, Dept. Computer Science and Engineering,
evolvable machines, vol.5, p. 11-29, Kluwer Academic University of Nebraska-Lincon, 4 July, 1997.
Publisher, 2004.
Shaaban N., Hasegawa S. and Suzuky A., Improve-
Bland I. M. and Megson G.M., Systolic Array Library ment of energy characteristics of CdZnTe semiconduc-
for Hardware Genetic Algorithms, Parallel, Emergent tors detectors, Genetic Programming and Evolvable
and Distributed Architectures Laboratory, Department Machines, vol.2. nr.3 289-299, Kluwer Academic
of Computer Science, University of Reading, 2001. Publisher, 2001.
Coello C. A., Van Veldhuizen D. A. and Lamont G. B., Sharabi S. and Sipper M., GP-Sumo: Using genetic
Evolutionary Algorithms for Solving Multi-Objec- programming to evolve sumobots, Springer. Genetic
tive Problems, Kluwer Academic Publishers, New Programming and Evolvable machines, vol. 7, p.211-
York, 2002. 230, Springer Science+Business Media, 2006 .
Goldberg D. E., Kargupta H., Horn J. and Cantu-Paz ThompsonA. and Layzell P., Analysis of unconventional
E., Critical deme size for serial and parallel genetic evolved electronics, Commun. ACM, 42(4), pp. 7179,
algorithms, IlliGAL, University of Illinois, Jan. 1999.
1995.
Yasunaga M., Kim J., Yoshihara I., Evolvable reason-
Iana G. V., Serban G., Angelescu P., Ionescu L. and ing hardware: its prototyping and performance evalu-
Mazare A., Aspects on sigma-delta modulators ation, Springer Genetic Programming and Evolv-
implementation in hardware structures, Advances in able machines, vol. 2, p. 211-230, Kluwer Academic
Intelligent Systems and Technologies, Proceedings Publishers, 2001.
ECIT2006. European Conference on Intelligent Sys-
Zhao S., Jiao L., Multi-objective evolutionary design
tems and Technologies, Iasi 2006.
and knowledge discovery of logic circuits based on an
Ionescu L., Serban G., Ionescu V., Anghelescu P. and adaptive genetic algorithm, Springer. Genetic Pro-
Iana G., Implementation of GAs n Reconfigurable gramming and Evolvable machines, vol.7, p.195-210,
Gates Network, Third European Conference on Springer Science+Business Media, 2006.
Intelligent Systems and Technologies ECIT 2004, ISBN
973-7994-78-7, 2004.
Koza J. R., Bennett III F. H., Hutchings J. L., Bade KEy TERMS
S. L., Keane M. A. and D. Andre, Evolving sotring
networks using genetic programming and the rapidly Evolvable Hardware: Reconfigurable circuit which
reconfigurable xilinx 6216 field programmable gate is programmed by evolved algorithm like GA. To ex-
array, in Proc. 31st Asilomar Conf. Signals, Systems, trinsic evolvable hardware evolved algorithm run to
and Comp., IEEE Press: New York, 1997. host station outside of the reconfigurable circuit (PC).
To intrinsic evolvable hardware evolved algorithm
Martin P., A hardware implementation of a genetic
run inside the same system with reconfigurable circuit
programming system using FPGAs and HandelC,
(even same chip).
Springer Genetic Programming and Evolvable Ma-
chines, vol. 2, nr.4, p.317-343, 2001 Genetic Algorithms (GA): A genetic algorithm
(or GA) is a search technique used in computing to
Miller J., Job D. and Vassiliev V., Principles in the
find true or approximate solutions to optimization and
evolutionary design of digital circuits Part 1,2,
search problems. Genetic algorithms are categorized as
Springer Genetic Programming and Evolvable ma-
a stochastic local search technique. Genetic algorithms
chines, vol. 1, p. 7-35, p. 259 288, Kluwer Academic
are a particular class of evolutionary algorithms that
Publishers, 2000.
use techniques inspired by evolutionary biology such
Scott D., Seth S. and A. Samal, A hardware engine as inheritance, mutation, selection, and crossover (also
for genetic algorithms, Technical Report UNL-CSE- called recombination).Individuals, with coding schema,
are initial random values. All individuals are, in the first
616
Evolved Synthesis of Digital Circuits
step of algorithm evaluated to get the fitness value. In Microstructure: Integration of structure in same
the next step, they are sorted by fitness and selected chip. Evolvable hardware microstructure is an intrinsic E
for genetic operators. The parents are the individuals evolvable hardware with all modules in same chip.
involved in genetic operators like crossover or muta-
Phenotype: Describe one of the traits of an indi-
tion. The offspring resulted are evaluated together
vidual that is measurable and that is expressed in only
with parents and the algorithm resume with the first
a subset of the individuals within that population. In
step. The loop is repeated until the solution is find or
evolvable hardware phenotype consist in the circuit
the number of generation reach the limit given by the
coded by an individual.
programmer.
Reconfigurable Circuit: Hardware structure con-
Genotype: Describe the genetic constitution of
sist in logical cell network which allow configuration
an individual, that is the specific allelic makeup of an
of the interconnections between cells
individual. In evolvable hardware it consist in a vector
of configuration bits.
HGA: Hardware genetic algorithm is a hardware
implementation of genetic algorithm. Hardware imple-
mentation increases the performance of the algorithm
by replacing serial software modules with parallel
hardware.
617
618
David Periscal
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolving Graphs for ANN Development and Simplification
of connection between two nodes (Alba, Aldana & The GP encoding for the solutions is tree-shaped, so
Troya, 1993). the user must specify which are the terminals (leaves E
In comparison with direct encoding, there are some of the tree) and the functions (nodes capable of hav-
indirect encoding methods. In these methods, only ing descendants) for being used by the evolutionary
some characteristics of the architecture are encoded algorithm in order to build complex expressions.
in the chromosome. These methods have several types The wide application of GP to various environments
of representation. and its consequent success are due to its capability
Firstly, the parametric representations represent the for being adapted to numerous different problems.
network as a group of parameters such as number of Although the main and more direct application is the
hidden layers, number of nodes for each layer, number generation of mathematical expressions (Rivero, Rabu-
of connections between two layers, etc (Harp, Samad & al, Dorado & Pazos, 2005), GP has been also used
Guha, 1989). Another non direct representation type is in other fields such as filter design (Rabual, Dorado,
based on a representation system that uses grammatical Puertas, Pazos, Santos & Rivero D., 2003), knowledge
rules (Kitano, 1990), shaped as production rules that extraction, image processing (Rivero, Rabual, Dorado
make a matrix that represents the network. & Pazos, 2004), etc.
Another type of encoding is the growing methods. In
this case, the genotype contains a group of instructions Model Overview
for building up the network (Nolfi & Parisi, 2002).
All of these methods evolve architectures, either This work will use a graph-based codification to rep-
alone (most commonly) or together with the weights. resent ANNs in the genotype. These graphs will not
The transfer function for every node of the architecture contain any cycles. Due to this type of codification the
is supposed to have been previously fixed by a human genetic operators had to be changed in order to be able
expert and is the same for all the nodes of the network to use the GP algorithm. The operators were changed
or, at least, all the nodes of the same layer. Only few in this way:
methods that also induce the evolution of the transfer
function have been developed (Hwang, Choi & Park, The creation algorithm must allow the creation
1997). of graphs. This means that, at the moment of the
creation of a nodes child, this algorithm must al-
low not only the creation of this node, but also a
ANN DEVELOPMENT WITH GENETIC link to an existing one in the same graph, without
PROGRAMMING making cycles inside the graph.
The crossover algorithm must allow the crossing
This section very briefly shows an example of how to of graphs. This algorithm works very similar to
develop ANNs using an AI tool, Genetic Programming the existing one for trees, i.e. a node is chosen
(GP), which performs an evolutionary algorithm, and on each individual to change the whole subgraph
how it can be applied to Data Mining tasks. it represents to the other individual. Special care
has to be taken with graphs, because before the
Genetic Programming crossover there may be links from outside this
subgraph to any nodes on it. In this case, after the
GP (Koza, 92) is based on the evolution of a given crossover these links are updated and changed to
population. Its working is similar to a GA. In this point to random nodes in the new subgraph.
population, every individual represents a solution for a The mutation algorithm has been changed too,
problem that is intended to be solved. The evolution is and also works very similar to the GP tree-based
achieved by means of the selection of the best individu- mutation algorithm. A node is chosen from the
als although the worst ones have also a little chance individual and its subgraph is deleted and replaced
of being selected and their mutual combination for with a new one. Before the mutation occurs, there
creating new solutions. After several generations, the may be nodes in the individual pointing to other
population is expected to contain some good solutions nodes in the subgraph. These links are updated
for the problem.
619
Evolving Graphs for ANN Development and Simplification
and made to point to random nodes in the new In order to be able to use GP to develop any kind of
subgraph. system, it is necessary to specify the set of operators
that will be in the tree. With them, the evolutionary
These algorithms must also follow two restrictions system must be able to build correct trees that repre-
in GP: typing and maximum height. The GP typing sent ANNs. An overview of the operators used can be
property (Montana, 1995) means that each node will seen on Table 1.
have a type and will also provide which type will have This table shows a summary of the operators that
each of its children. This property provides the ability can be used in the tree. This set of terminals and func-
of developing structures that follow a specific grammar. tions are used to build a tree that represents an ANN.
ANN
2-Neuron 3-Neuron
-1 %
2-Neuron -2.34
3-Neuron 2.1 -1.8
2-Input - 2
1-Input -2
0.67
2-Input +
4-Input
1-Input 2.1 -1
3-Input
x1 3.2
-2 2.1
x2
0.4
x3 1.3 0.67
-2.34
-1
-1.8
x4
1.1
620
Evolving Graphs for ANN Development and Simplification
Although these sets are not explained in the text, in Fig. Results and Comparison with Other
1 can be seen an example of how they can be used to Methods E
represent an ANN.
These operators are used to build GP trees. These Several experiments have been performed in order to
trees have to be evaluated, and, once the tree has been evaluate the system performance. The values taken
evaluated, the genotype turns into phenotype. In other for the parameters at these experiments were the fol-
words, it is converted into an ANN with its weights lowing:
already set (thus it does not need to be trained) and
therefore can be evaluated. The evolutionary process Population size: 1000 individuals.
demands the assignation of a fitness value to every Crossover rate: 95%.
genotype. Such value is the result of the evaluation Mutation probability: 4%.
of the network with the pattern set that represents the Selection algorithm: 2-individual tournament.
problem. This result is the Mean Square Error (MSE) Graph maximum height: 5.
of the difference between the network outputs and Maximum inputs for each neuron: 9.
the desired outputs. Nevertheless, this value has been Penalization value: 0.00001.
modified in order to induce the system to generate
simple networks. The modification has been made by To achieve these values, several experiments had to
adding a penalization value multiplied by the number be done in order to obtain values for these parameters
of neurons of the network. In such way, and given that that would return good results to all of the problems.
the evolutionary system has been designed in order to These problems are very different in complexity, so it
minimise an error value, when adding a fitness value, is expected that these parameters give good results to
a larger network would have a worse fitness value. many different problems.
Therefore, the existence of simple networks would In order to evaluate its performance, the system
be preferred as the penalization value that is added is presented here has been compared with other ANN
proportional to the number of neurons at the ANN. The generation and training methods.
calculus of the final fitness will be as follows: The method 5x2cv was used by Cant-Paz and
Kamath (1995) for the comparison of different ANN
fitness = MSE + N * P generation and training techniques based on EC tools.
This work presents as results the average precisions
where N is the number of neurons of the network and obtained in the 10 test results generated by this method.
P is the penalization value for such number. Such values are the basis for the comparison of the
technique described here with other well known ones,
Example of Applications described in detail by Cant-Paz and Kamath (1995).
Such work shows the average times needed to achieve
This technique has been used for solving problems the results. Not having the same processor that was used,
of different complexity taken from the UCI (Mertz & the computational effort needed for achieving the results
Murphy, 2002). All these problems are knowledge- can be estimated. This effort represents the number of
extraction problems from databases where, taking times that the pattern file was evaluated. The compu-
certain features as a basis, it is intended to perform a tational effort for every technique can be measured
prediction about another attribute of the database. A using the population size, the number of generations,
small description of the problems to be solved can be the number of times that the BP algorithm was applied,
seen at Table 2, along with other ANN parameters used etc. This calculation varies for every algorithm used.
later in this work. All the techniques that are compared with the work are
All these databases have been normalised between related to the use of evolutionary algorithms for ANN
0 and 1 and divided into two parts, taking the 70% of design. Five iterations of a 5-fold crossed validation
the data base for training and using the remaining 30% test were performed in all these techniques in order to
for performing tests. evaluate the accuracy of the networks. These techniques
are connectivity matrix, pruning, parameter search and
graph-rewriting grammar.
621
Evolving Graphs for ANN Development and Simplification
Table 2 shows a summary of the number of neurons 5 generations with no improvement or after 50 total
used by Cant-Paz and Kamath (1995) in order to solve generations.
the problems that were used with connectivity matrix The results obtained with these 4 methods are shown
and pruning techniques. The epoch number of the BP in Table 4. Every box of the table indicates 3 different
algorithm, when used, is also indicated here. values: precision value obtained by Cant-Paz and
Table 3 shows the parameter configuration used Kamath (1995) (left), computational effort needed for
by these techniques. The execution was stopped after obtaining such value with that technique (below) and
622
Evolving Graphs for ANN Development and Simplification
precision value obtained with the technique described the error given by the rest of the ANN development
here and related to the previously mentioned compu- systems used for the comparison. Only one technique E
tational effort value (right). (pruning) performs better that the one described here.
Watching this table, it is obvious that the results However, that technique still needs some work from
obtained with the method proposed here are, not the expert, to do the design of the initial network.
only similar to the ones presented by Cant-Paz and Most of the techniques used for the ANN devel-
Kamath (1995), but better in most of the cases. The opment are quite costly, due in some cases to the
reason of this lies in the fact that these methods need combination of training with architecture evolution.
a high computational load since training is necessary The technique described here is able to achieve good
for every case of network (individual) evaluation, results with a low computational cost and besides, the
which therefore turns to be time-consuming. During added advantage is that, not only the architecture and
the work described here, the procedures for design and the connectivity of the network are evolved, but also the
training are performed simultaneously, and therefore, network itself undergoes an optimization process.
the times needed for designing as well as for evaluating
the network are combined.
ACKNOWlEDGMENT
1. The graph evolution algorithm explained in this Alba E., Aldana J.F. & Troya J.M. (1993) Fully au-
work performs the evolution of the architec- tomatic ANN design: A genetic approach. Proc. Int.
tures. Workshop Artificial Neural Networks (IWANN93),
2. The GA takes those architectures and optimizes Lecture Notes in Computer Science. Berlin, Germany:
the weights of the connections. Springer-Verlag, 686, 399-404.
With this architecture, the evolution of ANNs can Cant-Paz E. & Kamath C. (2005) An Empirical Com-
be seen as a lamarckian strategy. parison of Combinatios of Evolutionary Algorithms and
Neural Networks for Classification Problems. IEEE
Transactions on systems, Man and Cybernetics Part
B: Cybernetics. 915-927.
CONClUSION
Greenwood G.W. (1997) Training partially recurrent
This work describes a technique in which an evolution- neural networks using evolutionary strategies. IEEE
ary algorithm is used to automatically develop ANNs. Trans. Speech Audio Processing, 5, 192-194.
This evolutionary algorithm performs graph evolution,
and it is based on the GP algorithm, although it had Harp S.A., Samad T. & Guha A. (1989) Toward the
to be modified in order to make it operate with graphs genetic synthesis of neural networks. Proc. 3rd Int.
instead of trees. Conf. Genetic Algorithms and Their Applications,
Results show that the networks returned by this al- J.D. Schafer, Ed. San Mateo, CA: Morgan Kaufmann.
gorithm give, in most of the cases, an error lower than 360-369.
623
Evolving Graphs for ANN Development and Simplification
Haykin, S. (1999). Neural Networks (2nd ed.). Engle- Rumelhart D.E., Hinton G.E. & Williams R.J. (1986)
wood Cliffs, NJ: Prentice Hall. Learning internal representations by error propaga-
tion. Parallel Distributed Processing: Explorations
Hwang M.W., Choi J.Y. & Park J. (1997) Evolutionary
in the Microstructures of Cognition. D. E. Rumelhart
projection neural networks. Proc. 1997 IEEE Int. Conf.
& J.L. McClelland, Eds. Cambridge, MA: MIT Press.
Evolutionary Computation, ICEC97. 667-671.
1, 318-362.
Jung-Hwan Kim, Sung-Soon Choi & Byung-Ro Moon
(2005) Normalization for neural network in genetic
search. Genetic and Evolutionary Computation Con-
KEy TERMS
ference. 1-10.
Kitano H. (1990) Designing neural networks using Artificial Neural Networks: Interconnected set
genetic algorithms with graph generation system. of many simple processing units, commonly called
Complex Systems, 4, 461-476. neurons, that use a mathematical model, that represents
an input/output relation,
Koza, J. R. (1992) Genetic Programming: On the Pro-
gramming of Computers by Means of Natural Selection. Back-Propagation Algorithm: Supervised learn-
Cambridge, MA: MIT Press. ing technique used by ANNs, that iteratively modifies
the weights of the connections of the network so the
Mertz C.J. & Murphy P.M. (2002). UCI repository of
error given by the network after the comparison of the
machine learning databases. http://www-old.ics.uci.
outputs with the desired one decreases.
edu/pub/machine-learning-databases.
Evolutionary Computation: Set of Artificial In-
Montana D.J. (1995) Strongly typed genetic program-
telligence techniques used in optimization problems,
ming. Evolutionary Computation, 3(2), 199-200.
which are inspired in biologic mechanisms such as
Nolfi S. & Parisi D. (2002) Evolution of Artificial natural evolution.
Neural Networks. Handbook of brain theory and neu-
Genetic Programming: Machine learning tech-
ral networks, Second Edition. Cambridge, MA: MIT
nique that uses an evolutionary algorithm in order to
Press. 418-421.
optimise the population of computer programs accord-
Rabual J.R., Dorado J., Puertas J., Pazos A., Santos ing to a fitness function which determines the capability
A. & Rivero D. (2003) Prediction and Modelling of a program for performing a given task.
of the Rainfall-Runoff Transformation of a Typical
Genotype: The representation of an individual on
Urban Basin using ANN and GP. Applied Artificial
an entire collection of genes which the crossover and
Intelligence.
mutation operators are applied to.
Rivero D., Rabual J.R., Dorado J. & Pazos A. (2004)
Phenotype: Expression of the properties coded by
Using Genetic Programming for Character Discrimi-
the individuals genotype.
nation in Damaged Documents. Applications of Evo-
lutionary Computing, EvoWorkshops2004: EvoBIO, Population: Pool of individuals exhibiting equal or
EvoCOMNET, EvoHOT, EvoIASP, EvoMUSART, similar genome structures, which allows the application
EvoSTOC. 349-358. of genetic operators.
Rivero D., Rabual J.R., Dorado J. & Pazos A. (2005) Search Space: Set of all possible situations of the
Time Series Forecast with Anticipation using Genetic problem that we want to solve could ever be in.
Programming. IWANN 2005. 968-975.
624
625
Bogdan Raducanu
Computer Vision Center, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Facial Expression Recognition for HCI Applications
the incoming warped frame and the current appearance in videos. Once a keyframe is detected, the temporal
of the face. Since the facial actions, encoded by the recognition scheme described in (Dornaika & Radu-
vector a, are highly correlated to the facial expres- canu, 2006) will be invoked on the detected keyframe.
sions, their time series representation can be utilized The proposed scheme has two advantages. First, the
for inferring the facial expression in videos. This will CPU time corresponding to the recognition part will
be explained in the sequel. be considerably reduced since only few keyframes are
considered. Second, since a keyframe and its neighbor
frames are characterizing the expression, the discrimi-
EFFICIENT FACIAL EXPRESSION nation performance of the recognition scheme will
DETECTION AND RECOGNITION be boosted. In our case, the keyframes are defined by
the frames where the facial actions change abruptly.
In (Dornaika & Raducanu, 2006), we have proposed Thus, a keyframe can be detected by looking for a
a facial expression recognition method that is based local positive maximum in the temporal derivatives
on the time-series representation of the tracked facial of the facial actions. To this end, two entities will be
actions a. An analysis-synthesis scheme based on computed from the sequence of facial actions a that
learned auto-regressive models was proposed. In this arrive in a sequential fashion: (i) the L1 norm ||a||1, and
paper, we introduce a process able to detect keyframes (ii) the temporal derivative given by:
626
Facial Expression Recognition for HCI Applications
6 EXPERIMENTAL RESULTS
=
a (i )
F
a 1
Dt =
t i =1 t (2)
Recognition results: We used a 300-frame video
sequence. For this sequence, we asked a subject to
In the above equation, we have used the fact that
display several expressions arbitrarily (see Figure
the facial actions are positive. Let W be the size of a
3). The middle of this figure shows the normalized
temporal segment defining the temporal granulometry
similarities associated with each universal expression
of the system. In other words, the system will detect
where the recognition is performed for every frame in
and recognize at most one expression every W frames.
the sequence. As can be seen, the temporal classifier
In practice, W belongs to [0.5s, 1s]. The whole scheme
(Dornaika & Raducanu, 2006) has correctly detected the
is depicted in Figure 1.
presence of the surprise, joy, and sadness expressions.
In this figure, we can see that the system has three
Note that the mixture of expressions at transition is
levels: the tracking level, the keyframe detection level,
normal since the recognition is performed in a frame-
and the recognition level. The tracker provides the facial
wise manner. The lower part of this figure shows the
actions for every frame. Whenever the current video
results of applying the proposed keyframe detection
segment size reaches W frames, the keyframe detection
scheme. On a 3.2 GHz PC, a non-optimized C code
is invoked to select a keyframe in the current segment
of the developed approach carries out the tracking and
if any. A given frame is considered as a keyframe if
recognition in about 60 ms.
it meets three conditions: (1) the corresponding Dt is a
Performance study: In order to quantify the recog-
positive local maximum (within the segment), (2) the
nition rate, we have used 35 test videos retrieved from
corresponding norm ||a||1 is greater than a predefined
the CMU database. Table 1 shows the confusion matrix
threshold, (3) its far from the previous keyframe by
associated with the 35 test videos featuring 7 persons.
at least W frames. Once a keyframe is found in the
As can be seen, although the recognition rate was good
current segment, the dynamical classifier described in
(80%), it is not equal to 100%. This can be explained
(Dornaika & Raducanu, 2006) will be invoked.
by the fact that the expression dynamics are highly
Figure 2 shows the results of applying the proposed
subject-dependent. Recall that the used auto-regressive
detection scheme on a 1600-frame sequence containing
models are built using data associated with one subject.
23 played expressions. Some images are shown in Figure
Notice that the human ceiling in correctly classifying
4. The solid curve corresponds to the norm ||a||1, the
facial expressions into the six basic emotions has been
dotted curve to the derivative Dt and the vertical bars
established at 91.7%.
correspond to the detected keyframes. In this example,
the value of W is set to 30 frames. As can be seen, out
of 1600 frames only 23 keyframes will be processed
by the expression classifier.
Table 1. Confusion matrix for the facial expression classifier associated with 35 test videos (CMU data). The
model is built using one unseen person
Surprise (7) Sadness (7) Joy (7) Disgust (7) Anger (7)
Surprise 7 0 0 0 0
Sadness 0 7 0 5 0
Joy 0 0 7 0 0
Disgust 0 0 0 2 2
Anger 0 0 0 0 5
627
Facial Expression Recognition for HCI Applications
Figure 3. Top: Four frames (50, 110, 150, and 250) associated with a 300-frame test sequence. Middle: The
similarity measure computed for each universal expression and for each non-neutral frame of the sequence-the
framewise recognition. Bottom: The recognition based on keyframe detection.
HUMAN-MACHINE INTERACTION to mention a few. In the sequel, we will show how our
proposed technique lends itself nicely to such applica-
Interpreting non-verbal face gestures is used in a wide tions. Without loss of generality, we use the AIBO robot
range of applications. An intelligent user-interface which has the advantage of being especially designed for
not only should interpret the face movements but also Human Computer Interaction. The input to the system
should interpret the users emotional state (Breazeal, is a video stream capturing the users face.
2002). Knowing the emotional state of the user makes The AIBO robot: AIBO is a biologically-inspired
machines communicate and interact with humans in a robot and is able to show its emotions through an ar-
natural way: intelligent entertaining systems for kids, ray of LEDs situated in the frontal part of the head. In
interactive computers, intelligent sensors, social robots, addition to the LEDs configuration, the robot response
628
Facial Expression Recognition for HCI Applications
Figure 4. Top: Some detected keyframes associated with the 1600-frame video. Middle: The recognized expres-
sion. Bottom: The corresponding robots response. F
contains some small head and body movements. From its CONClUSION
concept design, AIBOs affective states are triggered by
the Emotion Generator engine. This occurs as a response This paper described a view- and texture-independent
to its internal state representation, captured through approach to facial expression analysis and recognition.
multi-modal interaction (vision, audio and touch). For The paper presented two contributions. First, we pro-
instance, it can display the happiness feeling when it posed an efficient facial expression recognition scheme
detects a face (through the vision system) or it hears based on the detection of keyframes in videos. Second,
a voice. But it does not possess a built-in system for we applied the proposed method in a Human Computer
vision-based automatic facial-expression recognition. Interaction scenario, in which an AIBO robot is mir-
For this reason, with the scheme proposed in this paper roring the users recognized facial expression.
(see Section 3), we created an application for AIBO
whose purpose is to enable it with this capability.
This application is a very simple one, in which the ACKNOWlEDGMENT
robot is just imitating the expression of a human sub-
ject. Usually, the response of the robot occurs slightly This work has been partially supported by MCYT
after the apex of the human expression. The results Grant TIN2006-15308-C02, Ministerio de Educacin
of this application were recorded in a 2 minute video y Ciencia, Spain. Bogdan Raducanu is supported by
which can be downloaded from the following address: the Ramon y Cajal research program, Ministerio de
http://www.cvc.uab.es/~ bogdan/AIBO-emotions.avi. Educacin y Ciencia, Spain. The authors thank Dr.
In order to be able to display simultaneously in the Franck Davoine from CNRS, Compiegne, France, for
video the correspondence between subjects and robots providing the video sequence shown in Figure 4.
expressions, we put them side by side.
Figure 4 illustrates five detected keyframes from the
1600 frame video depicted in Figure 2. These are shown REFERENCES
in correspondence with the robots response. The middle
row shows the recognized expression. The bottom row Ahlberg, J. (2001). CANDIDE-3 An Updated Pa-
shows a snapshot of the robot head when it interacts rameterized Face. Technical Report LiTH-ISY-R-2326,
with the detected and recognized expression. Dept. of Electrical Engineering, Linkping University,
Sweden.
629
Facial Expression Recognition for HCI Applications
Bartlett, M., Littleworth, G., Lainscsek, C., Fasel I. & International Conference on Pattern Recognition, Vol.
Movellan, J. (2004). Machine Learning Methods for I, pp. 275-278, Hong-Kong.
Fully Automatic Recognition of Facial Expressions and
Tian, Y., Kanade T. & Cohn, J. (2001). Recognizing
Facial Actions. Proc. of IEEE Conference on Systems,
Action Units for Facial Expression Analysis. IEEE
Man and Cybernetics, Vol. I, The Hague, The Nether-
Transactions on Pattern Analysis and Machine Intel-
lands, pp.592-597.
ligence, Vol. 23, pp. 97-115.
Black, M.J. & Yacoob, Y. (1997). Recognizing facial
Yeasin M., Bullot, B. & Sharma, R. (2006). Recognition
expressions in image sequences using local parameter-
of Facial Expressions and Measurement of Levels of
ized models of image motion. International Journal of
Interest from Video. IEEE Transactions on Multimedia
Computer Vision, 25(1):23-48.
8(3):500-508.
Breazeal, C. & Scassellati, B. (2002). Robots that
Zhang, Y. & Ji, Q. (2005). Active and Dynamic Infor-
Imitate Humans. Trends in Cognitive Science, Vol. 6,
mation Fusion for Facial Expression Understanding
pp. 481-487.
from Image Sequences. IEEE Transactions on Pattern
Caamero, L. & Gaussier, P. (2005). Emotion Under- Analysis and Machine Intelligence, 27(5):699-714.
standing: Robots as Tools and Models. In Emotional
Development: Recent Research Advances, pp. 235-
258.
KEy TERMS
Cohen, I., Sebe, N., Garg, A., Chen, L. & Huang, T.
(2003). Facial Expression Recognition from Video 3D Deformable Model: A model which is able to
Sequences: Temporal and Static Modeling. Computer modify its shape while being acted upon by an external
Vision and Image Understanding, 91(1-2):160-187. influence. In consequence, the relative position of any
point on a deformable body can change.
Dornaika, F. & Davoine, F. (2006). On Appearance
Based Face and Facial Action Tracking. IEEE Trans- Active Appearance Models (AAM): Computer Vi-
actions on Circuits and Systems for Video Technology, sion algorithm for matching a statistical model of object
16(9):1107-1124. shape and appearance to a new image. The approach is
widely used for matching and tracking faces.
Dornaika, F. & Raducanu, B. (2006). Recognizing
Facial Expressions in Videos Using a Facial Action AIBO: One of several types of robotic pets de-
Analysis-Synthesis Scheme. Proc. of IEEE International signed and manufactured by Sony. Able to walk, see
Conference on Advanced Video and Signal Based its environment via camera, and recognize spoken
Surveillance, pp. N/A, Australia. commands, they are considered to be autonomous
robots, since they are able to learn and mature based
Fasel, B. & Luettin, J. (2003). Automatic Facial Ex-
on external stimuli from their owner or environment,
pression Analysis: A Survey. Pattern Recognition,
or from other AIBOs.
36(1):259-275.
Autoregressive Models: Group of linear prediction
Picard, R., Vyzas, E. & Healy, J. (2001) Toward Machine
formulas that attempt to predict the output of a system
Emotional Intelligence: Analysis of Affective Psycho-
based on the previous outputs and inputs.
logical State. IEEE Trasactions on Pattern Analysis
and Machine Intelligence, 23(10):1175-1191. Facial Expression Recognition System: Com-
puter-driven application for automatically identifying
Rabiner, L. (1989). A Tutorial on Hidden Markov Mod-
persons facial expression from a digital still or video
els and Selected Applications in Speech Recognition.
image. It does that by comparing selected facial features
Proceedings of IEEE, 77(2):257-286.
in the live image and a facial database.
Sung, J., Lee, S. & Kim, D. (2006). A Real-Time Facial
Hidden Markov Model (HMM): Statistical model
Expression Recognition Using the STAAM. Proc. of
in which the system being modeled is assumed to be
630
Facial Expression Recognition for HCI Applications
a Markov process with unknown parameters, and the Social Robot: An autonomous robot that interacts
challenge is to determine the hidden parameters from and communicates with humans by following the social F
the observable parameters. The extracted model param- rules attached to its role. This definition implies that a
eters can then be used to perform further analysis, for social robot has a physical embodiment. A consequence
example for pattern recognition applications. of the previous statements is that a robot that only
interacts and communicates with other robots would
HumanComputer Interaction (HCI): The study
not be considered to be a social robot.
of interaction between people (users) and computers.
It is an interdisciplinary subject, relating computer Wireframe Model: The representation of all sur-
science with many other fields of study and research faces of a three-dimensional object in outline form.
(Artificial Intelligence, Psychology, Computer Graph-
ics, Design).
631
632
Feature Selection
Noelia Snchez-Maroo
University of A Corua, Spain
Amparo Alonso-Betanzos
University of A Corua, Spain
INTRODUCTION BACKGROUND
Many scientific disciplines use modelling and simula- Feature extraction and feature selection are the main
tion processes and techniques in order to implement methods for reducing dimensionality. In feature ex-
non-linear mapping between the input and the output traction, the aim is to find a new set of r dimensions
variables for a given system under study. Any variable that are a combination of the n original ones. The best
that helps to solve the problem may be considered as known and most widely used unsupervised feature
input. Ideally, any classifier or regressor should be able extraction method is principal component analysis
to detect important features and discard irrelevant fea- (PCA); commonly used as supervised methods are
tures, and consequently, a pre-processing step to reduce linear discriminant analysis (LDA) and partial least
dimensionality should not be necessary. Nonetheless,
squares (PLS).
in many cases, reducing the dimensionality of a prob-
In feature selection, a subset of r relevant features
lem has certain advantages (Alpaydin, 2004; Guyon
is selected from a set n, whose remaining features will
& Elisseeff, 2003), as follows:
be ignored. As for the evaluation function used, FS ap-
proaches can be mainly classified as filter or wrapper
Performance improvement. The complexity of
models (Kohavi & John, 1997). Filter models rely on
most learning algorithms depends on the number
the general characteristics of the training data to select
of samples and features (curse of dimensionality).
features, whereas wrapper models require a predeter-
By reducing the number of features, dimension-
mined learning algorithm to identify the features to be
ality is also decreased, and this may save on
selected. Wrapper models tend to give better results,
computational resourcessuch as memory and
but when the number of features is large, filter models
timeand shorten training and testing times.
are usually chosen because of their computational ef-
Data compression. There is no need to retrieve
ficiency. In order to combine the advantages of both
and store a feature that is not required.
models, hybrid algorithms have recently been proposed
Data comprehension. Dimensionality reduction
(Guyon et al., 2006).
facilitates the comprehension and visualisation
of data.
Simplicity. Simpler models tend to be more robust
FEATURE SElECTION
when small datasets are used.
The advantages described in the Introduction section
There are two main methods for reducing dimen-
denote the importance of dimensionality reduction.
sionality: feature extraction and feature selection. In
Feature selection is also useful when the following
this chapter we propose a review of different feature
assumptions are made:
selection (FS) algorithms, including its main ap-
proaches: filter, wrapper and hybrid a filter/wrapper
There are inputs that are not required to obtain
combination.
the output.
There is a high correlation between some of the
input features.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Feature Selection
A feature selection algorithm (FSA) looks for an in the evaluation function. These approaches
optimal set of features, and consequently, a paradigm do not consider interactions between features, F
that describes the FSA is heuristic search. Since each i.e., a feature may not reduce error by itself, but
state of the search space is a subset of features, FSA improvement may be achieved by the features
can be characterised in terms of the following four link to another feature. Floating search (Pudil et
properties (Blum & Langley, 1997): al., 1994) solves this problem partially, in that
the number of features included and/or removed
The initial state. This can be the empty set of at each stage is not fixed. Another approach
features, the whole set or any random state. (Snchez et al., 2006) uses sensitivity indices
The search strategy. Although an exhaustive search (the importance of each feature is given in terms
leads to an optimal set of features, the associated of the variance) to guide a backward elimination
computational and time costs are high when the process, with several features discarded in one
number of features is high. Consequently, differ- step.
ent search strategies are used so as to identify a Random algorithms that employ randomness to
good set of features within a reasonable time. avoid local optimal solutions and enable tempo-
The evaluation function used to determine the rary transition to other states with poorer solutions.
quality of each set of features. The goodness of a Examples are simulated annealing and genetic
feature subset is dependent on measures. Accord- algorithms.
ing to the literature, the following measures have
been employed: information measures, distance The most popular FSA classification, which refers
measures, dependence measures, consistency to the evaluation function, considers the three (Blum
measures, and accuracy measures. & Langley, 1997) or last two (Kohavi & John, 1997)
The stop criterion. An end point needs to be es- groups, as follows:
tablished; for example, the process should finish
if the evaluation function has not improved after Embedded methods. The induction algorithm is
a new feature has been added/removed. simultaneously an FSA. Examples of this method
are decision trees, such as classification and regres-
In terms of search method complexity, there are sion trees (CART), and artificial neural networks
three main sub-groups (Salapa et al., 2007): (ANN).
Filter methods. Selection is carried out as a pre-
Exponential strategies involving an exhaustive processing step with no induction algorithm
search of all feasible solutions. Exhaustive search (Figure 1). The general characteristics of the
guarantees identification of an optimal feature sub- training data are used to select features (for
set but has a high computational cost. Examples example, distances between classes or statisti-
are the branch and bound algorithms. cal dependencies). This model is faster than the
Sequential strategies based on a local search for wrapper approach (described below) and results
solutions defined by the current solution state. in a better generalisation because it acts inde-
Sequential search does not guarantee an optimal pendently of the induction algorithm. However,
result, since the optimal solution could be in a it tends to select subsets with a high number of
region of the search space that is not searched. features (even all the features) and so a threshold
However, compared with exponential searching, is required to choose a subset.
sequential strategies have a considerably reduced Wrapper methods. Wrapper models use the
computational cost. The best known strategies induction algorithm to evaluate each subset of
are sequential forward selection and sequential features, i.e., the induction algorithm is part of the
backward selection (SFS and SBS, respectively). evaluation function in the wrapper model, which
SFS starts with an empty set of features and adds means this model is more precise than the filter
features one by one, while SBS begins with a full model. It also takes account of techniques, such as
set and removes features one by one. Features are cross-validation, that avoid over-fitting. However,
added or removed on the basis of improvements wrapper models are very time consuming, which
633
Feature Selection
reduced
set of features Induction
Feature selection Accuracy
Algorithm
restricts application with some datasets. Moreover, CUS. However, using both algorithms in domains with
although they may obtain good results with the a large number of features may be computationally
inherent induction algorithm, they may perform unfeasible. Consequently, search heuristics are used in
poorly with an alternative algorithm. different versions of the algorithm, resulting in good
but not necessarily optimal solutions.
Hybrid methods that combine filter and wrapper
methods have recently been attracting a great deal of RELIEF
attention in the FS literature (Liu & Motoda, 1998;
Guyon et al., 2006). Although the following sections The RELIEF algorithm (Kira & Rendell, 1992) esti-
of this chapter are mainly devoted to filter and wrap- mates the quality of attributes according to how well
per methods, a brief review of the most recent hybrid their values distinguish between instances that are
methods is also included. near to each other. For this purpose, given a randomly
selected instance, xs={x1s,x2s,,xns}, RELIEF searches
Filter Methods for its two nearest neighbours: one from the same class,
called nearest hit H, and the other from a different
A number of representative filter algorithms are de- class, called nearest miss M. It then updates the
scribed in the literature, such as 2-Statistic, information quality estimate for all the features, depending on the
gain, or correlation based feature selection (CFS). For values for xs, M, and H. RELIEF can deal with discrete
the sake of completeness, we will refer to two classical and continuous features but is limited to two-class
algorithms (FOCUS and RELIEF) and will describe problems. An extensionReliefFnot only deals with
very recently developed filter methods (FCBF and multiclass problems but is also more robust and capable
INTERACT). An exhaustive discussion of filter meth- of dealing with incomplete and noisy data. ReliefF was
ods is provided in Guyon et al. (2006)including of subsequently adapted for continuous class (regression)
methods such as Random Forests (RF), an ensemble problems, resulting in the RReliefF algorithm (Robnik-
of tree classifiers. Sikonja & Kononenko, 2003).
634
Feature Selection
This method was designed for high-dimensionality we briefly describe several feature subset selection
data and has been shown to be effective in removing algorithmsdeveloped in machine learningthat F
both irrelevant and redundant features. However, it are based on the wrapper approach. The literature is
fails to take into consideration the interaction between vast in this area and so we will just focus on the most
features. The INTERACT algorithm (Zhao & Liu, representative wrapper models.
2007) uses the same goodness measure, SU, but also An interesting study of the wrapper approach was
includes the consistency contribution (c-contribution). conducted by Kohavi & John (1997). Besides introduc-
It can thus handle feature interaction, and efficiently ing the notion of strong and weak feature relevance,
selects relevant features. these authors showed the results achieved by different
induction algorithms (ID3, C4.5, and nave Bayes) in
Wrapper Methods several search methods (best first, hill-climbing, etc.).
Aha & Bankert (1995) used a wrapper approach in
The idea of the wrapper approach is to select a fea- instance-based learning and proposed a new search
ture subset using a learning algorithm as part of the strategy that performs beam search using a kind of
evaluation function (Figure 2). Instead of using sub- backward elimination; that is, instead of starting with
set sufficiency, entropy or another explicitly defined an empty feature subset, the search randomly selects a
evaluation function, a kind of black box function is fixed number of feature subsets and starts with the best
used to guide the search. The evaluation function for among them. Caruana & Freitag (1994) developed a
each candidate feature subset returns an estimate of wrapper feature subset selection method for decision
the quality of the model that is induced by the learning tree induction, proposing bidirectional hill-climbing
algorithm. This can be rather time consuming, since, for the feature spaceas more effective than either
for each candidate feature subset evaluated during the forward or backward selection. Genetic algorithms
search, the target learning algorithm is usually ap- have been broadly adopted to perform the search for
plied several times (e.g., in the case of 10-fold cross the best subset of features in a wrapper way (Liu &
validation being used to estimate model quality). Here Motoda, 1998, Huang et al. 2007). The feature selection
Feature Evaluation
Set of Hypothesis
features
Induction Algorithm
635
Feature Selection
methods using support vector machines (SVMs) have not possible to include all of them and therefore we will
obtained satisfactory results (Weston et al., 2001). SVMs refer to some interesting ones. Some hybrid methods
are also combined with other techniques to implement involving SVMs are presented in Guyon et al. (2006),
feature selection (different approaches are described chapters 20 and 22. Shazzad & Park (2005) investigate
in Guyon et al., 2006). Kim et al. (2003) use artificial a fast hybrid method a fusion of Correlation-based
neural networks (ANNs) for customer prediction and Feature Selection, Support Vector Machine and Genetic
ELSA (Evolutionary Local Selection Algorithm) to Algorithm to determine an optimal feature set. A fea-
search for promising subsets of features. ture selection model based both on information theory
and statistical tests is presented by Sebban & Nock
Hybrid Methods (2002). Zhu et al. (2007) incorporates a filter ranking
method in a genetic algorithm to improve classification
Whereas the computational cost associated with the performance and accelerate the search process.
wrapper model makes it unfeasible when the num-
ber of features is high, when the filter model is used
its performance is less than satisfactory. The hybrid FUTURE TRENDS
model is a good combination of the two approaches
that overcomes these problems. Hybrid methods use a Feature selection is a huge topic that it is impossible
filter to generate a ranked list of features. On the basis to discuss in a short chapter. To pinpoint new topics in
of the order thus defined, nested subsets of features are this area we refer the reader to the suggestions given
generated and computed by a learning machine, i.e. by Guyon et al. (2006), summarised as follows:
following a wrapper approach (Guyon et al., 2006).
The main features of the hybrid model are depicted in Unsupervised variable selection. Although this
Figure 3. One of the first hybrid approaches proposed chapter has focused on supervised feature selec-
was that of Yuan et al., 1999. Since then, the hybrid tion, several authors have attempted to implement
model has focused the attention of the research com- feature selection for clustering applications (see,
munity and, by now, numerous hybrid models have for example, Dy & Brodley, 2004). For supervised
been developed to solve a variety of problems, such learning tasks, one may want to pre-filter a set
as intrusion detection, text categorisation, etc. of the most significant variables with respect to a
As a combination of filter and wrapper models, criterion which does not make use of y to minimise
there exist a great number of hybrid methods, so it is the problem of over-fitting.
Wrapper
induction algorithm
636
Feature Selection
Selection of examples. Mislabelled examples may Almuallim, H. & Dietterich, T. G (1991). Learning
induce a choice of wrong variables, so it may be with many irrelevant features. Proceedings of the 9th F
preferable to jointly select both variables and National Conference on Artificial Intelligence, 547-
examples. 552, AAAI Press.
System reverse engineering. This chapter focuses
Almuallim, H. & Dietterich, T. G. (1992) Efficient algo-
on the problem of selecting features useful to build
rithms for identifying relevant features. Proceedings of
a good predictor. Unravelling the causal dependen-
the 9th Canadian Conference on Artificial Intelligence,
cies between variables and reverse engineering
38-45, Vancouver.
the system that produced the data is a far more
challenging task that is beyond the scope of this Alpaydin, E. (2004). Introduction to Machine Learn-
chapter (but see, for example, Pearl, 2000). ing. MIT Press.
Blum, A. L. & Langley, P. (1997). Selection of relevant
features and examples in machine learning. Artificial
CONClUSION
Intelligence, (97) 1-2, 245-271.
Feature selection for classification and regression is a Caruana, R. & Freitag, D. (1994). Greedy attribute
major research topic in machine learning. It covers many selection. Proceedings of the Eleventh International
different fields, such as, for example, text categorisa- Conference on Machine Learning. Morgan Kaufmann
tion, intrusion detection, and micro-array data. This Publishers, Inc., 28-36.
study reviews key algorithms used for feature selection,
including filter, wrapper and hybrid approaches. The Dy, J. G. & Brodley, C. E. (2004). Feature Selection for
review is not exhaustive and is merely designed to give Unsupervised Learning. Journal of Machine Learning
an idea of the state of the art in the field. Most feature Research, (5), 845889.
selection algorithms lead to significant reductions in Guyon, I. & Elisseeff, A. (2003). An introduction to
the dimensionality of the data without sacrificing the variable and feature selection. Journal of Machine
performance of the resulting models. Choosing between Learning Research, (3), 1157-1182.
approaches depends on the problem in hand. Adopting
a filtering approach is computationally acceptable, Guyon, I., Gunn, S., Nikravesh, M. & Zadeh, L.A.
but the more complex wrapper approach tends to pro- (2006). Feature Extraction. Foundations and Applica-
duce greater accuracy in the final result. The filtering tions. Springer.
approach is very flexible, since any target learning Huang, J., Cai, Y. & Xu, X. (2007). A hybrid genetic
algorithm can be used. It is also faster than the wrap- algorithm for feature selection wrapper based on mu-
per approach. This latter, on the other hand, is more tual information. Patter recognition letters, (28) 13,
dependent on the learning algorithm; but the selection 1825-1844.
process is better. The hybrid approach offers promise
in terms of improving results in terms of classification Kim, Y., Street W. N. & Menczer, F. (2003). Feature
accuracy as well as in terms of the identification of selection in data mining. Data mining: opportunities
relevant attributes for the analysis. and challenges, 80-105. IGI Publishing.
Kira, K. & Rendell, L. (1992). The feature selection
problem: traditional methods and new algorithm. Proc.
REFERENCES AAAI92, San Jose, CA.
Aha, D.W., and Bankert, R. L. (1995). A comparative Kohavi, R. & John, G. (1997). Wrappers for feature
evaluation of sequential feature selection algorithms. subset selection. Artificial Intelligence, (97)1-2, 273-
Proceedings of the Fifth International Workshop on 324.
Artificial Intelligence and Statistics, 1-7. Springer- Liu, H. & Motoda, H. (1998). Feature extraction,
Verlag. construction and selection. A data mining perspective.
Kluwer Academic Publishers.
637
Feature Selection
Pearl, J. (2000). Casuality . Cambridge University SVMs. Advances in Neural Information Processing
Press. Systems, (13). MIT Press.
Pudil, P. and Novovicova, J. and Kittler, J. (1994). Zhao, Z. and Liu, H. (2007). Searching for interacting
Floating search methods in feature-selection. Pattern features. Proceedings of International Joint Conference
Recognition Letters, (15) 11, 1119-1125. on Artificial Intelligence, 1157-1161.
Robnik-Sikonja, M. & Kononenko, I. (2003). Theo- Zhu, Z., Ong, Y., Dash, M. (2007) WrapperFilter Fea-
retical and empirical analysis of ReliefF and RReliefF. ture Selection Algorithm Using a Memetic Framework.
Machine Learning, (53), 23-69, Kluwer Academic IEEE Transactions on Systems, Man and Cybernetics,
Publishers. Part B. (37) 1, 70-76.
Salappa, A., Doumpos, M. & Zopounidis, C. (2007).
Feature selection algorithms in classification problems:
an experimental evaluation. Optimization Methods and KEy TERMS
Software, (22) 1, 199 212.
Dimensionality Reduction: The process of reduc-
Snchez-Maroo, N., Caamao-Fernndez, M., Cas-
ing the number of features under consideration. The
tillo, E & Alonso-Betanzos, A.(2006). Functional
process can be classified in terms of feature selection
networks and analysis of variance for feature selection.
and feature extraction.
Proceedings of International Conference on Intel-
ligent Data Engineering and Automated Learning, Feature Extraction: A dimensionality reduction
1031-1038. method that finds a reduced set of features that are a
combination of the original ones.
Shazzad, K.M & Jong S.P. (2005). Optimization of
Intrusion Detection through Fast Hybrid Feature Feature Selection: A dimensionality reduction
Selection. International Conference on Parallel and method that consists of selecting a subset of relevant
Distributed Computing, Applications and Technolo- features from a complete set while ignoring the remain-
gies, 264 267. ing features.
Sebban, M., Nock, R. (2002). A hybrid filter/wrapper Filter Method: A feature selection method that
approach of feature selection using information theory. relies on the general characteristics of the training data
Patter recognition, (35)4:835-846. to select and discard features. Different measures can
be employed: distance between classes, entropy, etc.
Yu, L. and Liu, H. (2003). Feature selection for high-
dimensional data: A Fast Correlation-Based Filter Hybrid Method: A feature selection method that
Solution. Proceedings of The Twentieth International combines the advantages of wrappers and filters meth-
Conference on Machine Learning, 856-863. ods to deal with high dimensionality data.
Yuan, H., Tseng, S.S., Gangshan, S. and Fuyan, Z. Sequential Backward (Forward) Selection
(1999). Two-phase feature selection method using both (SBS/SFS): A search method that starts with all the
filter and wrapper. Proceedings of IEEE International features (an empty set of features) and removes (adds)
Conference on Systems, Man, and Cybernetics, (2) a single feature at each step with a view to improving
132136. -or minimally degrading- the cost function.
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Wrapper Method: A feature selection method that
Poggio, T. and Vapnik, V. (2001). Feature selection for uses a learning machine as a black box to score subsets
of features according to their predictive value.
638
639
The answer to the theoretical question: Can a machine Each element is a simple processor with internal and
be built capable of doing what the brain does? is yes, adjustable parameters. The interest in ANN is primar-
provided you specify in a finite and unambiguous way ily related to the finding of satisfactory solutions for
what the brain does. problems cast as function approximation tasks and
Warren S. McCulloch for which there is scarce or null knowledge about the
process itself, but a (limited) access to examples of
response. They have been widely and most fruitfully
INTRODUCTION used in a variety of applicationssee (Fiesler & Beale,
1997) for a comprehensive reviewespecially after the
The class of adaptive systems known as Artificial Neural boosting works of (Hopfield, 1982), (Rumelhart, Hinton
Networks (ANN) was motivated by the amazing parallel & Williams, 1986) and (Fukushima, 1980).
processing capabilities of biological brains (especially The most general form for an ANN is a labelled
the human brain). The main driving force was to re-cre- directed graph, where each of the nodes (called units
ate these abilities by constructing artificial models of or neurons) has a certain computing ability and is
the biological neuron. The power of biological neural connected to and from other nodes in the network
structures stems from the enormous number of highly via labelled edges. The edge label is a real number
interconnected simple units. The simplicity comes expressing the strength with which the two involved
from the fact that, once the complex electro-chemical units are connected. These labels are called weights.
processes are abstracted, the resulting computation The architecture of a network refers to the number of
turns out to be conceptually very simple. units, their arrangement and connectivity.
These artificial neurons have nowadays little in In its basic form, the computation of a unit i is
common with their biological counterpart in the ANN expressed as a function Fi of its input (the transfer
paradigm. Rather, they are primarily used as compu- function), parameterized with its weight vector or lo-
tational devices, clearly intended to problem solving: cal information. The whole system is thus a collection
optimization, function approximation, classification, of interconnected elements, and the transfer function
time-series prediction and others. In practice few ele- performed by a single one (i.e., the neuron model) is
ments are connected and their connectivity is low. This the most important fixed characteristic of the system.
chapter is focused to supervised feed-forward networks. There are two basic types of neuron models in the
The field has become so vast that a complete and clear- literature used in practice. Both express the overall
cut description of all the approaches is an enormous computation of the unit as the composition of two
undertaking; we refer the reader to (Fiesler & Beale, functions, as is classically done since the earlier model
1997) for a comprehensive exposition. proposal of McCulloch & Pitts (1943):
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Feed-Forward Artificial Neural Network Basics
Figure 1. A classification problem. Left: Separation by spherical RBF units (R-neurons). Right: Separation by
straight lines (P-neurons) in the MLP.
neurons of this type are arranged in a feed-forward out make them apparently quite opposite methods (see
architecture, the obtained neural network is called Fig. 1). Mathematically, under certain conditions, they
MultiLayer Perceptron (MLP) (Rumelhart, Hinton can be shown to be related (Dorffner, 1995). These
& Williams, 1986). Usually, a smooth non-linear and conditions (basically, that both input and weight vec-
monotonic function is used as activation. Among them, tors are normalized to unit norm) are difficult to fulfil
the sigmoids are a preferred choice. in practice.
The choice h(x,wi)= ||x-wi||/ (or other distance A layer is defined as a collection of independent units
measure), with >0R a smoothing term, plus an (not connected with one another) sharing the same input,
activation g with a monotonically decreasing response and of the same functional form (same Fi but different
from the origin, leads to the wide family of localized wi). Multilayer feed-forward networks take the form of
Radial Basis Function networks (RBF) (Poggio & directed acyclic graphs obtained by concatenation of
Girosi, 1989). Localized means that the units give a a number of layers. All the layers but the last (called
significant response only in a neighbourhood of their the output layer) are labelled as hidden. This kind of
centre wi. A Gaussian g(z)=exp(-z2/2) is a preferred networks (shown in Fig. 2) compute a parameterized
choice for the activation function. function Fw(x) of their input vector x by evaluating
The previous choices can be extended to take into the layers in order, giving as final outcome the output
account extra correlations between input variables. The of the last layer. The vector w represents the collection
inner product (containing no cross-product terms) can be of all the weights (free parameters) in the network.
generalized to a real quadratic form (an homogeneous For simplicity, we are not considering connections
polynomial of second degree with real coefficients) or between non-adjacent layers (skip-layer connections)
even further to higher degrees, leading to the so-called and assume otherwise total connectivity. The set of
higher-order units (or units). A higher-order unit input variables is not counted as a layer.
of degree k includes all possible cross-products of at Output neurons take the form of a scalar product (a
most k input variables, each with its own weight. Con- linear combination), eventually followed by an activa-
versely, basic Euclidean distances can be generalized tion function g. For example, assuming a single output
to completely weighted distance measures, where all neuron, a one-hidden-layer neural network with h hid-
the (quadratic) cross-products are included. These full den units computes a function F:RnR of the form:
expressions are not commonly used because of the high
numbers of free parameters they involve.
These two basic neuron models have traditionally h
640
Feed-Forward Artificial Neural Network Basics
Figure 2. A two-hidden-layer example of ANN, mapping a three-dimensional input space x=(x1,x2,x3) to a two-
dimensional output space (y1,y2)=Fw(x). The network has four and three units in the first and second hidden F
layers, respectively, and two output neurons. The vector w represents the collection of all the weights in the
network.
where R is an offset term (called the bias term), ciR Definition (P-neuron). A neuron model Fi of the
and g can be set as desired, including the choice g(z)=z. form:
Such a feed-forward network has dim(w)=(n+1)h+h+1
parameters to be adjusted. Fi(x)={g(wix+i),wiRn,iR} (3)
641
Feed-Forward Artificial Neural Network Basics
Figure 3. The logistic function l(z)=glog1.5(z) and its first derivative l(z)>0. This function is maximum at the
origin, corresponding to a medium activation at l(0)=0.5. This point acts as an initial neutral value around
a quasi-linear slope.
1
(0,1) (6) is the bipolar version of glog1(z)= 1+ exp(
1
z ) . These
1 + exp( ( z )) functions are chosen because of their simple analytic
behaviour, especially in what concerns differentia-
and the hyperbolic tangent: bility, of great importance for learning algorithms
relying in derivative information (Fletcher, 1980). In
particular,
642
Feed-Forward Artificial Neural Network Basics
The interest in sigmoid functions also relies in the in Definitions 2 and 3, which are collectively grouped
behaviour of their derivatives. Consider, for example, in the network parameters w. The first output is defined F
(6) with =1.5 and =0, plotted in Fig. (3). The deriva- as y(0)=x. For the last (output) layer, hc+1=m and the
tive of a sigmoid is always positive. For =0, all the Fc+1l,1 l hc+1 are P-neurons or linear units (obtained
functions are centred at z=0. In this point, the function by removing the activation function in a P-neuron). The
has a medium activation, and its derivative is maximum, final outcome for Fw(x) is the value of y(c+1).
allowing for maximum weight updates.
Definition (MLPNN). A MultiLayer Perceptron Neural
Types of Artificial Neural Networks Network is a FFNN (n,c,m) for which c1 and all the
Fl are P-neurons, 1 l c.
A fundamental distinction to categorize a neural network
relies on the kind of architecture, basically divided in Definition (RBFNN). A Radial Basis Function Neural
feed-forward (for which the graph contains no cycles) Network is a FFNN (n,c,m) for which c=1 and all the
and recurrent (the rest of situations). A very common Fc are R-neurons.
feed-forward architecture contains no intra-layer
connections and all possible inter-layer connections
between adjacent layers. LEARNING IN ARTIFICIAL NEURAL
NETWORKS
Definition (Feed-forward neural network: struc-
ture). A bipartitioned graph is a graph G whose nodes A system can be said to learn if its performance on a
V can be partitioned in two disjoint and proper sets V1 given task improves with respect to some measure as
and V2, V1V2=V, in such a way that no pair of nodes a result of experience (Rosenblatt, 1962). In ANNs the
in V1 is joined by an edge, and the same property holds experience is the result of exposure to a training set
for V2. We write then Gn1,n2, with n1=|V1|,n2=|V2|. A of data, accompanied with weight modifications. The
bipartitioned graph Gn1,n2 is complete if every node in main problem tackled in supervised learning is regres-
V1 is connected to every node in V2. These concepts sion, the approximation of an n-dimensional function
can be generalized to an arbitrary number of partitions, f: X RnRm by finite superposition (composition
as follows: A k-partitioned graph Gn1,...,nk is a graph and addition) of known parameterized base functions,
whose nodes V can be partitioned in k disjoint and like those in (3) or (4). Their combination gives rise
proper sets V1,...,Vk, such that to expressions of the form Fw(x). The interest is in
k
finding a parameter vector w* of size s such that Fw*(x)
i =1
Vi=V, optimizes a cost functional L (f, Fw) called the loss:
643
Feed-Forward Artificial Neural Network Basics
A common form for is an error function, as the conditional average of the target y=f(x) (which ex-
squared-error (a,b)=(a-b)2. This results from the presses the optimal network mapping), given by:
assumption that the noise follows a homocedastic
gaussian distribution with zero mean. When using
this error, the expression (11) can be viewed as the <y|x>= y p(y|x) dy
(squared) Euclidean norm in Rp of the p-dimensional
error vector e=(e1,...,ep), known as the sum-of-squares
error, with ei=yi-Fw(xi), as: The first term in the right hand side of (13) is the
(squared) bias and the second is the variance. The bias
measures the extent to which the average (over all D)
of Fw(x) differs from the desired target function <y|x>.
~
L (D,Fw) =
( x i ,y i ) D
(yi-Fw(xi))2 = e e = ||e||2 The variance measures the sensitivity of Fw(x) to the
particular choice of D. Too inflexible or simple models
will have a large bias, while too flexible or complex
(12) will have a large variance. These are complementary
quantities that have to be minimized simultaneously;
The usually reported quantity ||e||2/p is called mean both can be shown to decrease with increasing avail-
square error (MSE), and is a measure of the empirical ability of larger data sets D.
error (as opposed to the unknown true error). We shall The expressions in (13) are functions of an input
~ vector x. The average values for bias and variance
denote the error function E(w)= L (D, Fw). In a training
can be obtained by weighting with the corresponding
process, the network builds an internal representation
density p(x):
of the target function by finding ways to combine the
set of base functions {Fi(x)}i. The validity of a solu-
tion is mainly determined by an acceptably low and ED{(Fw(x)-<y|x>)2}p(x) dx
~ ~
balanced L (D,Fw) and L (Dout, Fw), for any Dout
X\D (where Dout is not used in the learning process) = (ED{(Fw(x)-<y|x>})2p(x) dx
to ensure that f has been correctly estimated from the
+ ED{(Fw(x)-ED{Fw(x)})2}p(x) dx
data. Network models too inflexible or simple or, on
the contrary, too flexible or complex will generalize
(14)
inadequately. This is reflected in the bias-variance
tradeoff: the expected loss for finite samples can be
Key conditions for acceptable performance on
decomposed in two opposing terms called error bias
novel data are given by a training set D as large and
and error variance (Geman, Bienenstock & Doursat,
representative as possible of the underlying distribu-
1992). The expectation for the sum-of-squares error
tion, and a set Dout of previously unseen data which
function, averaged over the complete ensemble of data
should not contain examples exceedingly different from
sets D is written as (Bishop, 1995):
those in D. An important consideration is the use of
a net with minimal complexity, given by the number
of free parameters (the number of components in w).
E(w) = ED{(Fw(x)-<y|x>)2}
This requirement can be realized in various ways. In
regularization theory, the solution is obtained from
= (ED{(Fw(x)-<y|x>})2+ED{(Fw(x)-
a variational principle including the loss and prior
ED{Fw(x)})2}
smoothness information, defining a smoothing func-
tional such that lower values correspond to smoother
(13)
functions. A solution of the approximation problem is
then given by minimization of the functional (Girosi,
where ED is the expectation operator taken over every
Jones & Poggio, 1993):
data set of the same size as D and <y|x> denotes the
644
Feed-Forward Artificial Neural Network Basics
645
Feed-Forward Artificial Neural Network Basics
646
647
INTRODUCTION mOTIVATION
Traditionally, the Evolutionary Computation (EC) tech- This chapter tries to establish the basis for the under-
niques, and more specifically the Genetic Algorithms standing of multimodality where, firstly, the characteri-
(GAs) (Goldberg & Wang, 1989), have proved to be sation of the multimodal problems will be attempted.
efficient when solving various problems; however, as a It would be also tried to offer a global view of some
possible lack, the GAs tend to provide a unique solution of the several approaches proposed for adapting the
for the problem on which they are applied. Some non classic functioning of the GAs to the search of multiple
global solutions discarded during the search of the best solutions. Lastly, the contributions of the authors will
one could be acceptable under certain circumstances. be also showed.
The majority of the problems at the real world involve
a search space with one or more global solutions and
multiple local solutions; this means that they are mul- BACKGROUND: CHARACTERIZATION
timodal problems (Harik, 1995) and therefore, if it is OF mULTImODAL PROBLEmS
desired to obtain multiple solutions by using GAs, it
would be necessary to modify their classic functioning The multimodal problems can be briefly defined as
outline for adapting them correctly to the multimodality those problems that have multiple global optimums
of such problems. or multiple local optimums.
For this type of problems, it is interesting to obtain
the greatest number of solutions due to several reasons;
on one hand, when there is not a total knowledge of the
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Finding Multiple Solutions with GA in Multimodal Problems
problem, the solution obtained might not be the best to locate multiple optimal solutions within a single
one as it can not be stated that no better solution could population.
be found at the search space that has not been explored In order to minimise the impact of homogenisation,
yet. On the other hand, although being certain that the or to tend that it may only affect later states of searching
best solution has been achieved, there might be other phase, several alternatives have been designed, based
equally fitted or slightly worst solutions that might be most of them on heuristics. One of the first alternatives
preferred due to different factors (easier application, for promoting the diversity was the applications of scal-
simpler interpretation, etc.) and therefore considered ing methods to the population in order to emphasize
globally better. the differences among the different individuals. Other
One of the most characteristic multimodal func- direct route for avoiding the diversity loss involves
tions used in lab problems are the Rastrigin function focusing on the elimination of duplicate partial high
(see Fig. 1) which offers an excellent graphical point fitness solutions (Bersano, 1997) (Langdon, 1996).
of view about multimodality means. Some other of the approaches tries to solve this
Providing multiple optimal (and valid) solutions problem by means of the dynamic variation of crossover
and not only the unique global solution is crucial in and mutation rates (Ursem, 2002). A higher amount of
multiple environments. Usually, it is very complex to mutations are done in order to increase the exploration
implement in the practice the best solution represents, through the search space, when diversity decreases; the
so it can offers multiple problems: computational cost mutations decrease and crossovers increase with the aim
too high, complex interpretation, of improving exploitation in optimal solution search
In these situations it turns out useful to have a when diversity increases. There are also proposals of
range of valid solutions between which that one could new genetic operators or variations of the actual ones.
choose that, still not being the best solution to the raised For example some of the crossover algorithms that
problem, offer a level of acceptable adjustment and be improve diversity and that should be highlighted are
simpler to implement, to understand, that the ideal BLX (Blend Crossover) (Eshelman & Schaffer, 1993),
global one. SBX (Simulated Binary Crossover) (Deb & Agrawal,
1995), PCX (Parent Centric Crossover) (Deb, Anand
& Joshi, 2002), CIXL2 (Confidence Interval Based
EVOLUTIONARy TECHNIQUES AND Crossover using L2 Norm) (Ortiz, Hervs & Garca,
mULTImODAL PROBLEmS 2005) or UNDX (Unimodal Normally Distributed
Crossover) (Ono & Kobayashi, 1999).
As it has been mentioned, the application of EC tech- Regarding replacement algorithms, schemes that
niques to the resolution of multimodal problems sets may keep population diversity have been also looked
out the difficulty that this type of techniques shows for. An example of this type of schemes is crowding
since they tend to solely provide the best of the found (DeJong, 1975)(Mengshoel & Goldberg, 1999). Here,
solutions and to discard possible local optimums that a newly created individual is compared to a randomly
might have been found throughout the search. Quite chosen subset of the population and the most closely
many modifications have been included in the traditional individual is selected for replacement. Crowding tech-
performance of the GA in order to achieve good results niques are inspired by Nature where similar members
with multimodal problems. in natural populations compete for limited resources.
A crucial aspect when obtaining multiple solu- Likewise, dissimilar individuals tend to occupy differ-
tions consists on keeping the diversity of the genetic ent niches and are unlikely to compete for the same
population, distributing as much as possible the genetic resource, so different solutions are provided.
individuals throughout the search space. Fitness sharing was firstly implemented by Gold-
berg & Richardson for being used on multimodal
functions (Goldberg & Richardson, 1999). The basic
CLASSICAL APPROACHES idea involves determining, from the fitness of each
solution, the maximum number of individuals that can
Nitching methods allow GAs to maintain a genetic remain around it, awarding the individuals that exploit
population of diverse individuals, so it is possible unique areas of the domain. The dynamic fitness shar-
648
Finding Multiple Solutions with GA in Multimodal Problems
ing (Miller & Shaw, 1995) with two components was Hybrid Two-Population Genetic
proposed in order to correct the dispersion of the final Algorithm F
distribution of the individuals into niches: the distance
function, which measures the overlapping of individu- Introduction
als, and the comparison function, which results 1 if
the individuals are identical and values closer to 0 To force a homogeneous search throughout the search
as much different they are. space, the approach proposed here is based on the
The clearing method (Petrowski, 1996) is quite addition of a new population (genetic pool) to a tra-
different from the previous ones, as the resources are ditional GA (secondary population). The genetic pool
not shared, but assigned to the best individuals, who will divide the search space into sub-regions. Every
will be then kept at every niche. one of the individuals of the genetic pool has its own
The main inconvenience of the techniques previ- fenced range for gene variation, so every one of these
ously described lies in the fact that they add new param- individuals would represent a specific sub-region within
eters that should be configured according the process the global search space. On the other hand, the group
of execution of GA. This process may be disturbed by of individual ranges in which any gene may have its
the interactions among those parameters (Ballester & value, is extended over the whole of those possible
Carter, 2003). values that a gene may have. Therefore, this genetic
pool would sample the whole of the search space.
It should be borne in mind that a traditional GA
OWN PROPOSALS performs its search considering only one sub-region
(the whole of the search space). Here the search space
Once detected the existing problems they should be will be divided into different subregions or intervals
resolved, or at least, minimized. With this goal, the Ar- according to the number of genetic individuals in the
tificial Neural Network and Adaptive System (RNASA) genetic pool.
group have developed two proposals that use EC tech- Since the individuals in the genetic pool have restric-
niques for this type of problems. Both proposals try tions in their viable gene values, one of these individuals
to find the final solution but keeping partial solutions would not be provided a valid solution. So, it is also
within the final population. used another population (the secondary population)
The main ideas of the two proposals, together with in addition to the genetic pool. Here, a classical GA
the problems used for the tests are explained at the would develop its individuals in an interactive fashion
following points. with those individuals of the genetic pool.
Unlike at genetic pool, the genes of individuals of
secondary population may adopt values throughout the
0 0
d/n
G11 G12 G13 GiN Ind1
d
G11 G12 G13 GiN Ind1
Valid Range for
Gene Value
Valid Range for
d/n G21 G22 G23 GiN Ind2 0 G21 G22 G23 GiN Ind2
Gene Value
2d/n d
(n-1)d/n 0
d
GP1 GP2 GP3 GPN IndP
d
GS1 GS2 GS3 GSN IndS
649
Finding Multiple Solutions with GA in Multimodal Problems
whole of the search space, so it would contribute the lower than the maximum range in which gene values
solutions, whereas the genetic pool would act as a sup- may vary. The structure of the individuals at genetic
port, keeping search space homogeneously explored. pool is shown at Fig.3.
The secondary population will provide the solutions As these individuals do not represent global solu-
(since its individuals are allowed to vary along all the tions to the problem that has to be solved, so their
search space range), whereas the genetic pool would fitness value will not be compulsory. It will reduce
act as a support, keeping search space homogeneously the complexity of the algorithm and, of course, it
explored. will increase the computational efficiency of the final
Next, both populations, which are graphically rep- implementation.
resented in Fig. 2, will be described in detail.
The Secondary Population
The Genetic Pool
The individuals of the secondary population are quite
As it has been previously mentioned, every one of the different for the previous. In this case, the genes of the
individuals at the genetic pool represents a sub-region individuals on the secondary population can take any
of the global search space. Therefore, they should have value throughout the whole space of possible solutions.
the same structure or gene sequence than when using This allows that all individuals on secondary popula-
a traditional GA. The difference lies in the range of tion are able to offer global solutions to the problem.
values that these genes might have. This is not possible in genetic pool because their genes
When offering a solution, traditional GA may have were restricted to different sub-ranges.
any valid value, whereas in the proposed GA, the range The evolution of the individuals at the genetic pool
of possible values is restricted. Total value range is will be carried out by a traditional GA rules. The main
divided into the same number of parts than individuals different lies in the operator crossover. In this case a
in genetic pool, so that a sub-range of values is allotted modified crossover will be used. Due to the information
to each individual. Those values that a given gene may is stored in isolated population, now the two parents
have will remain within its range for the whole of the who will produce the new offspring will not belong
performance of the proposed GA. to the same population. Hence, the genetic pool and
In addition to all that has been said, every individual secondary population are combined instead. In this
at the genetic pool will be in control of which are the way information of both populations will be merged
genes that correspond to the best found solution up to to produce the most fitted offspring.
then (meaning whether they belong to the best indi-
vidual at secondary population). This Boolean value The Crossover Operator
would be used to avoid the modification of those genes
that, in some given phase of performance, are the best As it was pointed before the crossover operator recom-
solution to the problem. bines the genetic material of the individuals of both
Furthermore, every one of the genes in an individual populations. This recombination involves a random
has an I value associated which indicates the relative individual from secondary population with a repre-
increment that would be applied to the gene during sentative of the genetic pool.
a mutation operation based only on increments and This representative will represent a potential solu-
solely applied to individuals of the genetic pool. It is tion offered by the genetic pool. As a unique individual
obvious that this incremental value should have to be can not verify this requirement, the representative will
650
Finding Multiple Solutions with GA in Multimodal Problems
be formed by a subset of genes of different individuals upper value for that gene (LIM_SUP_IND) and the total
on the genetic pool. Gathering information from dif- number of individuals in the genetic pool (IND_POOL)
ferent partial solutions will allow producing a valid as Fig. 5 summarize. In such way, first generations will
global solution. explore the search space briefly (a coarse-grain search)
Therefore, the value for every gene of the representa- and it is intended to do a more exhaustive route through
tive will be randomly chosen among all the individuals all the values that a given gene may have (a fine-grain
in the genetic pool. After a value is assigned to all the search) as the search process advance.
genes, this new individual represents not a partial,
unlike every one of the individuals separately, but a Genetic Algorithm with Division into
global solution. Species
Now, the crossover operator will be applied. This
crossover function will keep the secondary population Another proposed solution is an adaptation of the nitch-
diversity, so the offspring will contain values from the ing technique. This adaptation consists on the division
genetic pool. Therefore the genetic algorithm would of the genetic population into different and independent
be able to maintain multiple solutions in the same subspecies. In this case the criterion that determines the
population. The crossover operator does not change specie for a specific individual to concrete specie is done
the genetic pool because the last one only acts as an according to genotype similarities (similar genotypes
engine to keep the diversity will form isolated species). This classical concept has
This process is summarized in Fig 4. been provided with some improvements in order to,
not only decrease the number of iterations needed for
The Mutation Operator obtaining solutions, but also increase the number of
solutions kept within the genetic population.
Mutation operator increments the value of individual Several iterations of the GA were executed on
genes in the genetic pool. It introduces new information every species of the genetic population for speeding
in the genetic pool, so the representative can use it and up the convergence towards the solution that exists
finally, by means of the crossover operator, introduce near every species. The individuals generated during
it in secondary population. this execution having a genotype of a different species
It should be noted that the new value will have upper will be discarded.
limit, so when it is reached the new gene value will be The crossover operations between the species are
reset to the lower value. following applied similarly to what happens in biol-
When generations advance the increment amount is ogy. It origins, on one hand, the crossovers between
reduced, so the increment applied to the individuals in similar individuals are preferred (as it was done at the
the genetic pool will take lower values. The different previous step using GAs) and on the other, the cross-
increments between iterations are calculated taking in overs between different species are enabled, although
mind the lower value for a gene (LIM_INF_IND), the in a lesser rate.
651
Finding Multiple Solutions with GA in Multimodal Problems
IF (not Bi)
Gi = Gi + Ii
IF (Gi > LIM_SUP_GEN)
Gi = LIM_INF_GEN
( LIM _ SUP _ IND )( LIM _ INF _ IND )
Ii = Ii Delta Delta = IND _ POOL
ENDIF
ENDIF
The individuals generated after these crossovers the final user to decide which of them is the most suit-
could, either be incorporated to an already existing able in any particular case.
species or, if they analyse a new area of the search The final decision will depend on several factors, not
space, create themselves a new species. only the global error reached for a particular method.
Finally, the GA provides as much solutions as spe- Other factors also depend on the economic impact, the
cies remains actives over the search space. difficulty to implement it, the quality of the knowledge
provided for their analysis, and so on.
FUTURE TRENDS
REFERENCES
Since there are not any methods that provide the best
results in all the possible situations, new approaches Ballester, P.J., & Carter, J.N. (2003). Real-Parameter
would be developed. Genetic algorithm for Finding Multiple Optimal Solu-
New fitness functions would help to locate a great tions in Multimodel Optimizaton, Proceedings of Ge-
number of valid solutions within the search space. netic and Evolutionary Computation, pp. 706-717.
In the described approaches this functions remains Bersano-Beguey, T. (1997) Controlling Exploration,
constants over the method execution. Another option Diversity and Escaping from Local Optimal in GP.
would be allow dynamical fitness functions that vary Proceedings of Genetic Prograrnming. MIT Press.
along the execution stage. These kind of functions will Cambridge, MA.
try to adapt their output with the knowledge extracted
from the search space while the crossover and mutation Deb, K., & Agrawal, S. (1995). Simulated binary cross-
operators explore new arenas. over for continuous search space. Complex Systems
If different techniques offer acceptable solutions, 9(2), pp. 115-148. 1995.
other interesting approach an interesting point consists
Deb, K., Anand, A., & Joshi, D. (2002). A Compu-
on putting together. For example, this hybrid models
tationally Efficient Evolutionary Algorithm for Real
would integrate statistics methods (with a great math-
Parameter Optimization, KanGAL report: 2002003.
ematical background) with other heuristics.
DeJong, K.A. (1975). An Analysis of the Behaviour of
a Class of Genetic Adaptative Systems. Phd. Thesis,
CONCLUSION University of Michigan, Ann Arbor.
Eshelman, L.J., & Schaffer J.D. (1994). Real coded
This article shows an overview of the different methods
genetic algorithms and interval schemata. Foundations
related with evolutionary techniques used to address the
of Genetic Algorihtms (2), pp. 187-202.
problem of multimodality. This chapter showed several
approaches to provide, not only a global solution, but Goldberg, D.E., & Richardson J. (1987) Genetic algo-
multiple solutions to the same problem. It would help rithms with Sharing for Multimodal Function Optimi-
652
Finding Multiple Solutions with GA in Multimodal Problems
653
654
Domonkos Tikk
Budapest University of Technology and Economics, Hungary
INTRODUCTION on the market on the usage of free text and text mining
operations, since information is often stored as free
Current databases are able to store several Tbytes of text. Typical application areas are, e.g., text analysis
free-text documents. The main purpose of a database in medical systems, analysis of customer feedbacks,
from the users viewpoint is the efficient informa- and bibliographic databases. In these cases, a simple
tion retrieval. In the case of textual data, information character-level string matching would retrieve only
retrieval mostly concerns the selection and the rank- a fraction of related documents, thus an FST engine
ing of documents. The selection criteria can contain is required that can identify the semantic similarities
elements that apply to the content or the grammar of between terms.
the language. In the traditional database management There are several alternatives for implementing an
systems (DBMS), text manipulation is restricted to FTS engine. In some DBMS products, such as Oracle,
the usual string manipulation facilities, i.e. the exact Microsoft SQLServer, Postgres, and mySQL, a built-
matching of substrings. Although the new SQL1999 in FTS engine module is implemented. Some other
standard enables the usage of more powerful regular DBMS vendors extended the DBMS configuration with
expressions, this traditional approach has some major a DBMS-independent FTS engine. In this segment the
drawbacks. The traditional string-level operations are main vendors are: SPSS LexiQuest (SPSS, 2007), SAS
very costly for large documents as they work without Text Miner (SAS, 2007), dtSearch (dtSearch, 2007),
task-oriented index structures. and Statistica Text Miner (Statsoft, 2007).
The required full-text management operations be- The market of FTS engines is very promising since
long to text mining, an interdisciplinary field of natural the amount of textual information stored in databases
language processing and data mining. As the traditional rises steadily. According to the study of Meryll Lynch
DBMS engine is inefficient for these operations, data- (Blumberg & Arte, 2003), 85% of business information
base management systems are usually extended with are text documents e-mails, business and research
a special full-text search (FTS) engine module. We reports, memos, presentations, advertisements, news,
present here the particular solution of Oracle; there for etc. and their proportion still increases. In 2006,
making the full-text querying more efficient, a special there were more than 20 billion documents available
engine was developed that performs the preparation on the Internet (Chang, 2006). The estimated size of
of full-text queries and provides a set of language and the pool increases to 550 billion documents when the
semantic specific query operators. documents of the hidden (or deep) web which are e.g.
dynamically generated ones are also considered.
BACKGROUND
TEXT mINING
Traditional DBMS engines are not adequate to meet
the users requirements on the management of free-text The subfield of document management that aims at
data as they handles the whole text field as an atom processing, searching, and analyzing text documents
(Codd, 1985). A special extension to the DBMS en- is text mining. The goal of text mining is to discover
gine is needed for the efficient implementation of text the non-trivial or hidden characteristics of individual
manipulating operations. There is a significant demand documents or document collections. Text mining is an
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Full-Text Search Engines for Databases
Clustering Summarization
Decision
Support
Knowledge
application oriented interdisciplinary field of machine model is built based on the content of a set of sample
learning which exploits tools and resources from com- documents, which model is then used to classify unseen
putational linguistics, natural language processing, documents. Typical application examples of TC include
information retrieval, and data mining. among many others:
The general application schema of text mining is
depicted in Figure 1 (Fan, Wallace, Rich & Zhang, document filtering such as e.g. spam filtering,
2006). For giving a brief summary of text mining, four or newsfeed (Lewis, 1995);
main areas are presented here: information extraction, patent document routing determination of ex-
text categorization/classification, document clustering, perts in the given fields (Larkey, 1999);
and summarization. assisted categorization helping domain experts
in manual categorization with valuable sugges-
Information Extraction tions (Tikk et al, 2007),
automatic metadata generation (Liddy et al,
The goal of information extraction (IE) is to collect the 2002),
text fragments (facts, places, people, etc.) from docu-
ments relevant to the given application. The extracted Document Clustering
information can be stored in structured databases. IE
is typically applied in such processes where statistics, Document clustering (DC) methods group elements
analyses, summaries, etc. should be retrieved from of a document collection based on their similarity.
texts. IE includes the following subtasks: Here again, documents are usually clustered based on
their content. Depending on the nature of the results,
named entity recognition recognition of specified one can have partitioning and hierarchical clustering
types of entities in free text, see e.g. Borthwick, methods. In the former case, there is no explicit relation
1999; Sibanda & Uzuner, 2006, among the clusters, while in the latter case a hierarchy
co-reference resolution identification of text of clusters is created. DC is applied for e.g.:
fragments referring to the same entity, see e.g.
Ponzetto & Strube, 2006, clustering the results of (internet) search for
identification of roles and their relations deter- helping users in locating information (Zamir et
mination of roles defined in event templates, see al, 1997),
e.g. Ruppenhofer et al, 2006. improving the speed of vector space based infor-
mation retrieval (Manning et al, 2007),
Text Categorization providing a navigation tool when browsing a
document collection (Kki, 2005).
Text categorization (TC) techniques aim at sorting
documents into a given category system (see Sebastiani,
2002 for a good survey). In TC, usually, a classifier
655
Full-Text Search Engines for Databases
Indices
optimization
ranking Output
executor
keyword: exact matching; Oracle Text supports three methods for document
AND, OR, NOT : Boolean operators; partition (categorization & clustering). The manual cat-
NEAR (keyword1, keyword2): the keywords egorization allows the user to enter keyword-category
should occur at near positions in the document; pairs. The automatic categorization works if a training
BT(keyword): generalization of the keyword; set of document-category pairs is given. The cluster-
657
Full-Text Search Engines for Databases
658
Full-Text Search Engines for Databases
Codd, E.F (1985). Is Your DBMS Really Rational, Manning, Ch. D., Raghavan, P., & Schtze, H. (2007).
(Codds 12 rules), Computerworld Magazine Introduction to Information Retrieval. Cambridge F
University Press.
Curtmola, E., Amer-Yaiha, S., Brown, P. & Fernandez,
M. (2005). GalaTex: A Conformant Implementation Microsoft (2007). SQL Server Full Text Search Engine,
of the Xquery Full-Text Language, Proc. of WWW http://technet.microsoft.com/en-us/library/ms345119.
2005.(pp. 10241025), Chiba, Japan. aspx
dtSearch (2007), Text Retrieval / Full Text Search Oracle Text (2007). Oracle Text Product Description,
Engine, http://www.dtsearch.com homepage: http://www.oracle.com/technology/prod-
ucts/text/index.html
Eastman, C. & Jansen, B (2003). Coverage, Relevance,
and Ranking: The Impact of Query Operators on Web Ponzetto, S. P., & Strube, M. (2006). Exploiting
Search Engine Results, ACM Transactions on Informa- semantic role labeling, WordNet and Wikipedia for
tion Systems, 21, (4), 383411. coreference resolution. In Proc. of HLT-NAACL, Hu-
man Language Technology Conf. of the NAACL (pp.
Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006).
192199), New York, USA.
Tapping the power of text mining. Communications
of the ACM, 49 (9), 7682. Radev, D., Blair-Goldensohn, S., & Zhang, Z. (2001).
Experiments in single and multi-document summari-
M. K. Ganapathiraju (2002). Relevance of cluster size
zation using MEAD. In Proc. of DUC-01, Document
in MMR based summarizer. Technical Report 11-742,
Understanding Conf., Workshop on Text Summariza-
Language Technologies Institute, Carnegie Mellon
tion, New Orleans, USA.
University, Pittsburgh, USA.
Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L.,
Kki, M. (2005). Findex: search result categories help
Johnson, Ch. R., & Scheffczyk, J. (2006). FrameNet II:
users when document ranking fails. In CHI -05: Proc. of
Extended Theory and Practice. International Computer
the SIGCHI conference on Human factors in computing
Science Institute, Berkeley, USA.
systems (pp. 131140), Portland, OR, USA.
SAS (2007). SAS Text Miner, http://www.sas.com/
Larkey, L. S. (1999). A patent search and classifica-
technologies/analytics/datamining/textminer/
tion system. In Proc. of DL-99, 4th ACM Conference
on Digital Libraries (pp. 179187), Berkeley, CA, Sebastiani, F. (2002). Machine learning in automated
USA. text categorization. ACM Computing Surveys, 34(1),
147.
Lewis, D. D. (1995). The TREC-4 filtering track:
description and analysis. In Proc. of TREC-4, 4th Text Sibanda, T., & Uzuner, . (2006). Role of local con-
Retrieval Conference, (pp. 165180), Gaithersburg, text in automatic deidentification of ungrammatical,
MD, USA. fragmented text. In Proc. of HLT-NAACL, Human
Language Technology Conf. of the NAACL (pp. 6573),
Liddy, E.D., Sutton, S., Allen, E., Harwell, S., Co-
New York, USA.
rieri, S., Yilmazel, O., Ozgencil, N.E., Diekema, A.,
McCracken, N., & Silverstein, J. (2002). Automatic Silvestri, F., Orlando, S & Perego, R. (2004). WINGS:
metadata generation and evaluation. In Proc. of ACM A Parallel Indexer for Web Contents, Lecture Notes in
SIGIR (pp. 401402), Tampere, Finland. Computer Science, 3036, pp. 263-270.
Luu, T., Klemm, F., Podnar, I., Rajman, M. & Aberer, SPSS (2007). Predictive Text Analysis, http://www.
K. (2006). ALVIS Peers: A Scalable Full-text Peer-to- spss.com/predictive_text_analytics/
Peer Retrieval Engine, Proc. of ACM P2PIR06 (pp.
Statsoft (2007). STATISTICA Text Miner, http://www.
4148), Arlington, VA, USA.
statsoft.com
Maier, A.; Simmen, D. (2001). DB2 Optimization in
Tikk, D., Bir, Gy., & Trcsvri, A. (2007). A hierar-
Support of Full Text Search, Bulletin of IEEE on Data
chical online classifier for patent categorization. In do
Engineering.
659
Full-Text Search Engines for Databases
Prado, H. A. & Ferneda, E., editors, Emerging Tech- Query Refinement Engine: A component of the
nologies of Text Mining: Techniques and Applications. FTS engine that generates new refined queries to the
Idea Group Inc. (in press). initial query in order to improve the efficiency of the
retrieval. The refined queries can be generated using
Zamir, O., Etzioni, O., Madani, O., & Karp, R. M.
the users response or some typical patterns in the
(1997). Fast and intuitive clustering of web documents.
query history.
In Proc. of SIGKDD-97, 3rd Int. Conf. on Knowledge
Discovery and Data Mining (pp. 287290), Newport Ranking Engine: A module within the FTS engine
Beach, USA. that ranks the documents of the result set based on their
relevance to the query.
Zobel, J. & Moffat, A. (2006). Inverted Files for Text
Search Engines, ACM Computing Surveys, 38(2), Sectioner: A component of the FTS engine, which
Article 6. breaks the text into larger units called sections. The
types of extracted sections are usually determined by
the document type.
KEy TERmS Stemmer: It is a language-dependent module that
determines the stem form of a given word. The stem
Full-Text Search (FTS) Engine: A module within form is usually identical to the morphological root. It
a database management system that supports efficient requires a language dictionary.
search in free texts. The main operations supported by
Thesaurus: A special repository of terms, which
the FTS engine are the exact matching, position-based
contains not only the words themselves but the similar-
matching, similarity-based matching, grammar-based
ity, the generalization and specialization relationships.
matching and semantic-based matching.
It describes the context of a word but it does not give
Fuzzy Matching: A special type of matching where an explicit definition for the word.
the similarity of two terms are calculated as the cost
Word-Braker: A component of the full-text engine
of the transformation from one into the other. The
whose function is to break the text into words and
most widely used cost calculation method is the edit
phrases.
distance method.
Indexer: It builds one or more indices for the speed
up information retrieval from free text. These indices
usually contain the following information: terms
(words), occurrence of the terms, format attributes.
Inverted Index: An index structure where every
key value (term) is associated with a list of objects
identifiers (representing documents). The list contains
objects that include the given key value.
660
661
Amaury Lendasse
Helsinki University of Technology, Finland
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Functional Dimension Reduction for Chemometrics
models, such as Partial Least Squares (Hrdle, Liang, require that q is smaller than the number of points in
& Gao, 2000) have been widely used for the prediction the spectra.
task. However, it has been shown that the additivity Figure 1 presents a graph of the overall prediction
assumption is not always true and environmental condi- method. Gaussian fitting is used for the approximation
tions may further introduce more non-linearity to the of X. The obtained vectors are further scaled by a
data (Wlfert, Kok, & Smilde, 1998). We therefore diagonal matrix A before the final LS-SVM modeling.
propose that in order to address a general prediction The following sections explain these steps in greater
problem, a non-linear method should be used. LS-SVM detail.
is a relatively fast and reliable non-linear model which
has been applied to chemometrics as well (Chauchard, Gaussian Fitting: Approximating
Cogdill, Roussel, Roger, & Bellon-Maurel, 2004). Spectral Function X
662
Functional Dimension Reduction for Chemometrics
where
b Figure 2. Above: NIR absorption spectra. Below: 13
& i , j = Ji ( x)J j ( x)dx
a . optimized basis functions
663
Functional Dimension Reduction for Chemometrics
related to prediction of fat content in meat samples using Brabanter, De Moor, & Vandewalle, 2002). The qua-
NIR absorption spectra (Krn et al., 2007; Rossi et al., dratic optimization problem of SVM is simplified so
2006; Thodberg, 1996). It can be seen that the basis has that it reduces into a linear set of equations. Moreover,
adapted to the data: there are narrow functions in the regression SVM usually involves three unknown param-
center where there is more variance in the data. eters while LS-SVM has only two; the regularization
parameter and the width parameter .
N m
Variable Scaling Given a set of N training examples (xi , yi )i=1
the LS-SVM model is y = w Y ( x) + b , where
T
Variable scaling can be seen as a generalization of Y : m n is a mapping from the input space onto
variable selection; in variable selection variables are a higher dimensional hidden space, w n is a weight
either included in the training set (corresponding to vector and b is a bias term. The optimization problem
multiplication by 1) or excluded from it (corresponding is formulated as
to multiplication by 0), while in variable scaling the
entire range [0,1] of scalars is allowed. In this article, 1 1 N
w + G ei2
2
we present a method for choosing the scaling using Min J (w , b) =
w ,b 2 2 i=1
Delta Test (DT) (Lendasse, Corona, Hao, Reyhani, &
so that yi = w T Y ( xi ) + b + ei ,
Verleysen, 2006).
The scalars are generated by iterative Forward-
Backward Selection (FBS) (Benoudjit et al., 2004; where ei is the prediction error and 0 is a regular-
Rossi et al., 2006). FBS is usually used for variable ization parameter. The dual problem is derived using
selection, but it can be extended to scaling as well; Lagrangian multipliers which lead into a linear KKT
Instead of turning scalars from 0 to 1 or vice versa, system that is easy to solve (Suykens et al., 2002).
increases by 1/h (in the case of forward selection) or Using the dual solution, the original model can be
decreases by 1/h (in the case of backward selection) are reformatted as
allowed. Integer h is a constant grid parameter. Start- N
ing from an initial scaling, the FBS algorithm changes y (x) = A i K (x, x i ) + b
the each of the scalars by 1/h and accepts the change i =1 ,
that resulted in the best improvement. The process in
repeated until no improvement is found. The process where the kernel K( x, x i ) = Y ( x ) Y ( x i ) is a continuous
is initialized with several sets of random scalars. and symmetric mapping from m m to and i are
DT is a method for estimating the variance of the the Lagrange multipliers. A widely-used choice for the K
2 2
noise within a data set. Having a set of general input- is the standard Gaussian kernel K (x1 , x 2 ) = e x1 x 2 2 Q .
output pairs (xi , yi )iN=1 m and denoting the nearest The LS-SVM prediction is the final step in the pro-
neighbor of xi by xNN(i), the DT variance estimate is posed method where spectral data is compressed by the
Gaussian fitting and the fitting weights are normalized
1 N
2 and scaled before the prediction. More elaborate discus-
D =
2N
y
i =1
NN ( i ) yi
, sion and applications to real-world data are presented
in Krn et al. (2007).
where yNN(i) is the output of xNN(i). Thus, is equivalent
to the residual (i.e. prediction error) of a first-near-
est-neighbor model. DT is useful in evaluation of FUTURE TRENDS
dependence of random variables and therefore it can
be used for scaling: The set of scalars that give the The only unknown parameter in the proposed method
smallest is selected. is the number of basis functions which is selected by
validation. In future other methods for determining
LS-SVm good basis size should be developed in order to speed
up the process. Moreover, the methodology should
LS-SVM is a least square modification of the Support be tested with various data sets, including other than
Vector Machine (SVM) (Suykens, Van Gestel, De
664
Functional Dimension Reduction for Chemometrics
spectral data. The LS-SVM predictor could be also Chauchard, F., Cogdill, R., Roussel, S., Roger, J. M.,
replaced with another model. & Bellon-Maurel, V. (2004). Application of LS-SVM F
Although the proposed Gaussian fitting combined to non-linear phenomena in NIR spectroscopy: de-
with LS-SVM model seems to be fairly robust, the velopment of a robust and portable sensor for acidity
relation between the basis functions and the prediction prediction in grapes. Chemometrics and Intelligent
performance should be studied in detail. It would be Laboratory Systems 71, 141150.
desirable to optimize the basis directly for best possible
Haykin, S. (1999). Neural Networks: A Comprehensive
prediction performance (instead of good data fitting),
Foundation (2nd ed.). Prentice Hall.
although it seems difficult due to over-fitting and high
computational costs. Hrdle, W., Liang, H., & Gao, J. T. (2000). Partially
Linear Models. Physica-Verlag.
Jones, A. J. (2004). New tools in non-linear modeling
CONCLUSION
and prediction. Computational Management Science
1, 109149.
This article deals with the problem of finding a good set
of basis functions for dimension reduction of spectral Krn, T., & Lendasse, A. (2007). Gaussian Fit-
data. We have proposed a method based on Gaussian ting Based FDA for Chemometrics. F. Sando-
basis functions where the locations and the widths of val, A. Prieto, J. Cabestany, M. Graa (editors),
the functions are optimized to fit the data as accurately 9th International Work-Conference on Artificial Neural
as possible. The basis indeed tends to follow the nature Networks (IWANN2007), Lecture Notes in Computer
of the data and provides a good tool for dimension Science 4507, 186193.
reduction. Other methods, such as the proposed DT
scaling, will benefit from the smaller data dimension Lendasse, A., Corona, F., Hao, J., Reyhani, N., &
and help to achieve even better data compression. The Verleysen, M. (2006). Determination of the Mahala-
LS-SVM model is a robust and fast method to be used nobis matrix using nonparametric noise estimations.
in the final prediction. M. Verleysen (editor), 14th European Symposium on
Artificial Neural Networks (ESANN 2006), d-side
publi., 227232.
REFERENCES Ramsay, J., & Silverman, B. (1997). Functional Data
Analysis. Springer Series in Statistics. Springer.
Alsberg, B. K., & Kvalheim, O. M. (1993). Compres-
sion of nth-order data arrays by B-splines. I : Theory. Rossi, F., Lendasse, A., Franois, D., Wertz, V., &
Journal of Chemometrics 7, 6173. Verleysen, M. (2006). Mutual information for the se-
lection of relevant variables in spectrometric nonlinear
Bazaraa, M. S., Sherali, H. D., & Shetty, C. M. (1993). modeling. Chemometrics and Intelligent Laboratory
Nonlinear Programming, Theory and Algorithms. John Systems 80 (2), 215226.
Wiley and Sons.
Shao, X. G., Leung, A. K., & Chau, F. T. (2003). Wave-
Bengio, Y., Delalleau, O., & Le Roux, N. (2006). The let: A New Trend in Chemistry. Accounts of Chemical
Curse of Highly Variable Functions for Local Kernel Research 36, 276283.
Machines. Y. Weiss and B. Schlkopf and J. Platt (edi-
tors), Neural Information Processing Systems (NIPS Suykens, J., Van Gestel, T., De Brabanter, J., De Moor,
2005), Advances in Neural Information Processing B., & Vandewalle, J. (2002). Least Squares Support
Systems 18, 107114. Vector Machines. World Scientific Publishing.
Benoudjit, N., Cools, E., Meurens, M., & Verley- Thodberg, H. (1996). A Review of Bayesian Neural
sen, M. (2004). Chemometric calibration of infrared Networks with an Application to Near Infrared Spec-
spectrometers: selection and validation of variables troscopy. IEEE Transactions on Neural Networks 7,
by non-linear models. Chemometrics and Intelligent 5672.
Laboratory Systems 70, 4753.
665
Functional Dimension Reduction for Chemometrics
Verleysen, M., & Franois, D. (2005). The Curse of Least Squares Support Vector Machine: A least
Dimensionality in Data Mining and Time Series Predic- squares modification of the Support Vector Machine
tion. J. Cabestany, A. Prieto, and D.F. Sandoval (editors), which leads into solving a linear set of equations. Also
8th International Work-Conference on Artificial Neural bears close resemblance to Gaussian Processes.
Networks (IWANN2005), Lecture Notes in Computer
Machine Learning: An area of Artificial Intel-
Science 3512, 758770.
ligence dealing with adaptive computational methods
Wlfert, F., Kok, W. T., & Smilde, A. K. (1998). such as Artificial Neural Networks and Genetic Al-
Influence of temperature on vibrational spectra and gorithms.
consequences for the predictive ability of multivariate
Over-Fitting: A common problem in Machine
models. Analytical Chemistry 70, 17611767.
Learning where the training data can be explained well
but the model is unable to generalize to new inputs.
Over-fitting is related to the complexity of the model:
KEy TERmS any data set can be modelled perfectly with a model
complex enough, but the risk of learning random fea-
Chemometrics: Application of mathematical or tures instead of meaningful causal features increases.
statistical methods to chemical data. Closely related
Support Vector Machine: A kernel based su-
to monitoring of chemical processes and instrument
pervised learning method used for classification and
design.
regression. The data points are projected into a higher
Curse of Dimensionality: A theoretical result in dimensional space where they are linearly separable.
machine learning that states that the lower bound of The projection is determined by the kernel function and
error that an adaptive machine can achieve increases a set of specifically selected support vectors. Training
with data dimension. Thus performance will degrade process involves solving a Quadratic Programming
as data dimension grows. problem.
Delta Test: A Non-parametric Noise Estimation Variable Selection: Process where unrelated input
method. Estimates the amount of noise within a data variables are discarded from the data set. Variable selec-
set, i.e. the amount of information that cannot be tion is usually based on correlation or noise estimators
explained by any model. Therefore Delta Test can be of the input-output pairs and can lead into significant
used to obtain a lower bound of learning error which improvement in performance.
can be achieved without risk of over-fitting.
Functional Data Analysis: A statistical approach
where multivariate data are treated as functions instead
of discrete vectors.
666
667
Functional Networks F
Oscar Fontenla-Romero
University of A Corua, Spain
Bertha Guijarro-Berdias
University of A Corua, Spain
Beatriz Prez-Snchez
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Functional Networks
x 2
NETWORKS
y (x) = j = 0 w j j (x) = j = 0 w j exp
M M j
DESCRIPTION OF FUNCTIONAL
. .
. .
. .
yN1 (1) yN2(2) yN(M-1)
yN(0) fN(1) fN(2) . . .M-1 fN(M) yN(M)
0 2
1 M M
668
Functional Networks
values in order to return a set of output values tions determined by some known properties of
to the next layer of storing units. In this general the model to be estimated. These restrictions can F
model each neural function f i (m ) is defined as the be representing by forcing the outputs of some
following composition: neurons to coincide in a unique storage unit.
Later on, the network can be translated into a
( ) ( ( ) (
f i ( m ) y1( m1) ,..., y N( mm11) = g i( m ) hi(1m1) y1( m1) ,..., hiN( mm)1 y N( mm11) )) system of functional equations that usually can
be simplified in order to obtain a more simple but
(
gi( m ) hi(1m 1) (y1( m 1) ),..., hiN( mm)1 (y ( m 1)
N m1 )) equivalent architecture. Finally, on the absence of
knowledge about the problem always the general
where the superscript (m) is the number of model, shown in Figure 1, can be used.
(m )
layer. The functions g i are known and fixed 2. Parametric learning. This second stage refers to
before training, for example to be the sum or the estimation of the neurons functions. Often
product. In contrast, functions hij(m ) are lineal these neural functions are considered to be lineal
combinations of other known functions iq combinations of functional families, and therefore
(for example, polynomials, cosines, etc.), i.e. the parametric learning consists of estimating
both the arguments of the neural functions and
hij( m ) (y (jm1) )= zij=1 aijz( m )Fijz( m ) (y (jm1)) where the co-
n( m )
the parameters of the lineal combination using the
efficients aijz(m ) implied in this linear combination
available training data. It is important to remark
are the model parameters to be learned. As can be
that this type of learning generalizes the idea of
observed, MLPs, GLNs and RBFs are particular
estimating the weights of a neural network.
cases of this generalized model.
c. A set of directed links that connect the functional
An Example Of A Functional Network
units and the storing units. These connections
indicate the direction of the flow of information.
The general FN in Figure 1 does not have arrows In this section the use of FNs is illustrated by means of
that converge in the same storing unit, but if it did, an artificial simple example. Lets suppose a problem of
this would indicate that the neurons from which engine diagnosis for which three continuous variables
they emanate must produce identical outputs. (x=vibrations, y=oil density, z=temperature) are
This is an important feature of FNs that is not being monitored. The problem is to estimate the prob-
available for neural networks. These converging ability P of a given diagnosis based on these variables,
arrows represent constraints which can arise from i.e., P(x, y, z). Moreover, we know that the information
physical and/or theoretical characteristics of the provided by the monitored variables is accumulative.
problem under consideration. Therefore, it is possible, for example, to calculate first
the probability P1(x, y) of a diagnosis based on only
variables x and y, and later on when variable z is avail-
Learning in Functional Networks
able combine the value provided by P1 with the new
information z to obtain P(x, y, z). That is, there exist
Functional networks combine knowledge about the
some functions such as:
problem to determine the network, and training data
to estimate the unknown neural functions. Therefore,
P(x, y, z) F[P1(x, y), z] = K[P2(y, z), x] = L[P3(x, z),
in contradistinction to neural networks, FNs include
y] (1)
two types of learning:
This situation suggests the structure of the FN
1. Structural learning. The specification of the initial
shown in Figure 2a, where I is the identity function.
topology for a FN can be based on the features
The coincident connections in the store output unit, or
of the problem we are facing (Castillo, Cobo,
equivalently eq. 1, establish strong restrictions about
Gutirrez, & Pruneda, 1998). Usually knowledge
the functions P1, P2, P3, F, K, L. The use of methods
about the problem can be used in order to develop
for functional equations allows to deal with eq. 1 in
a network structure. An important feature of FN
order to obtain the corresponding functional conditions
is that they allow managing functional restric-
from which it is possible to derive a new equation for
669
Functional Networks
1. Neural networks are derived only from data about Some Functional Network Models
the problem. However, FNs can also use knowl-
edge about the problem to derive its topology, In this section some typical FN models are presented,
incorporating properties about the function to be that let solving several real problems.
modeled.
2. During learning in neural networks the shape of The Uniqueness Model
neural functions is fixed usually to be a sigmoid
type function, and only the weights can be adapted. This is a simple but very powerful model for which
In FNs, neural functions are also learnt. the output z of the corresponding FN architecture can
3. Neural functions that can be employed in neural be written as a function of the inputs x and y,
networks are limited and belong to some known
z = F(x, y) = f31(f1(x) + f2(y)) (2)
670
Functional Networks
z j = f 31 (f1 ( x j ) + f 2 ( y j ) ) f 3 ( z j ) = f1 ( x j ) + f 2 ( y j ); j = 1,..., n
Uniqueness of Representation. For this model to F
have uniqueness of solution it is only = f 31 (f1 (tox jfix
z j required f 2 ( y j ) )
) +the f 3 ( z j ) = f1 ( x j ) + f 2 ( y j ); j = 1,..., n
functions f1, f2, f3 at a point (see explanation in Castillo,
Cobo, Gutirrez, & Pruneda, 1998). Again the functions fs can be approximated as a
Learning the model. Learning the function F(x, y) linear combination of known functions from a
in eq.2 is equivalent to learning the functions from a given family. Finally, the following sum of square
data set, {(xi, yi, zi): j = 1,..., n} where z is the desired errors,
output for the given inputs. To estimate f1, f2, f3 we can
employ the non-linear and linear methods: 2
n
m1 n m2 m3
Q = e = a1iF1i ( x j ) + a 2iF 2i ( y j ) a3iF3i ( z j )
2
j
1. The Non-Linear Method. We approximate each j =1 j =1 i =1 i =1 i =1
of the functions f1, f2, f31 z = F(x, y) = f31(f1(x) + n 2
n
m1 m2 m3
f2(y)) by considering them to be a linear combi- Q = e j = a1iF1i ( x j ) + a 2iF 2i ( y j ) a3iF3i ( z j )
2
I
x p
x
K
G
u y q + k u
y
N F
z r
z
I
a) b)
671
Functional Networks
as a function of the input x and N(y, z). This property Thus, the FN in Figure 3b is equivalent to the FN
is represented with the links convergent to the output in Figure 3a.
node u, which leads to the functional equation Uniqueness of Representation. By employing func-
tional equations for the generalized associativity model
F[G(x, y), z] = K[x, N(y, z)] (3) it can be demonstrated that uniqueness of solution
requires fixing the functions k, p, q, r at a point (see
Simplification of the model. It can be shown that Castillo, Cobo, Gutirrez, & Pruneda, 1998).
the general solution of eq. 3 is: Learning the model. The problem of learning the
FN in Figure 3b involves estimating the functions k,
F ( x, y ) = k [f ( x) + r ( y )]; G ( x, y ) = f 1[p ( x) + q ( y )] p, q, r in eq. 5, that can be rewritten as:
K ( x, y ) = k [p ( x) + n( y )]; N ( x, y ) = n 1[q ( x) + r ( y )]
k1(u) = p(x) + q(y) + r(z)
(4)
Being {(x1i, x2i, x3i, x4i)|i = 1,..,n} with (x1, x2, x3, x4
where f, r, k, n, p, q are arbitrary continuous and strictly x, y, z, u) the observed training sample of size n we
monotonic functions. Substituting eq. 4 in eq. 3, the can define the error
following result is obtained
ei = p ( x1i ) + q ( x2i ) + r( x3i ) k 1 ( x4i ); i = 1,..., n
F[G(x, y), z] = K[x, N(y, z)] = u = k[p(x) + q(y) +
r(z)] (5)
f1
x
x
fn
+
g1
gn
z
y c 11
h1
f1 x
x
x c 1k-r
hm fr x
+
c r1 + z
k1 g1 x
x
y c rk-r
km g k-r x
a) b)
672
Functional Networks
Q = k =1 ek
n 2
k
.
f ( x) g ( y ) = 0
i i
i =1 (6)
In this case, the parameters are not constrained by extra
where conditions, so the minimum can be obtained by solving
the following system of linear equations, where the
f i ( x) = hi n ( x) unknowns are the coefficients cij:
g i ( y ) = ki n ( y ); i = n + 1,..., n + m
Q n
= 2 ek f p ( x1k ) g q ( x2 k ) = 0; p = 1,..., r ; q = 1,..., r s
c pq k =1
This suggests the FN in Figure4a. Q n
673
Functional Networks
plied to a regression and a classification problem. In all continuous attributes. The set contains 178 instances
cases, the functions of each layer were approximated by that must be classified in three different classes.
considering a linear combination of known functions For this case, the Separable Model (Figure 4)
from a polynomial family. with three output units was employed. Moreover, its
performance is compared to other standard methods:
Classification Problem a Multilayer Perceptron (MLP), a Radial Basis Func-
tion Network (RBF) and Support Vector Machines
The first example shows the performance of a FN solving (SVM).
a classification problem: the Wine data set. This data- Figure 5 shows the comparative results. The first
base can be obtained from the UCI Machine Learning subfigure contains the mean accuracy obtained using
Repository1. The aim of this problem is to determine a leaving-one-out cross-validation method. As can be
the origin of wines using a chemical analysis of 13 observed, the FN obtains a very good performance for
Figure 5. Accuracy and training time obtained by different models for the wine data set
Wine
1 1 1 0,978 1 1
1
0,9 0,981 0,97 0,981 1
0,8
0,7
0,652
0,6
Accuracy
0,599 Train
0,5
Test
0,4
0,3
0,2
0,1
0
MLP RBF-1 RBF-2 RBF-3 SVM RF
model
25
20
15
Time (s)
Media
Media+Desv
10
0
MLP RBF-1 RBF-2 RBF-3 SVM RF
model
674
Functional Networks
the test set. Regarding the time required for the learning or engineering interpreted. Another main advantage is
process, the second subfigure shows that the FNs are that the initially proposed model can be simplified, using F
comparable with the other methods. functional equations, and learnt by solving a system of
linear equations, which speeds the learning process and
Regression Problem avoid it to be stuck in a local minimum. Finally, the
shape of neural function does not have to be fixed, but
In this case the aim of the network is to predict the they can be fitted from data during training, therefore
failure shear effort in concrete beams based on several widening the modeling ability of the network.
geometrical, longitudinal and transversal parameters
of the beam (Alonso-Betanzos, Castillo, Fontenla-
Romero, & Snchez-Maroo, 2004). REFERENCES
A FN, corresponding to the Associative Model
(Figure 3), and also a MLP were trained employing Alonso-Betanzos, A., Castillo, E., Fontenla-Romero,
a ten-fold cross-validation, running 30 simulations O., & Snchez-Maroo, N. (2004). Shear Strength
using different initial parameter values. A set with 12 Prediction using Dimensional Analysis and Functional
samples was kept for further validation of the trained Networks. Proceedings of European Symposium on
systems. The mean normalized Mean Squared Errors Artificial Neural Networks, 251-256
over 30 simulations obtained by the FN was 0.1789
Bishop, C.M. (1995). Neural Networks for pattern
and 0.8460 for test and validation, respectively, while
recognition. Oxford University Press.
the MLP obtained 0.1361 and 2.9265.
Castillo, E. (1998). Functional networks. Neural Pro-
cessing Letters, 7, 151-159.
FUTURE TRENDS
Castillo, E., Cobo, A., Gutirrez, J., & Pruneda R.
(1998). Functional networks with applications. A neu-
Functional networks are being successfully employed
ral-Based Paradigm. Kluwer Academic Publishers.
in many different real applications. In engineering prob-
lems they have been applied, for instance, for surface Castillo, E., & Gutirrez, J.M. (1998). A comparison of
reconstruction (Iglesias, Glvez, & Echevarra, 2006). functional networks and neural networks. Proceedings
Other works have used these networks for recovering of the IASTED International Conference on Artificial
missing data (Castillo, Snchez-Maroo, Alonso-Be- Intelligence and Soft Computing, 439-442
tanzos, & Castillo, 2003) and for general regression
and classification problems (Lacruz, Prez-Palomares Castillo, E., Iglesias, A., & Ruiz-Cobo, R. (2004).
& Pruneda, 2006) . Functional Equations in Applied Sciences. Elsevier.
Another recent research line is related to the investi- Castillo, E., Snchez-Maroo, N., Alonso-Betanzos, A.,
gation of measures of fault tolerance (Fontenla-Romero, & Castillo, C. (2003). Recovering missing data with
Castillo, Alonso-Betanzos, & Guijarro-Berdias, 2004), Functional and Bayesian Networks. Lecture Notes in
in order to develop new learning methods. Computer Science, 2687, part II, 489-496.
Fontenla-Romero, O., Castillo, E., Alonso-Betanzos,
CONCLUSION A., & Guijarro-Berdias, B. (2004). A measure of fault
tolerance for functional networks. Neurocomputing,
This article presents a review of functional networks. 62, 327-347.
Functional networks are inspired by neural networks Gupta, M., & Rao, D. (1994). On the principles of fuzzy
and functional equations. This model offers all the neural networks. Fuzzy Sets and Systems, 61, 1-18.
advantages of ANNs, such as noise tolerance and
generalisation capacity, adding new advantages. One Hagan, M.T. & Menhaj, M. (1994). Training feedfor-
of them is the possibility to use knowledge about the ward networks with the marquardt algorithm. IEEE
problem to be modeled to derive the initial network Transactions on Neural Networks, 5(6), 989-993.
topology, thus resulting on a model that can be physical
675
Functional Networks
Iglesias, A., Glvez, A. & Echevarra, G. (2006). Surface Functional Equation: An equation for which its
reconstruction via functional networks. Proceedings of unknowns are expressed in terms of both independent
the International Conference on Mathematical and Sta- variables and functions.
tistical Modeling. CDROM ISBN:84-689-8577-5.
Functional Network: A structure consisting of
Lacruz, B., Prez-Palomares, A. & Pruneda, R.E. processing units and storing units. These units are
(2006). Functional Networks for classification and organized in layers and linked by connections. Each
regression problems. Proceedings of the International processing unit contains a multivariate and multiargu-
Conference on Mathematical and Statistical Modeling. ment function to be learnt during a training process.
CDROM ISBN:84-689-8577-5.
Lagrange Multiplier: Given the function f(x1,
Moller, M.F. (1993). A scaled conjugate gradient algo- x2,...,xn), the Lagrange multiplier is used to find the
rithm for fast supervised learning. Neural Networks, extremum of f subject to a constraint g(x1, x2,...,xn) by
6, 525-533. solving
Moody, J. & Darken, C.J. (1989) Fast learning in
networks of locally-tuned processing units. Neural f g
+L = 0, k = 1,..., n
Computation,1(2), 281-294. xk x k .
Rumelhart, D.E., Hinton, G.E. & Willian, R.J. (1986)
Learning representations of back-propagation errors. Learning Algorithm: A process that, based on
Nature, 323, 533-536. some training data representing the problem to be
learnt, adapts the free parameters of a given model,
Specht, D. (1990). Probabilistic neural networks. Neural such as a neural network, in order to obtain a desired
Networks, 3, 109-118. functionality.
Youssef, H. M. (1993). Multiple Radial Basis Function Linear Equation: An algebraic equation involving
Networks in Modeling and Control. A IAA/ GNC. only a constant and first-order (linear) terms.
Uniqueness: Property of being the only possible
solution.
KEy TERmS
676
677
Ernesto Lpez-Mellado
CINVESTAV Unidad Guadalajara, Mexico
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Approximation of DES State
it is represents in natural language: the activity will Definition The intersection and union of fuzzy sets are
stop about 2.5. defined in terms of min and max operators.
Fuzzy extension principle. The fuzzy extension
principle plays a fundamental role because we can (a b )= min(a, b) = min ( a ( ), b ( ))| support _ of _ a b
extend functions defined on crisp sets to functions on
fuzzy sets. An important application of this principle and
is a mechanism to operate arithmetically with fuzzy
numbers. (a b )= max (a, b )= max ( a ( ), b ( ))| support _ of _ a b
Definition. Let X1, ,Xn be crisp sets and let f a func- We used these operators, intersection and union, as a
tion such f : X1 Xn Y . If a1, ,an are fuzzy sets t-norm and a s-norm, respectively.
on X1, ,Xn, respectively, then f(a1, ,an) is the fuzzy
set on Y such that: Definition The distribution of possibility before
and after are the fuzzy sets a b = (, a2 , a3 , a4 ) and
{
f (a1 ,..., an ) = (x1 ,..., xn )(X1... X n ) a1 (x1 ) ... an (xn )/ f (x1 ,..., xn )}
a a = (a1 , a2 , a3 , + ) respectively; they are defined in
(Andreu, 1997) as a function ( ,a ] ( ) = sup ( ) and
If b= f(a1, ,an) then b is the fuzzy set on Y such ( a , + ] ( )
= sup ( ), respectively.
that:
The fuzzy set was characterized as: Definition. An ordinary PN structure G is a bipartite
digraph represented by the 4-tuple G = (P, T , I , O )
a = { a (x1 )/ x1 ,..., a (xn )/ xn }. where P = {p1 , p2 , , pn }and T = {t1 , t2 , , tm } are
finite sets of vertices called respectively places and
With the extension principle we can define a simplified transitions, I (O ) : P T {0,1}is a function rep-
fuzzy sets addition operation. resenting the arcs going from places to transitions
(transitions to places).
Definition. Let a = (a1 , a2 , a3 , a4 ) and b = (b1 , b2 , b3 , b4 )
be two trapezoidal fuzzy sets. The fuzzy sets addition Pictorially, places are represented by circles, transi-
operation is: a b = (a1 + b1 , a2 + b2 , a3 + b3 , a4 + b4 ) tions are represented by rectangles, and arcs are de-
(Klir, 1995). picted as arrows. The symbol t j (t j )denotes the set
678
Fuzzy Approximation of DES State
C + = cij+ where cij+ = O (pi , t j ); the incidence matrix Definition. Let Yi a p-invariant of a Petri net (G, M 0 ) ,
of G is C = C + C . Yi the support of Yi, then the induced subnet by Yi is
A marking function M : P + represents the
number of tokens or marks (depicted as dots) resid-
ing inside each place. The marking of a PN is usually PCi = (Pi = Yi , Ti = tk p j , tl p j
expressed as an n-entry vector. | p j Yi , I i , Oi)
Definition. A Petri Net system or Petri Net (PN) is the named p-component, where
pair N = (G, M0), where G is a PN structure and M0 is
an initial token distribution. I i = Pi Ti I , Oi = Pi Ti O.
where vk (i ) = 0, i j , vk ( j ) = 1 .
named t-component.
The reachability set of a PN is the set of all pos-
sible reachable marking from M0 firing only enabled
I i = Pi Ti I and Oi = Pi Ti O.
transitions; this set is denoted by R(G, M0).
A structural conflict is a PN sub-structure in which
two or more transitions share one or more input places; Definition. A invariant Zi is minimal if no invariant
such transitions are simultaneously enabled and the fir- Zj satisfies Z j Z i , where Zi,Zj are p-invariants or
ing of one of them may disable the others, Fig.3(b). t-invariants and z Z i : z 0.
Definition. A transition tk T is live, for a marking Definition. Let Z = {Z1 , , Z q }be the set of minimal
M0, if M k R (G, M 0 ) , M n R (G, M 0 ) such that invariants (Silva, 1982) of a PN, then Z is called the
tk is enabled invariants base. The cardinality of Z is represented
as Z .
tk
Mn
.
FUZZy TImED PETRI NETS
A PN is live if all its transitions are live.
Basic Operators
Definition. A PN is said 1-bounded, or safe, for a
marking M0, if pi P and M j R (G, M 0 ), it holds We introduce first some useful operators.
that M j ( pi ) 1.
Definition. In order to get the fuzzy set between f and
In this work we deal with live and safe PN. g , the lmax function is defined as:
679
Fuzzy Approximation of DES State
( ) (
lmax f , g = min f a , g b ) r r
f11 f1r g11(2) g1n f
f1k g k 1 g kn
1k
k =1 k =1
+
operation
Definition. The latest (earliest) the
selects =
latest (earliest) fuzzy set among fmr they
f m1n fuzzy sets; g r1 are g rn r
fmk g k 1
r
calculated as follows: k =1
k =1
g
f mk kn
( ) ( ( ) ( ))
latest f1 , , fn = min max f1b , , fnb , min f1a , , fna Formalism Description of the FTPN
( ) ( ( ) (
st f1 , , fn = min max f1b , , fnb , min f1a , , fna )) (3)Definition. A fuzzy timed Petri net structure is a 3-
tuple FTPN = (N , , ); where N = (G, M0) is a PN,
= {a1 , a2 , , an } is a collection of fuzzy sets, : P
( ) ( ( ) (
earliest f1 , , fn = min min f1b , , fnb , max f1a , , fna ))
is a function that associates a fuzzy set ai to each
place pi P.
( ) ( ( ) (
st f1 , , fn = min min f1b , , fnb , max f1a , , fna )) (4)
Fuzzy timing of places
Definition. The fuzzy_conjugation-operator is defined
The fuzzy set a = (a1 , a2 , a3 , a4 ) Fig.2(b) represents
oper
as arg1 arg 2 , where arg1, arg2 are arguments that
can be matrices of fuzzy sets; is the fuzzy and operation the static possibility distribution ( a ) [0,1] of the
and oper is any operation referred as, +, -, latest, min, instant at which a token leaves a place p P , starting
etc. For some row i = 1,...m and some column j = 1,... from the instant when p is marked. This set does not
( )
n) the products and fik , g kj | k = 1, , r are computed change during the FTPN execution.
( ( ))
as oper and fik , g kj . For example:
Fuzzy timing of tokens
r r
g fuzzy set b = (b , b , b , b ) Fig.2(c) represents
g1n f The
f1k g k 1
f11 f1r g11 1k kn 1 2 3 4
Figure 2. (a) Fuzzy timed Petri net. (b) The fuzzy set associated to places. (c) Fuzzy set to place or mark associ-
ated. (d) Fuzzy timestamp
680
Fuzzy Approximation of DES State
Fuzzy firing date Now, we reformulate the expressions (5), (6), (7) and (8)
allowing a more general an compact representation.
The firing transition date otk ( ) of a transition tk is
determined with respect to the set of transitions {tj} lmax
simultaneously enabled, Fig.3(b). This date, expressed B = C + O A
(9)
as a possibility distribution, is computed as follows:
T latest
Figure 3. (a) Conjunction transition. (b) Structural conflict. (c) Attribution and selection place
681
Fuzzy Approximation of DES State
earliest latest
bp o a p1
C = lmax C + O , C O 1 t5
(12)
bp2 ot1 a p2
where B , E , O and C denote vectors composed by bp3 ot1 a p3
B= =
b ot a p4
bps , etk , otk , c ps , respectively. p4 2
bp ot3 a p5
Modeling Example 5
o
bp t4 a
p6
6 (13)
Now we will illustrate the previous matrix formulation
though a simple example. bp1
et1
bp2
Example et2
Consider the system shown in Fig.4(a); it consist E = et3 = bp3
of two cars, car1 and car2, which move along inde-
pendent and dependent ways executing the set of (
et4 latest bp4 , bp5 )
e
activities Op={Right_car1, Right_car2, Charge_car1, t5 bp
6 (14)
Charge_car2, Left_car1,2 Discharge_car1,2}. The
operation of the system is automated following the
sequence described in the FTPN of Fig.4(b) in which ot1 et1
o e
the activities are associated to places. The ending time t2 t2
possibility a pi for every activity is given in the model.
O = ot3 = et3
We are considering that there are not sensors detect-
ing the tokens in the system, thus the behavior is then ot4 et4
o e
analyzed through the estimated state. t5 t5 (15)
Figure 4. (a) Two cars system. (b) Fuzzy timed Petri net model
682
Fuzzy Approximation of DES State
683
Fuzzy Approximation of DES State
M ( ) = min (M ( ),1 ) among the bigger possibility M u ( ) that the token is
in a place u and the possibility M v ( ) that the token
Remark. is in any other place. The function Yi ( ) is then
If O ( b ) , O ( a ) {0,1} the behaviour is that of an ordi- calculated.
nary timed Petri net.
Yi ( ) = min (M p ( ) M p ( ) ) (19)
Example. u v
Notice that during (0,5 ), M ( ) coincides with the of the place pk P . Now, the discrete marking
fuzzy timestamp; it is shown in Fig.5(a). can be obtained with the following procedure.
Algorithm: Defuzzification
STATE APPROXImATION OF THE FTPN See Algorithm A.
684
Fuzzy Approximation of DES State
Figure 5. (a) Fuzzy marking evolution. (b) Marking For t = 0.95 the new fuzzy marking is
estimation. (c) Discrete state F
M (0.95 ) = [0 1 1 0.5 0.25 0] ,
T
therefore
M 1 (0.95 ) = [0 1 0 0 0 0] M 2 (0.95 ) = [0 0 1 0 0 0]
T T
M (0.95 ) = [0 1 0 0 0 0] + [0 0 1 0 0 0] = [0 1 1 0 0 0]
T T T
M (0.95 ) = [0 1 1 0 0 0]
T
FUTURE TRENDS
Algorithm A.
Input: M ( ), Y
Output: M ( )
Step 1 M ( ) 0
Step 2 Yi | i = 1,..., Y
Step 2.1 pk Yi : m q = max (M ( pk ))
Step 2.2 M (pq ) = 1
685
Fuzzy Approximation of DES State
The aim of this research has been the use of the Cybern., Part A: Syst. and Humans, Vol. 33, No. 3,
methodology for estimating the DES state of a discrete 314-324.
event system for monitoring its behavior and diagnos-
Giua, A., Julvez, C., Seatzu C. (2003). Marking Estima-
ing faults. A FTPN is going to be used as a reference
tion of Petri Nets base on Partial Observation. Proc. Of
model and their outputs (measurable marking) have to
the American Control Conference, Denver, Colorado
be compared with the outputs of the monitored system;
June 4-6, 326-331.
the analysis of residuals should provide an early detec-
tion of system malfunctioning and a plausible location Gonzlez-Castolo, J. C., Lpez-Mellado, E. (2006).
of the faulty behavior. Fuzzy State Estimation of Discrete Event Systems.
Proc. of MICAI 2006: Advances in Artificial Intel-
ligence, Vol. 4293, 90-100.
CONCLUSION
Hennequin, S., Lefebvre, D., El Moudni., A. (2001).
Fuzzy Multimodel of Timed Petri Nets. IEEE Trans.
This article addressed the state estimation problem
on Syst., Man, Cybern., Vol. 31, No. 2, 245-250.
of DES whose the duration of activities is ill known;
fuzzy sets represent the uncertainty of the ending of Klir, G. and Yuan, B. (1995). Fuzzy Sets and Fuzzy
activities. Several novel notions have been introduced Logic. Theory and Applications. Prentice Hall, NJ,
in the FTPN definition, and a new matrix formulation USA.
for computing the fuzzy marking of Marked Graphs
has been proposed. The extreme situation in which any Koriem, S.M. (2000). A Fuzzy Petri Net Tool For Mod-
activity of a system cannot be detected by sensors has eling and Verification of Knowledge-Based Systems.
been dealt for illustrating the degradation of the mark- The Computer Journal, Vol. 43, No. 3. 206-223.
ing estimation when a cyclic execution is performed. Leslaw, G., Kluska, J. (2004). Hardware Implementa-
Current research addresses the topics mentioned in the tion of Fuzzy Petri Net as a Controller. IEEE Transac-
above section. tions on Systems, Man, and Cybernetics, Vol. 34, No.
3, 1315-1324.
Cao, T., Sanderson, A. C. (1996). Intelligent Task Plan- Murata., T. (1996). Temporal uncertainty and fuzzy-
ning Using Fuzzy Petri Nets. Intelligent Control and timing high-level Petri nets. Lect. Notes Comput. Sci.,
Intelligent Automation, Vol. 3. Word Scientific. Vol.1091, 29-58.
Cardoso, J., Camargo., H. (1999). Fuzziness in Petri Pedrycz, W., Camargo, H. (2003). Fuzzy timed Petri
Nets. Physical Verlag. Nets. Elsevier, Fuzzy Sets and Systems, 140, 301-
330.
Chen, S., Ke, J., Chang, J. (1990). Knowledge represen-
tation using Fuzzy Petri nets. IEEE Trans. Knowledge Ramirez-Trevio, A., Rivera-Rangel, A., Lpez-
Data Eng., Vol. 2, No. 3, 311-319. Mellado, E. (2003). Observability of Discrete Event
Systems Modeled by Interpreted Petri Nets. IEEE
Ding, Z., Bunke, H., Schneider, M., Kandel., A. (2005). Transactions on Robotics and Automation. Vol.19,
Fuzzy Timed Petri Net, Definitions, Properties, and No. 4, 557-565.
Applications. Elsevier, Mathematical and Computer
Modelling 41, 345-360. Shen, R. V. L. (2003). Reinforcement Learning for
High-Level Fuzzy Petri Nets. IEEE Trans. on Syst.,
Gao, M., Zhou, M., Guang, X., Wu, Z. (2003). Fuzzy Man, & Cybern., Vol. 33, No. 2, 351-362.
Reasoning Petri Nets, IEEE Trans. Syst., Man &
686
Fuzzy Approximation of DES State
687
688
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Control Systems
original real-world setting, to accomplish the desired associated with the set X. If a member, x, belongs to
system automation. This is the so-called fuzzifica- two fuzzy sets, one says x is X1 AND x is X2, and so F
tionfuzzy operationdefuzzification routine in on. Here, the relation AND needs a logical operation
fuzzy control design. The key stepfuzzy opera- to perform. As a result, this statement eventually yields
tionis executed by a logical rule base consisting of only one membership value for the element x, denoted
some IF-THEN rules established by using fuzzy logic by X X (x). There are several logical operations to
1 2
and human knowledge (Chen & Pham, 1999, 2006; implement the logical AND; they are quite different but
Drianker, Hellendoorn & Reinfrank, 1993; Passino & all valid within their individual logical system. A com-
Yurkovich, 1998; Tanaka, 1996; Tanaka & wang, 1999; monly used one is X X (x) = min {X (x), X (x)}.
1 2 1 2
Wang, 1994; Ying, 2000).
Fuzzy Logic Rule Base
Fuzzification
The majority of fuzzy logic control systems are knowl-
Fuzzy set theory allows partial membership of an ele- edge-based systems. This means that either their fuzzy
ment with respect to a set: an element can partially models or their fuzzy logic controllers are described
belong to a set and meanwhile partially not belong to by fuzzy logic IF-THEN rules. These rules have to
the same set. For example, an element, x, belonging to be established based on human experts knowledge
the set, X, IS specified by a (normalized) membership about the system, the controller, and the performance
function, X : X [0,1]. There are two extreme cases: specifications, etc., and they must be implemented by
X(x) = 0 means x X and X(x) = 1 means x X in performing rigorous logical operations.
the classical sense. But X(x) = 0.2 means x belongs For example, a car driver knows that if the car moves
to X only with grade 0.2, or equivalently, x does not straight ahead then he does not need to do anything; if
belong to X with grade 0.8. Moreover, an element can the car turns to the right then he needs to steer the car
have more than one membership value at the same time, to the left; if the car turns to the right by too much then
such as X(x) = 0.2 and X(x) = 0.6, and they need not he needs to take a stronger action to steer the car to the
be summed up to one. The entire setting depends on left much more, and so on. Here, much and more
how large the set X is (or the sets X and Y are) for the etc. are fuzzy terms that cannot be described by classi-
associate members, and what kind of shape a member- cal mathematics but can be quantified by membership
ship function should have in order to make sense of the functions (see Fig. 2, where part (a) is an example of
real problem at hand. A set, X, along with a membership the description to the left). The collection of all such
function defined on it, X(), is called a fuzzy set and is if then principles constitutes a fuzzy logic rule
denoted (X, X). More examples of fuzzy sets can be base for the problem under investigation. To this end,
seen below, as the discussion continues. This process it is helpful to briefly summarize the experience of the
of transforming a crisp value of an element (say x = driver in the following simplified rule base: Let X =
0.3) to a fuzzy set (say x = 0.3 X = [0,1] with X(x) [180, 180], x be the position of the car, left() be the
= 0.2) is called fuzzification. membership function for the moving car turning to
Given a set of real numbers, X = [1,1], a point x the left, right() the membership function for the car
X assumes a real value, say x = 0.3. This is a crisp turning to the right, and 0() the membership function
number without fuzziness. However, if a membership for the car moving straight ahead. Here, simplified
function X() is introduced to associate with the set X, statements are used, for instance, x is Xleft means x
then (X, X) becomes a fuzzy set, and the (same) point belongs to X with a membership value left(x) etc. Also,
x = 0.3 has a membership grade quantified by X() (for similar notation for the control action u of the driver
instance, X(x) = 0.9). As a result, x has not one but two is employed. Then, a simple typical rule base for this
values associated with the point: x = 0.3 and X(x) = car-driving task is
0.9. In this sense, x is said to have been fuzzified. For
convenience, instead of saying that x is in the set X R(1): IF x is Xleft THEN u is Uright
with a membership value X(x), in common practice R(2): IF x is Xright THEN u is Uleft
it is usually said x is , while one should keep in mind R(3): IF x is X0 THEN u is U0
that there is always a well-defined membership function
689
Fuzzy Control Systems
where X0 means moving straight ahead (not left nor Thus, what should the control action be? To simplify
right), as described by the membership function shown this discussion, suppose that the control action is simply
in Fig. 2(c), and u is U0 means u = 0 (no control ac- u = x with the same membership functions X = U for
tion) with a certain grade (if this grade is 1, it means all cases. Then, a natural and realistic control action
absolutely no control action). Of course, this description for the driver to take is a compromise between the two
only illustrates the basic idea, which is by no means required actions. Among several possible compromise
a complete and effective design for a real car-driving (or, average) formulas for this purpose, the most com-
application. monly adopted one that works well in most cases is the
In general, a rule base of r rules has the form following weighted average formula:
R(k): IF x1 is Xk1 AND AND xm is Xkm THEN u is Uk M right (u ) u + M0 (u ) u 0.28 (5o ) + 0.5 (5o )
(1) u= = = 0.5o
M right (u ) + M0 (u ) 0.28 + 0.5
690
Fuzzy Control Systems
e u
Fuzzification Fuzzy Rule Base Defuzzification
controller controller
input output
Fuzzy Logic Controller (FLC)
on the correctness and effectiveness of the rule base, mAIN FOCUS OF THE CHAPTER:
while the latter depends on the designers knowledge SOmE BASIC FUZZy CONTROL
and experience about the physical system or process for APPROACHES
control. Just like any of the classical design problems,
there is generally no unique solution for a problem; an A Model-Free Approach
experienced designer usually comes out with a better
design. This general approach of fuzzy logic control works for
A general weighted average formula for defuzzi- trajectory tracking for a conventional dynamical system
fication is the following convex combination of the that does not have a precise mathematical model.
individual outputs: The basic setup is shown in Fig. 3, where the plant is
a conventional system without a mathematical descrip-
r r
wi
output = A iui := ui tion and all the signals (the reference set-point sp, output
r
i =1 i =1 w y(t), control u(t), and error e(t) = sp y(t)) are crisp.
i =1 i (2)
The objective is to design a controller to achieve the
goal e(t) 0 as t , assuming that the system inputs
with notation referred to the rule base (1), where
and outputs are measurable by sensors on line.
If the mathematical formulation of the plant is
wi r
wi MU i (ui ), A i := 0, i = 1,, r , A =1 unknown, how can one develop a controller to con-
i =1 wi
r i
i =1 trol this plant? Fuzzy logic approach turns out to be
advantageous in this situation: it only uses the plant
Sometimes, depending on the design or application, inputs and outputs, but not the state variables nor any
the weights are other information. After the design is completed, the
entire dashed-block in Fig. 2 is used to replace the
m controller block in Fig. 3.
wi = M X ij ( x j ), i = 1,, r As an example, suppose that the physical reference
j =1
set-point is the degree of temperature, say 40F, and
that the designer knows the range of the error signal,
The overall structure of a fuzzy logic controller is e(t) = 40 y(t), is within X = [25, 45], and assume
shown in Fig. 2. that the scale of control is required to be in the unit
of 1. Then, the membership functions for the error
signal to be negative large (NL), positive large
(PL), and zero (ZO) may be chosen as shown in Fig.
691
Fuzzy Control Systems
r + e u y
3 Controller Plant
(reference (error (control (output
signal) signal) signal) signal)
4. Using these membership functions, the controller is it is considered to be positive large with membership
expected to drive the output temperature to be within value one, meaning that the set-point (40) is higher
the allowable range: 40 1. With these membership than y(t) by too much.
functions, when the error signal e(t) = 5, for instance, The output from the fuzzification module is a fuzzy
set consisting of the interval X and three member-
ship functions, NL, PL and ZO, in this example. The
output from fuzzification will be the input to the next
modulethe fuzzy logic rule basewhich only takes
Figure 4. Membership function for the error tempera- fuzzy set inputs to be compatible with the logical IF-
ture signal THEN rules.
Figure 5 is helpful for establishing the rule base. If
e > 0 at a moment, then the set-point is higher than the
output y (since e = 40 y), which corresponds to two
possible situations, marked by a and d respectively.
To further distinguish these two situations, one may
use the rate of change of the error, e = y . Here, since
the set-point is a constant, its derivative is zero. Using
information from both e and , one can completely
characterize the changing situation of the output tem-
perature at all times. If, for example, e > 0 and > 0,
temperature
o
b c y( t )
r = 45
d
e( t )
a
0 t
692
Fuzzy Control Systems
then the temperature is currently at situation d rather R1: IF e = PL AND > 0 THEN u(t+1)= PL(e) .
than situation a, since > 0 means y < 0 which, in turn, u(t); F
signifies that the curve is moving downward. R2: IF e = PS AND > 0 THEN u(t+1) = (1PS(e))
Based on the above observation from the physi- . u(t);
cal situations of the current temperature against the R3: IF e = PL AND < 0 THEN u(t+1) = PL(e) .
set-point, a simple rule base can be established as u(t);
follows: R4: IF e = PS AND < 0 THEN u(t+1) = (1PS(e))
. u(t);
R1: IF e > 0 AND > 0 THEN u(t+) = C u(t); R5: IF e = NL AND > 0 THEN u(t+1) = NL(e) .
R2: IF e > 0 AND < 0 THEN u(t+) = C u(t); u(t);
R3: IF e < 0 AND > 0 THEN u(t+) = C u(t); R6: IF e = NS AND > 0 THEN u(t+1) = (1NS(e))
R4: IF e < 0 AND < 0 THEN u(t+) = C u(t); . u(t);
R7: IF e = NL AND < 0 THEN u(t+1) = NL(e) .
otherwise (e.g., e = 0 or = 0), u(t+) = u(t), till next u(t);
step, where C > 0 is a constant control gain and t + can R8: IF e = NS AND < 0 THEN u(t+1) = (1NS(e))
be just t + 1 in discrete time. . u(t);
In the above, the first two rules are understood as
follows (other rules can be similarly interpreted): otherwise, u(t+1) = u(t). Here and below, = PL
means is PL, etc. In this way, the initial rule base is
1. R(1): e > 0 and > 0. As analyzed above, the tem- enhanced and extended.
perature curve is currently at situation d, so the In the defuzzification module, new membership
controller has to change its moving direction to functions are needed for the change of the control action,
the opposite by changing the current control action u(t + 1) or u(t), if the enhanced rule base described
to the opposite (since the current control action above is used. This is because both the error and the
is driving the output curve downward). rate of change of the error signals have been fuzzified
2. R(2): e > 0 and < 0. The current temperature curve to be positive large or positive small, the control
is at situation a, so the controller does not need actions have to be fuzzified accordingly (to be large
to do anything (since the current control action or small).
is driving the output curve up toward the set- Now, suppose that a complete, enhanced fuzzy logic
point). rule base has been established. Then, in the defuzzifi-
cation module, the weighted average formula can be
The switching control actions may take different used to obtain a single crisp value as the control action
forms, depending on the design. One example is u(t + output from the controller (see Fig. 2):
1) = u(t) + u(t), among others (Chen & Pham, 1999,
2006).
N
Mi ui (t + 1)
Furthermore, to distinguish positive large from u (t + 1) = i =1
N
just positive for e > 0, one may use those member- i =1
Mi
ship functions shown in Fig. 4. Since the error signal
e(t) is fuzzified in the fuzzification module, one This is an average value of the multiple (N = 8 in
can similarly fuzzify the auxiliary signal (t) in the the above rule base) control signals at step t + 1, and
fuzzification module. Thus, there are two fuzzified is physically meaningful to the given plant.
inputs, e and , for the controller, and they both have
corresponding membership functions describing their A Model-Based Approach
properties as positive large (PL), negative large
(NL), or zero (ZO), as shown in Fig. 5. Thus, for the If a mathematical model of the system, or a fairly good
rule base, one may replace it by a set of more detailed approximation of it, is available, one may be able to
rules as follows: design a fuzzy logic controller with better results such
as performance specifications and guaranteed stability.
693
Fuzzy Control Systems
This constitutes a model-based fuzzy control approach Q > 0 for all k = 1,,r, then the fuzzy controlled system
(Chen & Zhang, 1997; Malki, Li & Chen, 1994; Malki, (6) is asymptotically stable about zero.
Feigenspan, Misir & Chen, 1997; Sooraksa & Chen,
1998; Ying, Siler & Buckley, 1990). This theorem provides a basic (sufficient) condition
For instance, a locally linear fuzzy system model is for the global asymptotic stability of the fuzzy control
described by a rule base of the following form: system, which can also be viewed as a criterion for
tracking control of the system trajectory to the zero set-
r
RS( k ) : IF x1 is X k1 AND AND xm is X km THEN xpoint. + Bk u stable control gain matrices {K k }k =1 may
= Ak xClearly,
be determined according to this criterion in a design.
X k1 AND AND xm is X km THEN x = Ak x + Bk u
(3)
where {Ak} and {Bk} are given constant matrices, x FUTURE TRENDS
= [x1,...,xm]T is the state vector, and u = [u1,...,un]T is
a controller to be designed, with m n 1, and k = This topic will be discussed elsewhere in the near
1,,r. The fuzzy system model (3) may be rewritten future.
in a more compact form as follows:
r
x = A k (Ak x + Bk u ) = A( M ( x)) x + B( M ( x))u CONCLUSION
k =1 (4)
The essence of systems control is to achieve automa-
where tion. For this purpose, a combination of fuzzy control
technology and advanced computer facility available
{
M ( x) = M X ij ( x) } m in the industry provides a promising approach that can
mimic human thinking and linguistic control ability, so
i , j =1
.
as to equip the control systems with a certain degree
of artificial intelligence. It has now been realized that
Based on this fuzzy model, (3) or (4), a fuzzy con-
fuzzy control systems theory offers a simple, realistic
troller u(t) can be designed by using some conventional
and successful addition, or sometimes an alternative,
techniques. For example, if a negative state-feedback
for controlling various complex, imperfectly modeled,
controller is preferred, then one may design a controller
and highly uncertain engineering systems, with a great
described by the following ruse base:
potential in many real-world applications.
RC( k ) : IF x1 is X k1 AND AND xm is X km THEN u = K k x
X k1 AND AND xm is X km THEN u = K k x REFERENCES
(5)
where {K k }rk =1 are constant control gain matrices to be G. Chen & T. T. Pham (1999). Introduction to Fuzzy
determined, k = 1,,r. Thus, the closed-loop controlled Sets, Fuzzy Logic, and Fuzzy Control Systems. CRC
system (4) together with (5) becomes Press.
G. Chen & T. T. Pham (2006) Introduction to Fuzzy
(k )
RSC : IF x1 is X k1 AND AND xm is X km THEN xSystems.
= [ Ak BK
CRCk ] xPress.
X k1 AND AND xm is X km THEN x = [ Ak BK k ]x (6) G. Chen & D. Zhang (1997). Back-driving a truck
with suboptimal distance trajectories: A fuzzy logic
For this feedback controlled system, the following control approach. IEEE Trans. on Fuzzy Systems, 5:
is a typical stability condition [1,2,10]: 369-380.
D. Drianker, H. Hellendoorn & M. Reinfrank (1993).
If there exists a common positive definite and symmetric
An Introduction to Fuzzy Control. Springer-Verlag.
constant matrix P such that AkT P + PAk = Q for some
694
Fuzzy Control Systems
H. Malki, D. Feigenspan, D. Misir & G. Chen (1997) L. A. Zadeh (1973). Outline of a new approach to the
Fuzzy PID control of a flexible-joint robot arm with analysis of complex systems and decision processes. F
uncertainties from time-varying loads. IEEE Trans. on IEEE Trans. on Systems, Man, and Cybernetics. 3:
Contr. Sys. Tech. 5: 371-378. 28-44.
H. Malki, H. Li & G. Chen (1994). New design and
stability analysis of fuzzy proportional-derivative
control systems. IEEE Trans. on Fuzzy Systems, 2: KEy TERmS
345-354.
Defuzzification: A process that converts fuzzy
K. M. Passino & S. Yurkovich (1998) Fuzzy Control,
terms to conventional expressions quantified by real-
Addison-Wesley.
valued functions.
P. Sooraksa & G. Chen (1998). Mathematical modeling
Fuzzification: A process that converts conventional
and fuzzy control of flexible robot arms. Math. Comput.
expressions to fuzzy terms quantified by fuzzy mem-
Modelling. 27: 73-93.
bership functions.
K. Tanaka (1996). An Introduction to Fuzzy Logic for
Fuzzy Control: A control method based on fuzzy
Practical Applications. Springer.
set and fuzzy logic theories.
K. Tanaka & H. O. Wang (1999). Fuzzy Control Sys-
Fuzzy Logic: A logic that takes on continuous
tems Design and Analysis: A Linear Matrix Inequality
values in between 0 and 1.
Approach. IEEE Press.
Fuzzy Membership Function: A function defined
L. X. Wang (1994) Adaptive Fuzzy Systems and Control:
on fuzzy set and assumes continuous values in between
Design and Stability Analysis. Prentice-Hall.
0 and 1.
H. Ying (2000). Fuzzy Control and Modeling: Analytical
Fuzzy Set: A set of elements with a real-valued
Foundations and Applications. IEEE Press.
membership function describing their grades.
H. Ying, W. Siler & J. J. Buckley (1990). Fuzzy control
Fuzzy System: A system formulated and described
theory: a nonlinear case. Automatica. 26: 513-520.
by fuzzy set-based real-valued functions.
L. A. Zadeh (1965). Fuzzy sets. Information and Con-
trol. 8: 338-353.
695
696
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Decision Trees
Surrounding the notion of MFs is the issue of their each linguistic term is evident. Moreover, using left and
structure (Dombi and Gera, 2005). Here, piecewise right limits of the X domain as and , respectively,
linear MFs are used to define the linguistic terms pre- the sets of defining values are (in list form); X1 - [,
sented, see Figure 1. , 1,3, 1,4, 1,5] and X2 - [2,1, 2,2, 2,3, , ], where
In Figure 1(top), a single piecewise linear MF is 1,3 = 2,1, 1,4 = 2,2 and 1,5 = 2,3.
shown along with the defining values that define it, This section now goes on to outline the technical
namely, 1,1, 1,2, 1,3, 1,4 and 1,5. The associated details of the fuzzy decision tree approach introduced
mathematical structure of this specific form of MF is in Yuan and Shaw (1995). With an inductive fuzzy
given below; decision tree, the underlying knowledge related to a
decision outcome can be represented as a set of fuzzy
0 if x j ,1 if .. then .. decision rules, each of the form;
x j ,1 If (A1 is Ti11 ) and (A2 is Ti22 ) and (Ak is Tikk ) then C
0.5 if j ,1 <x j,2
j,2 j ,1 is Cj, where A1, A2, .., Ak and C are linguistic variables
x j,2 for the multiple antecedents (Ais) and consequent (C)
0.5 + 0.5 if <x
j ,3 j,2
j,2 j ,3 statements used to describe the considered objects, and
T(Ak) = { T1k , T2k , .. TSki } and {C1, C2, , CL} are their
( x) 1 if x = j ,3
x j ,3 respective linguistic terms, defined by the MFs T jk (x)
1 0.5 if j ,3 <x j,4 etc. The MFs, T jk (x) and C j ( y ), represent the grade
j,4 j ,3
of membership of an objects antecedent Aj being T jk
x j,4
0.5 0.5 if j,4 <x j ,5 and consequent C being Cj, respectively.
j ,5 j,4 A MF (x) from the set describing a fuzzy linguistic
0 if <x
j ,5 variable Y defined on X, can be viewed as a possibility
distribution of Y on X, that is (x) = (x), for all x X
As mentioned earlier, MFs of this type are used to the values taken by the objects in U (also normalized
define the linguistic terms which make up linguistic so max x X p ( x) = 1 ). The possibility measure E(Y) of
variables. An example of a linguistic variable X based ambiguity is defined by
on two linguistic terms, X1 and X2, is shown in Figure
n
2(bottom), where the overlap of the defining values for E(Y) = g() = (p i p i+1 ) ln[i ] ,
i =1
697
Fuzzy Decision Trees
k
G(P| F) = w( Ei | F )G ( Ei F ) ,
where * = {1*, 2*, , n*} is the permutation of the i =1
normalized possibility distribution = {(x1), (x2),
where G(Ei F) is the classification ambiguity with
, (xn)}, sorted so that i* *i+1 for i = 1, .., n, and
fuzzy evidence Ei F, and where w(Ei| F) is the weight
p n+1 = 0 .
which represents the relative size of subset Ei F in
The ambiguity of attribute A (over the objects u1,
.., um) is given as:
F: w(Ei| F) =
1 m
E(A) = m Ea ( A(ui ) , k
i =1 min( Ei (u ), F (u )) min( Ej (u ), F (u ))
uU j =1 uU
where
In summary, attributes are assigned to nodes based
E(A(ui)) = g ( Ts (ui ) max (
1 j s
Tj (ui ))), on the lowest level of classification ambiguity. A node
becomes a leaf node if the level of subsethood is higher
with T1, , Ts the linguistic terms of an attribute (an- than some truth value assigned to the whole of the
tecedent) with m objects. fuzzy decision tree. The classification from the leaf node
The fuzzy subsethood S(A, B) measures the degree is to the decision group with the largest subsethood
to which A is a subset of B, and is given by, value. The truth level threshold controls the growth
of the tree; lower may lead to a smaller tree (with
S(A, B) = min( (u ), (u )) (u ).
uU
A B
uU
A lower classification accuracy), higher may lead to a
larger tree (with higher classification accuracy).
Given fuzzy evidence E, the possibility of classifying
an object to the consequent Ci can be defined as,
mAIN THRUST
(Ci|E) = S ( E , Ci ) / max S ( E , C j ) ,
j
The main thrust of this article is a detailed example
where the fuzzy subsethood S(E, Ci) represents the of the construction of a fuzzy decision tree. The de-
degree of truth for the classification rule (if E then Ci). scription includes the transformation of a small data
With a single piece of evidence (a fuzzy number for an set into a fuzzy data set where the original values are
attribute), then the classification ambiguity based on described by their degrees of membership to certain
this fuzzy evidence is defined as: G(E) = g((C| E)), linguistic terms.
which is measured using the possibility distribution The small data set considered, consists of five ob-
(C| E) = ((C1| E), , (CL| E)). jects, described by three condition attributes T1, T2
The classification ambiguity with fuzzy partition- and T3, and classified by a single decision attribute
ing P = {E1, , Ek} on the fuzzy evidence F, denoted C, see Table 1.
as G(P| F), is the weighted average of classification If these values are considered imprecise, fuzzy,
ambiguity with each subset of partition: there is the option to transform the data values in
Object T1 T2 T3 C
u1 112 45 205 7
u2 85 42 192 17
u3 130 58 188 22
u4 93 54 203 29
u5 132 39 189 39
698
Fuzzy Decision Trees
Figure 2. Membership functions defining the linguistic terms, CL and CH, for the decision attribute C
F
fuzzy values. Here, an attribute is transformed into a 32, , ]. To demonstrate their utilisation, for the object
linguistic variable, each described by two linguistic u2, with a value C = 17, its fuzzification creates the two
terms, see Figure 2. values C (17) = 0.750 and C (17) = 0.250, the larger
L H
In Figure 2, the decision attribute C is shown to be of which is associated with the high linguistic term.
described by the linguistic terms, CL and CH (possibly A similar series of membership functions can be
denoting the terms low and high). These linguistic terms constructed for the three condition attributes, T1, T2
are themselves defined by MFs (C () and C (). The and T3, Figure 3.
L H
hypothetical MFs shown have the respective defining In Figure 3, the linguistic variable version of each
terms of , C (): [, , 9, 25, 32] and C (): [9, 25, condition attribute is described by two linguistic terms
L H
Figure 3. Membership functions defining the linguistic terms for the condition attributes, T1, T2 and T3
699
Fuzzy Decision Trees
(possibly termed as low and high), themselves defined along with G(T1H) = 0.572, then G(T1) = (0.514 +
by MFs. The use of these series of MFs is the ability 0.572)/2 = 0.543. Compared with G(T2) = 0.579 and
to fuzzify the example data set, see Table 2. G(T2) = 0.583, the condition attribute T1, with the
In Table 2, each object is described by a series least classification ambiguity, forms the root node for
of fuzzy values, two fuzzy values for each attribute. the desired fuzzy decision tree.
Also shown in Table 2, in bold, are the larger of the The subsethood values in this case are; for T1: S(T1L,
values in each pair of fuzzy values, with the respec- CL) = 0.574 and S(T1L, CH) = 0.426, and S(T2H, CL)
tive linguistic term this larger value is associated with. = 0.452 and S(T2H, CH) = 0.548. For T2L and T2H, the
Beyond the fuzzification of the data set, attention turns larger subsethood value (in bold), defines the possible
to the construction of the concomitant fuzzy decision classification for that path. In both cases these values
tree for this data. Prior to this construction process, a are less that the threshold truth value 0.75 employed, so
threshold value of = 0.75 for the minimum required neither of these paths can be terminated to a leaf node,
truth level was used throughout. instead further augmentation of them is considered.
The construction process starts with the condition With three condition attributes included in the
attribute that is the root node. For this, it is necessary example data set, the possible augmentation to T1L
to calculate the classification ambiguity G(E) of each is with either T2 or T3. Concentrating on T2, where
condition attribute. The evaluation of a G(E) value is with G(T1L) = 0. 0.514, the ambiguity with partition
shown for the first attribute T1 (i.e. g((C| T1))), where evaluated for T2 (G(T1L and T2| C)) has to be less than
it is broken down to the fuzzy labels L and H, for L; this value, where;
k
(C| T1L) = S (T1L , Ci ) / max S (T1L , C j ) , G(T1L and T2| C) = w(T2i | T1L )G (T1L T2i ).
j i =1
considering CL and CH with the information in Table Starting with the weight values, in the case of T1L
1; and T2L, it follows;
1.398/2.433 = 0.574,
(u )) min( (u )) =
k
min( T2 L (u ), T1 L T2 j (u ), T1 L
uU j =1 uU
whereas, S(T1L, CH) = 0.426. Hence = {0.574, 0.426},
giving the ordered normalized form of * = {1.000,
1.842/2.433 = 0.757.
0.741}, with p 3 = 0 , then
2
Similarly w(T2H| T1L) = 0.243, hence;
G(T1L) = g((C| T1L)) = ( i i +1 ) ln[i ] = 0.514,
i =1
700
Fuzzy Decision Trees
G(T1L and T2| C) = 0.757 G(T1L T2L) + 0.699 defined threshold value of 0.75 then they are shown in
G(T1L T2H) = 0.757 0.327 + 0.699 0.251 = bold and accompany the node becoming a leaf node. F
0.309, The interpretative power of FDTs is shown by
consideration of the rules constructed. For the rule R5
A concomitant value for G(T1L and T3| C) = 0.487, it can be written down as;
the lower of these (G(T1L and T2| C)) is lower than
the concomitant G(T1L) = 0.514, so less ambiguity If T1 = H and T3 = H then C = L with truth level
would be found if the T2 attribute was augmented to 0.839.
the path T1 = L. The subsequent subsethood values in
this case for each new path are; T2L; S(T1L T2L, CL) The rules can be considered in a more linguistic
= 0.759 and S(T1L T2L, CH) = 0.358; T2H: S(T1L form, namely;
T2H, CL) = 0.363 and S(T1L T2H, CH) = 1.000. With
each suggested classification path, the largest subset- If T1 is low and T3 is high then C is low with truth
hood value is above the truth level threshold, therefore level 0.839.
they are both leaf nodes leading from the T1 = L path.
The construction process continues in a similar vein It is the rules like this one shown that allow the
for the path T1 = H, with the resultant fuzzy decision clearest interpretability to the understanding of results
tree in this case presented in Figure 4. in classification problems when using FDTs.
The fuzzy decision tree in Figure 8 shows five rules
(leaf nodes), R1, R2, , R5, have been constructed.
There are a maximum of four levels to the tree shown, FUTURE TRENDS
indicating a maximum of three condition attributes are
used in the rules constructed. In each non-root node Fuzzy decision trees (FDTs) benefit from the inductive
shown the subsethood levels to the decision attribute learning approach that underpins their construction, to
terms C = L and C = H are shown. On the occasions aid in the classification of objects based on their values
when the larger of the subsethood values is above the over different attribute. Their construction in a fuzzy
701
Fuzzy Decision Trees
environment allows for the potential critical effects operators. Fuzzy Sets and Systems, 154, 275-286.
of imprecision to be mitigated, as well as brings a
Hunt, E.B., Marin, J., & Stone, P.T. (1966). Experiments
beneficial level of interpretability to the results found,
in Induction. New York, NY: Academic Press.
through the decision rules defined.
As with the more traditional crisp decision tree Li, M.-T., Zhao, F., & Chow L.-F. (2006). Assignment
approaches, there are issues such as the complexity of of Seasonal Factor Categories to Urban Coverage
the results, in this case the tree defined. Future trends Count Stations Using a Fuzzy Decision Tree. Journal
will surely include how FDTs can work on re-grading of Transportation Engineering, 132(8), 654-662.
the complexity of the tree constructed, commonly
known as pruning. Further, the applicability of the rules Pal, N.R., & Chakraborty, S. (2001). Fuzzy Rule Ex-
constructed, should see the use of FDTs extending in traction From ID3-Type Decision Trees for Real Data.
the range of applications it can work with. IEEE Transactions on Systems, Man, and Cybernetics
B, 31(5), 745-754.
Quinlan, J.R. (1979). Discovery rules by induction
CONCLUSION from large collections of examples, in: D. Michie (Ed.),
Expert Systems in the Micro Electronic Age, Edinburgh
The interest in FDTs, with respect to AI, is due to the University Press, Edinburgh, UK.
inductive learning and linguistic rule construction
processes that are inherent with it. The induction un- Quinlan, J.R. (1987). Probabilistic decision trees, in:
dertaken, truly lets the analysis create intelligent results P. Langley (Ed.), Proc. 4th Int. Workshop on Machine
from the data available. Many of the applications FDTs Learning, Los Altos, CA.
have been used within, such medicine, have benefited Wang, X., Nauck, D.D., Spott, M., & Kruse, R. (2007).
greatly from the interpretative power of the readable Intelligent data analysis with fuzzy decision trees. Soft
rules. The accessibility of the results from FDTs should Computing, 11, 439-457.
secure it a positive future.
Yuan, Y., & Shaw, M.J. (1995). Induction of fuzzy deci-
sion trees. Fuzzy Sets and Systems, 69(2), 125-139.
REFERENCES Zadeh, L.A. (1965). Fuzzy Sets. Information and
Control, 8(3), 338-353.
Armand, S., Watelain, E., Roux, E., Mercier, M., &
Lepoutre, F.-X. (2007). Linking clinical measurements
and kinematic gait patterns of toe-walking using fuzzy
decision trees. Gait & Posture, 25(3), 475-484. KEy TERmS
Bhatt, R.B., & Gopal, M. (2005). Improving the Learn- Condition Attribute: An attribute that describes
ing Accuracy of Fuzzy Decision Trees by Direct Back an object. Within a decision tree it is part of a non-leaf
Propagation, IEEE International Conference on Fuzzy node, so performs as an antecedent in the decision rules
Systems, 761-766. used for the final classification of an object.
Chang, R. L. P., & Pavlidis, T. (1977). Fuzzy decision Decision Attribute: An attribute that characterises
tree algorithms. IEEE Transactions Systems Man and an object. Within a decision tree is part of a leaf node,
Cybernetics, SMC-7(1), 28-35. so performs as a consequent, in the decision rules, from
Chiang, I.-J., & Hsu, J.Y.-J. (2002). Fuzzy classifica- the paths down the tree to the leaf node.
tion trees for data analysis. Fuzzy Sets and Systems, Decision Tree: A tree-like structure for representing
130, 87-99. a collection of hierarchical decision rules that lead to
Dombi, J., & Gera, Z. (2005). The approximation of a class or value, starting from a root node ending in a
piecewise linear membership functions and ukasiewicz series of leaf nodes.
702
Fuzzy Decision Trees
Induction: A technique that infers generalizations Membership Function: A function that quantifies
from the information in the data. the grade of membership of a variable to a linguistic F
term.
Leaf Node: A node not further split, the terminal
grouping, in a classification or decision tree. Node: A junction point down a path in a decision
tree that describes a condition in an if-then decision
Linguistic Term: One of a set of linguistic terms,
rule. From a node, the current path may separate into
which are subjective categories for a linguistic variable,
two or more paths.
each described by a membership function.
Root Node: The node at the tope of a decision
Linguistic Variable: A variable made up of a num-
tree, from which all paths originate and lead to a leaf
ber of words (linguistic terms) with associated degrees
node.
of membership.
Path: A path down the tree from root node to leaf
node, also termed a branch.
703
704
Alexander V. Bozhenyuk
Taganrog Technological Institute of Southern Federal University, Russia
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Graphs and Fuzzy Hypergraphs
705
Fuzzy Graphs and Fuzzy Hypergraphs
k-vertex subgraph with the degree of domination more color). In other words, between vertices of the
than B . min same color there arent edges with the member-
k
~ ship function more than 0,5;
Afuzzy set B X = {< b1min / 1 >, < b 2min / 2 >,..., < b nmin / n >}
~ by 3 colors with the degree of separation 1 (vertices
is called a domination fuzzy set of fuzzy graph G
x1, x3, - first color, vertices x2 - second color, vertex
(Bershtein & Bozhenuk, 2001 a).
~ x4 - third color). In other words, between vertices
Fuzzy graph G (Figure 1) has five fuzzy minimal
of the same color there arent any edges.
dominating vertex sets: P1 = {x1,x2,x3} with the degree
of domination 1; P2 = {x1,x3,x4} with degree of domi-
nation 0,6; P3 = {x2,x3} with the degree of domination Fuzzy Hypergraph
0,5; P4 = {x1,x3} with degree of domination 0,2 and ~ ~
P5 = {x2,x4} with the degree of domination 0,3. A Let a fuzzy hypergraph H = (X,E) be given, where
domination fuzzy set of fuzzy graph G is defined as
~ X={xi}, iI={1,2,,n} is a finite set and E~ = {e~k },
~ ~
ek = { < ek ( x ) / x > } , kK={1,2,,m} is a family of
B X = {< 0 / 1 >, < 0,5 / 2 >, < 1 / 3 >, < 1 / 4 >} .
A value fuzzy subsets in X (Monderson & Nair, 2000, Bersh-
tein & Bozhenyuk, 2005). Thus elements of set X are
L = & i = & (1- G ( x, y)) the vertices of hypergraph, a family E~ is the family
i =1, k i =1, k x,yX i
of hypergraph fuzzy edges. The value ek (x) [ 0,1 ] is
an incidence degree of a vertex x to an edge e~k .
706
Fuzzy Graphs and Fuzzy Hypergraphs
707
Fuzzy Graphs and Fuzzy Hypergraphs
Bershtein, L.S., Bozhenyuk, A.V. & Rozenberg, Binary Symmetric Relation: A relation R on a set
I.N. (2005). Fuzzy Coloring of Fuzzy Hypergraph. A is symmetric if for all x,yA xRyyRx.
Computation Intelligence, Theory and Applications. Fuzzy Set: A generalization of the definition of
International Conference 8th Fuzzy Days in Dortmund, the classical set. A fuzzy set is characterized by a
Germany, Sept. 29- Oct. 01, 2004 Proceedings. Bernd membership function, which maps the member of the
Reusch (ed.): Springer-Verlag. 703-711. universe into the unit interval, thus assigning to ele-
Bozhenyuk, A.V., Rozenberg, I.N. & Starostina, T.A. ments of the universe degrees of belongingness with
(2006). Analysis and Research of Flows and Vitality respect to a set.
708
Fuzzy Graphs and Fuzzy Hypergraphs
709
710
Bernardino Arcay
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Logic Applied to Biomedical Image Analysis
for the representation of knowledge that can be used Finally, these algorithms will be used in a study that
in any phase of the analysis. (Wu & Agam & Roy & shows the use and possibilities of fuzzy logic in the F
Armato, 2004) (Vermandel & Betrouni & Taschner & analysis of biomedical images.
Vasseu & Rosseau, 2007)
Fuzzy C-Means (FCM)
FUZZy CLUSTERING ALGORITHmS The FCM algorithm was developed by Bezdek (Bezdek,
APPLIED TO BIOmEDICAL ImAGE 1973) and is the first fuzzy clustering algorithm; it ini-
ANALySIS tially needs the number of clusters in which the image
will be divided and a sample of each cluster. The steps
Medical imaging systems use a series of sensors that of this algorithm are the following:
detect the features of the tissues and the structure of
the organs, which allows us, depending on the used 1. Calculation of the membership of each element
technique, to obtain a great amount of information and to each cluster:
images of the area from different angles. These virtues 1
have converted them into one of the most popular sup- y (i, j ) vk
2
u (i, j ) =
m 1
port techniques in diagnosis, and have given rise to k
y (i, j ) vj
the current distribution and variety in medical images (1)
modalities (X-Rays, PET ) and to new modalities
that are being developed (fMRI). 2. Calculation of the new centroids of the image:
The complexity of the segmentation of biomedical
images is entirely due to its characteristics: the large m
uk (i, j ) y (i, j )
amount of data that need to be analyzed, the loss of vk =
i, j
m , k = 1,, C
uk (i, j )
information associated to the transition from a 3D i, j
(2)
body to a 2D representation, the great variability and
complexity of the shapes that must be analyzed 3. If the error stays below a determined threshold,
Among the most frequently applied focuses to seg- stop. In the contrary case, return to step 1.
ment medical images is the use of pattern recognition
techniques, since normally the purpose of analyzing a The parameters that were varied in the analysis
medical digital image is the detection of a particular of the algorithm were the provided samples and the
element or object: tumors, organs, etc. value of m.
Of all these techniques, fuzzy clustering techniques
have proven to be among the most powerful ones, be- Fuzzy K-Nearest Neighbour (FKNN)
cause they allow us to use several features of the dataset,
each with their own dimensionality, and to partition The Fuzzy K-Nearest Neighbour (Givens Jr. & Gray &
these data; also, they work automatically and usually Keller, 1992) is, as its name indicates, a fuzzy variant
have low computational requirements. Therefore, if of a hard segmentation algorithm. It needs to know
the problem of segmentation is defined as the partition the number of classes into which the set that must be
of the image into regions that have a common feature, classified will be divided.
fuzzy clustering algorithms carry out this partition The element that must be classified is associated
with a set of exemplary elements, called centroids, and to the class of the nearest sample among the K most
obtain a matrix of the size of the original image and similar ones. These K most similar samples are known
with a dimensionality equal to the number of clusters as neighbours; if, for instance, the neighbours are
into which the image was divided; this indicates the classified from more to less similar, the destination
membership of each pixel to each cluster and serves class of the studied element will be the class of the
as a basis for the detection of each element. neighbour that is first on the list.
In the next section we present a series of fuzzy We use the expression in Equation 3 to calculate
clustering algorithms that can be considered to reflect the membership factors of the pixel to the considered
the evolution in this field and its various viewpoints. clusters:
711
Fuzzy Logic Applied to Biomedical Image Analysis
(1 / d (X ,V ))
C
( ) 1 / q 1
algorithm for the segmentation of color images through 2
j k
( ) (u ) K (X , X )
2. A fuzzy part that classifies the pixels that have N q
(u )
N q
()
F Vk =
2) to classify the pixels. The fuzzy points are pixels j =1 (u jk )
N q
712
Fuzzy Logic Applied to Biomedical Image Analysis
Images Used in the Study images were digitalized to a size of 500x500 pixels and
24 color bits per pixel, using an average scanner. F
For the selection of the images that were used in the The histograms of Figure 1 show some of the char-
study (Gonzalez & Woods, 1996), we applied the acteristics that were present in most photographies.
traditional image processing techniques and used the The bands with a larger amount of pixels are those
histogram as basic tool. See Figure 1. of the colors red and green, because of the color of
We observed that the pictures presented a high level the skin and the fact that green is normally used in
of variation, because it was not possible to standardize sanitary tissue.
the different elements that have a determining effect on The histogram is continuous and presents values in
them: position of the patient, luminosity, etc. We selected most levels, which leads us to suppose that the value
the pictures on the basis of a characteristic trait (bad of most points is determined by the combination of
lighting, presence of strange objects, etc.) or on their the three bands instead of only one band, as was to be
normality (correct lighting, good contrast, etc.). The expected. This complicates the analysis of the image
with algorithms.
Figure 1. Photograph that was used in the study, and histogram for each color band
713
Fuzzy Logic Applied to Biomedical Image Analysis
Figure 2. Best results for the RUMA and global measurements for the: FKNN algorithm (a) and MFCM algo-
rithm (b)
(a)
(b)
714
Fuzzy Logic Applied to Biomedical Image Analysis
We also opted for applying a second success rate Figure 2(a) shows the best results for the FKNN
measurement, because although RUMA provides a algorithm, varying the number of samples and neigh- F
value for the area of interest, it may not detect certain bours from 1 sample per cluster to 8 samples, for both
classification errors that can affect the resulting image. measurements.
We use a measure that was developed by our research Figure 2(b) shows the results for the MFCM al-
team and measures the clustering algorithms perform- gorithm, varying the threshold that was required for
ance in classifying all the pixels of the image (Eq. 9). each area in the histogram and the sigma, for both
During the development of the measure, we supposed measurements.
that the error would be smaller if the error of each cluster The FCM and FKCM algorithms are not detailed
classification were smaller, so we measured the error because the parameters that were varied were the value
in the pixel classification of each cluster and weighed of the provided samples and the stop threshold, with
it against the number of pixels of that cluster. a rather stable result for both measurements. In the
Figure 3 we can see one of the results obtained for the
n n Fij algorithm FCM and the imaged labeled Q1.
error = ,i j Figure 4(a) shows the results for the various images
j =1 i =1 MASC j of the test set for RUMA applied to all the algorithms,
(9)
Figure 4(b) shows the results using global measure-
Fij is the number of clusters that belong to cluster ment.
j and were assigned to cluster i, MASCj is the total The tests reveal great variation in the values provided
amount of pixels that belong to class j, and n is the for the different algorithms by each measurement; this
amount of clusters into which the image was divided. is due to the lack of homogeneous conditions in the
The value of this measurement lies between 0 and n; in acquisition of the images and the ensuing differences
order to simplify its interpretation, it was normalized in photographic quality.
between 0 and 1. We can also observe that the results obtained with
The graphics are simplified by inverting the dis- FKCM are considerably better than the results with
crepancy values: the higher the value, the better the FCM, because the first uses a better function to cal-
result. culate the pixel membership. Nevertheless, for most
Figure 3. Image labeled Q1 (left) and one of the results obtained for the FCM algorithm (right)
715
Fuzzy Logic Applied to Biomedical Image Analysis
Figure 4. Best results for the burned area using: RUMA measurement (a) and global success rate measurement
(b)
(a)
(b)
pictures the good results with FKCM are surpassed histogram, which enables it in most cases to find good
by the FKNN and MFCM algorithms. In the case of centroids and make good classifications.
FKNN, this is due to its capacity to use several samples Even though the FKNN algorithm obtains better
for each cluster, which allows a more exact calculation results, in most cases it requires a high number of
of the memberships and less error probability. MFCM, samples (more than 4), which may disturb the medical
on the other hand, carries out a previous analysis of the expert and complicate the implantation in real clini-
716
Fuzzy Logic Applied to Biomedical Image Analysis
717
Fuzzy Logic Applied to Biomedical Image Analysis
tures in data. Ed: J.C. Bezdek, S.K. Pal. IEEE Press. Fuzzy Inference Systems: A sequence of fuzzy
258-263. conditional statements which may contain fuzzy as-
signment and conditional statements. The execution of
Lee, S.U. & Lim, Y.M. (1990) On the color image
such instructions is governed by the compositional rule
segmentation algorithm based on the thresholding and
of inference and the rule of preponderant alternative.
the fuzzy c-means techniques. Pattern Recognition.
(23) 9, 935-952. Fuzzy Operator: Operations that enable us to com-
bine fuzzy sets. A fuzzy operator combines two fuzzy
Pham, D.L. (2001) Spatial models for fuzzy cluster-
sets to give a new fuzzy set. The most frequently used
ing, Computer Vision and Image Understanding (84)
fuzzy operators are the following: equality, contain-
285-297.
ment, complement, intersection and union.
Gonzalez, R., Woods, R. (1996) Digital image process-
Medical Image: A medical specialty that uses x-
ing. Addison-Wesley.
rays, gamma rays, high-frequency sound waves, and
Zhang, Y.J. (1996) A survey on evaluation methods magnetic fields to produce images of organs and other
for image segmentation, Pattern Recognition (29) internal structures of the body. In diagnostic radiology
1335-1346. the purpose is to detect and diagnose disease, whereas
in interventional radiology, imaging procedures are
combined with other techniques to treat certain diseases
and abnormalities.
KEy TERmS
Membership Function: Gives the grade, or degree,
Fuzzification: The process of decomposing a sys- of membership within the fuzzy set, of any element of
tem input and/or output into one or more fuzzy sets. the universe of discourse. The membership function
Many types of curves can be used, but triangular or maps the elements of the universe onto numerical
trapezoidal shaped membership functions are the most values in the interval [0, 1].
common. Segmentation: A process that partitions a digital
Fuzzy Algorithm: An ordered sequence of instruc- image into disjoint (non-overlapping) regions, using
tions which may contain fuzzy assignments, condi- a set of features or characteristics. The output of the
tional statements, repetitive statements, and traditional segmentation step is usually a set of classified elements,
operations. such as tissue regions or tissue edges.
718
719
INTRODUCTION Ha, Seo, Lee & Lee, 1998). Several previous works
have been focused in the development of acquisition
The acquisition system is one of the most sensitive systems for non frequency selective channels with fast
stages in a Direct Sequence Spread Spectrum (DS-SS) SNR variations (Moran, Socor, Jov, Pijoan & Tarrs,
receiver (Peterson, Ziemer & Borth, 1995), due to its 2001) (Mateo & Alsina, 2004).
critical position in order to demodulate the received
information. There are several schemes to deal with
this problem, such as serial search and parallel algo- BACKGROUND
rithms (Proakis, 1995). Serial search algorithms have
slow convergence time but their computational load In 1964, Dr. Lofti Zadeh came out with the term fuzzy
is very low; on the other hand, parallel systems con- logic (Zadeh, 1965). The reason was that traditional
verge very quickly but their computational load is very logic could not answer to some questions with a simple
high. In our system, the acquisition scheme used is the yes or no. So, it handles the concept of partial truth. Fuzzy
multiresolutive structure presented in (Moran, Socor, logic is one of the possibilities to imitate the working of
Jov, Pijoan & Tarrs, 2001), which combines quick a human brain, and so to try to turn artificial intelligence
convergence and low computational load. into real intelligence. Zadeh devised the technique as a
The decisional system that evaluates the acquisition method to solve problems for soft sciences, in particular
stage is a key process in the overall system perform- those that involve human interaction.
ance, being a drawback of the structure. This becomes Fuzzy logic has been proved to be a good option
more important when dealing with time-varying chan- for control in very complex processes, when it is not
nels, where signal to noise ratio (called SNR) is not a possible to produce a mathematical model. Also fuzzy
constant parameter. Several factors contribute to the logic is recommendable for highly non-linear processes,
performance of the acquistion system (Glisic & Vucetic, and overall, when expert knowledge is desirable to
1997): channel distorsion and variations, noise and be performed. But it is not a good idea to apply if
interference, uncertainty about the code phase, and data traditional control or estimators give out satisfying
randomness. The existence of all these variables led results, or for problems that can be modelled in a
us to think about the possibility of using fuzzy logic mathematical way.
to solve this complex acquisition estimation (Zadeh, The most recent works in control and estimation
1973). A fuzzy logic acquisition estimator had already using fuzzy logic applied to direct sequence spread
been tested and used in our research group to control spectrum communication systems are classified into
a serial search algorithm (Alsina, Morn & Socor, three types. The first group uses fuzzy logic to improve
2005) with encouraging results, and afterwards in the the detection stage of the DS-CDMA1 receiver, and they
multiresolutive scheme (Alsina, Mateo & Socor, are presented by Bas et al and Jang et al (Bas, Prez, &
2007), and other applications to this field can be found in Lagunas, 2001)(Jang, Ha, Seo, Lee, & Lee, 1998). The
bibliography as (Bas, Prez & Lagunas, 2001) or (Jang, second group uses fuzzy logic to improve interference
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Logic Estimator for Variant SNR Environments
rejection, with works presented by Bas et al and by sequence; this is to obtain an accurate estimation of
Chia-Chang et al (Bas, & Neira, 2003) (Chia-Chang, its exact phase or timing position (Proakis, 1995). In
Hsuan-Yu, Yu-Fan, & Jyh-Horng, 2005). Finally, time-varying environments this fact becomes even more
fuzzy logic techniques are also improving estimation important because acquisition and tracking perform-
and control in the acquisition stage of the DS-CDMA ance can heavily degrade communication demodulation
receiver, in works by Alsina et al (Alsina, Moran, & reliability. In this work a new multiresolutive acquisi-
Socor, 2005) (Alsina, Mateo, & Socor, 2007). tion system with a fuzzy logic estimator is proposed
(Alsina, Mateo, & Socor, 2007). The fuzzy logic
estimation improves the accuracy of the acquisition
ACQUISITION ESTImATION IN stage compared to the results for the stability control-
DS-CDmA ENVIRONmENTS ler, through the estimation of the probability of being
acquired, and the signal to noise ratio in the channel,
One of the most important problems to be solved in improving the results obtained for the first fuzzy logic
direct sequence spread spectrum systems is to achieve estimator for the multiresolutive structure in (Alsina,
a robust and precise acquisition of the pseudonoise Mateo & Socor, 2007).
720
Fuzzy Logic Estimator for Variant SNR Environments
trained with a decimated version of the PN sequence Four different parameters have been defined as inputs
(PN-DEC). in the fuzzy estimator; three of them referred to the
Under ideal conditions, in a non-frequency selective values of the four modulus averaged acquisition LMS
channel with white Gaussian noise, just one of the filters filters (Wavi[n]), especially the LMS filter adapted to
should locally converge an impulse like bi[k][n ], the decimated sequence PN-DEC (called Wcon[n]),
where b[k] is the information bit, represents the delay and one about the tracking filter (wtr[n]) that refines
between the input signal PN sequence and the reference the search:
one and is the fading coefficient for channel distorsion.
The algorithm is reseted every new data symbol, and Ratio1: it is computed as the quotient of the peak
a modulus smoothing average algorithm is applied to value of the LMS filter Wcon[n] divided into the
each of the LMS solutions (wi[n]) to remove the data mean value of this filter but the maximum, as
randomness component bi[k] dependency, obtaining follows:
nonnegative and averaged impulsional responses
(Wavi[n]). The decisional system uses a peak detection Wcon [T ]
Ratio1 =
algorithm to find which of these filters has detected 1 N
721
Fuzzy Logic Estimator for Variant SNR Environments
Wcon [T ]
Ratio3 = These parameters have been chosen due to the
1 H
1 N
Wavi [n]
H 1 i =1; Wavi Wcon N n=1 information they contain about the probability of
being acquired, and also about the SNR level in the
channel and its variations. They value variations give
Ratio1 track: it is computed as the quotient of the good estimations about acquisition quality and a good
peak value of the LMS tracking filter wtr[], be- measure for SNR, with the appropriate definition of
ing the most precise estimation of the correct IF-THEN rules.
acquisition point, divided into the mean value of
the same filter but the maximum. Output Variables
722
Fuzzy Logic Estimator for Variant SNR Environments
1999). Two output variables will be computed. Acqui- also in detection as in (Verd, 1998) or (Alsina, Morn
sition, giving a value in the range of [0,1], being zero & Socor, 2005). F
when it is Not Acquired and one if it is Acquired. Three
more fuzzy sets have been defined between the extreme If-Then Rules
values; Probably Not Acquired, Not Determined and
Probably Acquired. Acquisition will show a value of A total of sixty rules have been used to define the two
reliability for the correct demodulation of the detector. outputs in function of the input values, evolving the set
The multiresolutive scheme only gives an estimation of of rules used in (Alsina, Mateo & Socor). In figure 2
the acquisition point, and Acquisition value evaluates the the surface for Acquisition for all input variables and
probability of being acquired, and so, the consistency figure 3 shows the surface for SNR Estimation for all
of the bit demodulation done by the receiver. inputs. Rules have been defined to take into account the
The second variable is SNR Estimation which gives best performance, in its range, of each input parameter
a value (in the range of [-30,0] dBs in our experiment) value to design the two outputs of the fuzzy estimator.
of the estimated SNR value in the channel. SNR Estima- This means the value range is only considered where
tion will give us information about channel conditions; their estimations are more reliable for both outputs.
this will help not only in acquisition and tracking, but
723
Fuzzy Logic Estimator for Variant SNR Environments
Figure 4. % of correct estimation of acquisition using the new fuzzy estimator against the stability Control
724
Fuzzy Logic Estimator for Variant SNR Environments
of the system. Despite its good performance, being quite precisely until very low SNR (near -20dBs) by
observed in figure 4, the new fuzzy approach improves the fuzzy block, as the input parameters are not stable F
the results for wider SNR range. The quality of the enough to make a good prediction for lower values;
fuzzy acquisition estimation is much better for very low this is similar to what happens for Acquisition estima-
SNR compared to the stability control, and its global tion. To observe the recovery of the fuzzy estimator
performance for the whole range of SNR in our tests is in case of fast SNR changes in the channel, a detail of
improved. The stability control is not a good estimator SNR Estimation is shown in figure 5.b. This informa-
for critical SNR (considered around -15dBs), and it tion shows the channel state to the receiver, and allows
decreases its reliability when SNR decreases. Despite further work to improve reliability of the demodulation
showing similar performance around critical SNR, by means of different approaches (Verd, 1998).
the fuzzy logic estimation of Acquisition improves its
performance for worse SNR ratios, being over 90% of FUTURE TRENDS
correct estimation all the simulations along.
Future work will be focused on improving the estima-
Fuzzy SNR Estimation in Time Varying tion for the SNR in the fuzzy system. Another goal to
Channels be reached is to increase the stability against channel
changes using previous detected symbols, obtaining
In figure 5.a the acquisition system has been simulated a system with feedback. The fuzzy estimator outputs
in an AWGN channel, forcing severe and very fast SNR will be used to design a controller for the acquisition
changes in order to evaluate the convergence speed of and tracking structure. Its aim will be to improve the
the SNR estimator. SNR Estimation mean value, being a stability of estimation of the correct acquisition point ()
very variable value, is obtained through an exponential through an effective and robust control of its variations
smoothing average filter, and compared to the SNR in for sudden channel changes, so memory will be added
the AWGN channel. The SNR in the channel is estimated to the fuzzy logic estimator. This way the estimator is
Figure 5. a) SNR estimation in a varying SNR channel; b) Detail of SNR Estimation when adapting to an in-
stantaneous SNR variation
725
Fuzzy Logic Estimator for Variant SNR Environments
converted in a controller, and the whole performance Alsina, R.M., Morn, J.A., & Socor, J.C. (2005).
of the receiver is improved. Sequential PN Acquisition Based on a Fuzzy Logic
Further research will also take into account mul- Controller. 8th International Workshop on Artificial
tipath channel conditions and possible variations, Neural Networks, Lecture Notes in Computer Science.
including rake-based receiver detection, in order to (3512) 1238-1245.
reach a good acquisition and tracking performance in
Alsina, R.M., Mateo, C., & Socor, J.C. (2007). Mul-
ionospheric channels. Furthermore, the reliability of the
tiresolutive Adaptive PN Acquisition Scheme with a
results encourages us to use the acquisition estimation
Fuzzy Logic Estimator in Non Selective Fast SNR
to minimize the computational load of the acquisi-
Variation Environments. 9th International Workshop on
tion system for proper channel conditions, thorough
Artificial Neural Networks, Lecture Notes in Computer
decreasing the number of iterations to converge in
Science. (4507) 367-374.
the LMS adaptive filters. A more efficient fuzzy logic
control can be designed in order to achieve a better Bas, J., Prez, A., & Lagunas, M.A. (2001). Fuzzy
trade-off between computational load (referred to the Recursive Symbol-by-Symbol Detector for Single
LMS filters adaptation) and acquisition point estimation User CDMA Receivers. International Conference on
accuracy (). Acoustics, Speech and Signal Processing.
Bas, J., & Neira, A.P. (2003). A fuzzy logic system for
interference rejection in code division multiple access.
CONCLUSION
The 12th IEEE International Conference on Fuzzy
Systems, (2), 996-1001.
The new proposed acquisition system estimator has
already been exposed, and some results have been Chia-Chang, H., Hsuan-Yu, L., Yu-Fan, C., & Jyh-
compared against a stability control strategy within Horng, W. (2005). Adaptive interference supression
the multiresolutive acquisition system in a variant SNR using fuzzy-logic-based space-time filtering techniques
environment. The main advantage of a multiresolutive in multipath DS-CDMA. The 6th IEEE International
fuzzy estimator is its reliability when evaluating the Workshop on Signal Processing Advances in Wireless
probability of acquisition, also its stability, and its Communications, p. 22-26.
quick convergence when there are fast channel SNR
changes. The computational load of a fuzzy estimator Glisic, S.G., & Vucetic, B. (1997). Spread Spectrum
is higher than the same cost in a stability control. The CDMA Systems for Wireless Communications. Artech
mean number of FLOPS in a DSP needed to do all House Publishers.
the process is greater compared to the conventional Jang, J., Ha, K., Seo, B., Lee, S., & Lee, C.W. (1998).
stability control. This has to be taken into account A Fuzzy Adaptive Multiuser Detector in CDMA Com-
because the multiresolutive structure should make its munication Systems. International Conference on
computational cost minimum to work on-line with the Communications.
received data. Further work will be done to compare
the computational load added to the structure to the Leekwijck, W.V., & Kerre, E.E. (1999). Defuzzification:
global improvements of the multiresolutive receiver, Criteria and Classification. Fuzzy Sets and Systems.
to decide whether this cost increase is affordable for (108) 159-178.
the acquisition system, or it is not. Mateo, C., & Alsina, R.M. (2004). Diseno de un Sistema
de Control Adaptativo a las Condiciones del Canal para
un Sistema de Adquisicin de un Receptor DS-SS. XIX
REFERENCES Congreso de la Unin Cientfica Nacional de Radio.
Alsina, R.M., Morn, J.A., & Socor, J.C. (2003). Morn, J.A., Socor, J.C., Jov, X., Pijoan, J.L., & Tar-
Multiresolution Adaptive Structure for Acquisition and rs, F. (2001). Multiresolution Adaptive Structure for
Detection in DS-SS Digital Receiver in a Multiuser Acquisition in DS-SS Receiver. International Confer-
Environment. IEEE International Symposium on Signal ence on Acoustics, Speech and Signal Processing.
Processing and its Applications.
726
Fuzzy Logic Estimator for Variant SNR Environments
Peterson, R.L., Ziemer, R.E., & Borth, D.E. (1995). Fuzzy Sets: Fuzzy sets are sets whose members
Spread Spectrum Communications Handbook. Pren- have a degree of membership. They were introduced to F
tice Hall. be an extension of the classical sets, whose elements
membership was assessed by binary numbers.
Proakis, J.G. (1995). Digital Communications. Mc-
Graw-Hill. Fuzzyfication: It is the process of defining the degree
of membership of a crisp value for each fuzzy set.
Verd, S. (1998). Multiuser Detection. Cambridge
University Press. IF-THEN Rules: They are the typical rules used
by expert fuzzy systems. The IF part is the anteced-
Zadeh, L.A. (1965). Fuzzy Sets. Information and
ent, also named premise, and the THEN part is the
Control. (8), 338-353.
conclusion.
Zadeh, L.A. (1973). Outline of a New Approach to
Linguistic Variables: They take on linguistic
the Analysis of Complex Systems and Decision Pro-
values, which are words, with associated degrees of
cesses. IEEE Transactions Systems Man Cybernetics.
membership in each set.
(3), 28-44.
Linguistic Term: It is a subjective category for a
Zadeh, L.A. (1988). Fuzzy Logic. Computer, 83-92.
linguistic variable. Each linguistic term is associated
with a fuzzy set.
Membership Function: It is the function that gives
KEy TERmS the subjective measures for the linguistic terms.
Defuzzyfication: After computing the fuzzy rules,
and evaluating the fuzzy variables, this is the process
the system follows to obtain a new membership func- ENDNOTES
tion for each output variable.
1
DS-CDMA stands for Direct Sequence Code
Degree of Truth: It denotes the extent to which a
Division Multiple Access.
preposition is true. It is important to not be confused 2
The received signal x[n] is sampled at M sam-
with the concept of probability.
ples per chip in order to give the necessary time
Fuzzy Logic: Fuzzy logic was derived from Fuzzy resolution for the tracking stage.
Set theory, working with a reason that it is approxi- 3
where PG is the length of the pseudonoise se-
mate rather than precise, deducted from the typical quences, also called PN sequences and 'ceil(x)'
predicate logic. (expressed as N = [x]) is the smaller integer greater
than x.
727
728
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Rule Interpolation
Figure 1. KH method for two SISO rules: A1 B1 and A2 B2 , conclusion y of the observation x
729
Fuzzy Rule Interpolation
to hold the universal approximation property in (Tikk, while its flanks by CRF (the method is restricted to trap-
Jo, Kczy, Vrlaki, Moser & Gedeon, 2002) and (Tikk, ezoidal membership functions). The main advantages of
2003). This method takes into account all the rules of this method are its applicability for multi-dimensional
the rule base in the calculation of the conclusion. The problems and its relative simplicity.
method adapts a modification of the Shepard operator Conceptually different approaches were proposed
based interpolation (Shepard, 1968). The rules are taken in (Baranyi, Kczy & Gedeon, 2004) based on the
into account in extent to the inverse of the distance relation, semantic and inter-relational features of the
between their antecedents and the observation. The fuzzy sets. The family of these methods applies a two
universal approximation property holds if the distance step General Methodology (referred as GM). The
function is raised to the power of at least the number notation also reflects the feature, that methods based on
of the antecedent dimension. GM can handle arbitrary shaped fuzzy sets. The basic
Another modification of KH method is the modi- concept is to divide the task of the FRI into two main
fied alpha-cut based interpolation method (referred as steps. The first step is to determine the reference point
MACI) (fully extended in (Tikk & Baranyi, 2000)), of the conclusion based on the ratio of the distances
which alleviates completely the abnormality problem. between the reference points of the observation and the
MACIs main idea is the following: it transforms fuzzy antecedents. Then accomplishing the first step, based
sets of the input and output universes to such a space on the existing rules a new, interpolated rule is gener-
where abnormality is excluded, then computes the ated for the reference point of the observation and the
conclusion there, which is finally transformed back to reference point of the conclusion. In the second step of
the original space. MACI uses vector representation the method, a single rule reasoning method (revision
of fuzzy sets. The original method was introduced function) is applied to determine the final fuzzy conclu-
in (Yam & Kczy, 1997) and it was applicable for sion based on the similarity of the fuzzy observation
CNF sets only. This restriction was latter relaxed in and the antecedent of the new interpolated rule. For
(Tikk, Baranyi, Gedeon & Muresan 2001) by paying both the main steps of GM numerous solutions exists,
its expanse in higher computational demand than the therefore the GM stands for an FRI concept, or a fam-
original method. MACI is one of the most applied FRI ily of FRI methods.
methods (Wong, Tikk, Gedeon & Kczy, 2005), since it A rather different application oriented aspect of
preserves advantageous computational and approximate the FRI emerges in the concept of the Fuzzy Inter-
nature of KH method, while it excludes its chance for polation based on Vague Environment FRI method
abnormal conclusion. (referred as FIVE), originally introduced in (Kovcs,
Another FRI method was proposed by Kczy et 1996), (Kovcs & Kczy, 1997a), (Kovcs & Kczy,
al. in (Kczy, Hirota & Gedeon, 1997). It takes into 1997b) and extended with the ability of handling fuzzy
consideration only the two closest surrounding rules observation in (Kovcs, 2006). It was developed to
to the observation and its main idea is the conservation fit the speed requirements of direct fuzzy control,
of the relative fuzziness (referred as CRF method). where the conclusions of the fuzzy controller are ap-
This notion means that the left (and right) fuzziness plied directly as control actions in a real-time system.
of the approximated conclusion in proportion to the The main idea of the FIVE method is based on the
flanking fuzziness of the neighbouring consequent fact that most of the control applications serves crisp
should be the same as the left (and right) fuzziness of observations and requires crisp conclusions from the
the observation in proportion to the flanking fuzziness controller. Adopting the idea of the vague environment
of the neighbouring antecedent. The original method (Klawonn, 1994), FIVE can handle the antecedent and
is restricted to CNF sets only. consequent fuzzy partitions of the fuzzy rule base by
An improved fuzzy interpolation technique for scaling functions (Klawonn, 1994) and therefore turn
multidimensional input spaces (referred as IMUL) was the fuzzy interpolation to crisp interpolation. In FIVE
originally proposed in (Wong, Gedeon & Tikk, 2000), any crisp interpolation, extrapolation, or regression
and described more detailed in (Wong, Tikk, Gedeon & method can be adapted very simply for FRI. Because
Kczy, 2005). IMUL applies a combination of CRF and of its simple multidimensional applicability, in FIVE,
MACI methods, and mixes the advantages of both. The originally the Shepard operator based interpolation
core of the conclusion is determined by MACI method, (Shepard, 1968) was adapted.
730
Fuzzy Rule Interpolation
731
Fuzzy Rule Interpolation
Environment of the Fuzzy Rule base as a Practical resulting always in acceptable conclusion. Tatra Moun-
Alternative of the Classical CRI. Proceedings of the tains Mathematical Publications, (21), 73-91.
7th International Fuzzy Systems Association World
D. Tikk, I. Jo, L. T. Kczy, P. Vrlaki, B. Moser, and
Congress, Prague, Czech Republic, 144-149.
T. D. Gedeon (2002). Stability of interpolative fuzzy
Sz. Kovcs, and L.T. Kczy (1997b). The use of the KH-controllers. Fuzzy Sets and Systems, (125) 1,
concept of vague environment in approximate fuzzy 105-119.
reasoning. Fuzzy Set Theory and Applications, Tatra
D. Tikk (2003). Notes on the approximation rate of
Mountains Mathematical Publications, Mathematical
fuzzy KH interpolator. Fuzzy Sets and Systems, (138)
Institute Slovak Academy of Sciences, Bratislava,
2, 441-453.
Slovak Republic, (12), 169-181.
Y. Yam, and L. T. Kczy (1997). Representing mem-
Sz. Kovcs (2005). Interpolative Fuzzy Reasoning in
bership functions as points in high dimensional spaces
Behaviour-based Control, Advances in Soft Computing,
for fuzzy interpolation and extrapolation. Dept. Mech.
Computational Intelligence, Theory and Applications,
Automat. Eng., Chinese Univ. Hong Kong, Technical
Bernd Reusch (Ed.), Springer, Germany, ISBN 3-540-
Report CUHK-MAE-97-03.
22807-1, (2), 159-170.
G. Vass, L. Kalmr and L. T. Kczy (1992). Extension
Sz. Kovcs (2006). Extending the Fuzzy Rule Interpola-
of the fuzzy rule interpolation method. Proceedings
tion FIVE by Fuzzy Observation. Advances in Soft
of the International Conference Fuzzy Sets Theory
Computing, Computational Intelligence, Theory and
Applications (FSTA92), Liptovsky Mikulas, Czecho-
Applications, Bernd Reusch (Ed.), Springer Germany,
slovakia, 1-6.
ISBN 3-540-34780-1, 485-497.
K. W. Wong, T. D. Gedeon, and D. Tikk (2000). An
P. M. Larsen (1980). Industrial application of fuzzy
improved multidimensional -cut based fuzzy interpola-
logic control. Int. J. of Man Machine Studies, (12) 4,
tion technique. Proceedings of the International Confer-
3-10.
ence Artificial Intelligence in Science and Technology
E. H. Mamdani and S. Assilian (1975). An experiment (AISAT2000), Hobart, Australia, 29-32.
in linguistic synthesis with a fuzzy logic controller. Int.
K. W. Wong, D. Tikk, T. D. Gedeon, and L. T. Kczy
J. of Man Machine Studies, (7), 1-13.
(2005). Fuzzy Rule Interpolation for Multidimensional
I. Perfilieva (2004). Fuzzy function as an approximate Input Spaces With Applications. IEEE Transactions on
solution to a system of fuzzy relation equations. Fuzzy Fuzzy Systems, ISSN 1063-6706, (13) 6, 809-819.
Sets and Systems, (147), 363-383.
L. A. Zadeh (1973). Outline of a new approach to the
D. Shepard (1968). A two dimensional interpolation analysis of complex systems and decision processes.
function for irregularly spaced data. Proc. 23rd ACM IEEE Trans. on SMC, (3), 28-44.
Internat. Conf., 517-524.
M. Sugeno (1985). An introductory survey of fuzzy
control. Information Science, (36), 59-83. KEy TERmS
T. Takagi and M. Sugeno (1985). Fuzzy identification
-Cut of a Fuzzy Set: Is a crisp set, which holds
of systems and its applications to modeling and control.
the elements of a fuzzy set (on the same universe of
IEEE Trans. on SMC, (15), 116-132.
discourse) whose membership grade is grater than, or
D. Tikk and P. Baranyi (2000). Comprehensive analy- equal to . (In case of strong -cut it must be grater
sis of a new fuzzy rule interpolation method. In IEEE than .)
Transaction on Fuzzy Systems, (8) 3, 281-296.
-Covering Fuzzy Partition: The fuzzy partition
D. Tikk, P. Baranyi, T. D. Gedeon, and L. Muresan (a set of linguistic terms (fuzzy sets)) -covers the
(2001). Generalization of a rule interpolation method universe of discourse, if for all the elements in the
732
Fuzzy Rule Interpolation
universe of discourse a linguistic term exists, which The fuzzy rules are represented as a fuzzy relation of
have a membership value grater or equal to . the rule antecedent and the rule consequent linguistic F
terms. In case of the Zadeh - Mamdani - Larsen composi-
Complete (or Dense) Fuzzy Rule Base: A fuzzy
tional rule of inference (Zadeh, 1973) (Mamdani, 1975)
rule base is complete, or dense if all the input universes
(Larsen, 1980) the fuzzy rule relations are calculated
are -covered by rule antecedents, where >0. In case
as the fuzzy cylindric closures (t-norm of the cylindric
of Complete Fuzzy Rule Base, for all the possible
extensions) (Klir & Folger, 1988) of the antecedent and
multidimensional observations, a rule antecedent must
the rule consequent linguistic terms.
exist, which has a nonzero activation degree. Note, that
completeness of the fuzzy rule base is not equivalent Fuzzy Rule Interpolation: A way for fuzzy infer-
with covering fuzzy partitions on each antecedent uni- ence by interpolation of the existing fuzzy rules based
verse (required but not sufficient in multidimensional on various distance and similarity measures of fuzzy
case). Usually the number of the rules of a complete sets. A suitable method for handling sparse fuzzy rule
rule base is O(MI), where M is the average number of bases, since FRI methods can provide reasonable (in-
the linguistic terms in the fuzzy partitions and I is the terpolated/extrapolated) conclusions even if none of the
number of the input universe. existing rules fires under the current observation.
Convex and Normal Fuzzy (CNF) Set: A fuzzy set Sparse Fuzzy Rule Base: A fuzzy rule base is
defined on a universe of discourse holds total ordering, sparse, if an observation may exist, which hits no rule
which has a height (maximal membership value) equal antecedent. (The rule base is not complete.)
to one (i.e. normal fuzzy set), and having membership
Vague Environment (VE): The idea of a VE is based
grade of any elements between two arbitrary elements
on the similarity (or in this case the indistinguishability)
grater than, or equal to the smaller membership grade
of the considered elements. In VE the fuzzy membership
of the two arbitrary boundary elements (i.e. convex
fuzzy set). function A (x ) is indicating level of similarity of x to a
specific element a that is a representative or prototypical
Fuzzy Compositional Rule of Inference (CRI): element of the fuzzy set A (x ), or, equivalently, as the
The most common fuzzy inference method. The fuzzy degree to which x X is indistinguishable from a X
conclusion is calculated as the fuzzy composition (Klir (Klawonn, 1994). Therefore the -cuts of the fuzzy set
& Folger, 1988) of the fuzzy observation and the fuzzy A (x ) are the sets which contain the elements that are
rule base relation (see Fuzzy dot representation of 1 -indistinguishable from a. Two values in a VE
fuzzy rules). In case of the Zadeh - Mamdani - Larsen are -distinguishable if their distance is greater than
max-min compositional rule of inference (Zadeh, 1973) . The distances in a VE are weighted distances. The
(Mamdani, 1975) (Larsen, 1980) the applied fuzzy weighting factor or function is called scaling function
composition is the max-min composition of fuzzy rela- (factor) (Klawonn, 1994). If the VE of a fuzzy parti-
tions (max stands for the applied s-norm and min tion (the scaling function or at least the approximate
for the applied t-norm fuzzy operations). scaling function (Kovcs, 1996), (Kovcs & Kczy,
1997b)) exists, the member sets of the fuzzy partition
Fuzzy Dot Representation of Fuzzy Rules: The
can be characterized by points in that VE.
most common understanding of the If-Then fuzzy rules.
733
734
Guanrong Chen
City University of Hong Kong, China
Tree Partitioning
mAIN FOCUS OF THE CHAPTER
Figure 1 (b) visualizes a tree partition. The tree par-
Structure Identification titioning results from a series of guillotine cuts. Each
region is generated by a guillotine cut, which is made
In structure identification of a fuzzy model, the first entirely across the subspace to be partitioned. At the
step is to select some appropriate input variables from (k 1)st iteration step, the input space is partitioned
the collection of possible system inputs; the second into k regions. Then a guillotine cut is applied to one of
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Fuzzy Systems Modeling
these regions to further partition the entire space into is further defined. This procedure is repeated until
k + 1 regions. There are several strategies for deter- overlapping is resolved. F
mining which dimension to cut, where to cut at each
step, and when to stop. This flexible tree partitioning Parameters Identification
algorithm resolves the problem of curse of dimensional-
ity. However, more membership functions are needed After the system structure has been determined, pa-
for each input variable, and they usually do not have rameters identification is in order. In this process, the
clear linguistic meanings; moreover, the resulting fuzzy optimal parameters of a fuzzy model that can best
model consequently is less descriptive. describe the input-output behavior of the underlying
system are searched by optimization techniques.
Scatter Partitioning Sometimes, structure and parameters are identi-
fied under the same framework through fuzzy model-
Figure 1 (c) illustrates a scatter partition. This method ing. There are virtually many different approaches
extracts fuzzy rules directly from numerical data (Abe to modeling a system using the fuzzy set and fuzzy
& Lan, 1995). Suppose that a one-dimensional output, system theories (Chen & Pham, 1999, 2006), but the
y, and an m-dimensional input vector, x, are available. classical least-squares optimization and the general
First, the output space is divided into n intervals, [y0, Genetic Algorithm (GA) optimization techniques are
y1], (y1, y2], , (yn1, yn], where the ith interval is most popular. They are quite generic, effective, and
called output interval i. Then, activation hyperboxes competitive with other successful non-fuzzy types of
are determined, which define the input region cor- optimization-based modeling methods such as neural
responding to the output interval i, by calculating the networks and statistical Monte Carlo.
minimum and maximum values of the input data for
each output interval. If the activation hyperbox for the An Approach Using Least-Squares
output interval i overlaps with the activation hyperbox Optimization
for the output interval j, then the overlapped region
is defined as an inhibition hyperbox. If the input data A fuzzy system can be described by the following
for output intervals i and/or j exist in the inhibition generic form:
hyperbox, then within this inhibition hyperbox one or
two additional activation hyperboxes will be defined. m
f ( x) = A k g k ( x) = A g ( x)
T
Moreover, if two activation hyperboxes are defined and
k =1 (1)
they overlap, then an additional inhibition hyperbox
735
Fuzzy Systems Modeling
are used as an example to illustrate the computational Step 2. For each j, 2 j ml, compute
algorithm.
T
One approach to initializing the fuzzy basis func- wk g i
tions is to choose n initial basis functions, gk(x), in the ckj(i ) = T
wk wk
736
Fuzzy Systems Modeling
Procedure GA
An Approach Using Genetic Algorithms Begin
t=0
The parameter identification procedure is generally very Initialize P(t)
tedious for a large-scale complex system, for which the Evaluate P(t)
GA approach has some attractive features such as its While not finished do
great flexibility and robust optimization ability (Man, Begin
Tang, Kwong & Halang, 1997). t=t+1
GA is attributed to Holland (1975), which was ap- Reproduce P(t) from P(t 1)
plied to fuzzy modeling and fuzzy control in the 1980s. Crossover individuals in P(t)
737
Fuzzy Systems Modeling
738
Fuzzy Systems Modeling
according to the number of crossover points. Uniform Figure 2. A chromosome structure for fuzzy modeling
crossover often works well with small populations of F
chromosomes and for simpler problems (Soucek &
Group, 1992).
Mutation
GA Parameters
739
Fuzzy Systems Modeling
740
Fuzzy Systems Modeling
undefined; but in this case one may introduce a penalty f1 = 1, f2 = fa1 = a,..., fm = am
factor, 0 < p < 1, and compute p f(J) instead of f(J).
If an individual with a very high fitness value appears for a fitness scaling factor a (0,1).
at the earlier stage, this fitness function may cause
early convergence of the solution, thereby stopping GA-Based Fuzzy Modeling with Fine Tuning
the algorithm before optimality is reached. To avoid
this situation, the individuals may be sorted according GA generally does not guarantee the convergence to a
to their raw fitness values, and the new fitness values global optimum. In order to improve this, the gradient
are determined recursively by descent method can be used to fine tune the parameters
741
Fuzzy Systems Modeling
identified by GA. Since GA usually can find a near J. Liska & S. S. Melsheimer (1994). Complete design
global optimum, to this end fine tuning of the member- of fuzzy logic systems using genetic algorithms. Proc.
ship function parameters in both IF and THEN parts, of IEEE Conf. on Fuzzy Systems. 1377-1382.
e.g., by a gradient descent method, can generally lead
K. F. Man, K. S. Tang, S. Kwong & W. A. Halang
to a global optimization (Chang, Joo, Park & Chen,
(1997). Genetic Algorithms for Control and Signal
2002; Goldberg, 1989).
Processing. Springer.
B. Soucek & T. I. Group (1992). Dynamic Genetic and
FUTURE TRENDS Chaotic Programming. Wiley.
W. Spears & V. Anand (1990). The use of crossover in
This will be further discussed elsewhere in the fu-
genetic programming. NRL Technical Report, AI Center,
ture.
Naval Research Labs, Washington D. C.
T. Takagi & M. Sugeno (1985) Fuzzy identification of
CONCLUSION systems and its applications to modeling and control.
IEEE Trans. on Systems, Man and Cybernetics. 15:
Fuzzy systems identification is an important and yet 116-132.
challenging subject for research, which calls for more
L. X. Wang & J. M. Mendel (1996). Generating fuzzy
efforts from the control theory and intelligent systems
rules by learning from examples. IEEE Trans. on Sys-
communities, to reach another high level of efficiency
tems, Man and Cybernetics. 22:1414-1427.
and success.
L. A. Zadeh (1973). Outline of a new approach to the
analysis of complex systems and decision processes.
REFERENCES IEEE Trans. on Systems, Man and Cybernetics. 3:
28-44.
S. Abe & M. S. Lan (1995). Fuzzy rules extraction
A. M. S. Zalzala & P. J. Fleming (1997) Genetic Algo-
directly from numerical data for function approxima-
rithms in Engineering Systems. IEE Press.
tion. IEEE Trans. on Systems, Man and Cybernetics.
25: 119-129.
W. Chang, Y. H. Joo, J. B. Park & G. Chen (2002).
Design of robust fuzzy-model-based controller with KEy TERmS
sliding mode control for SISO nonlinear systems. Fuzzy
Sets and Systems. 125:1-22. Fuzzy Rule: A logical rule established based on
fuzzy logic.
G. Chen & T. T. Pham (1999). Introduction to Fuzzy
Sets, Fuzzy Logic, and Fuzzy Control Systems. CRC Fuzzy System: A system formulated and described
Press. by fuzzy set-based real-valued functions.
G. Chen & T. T. Pham (2006). Introduction to Fuzzy Genetic Algorithm: An optimization scheme based
Systems. CRC Press. on biological genetic evolutionary principles.
742
Fuzzy Systems Modeling
743
744
J. Andrs Serantes
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Gene Regulation Network Use
GENETIC REGULATORy NETWORK that will act as messages for themselves as well as for
mODEL neighbour cells. G
The system is supposed to express a global behaviour
The cells of a biological system are mainly determined towards the information processing. Such behaviour
by the DNA strand, the genes, and the proteins contained would emerge from the information encoded in a set of
by the cytoplasm. The DNA is the structure that holds variables of the cell that, in analogy with the biological
the gene-encoded information that is needed for the cells, will be named genes.
development of the system. The genes are activated or The central element of our model is the artificial cell.
transcribed thanks to the protein shaped-information Every cell has a binary string-encoded information for
that exists in the cytoplasm, and consist of two main the regulation of its functioning. Following the biologi-
parts: the sequence, which identifies the protein that cal analogy, this string will be called DNA. The cell also
will be generated if the gene is transcribed, and the has a structure for the storage and management of the
promoter, which identifies the proteins that are needed proteins generated by the own cell and those received
for gene transcription. from neighbourhood cells; following the biological
Another remarkable aspect of biological genes is the model, this structure is called cytoplasm.
difference between constitutive genes and regulating The DNA of the artificial cell consists of functional
genes. The latter are transcribed only when the proteins units that are called genes. Each gene encodes a protein
identified in the promoter part are present. The constitu- or message (produced by the gene). The structure of a
tive genes are always transcribed, unless inhibited by gene has four parts (see Figure 1):
the presence of the proteins identified in the promoter
part, acting then as gene oppressors. Sequence: the binary string that corresponds to
The present work has tried to partially model this the protein that encodes the gene
structure with the aim of fitting some of its abilities into Promoters: is the gene area that indicates the
a computational model; in this way, the system would proteins that are needed for the genes transcrip-
have a structure similar that is similar to the above and tion.
will be detailed in the next section. Constituent: this bit identifies if the gene is con-
stituent or regulating
Proposed Model Activation percentage (binary value): the per-
centage of minimal concentration of promoters
Various model variants were developed on the basis proteins inside the cell that causes the transcription
of biological concepts. The proposed artificial cellular of the gene.
system is based on the interaction of artificial cells
by means of messages that are called proteins. These The transcription of the encoded protein occurs
cells can divide themselves, die, or generate proteins when the promoters of the non-constituent genes ap-
pear in a certain rate at the cellular cytoplasm. On the
other hand, the constituent genes are expressed until
such expression is inhibited by the present rate of the
promoter genes.
Figure 1. Structure of a system gene The other fundamental element for keeping and
managing the proteins that are received or produced by
the artificial cell is the cytoplasm. The stored proteins
DNA
have a certain life time before they are erased. The
cytoplasm checks which and how many proteins are
needed for the cell to activate the DNA genes, and as
G ENE
such responds to all the cellular requirements for the
concentration of a given type of protein. The cytoplasm
TRUE 1001 . 0010 1000
745
Gene Regulation Network Use
Figure 2. Logical operators match This analogous functioning seems to indicate that the
system could execute more complex tasks, as ANNs
G e n e S tru ctu re do (Hassoun M.H. 1995).
FUTURE TRENDS
AND A B C
F als e A B C
0
1
0
0
0
0 The final objective of this group is to develop an arti-
ficial model which is based on the biologically model
0 1 0
1 1 1
746
Gene Regulation Network Use
Dellaert F. & Beer R.D. (1996) A Developmental Model Lindenmayer, A. (1968) Mathematical models for
for the Evolution of Complete Autonomous Agent In cellular interaction in development: Part I and II. G
From animals to animats: Proceedings of the Forth Journal of Theorical Biology. Vol. 18 pp. 280-299,
International Conference on Simulation of Adaptive pp. 300-315.
Behavior, Massachusetts, September 9-13, pp. 394-
Mjolsness, E., Sharp, D.H., & Reinitz, J. (1995) A Con-
401, MIT Press.
nectionist Model of Development. Journal of Theoretical
Eggenberger P. (1996) Cell Interactions as a Control Biology 176: 291-300.
Tool of Developmental Processes for Evolutionary
Stanley, K. & Miikkulainen, R. (2003) A Taxonomy for
Robotics. In From animals to animats: Proceedings of
Artificial Embryogeny. In Proceedings Artificial Life
the Forth International Conference on Simulation of
9, pp. 93-130. MIT Press.
Adaptive Behavior, Massachusetts, September 9-13,
pp. 440-448, MIT Press. Turing, A.(1952) The chemical basis of morphogen-
esis. Philosofical Transactions of the Royal Society B,
Fernndez-Blanco E., Dorado J., Rabual J.R., Gestal
vol.237, pp. 37-72
M. & Pedreira N. (2007) A New Evolutionary Compu-
tation Technique for 2D Morphogenesis and Informa-
tion Processing. WSEAS Transactions on Information KEy TERmS
Science & Applications vol. 4(3) pp.600-607, WSEAS
Press. Artificial Cell: Each of the elements that process
Gruau F. (1994)Neural network synthesis using cellular the orders codified into the DNA.
encodingand the genetic algorithm. Doctoral disserta- Artificial Embryogeny: Under this term are all the
tion, Ecole Normale Superieure de Lyon, France. processing models which use biological development
Hassoun M.H. (1995) Fundamentals of Artificial Neural ideas as inspiration.
Networks. University of Michigan Press, MA, USA Cytoplasm: Part of an artificial cell which is respon-
Hornby, G. S. & Pollack J. B. (2002) Creating high-level sible of management the protein-shaped messages.
components with a generative representation for body DNA: Set of rules which are responsible of the
brain evolution. Artificial Life vol.8 issue 3. cell behaviour.
Kauffman, S.A. (1969) Metabolic stability and epi- Gene: Each of the rules which codifies one action
genesis in randomly constructed genetic nets. Journal of the cell.
of Theoretical Biology 22 pp. 437-467.
Gene Regulatory Network: Term that names the
Kitano, H. (1990). Designing neural networks using connexion between the different genes of a DNA. The
genetic algorithm with dynamic graph generation connexion identifies the genes that are necessary for
system. Complex Systems vol. 4 pp. 461-476 the transcription of other ones.
Kumar, S. & Bentley P.J. (editors) (2003). On Growth, Protein: This term identifies every kind of the mes-
Form and Computers. Academic Press. London UK. sages that receives an artificial cell.
747
748
At first a population of candidate solutions is generated. for fulfilling production orders. In our formulation,
A fitness function or objective function is then defined, we use the following denotational symbols: G
and each candidate solution in the population is evalu-
ated to determine its performance or fitness. Based on M: the cardinality of the the set of machines
the relative fitness value, two candidate solutions are available
selected probabilistically as parents. Recombination T: the cardinality of the the set of tools avail-
is then applied probabilistically to the two parents to able
form two offspring, and each of the offspring solutions P: the cardinality of the set of products to be
contains some characteristics from its parent solutions. manufactured
After this, mutation is applied sparingly to components MachineUtilization: the function of total machine
of each offspring solution. The newly generated off- utilization
spring are then used to replace the low-fitness members processing_timeproduct,tool,machine: the time needed to
in the population. This process is repeated until a new manufacture product product using tool tool on
population is formed. Through the above iterative cycles machine machine
of operations, GAs is able to develop better solutions available_timemachine: the total available processing
through progressive generations. time on machine machine
In order to prepare for the investigation of the effects capacitymachine: the total number of slots available
of genetic operations in the sequel of current research, on machine machine
we apply the GA technique to the optimization model- machine, tool, product: indicators for machines,
ing of manufacturing systems in next section. tools, and products to be manufactured corre-
spondingly
slottool: the number of slot required by machine
A GA-BASED BATCH SELECTION tool tool
SySTEm quantityproduct: the quantity of product product to
be manufactured in a shift
Batch selection is one of the most critical tasks in the Qproduct: the quantity of product product ordered by
development of a master production plan for flexible customers as specified in the production table
manufacturing systems (FMSs). In the manufacturing
process, each product requires processing by differ- Fitness (or Objective) Function
ent sets of tools on different machines with different
operations performed in a certain sequence. Each ma- The objective is to identify a batch of products to be
chine has its own limited space capacity in mounting manufactured so that the total machine utiliztion rate
tools and limited amount of available processing time. will be maximized. See Exhibit A.
Under various kinds of resource constraints, choosing The above objective function is to be maximized
an optimal batch of products to be manufactured in a subject to the following resource constraints:
continuous operational process with the purpose to
maximize machine utilization or profits has made the 1. Machine capacity constraint (see Exhibit B)
batch selection decision a very hard problem. While this
problem is usually manageable for manufacturing small The above function f() is used to determine if tool
number of products, it quickly becomes intractable if tool needs to be mounted on machine machine for the
the number of products grows even slightly large. The processing of the current batch of product.
time required to solve the problem exhaustively would
grow in a non-deterministic polynomial manner with 2. Machine time constraint (see Exhibit C)
the number of products to be manufactured. 3. Non-negativity and integer contraints
Batch selection affects all the subsequent deci-
sions in job shop scheduling for satisfying the master Encoder/Decoder
production plan, and holds the key to the efficient
utilization of resources in generating production plans The Encoder/Decoder is a representation scheme used
to determine how the problem is structured in the GA
749
Genetic Algorithm Applications to Optimization Modeling
Exhibit A.
Maximize MachineUtilization(quantity1 , quantity2 ,....., quantity P ) =
M T P
machine =1 tool =1 product =1
processing _ time product ,tool ,machine quantity product
M
machine =1
available _ timemachine
Exhibit B.
T P
slot
tool =1
tool f( processing _ time
product =1
product ,tool ,1 quantity product ) capacity1
T P
slot
tool =1
tool f( processing _ time
product =1
product ,tool , M quantity product ) capacity M
1, if y > 0
where f ( y ) =
0, if y = 0
Exhibit C.
P T
processing_time
product =1 tool =1
product ,tool ,1 quantity product available _ time1
P T
processing_time
product =1 tool =1
product ,tool , M quantity product available _ time M
quantity product 0,
system. The way in which candidate solutions are en- is one of the most commonly used encoding schemes
coded is one of a central factor in the success of GAs in practicebinary encoding.
(Mitchell, 1996). Generally, the solution encoding can A candidate solution for the batch selection task is a
be defined over an alphabet which might consist of vector of quantities to be manufactured for P products.
binary digits, continuous numbers, integers, or symbols. Let the entire solution space be denoted as solution
However, choosing the best encoding scheme is almost (see Exhibit D).
tantamount to solving the problem itself (Mitchell, The encoding function encodes the quantity to be
1996). In this research, our GA system is mainly based produced for each product as an l-bit binary string, and
on Holland's canonical model (Holland, 1992), which then forms a concatenation of the strings for P products
750
Genetic Algorithm Applications to Optimization Modeling
Exhibit D.
P
G
solution = [0, 1,, Q
product =1
product ]
quantity product (2 l 1)
0 . 5 , otherwise. Our system follows the generation-evaluation-selec-
product
max {Q product } tion-crossover-mutation cycles in searching for ap-
=1, 2 ,..., P
propriate solution strings for the batch selection task.
It starts with generating an initial population, Pop, of
l pop_size candidate solution strings at random. In each
In the above formula, 2 1 is the value of an l- iteration of the operational cycle, each candidate solu-
tion string, si, in the current population is evaluated by
bit string 1 1 , and jk is the ceiling function. For the fitness function.
l
example, assume there are only two products to be Candidate solution strings in the current population
selected in a production batch with 200 units as the are selected probabilitistically on the basis of their fitness
largest possible quantity to be manufactured for each values as seeds for generating the next generation. The
product. A candidate solution consisting of quantities purpose of selection is to generate offspring of high
100 and 51 for products 1 and 2 respectively will be fitness value on the basis of the fitter members in the
represented by a 16-bit string as 0110010000110011 current population. Actually, selection is the mechanism
with the first 8 bits representing product 1 and the that helps our GA system to exploit a promising region
second 8 bits representing product 2. in the solution space. There are several fitness-based
After a new solution string is generated, it is then schemes for the selection process: Roulette-wheel
decoded back to the format for the compuation of the selection, rank-based selection, tournament selection,
objective function and for the check of solution fea- and elitist selection (Goldberg, 1989)(Michalewicz,
sibility. Let each l-bit segment of a solution string be 1994). The first three methods randomly select candidate
denoted as string with string[i] as the value of the ith solution strings for reproduction on the basis of either
bit in the l-bit segment. The decoding function converts the fitness value or the rank of individual strings. Best
each l-bit string according to the following formula: members of the current population might be lost if they
are not selected to reproduce or if they are altered by
crossover (i.e., recombination) or mutation. The elitist
selection strategy is for the purpose of retaining some
of the fittest individuals from the current population.
Elitist selection retains a limited number of elite
solution strings, i.e., strings with the best fitness
751
Genetic Algorithm Applications to Optimization Modeling
values, for passing to the next generation without Crossover recombines good solution strings in
any modification. A fraction called the generation the current population and proliferates the population
gap is used to specify the proportion of the popula- gradually with schemata of high fitness values. Cross-
tion to be replaced by offspring strings after each over is commonly regarded as the most distinguishing
iteration. Our GA system retains copies of the first operator of GAs, and it usually interacts in a highly
(1 generation _ gap ) pop _ size elitist members intractable manner with fitness function, encoding, and
of Pop for the formation of the next population, other details of a GA (Mitchell, 1996). Though vari-
Popnew. ous crossover operators have been proposed, there is
For generating the rest of the members for Popnew, no general conclusions on when to use which type of
the GA module will probabilitistically select: crossover (Michalewicz, 1994)(Mitchell, 1996).
In this paper, we adopt the standard one-point cross-
over for our GA system. For each pair of solution strings
generation _ gap pop _ size selected for reproduction, the value of crossover_rate
2 determines the probability for their recombination. A
position in both candidate solution strings is randomly
selected as the crossover point. The parts of two parent
pairs of solution strings from Pop for generating off-
strings after the crossover position are exchanged to
spring strings. The probability of selecting a solution
form two offspring. Let k be the crossover point ran-
string, si, from Pop is given by
domly generated from a uniform distribution ranging
from 1 to lP, where lP is the length of a solution string.
Let si = (x1, x2,, xk-1, xk,, xlP) and sj = (y1, y2,,
Fitness ( si ) yk-1, yk,, ylP) represent a pair of candidate solution
Pr( si ) = pop _ size
. strings selected for reproduction. Based on these two
Fitness(s
j =1
j ) strings, the crossover operator generates two offspring
si = ( x1, x2 , , xlP ) and sj = ( y1, y2 , , ylP ), where
Let the cumulative probability of individual solution
strings in the population be called Ci, and
x , if i < k
xi = i
i
yi , otherwise
C i = Pr(s j ),
j =1 y , if i < k
yi = i
xi , otherwise.
for i = 1, 2,, pop_size. The solution string si will be
selected for reproduction if C i 1 < rand (0,1) C i .
In addition to exploiting a promising solution region In other words, si = (x1, x2,, xk-1, yk,, ylP) and sj =
via the selection process, we also need to explore other (y1, y2,, yk-1, xk,, xlP). These two oppspring are then
promising regions for possible better solutions. Ex- added to Popnew. This offspring-generating process is
ploitation without exploration will cause degeneration repeated until there are generation _ gap pop _ size
for a population of solution strings, and might cause offspring generated for Popnew.
the local optima problem for the system. Actually, the With selection and crossover alone, our system
capability of maintaining a balanced exploitation vs. might occasionally develop a uniform population
exploration is a major strength of the GA approach over which consists of the same solution strings. This will
traditional optimization techniques. The exploration blind our system to other possible solutions. Mutation,
function is achieved by the crossover and mutation which is the other operator applied to the reproduction
operators. These two operators generate offspring process, is used to help our system avoid the formation
solutions which belong to new schemata, and thus al- of a uniform population by introducing diversity into a
low our system to explore other promising regions in population. It is generally believed that mutation alone
a solution sapce. This process also allows our system does not advance the search for a solution, and is usu-
to improve its performance stochastically.
752
Genetic Algorithm Applications to Optimization Modeling
ally considered as a secondary role in the operation of operators. Still, Schaffer et al. (1991) asserted that the
GAs (Goldberg, 1989). Usually, mutation is applied best settings for population size is independent of the G
to alter the bit value of a string in a population only problems. In the sequel of this paper, we will conduct
occasionally. Let mutation_rate be the probability of a sequence of experiment to systematically analyze the
mutation for each bit in a candidate solution string. influence of the population size on GA performance,
For each offspring string, s = ( x1, x2 , , xlP ), gener- by using the batch-selection model peoposed in this
ated by the crossover operator for the new population paper, so that we can be more conclusive on the issue
Popnew, the mutation operator will invert each bit of the effective population size.
probabilitistically:
REFERENCES
1 xi , if rand (0,1) < mutation _ rate
xi =
xi , otherwise. Arabas, J., & Kozdrowski, S. (2001). Applying an
Evolutionary Algorithm to Telecommunication Net-
The probability of mutation for a candidate solution work Design. IEEE Transactions on Evolutionary
string is 1 (1 mutation _ rate)lP . Computation. (5)4, 309-322.
The above processes constitute an operational cycle
Ballester, P.J., & Carter, J.N. (2004). An Effective
of our system. These operations are repeated until the
Real-Parameter Genetic Algorithm with Parent Cen-
termination criterion is reached, and the result is passed
tric Normal Crossover for Multimodal Optimisation.
to the Decoder for decoding. The decoded result is then
Proceedings of the 2004 GECCO. Lecture Notes in
presented to the decision maker for further consideration
Computer Science. Berlin, Heidelberg: Springer-Ver-
in the final decision. If current solution is not satisfac-
lag. 901-913.
tory to the decision maker, the current solution can be
modified by the decision maker, and then entered into Davis, L. (Editor) (1991). Handbook of Genetic Algo-
the GA system to initiate another run of search process rithms. New York, NY: Van Nostrand Reinhold.
for satisfactory solutions.
Del Rosso, C. (2006). Reducing Internal Fragmentation
in Segregated Free Lists Using Genetic Algorithms.
Proceedings of the 2006 ACM Workshop on Interdis-
FUTURE TRENDS AND CONCLUSION
ciplinary Software Engineering Research.
In this paper we designed a GA-based system for the Deng, P-S. (2007). A Study of the Performance Effect
batch selection problem of flexible manufacturing of Genetic Operators. Encyclopedia of Artificial Intel-
systems. In our design we adopted a binary encoding ligence, Dopico, J.R.R., de la Calle, J.D. & Sierra, A.P.
scheme, the elitist selection strategy, a single-point (Editors), Harrisburg, PA: IDEA.
crossover strategy, and a uniform random mutation
for the batch selection problem. Goldberg, D.E. (1989). Genetic Algorithm in Search,
The performance of GAs is usually influenced by Optimization, and Machine Learning. Reading, MA:
various parameters and the complicated interactions Addison-Wesley.
among them, and there are several issues worth further Gordon, M. (1988). Probabilistic and Genetic Algo-
investigation. With the availability of a larger pool of rithms for Document Retrieval. Communications of
diverse schemata in a larger population, our GA system the ACM. (31)10, 1208-1218.
will have a broader view of the landscape (Holland,
1992) of the solution space, and is thus more likely to Grefenstette, J.J. (1993). Introduction to the Special
contain representative solutions from a large number of Track on Genetic Algorithms. IEEE Expert. October,
hyperplanes. This advantage gives GAs more chances 5-8.
of discovering better solutions in the solution space. Holland, J. (1992). Adaptation in Natural and Artificial
However, Davis (1991) argues that the most effective Systems. Cambridge, MA: MIT Press.
population size is dependent upon the nature of the
problem, the representation formalism, and the GA
753
Genetic Algorithm Applications to Optimization Modeling
Mitchell, M. (1996). An Introduction to Genetic Algo- Fitness Functions: The objective function of the
rithms. Cambridge, MA: MIT Press. GA for evaluating a population of solutions
754
755
cluster head
sink
sensor node
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Genetic Algorithms for Wireless Sensor Networks
GENETIC ALGORITHmS FOR too fast. After the crossover and mutation processes
WIRELESS SENSOR NETWORKS the individuals of the next generation are selected.
Some of the poorest individuals of the generation can
A WSN designer who takes into account all the design be replaced by the best individuals from the previous
issues deals with more than one non-linear objective generation. This is called elitism, and ensures that the
functions or design criteria which should be optimized new generation is at least as fit as the previous genera-
simultaneously. Therefore, the focus of the problem is tion. The algorithm stops if a predetermined stopping
how to find many near-optimal non-dominated solu- criterion is met (Hussain, Matin & Islam, 2007).
tions in a practically acceptable computational time
(Jourdan & de Weck, 2004) (Weise, 2006) (Ferentinos Fitness Function and Specific
& Tsiligiridis, 2007). There are several interesting Parameters for WSNs
approaches to tackling such problems, but one of the
most powerful heuristics, which is also appropriate The fitness function executed in a sensor node is a
to apply in the multi-objective optimization problem, weighted function that measures the quality or per-
is based on genetic algorithms (GA) (Ferentinos & formance of a solution, in this case a specific sensor
Tsiligiridis, 2007). network design. This function is maximized by the GA
Genetic algorithms have been used in many fields system in the process of evolutionary optimization. A
of science to derive solutions for any type of problems fitness function must include and correctly represent
(Goldberg 1989) (Weise, 2006). They are particularly all or at least the most important factors that affect the
useful in applications involving design and optimiza- performance of the system. The major issue in develop-
tion, where there are large numbers of variables and ing a fitness function is the decision on which factors
where procedural algorithms are either non-existent or are the most important ones (Ferentinos & Tsiligiridis,
extremely complicated (Khana, Liu & Chen, 2006), 2007) (Gnanapandithan & Natarajan, 2006).
(Khana, Liu & Chen, 2007). In nature, a species adapts A genetic algorithm must be designed for WSN
to an environment because the individuals that are the topologies by optimizing energy-related parameters
fittest in respect to that environment will have the best that affect the battery consumption of the sensors and
chance to reproduce, possibly creating even fitter child. thus, the lifetime of the network. At the same time, the
This is the basic idea of genetic evolution. Genetic algorithm has to meet some connectivity constraints
algorithms start with an initial population of random and optimize some physical parameters of the WSN
solution candidates, called individuals or chromosomes. implemented by the specific application. The multiple
In the case of sensor networks, the individuals are small objectives of the optimization problem are blended into
programs that can be executed on sensor nodes (Wazed, a single objective function, the parameters of which are
Bari, Jaekel & Bandyopadhyay, 2007). combined to formulate a fitness function that gives a
Each individual may be represented as a simple string quality measure to each WSN topology. Three sets of
or array of genes, which contain a part of the solution. parameters dominate the design and the performance
The values of genes are called alleles. As in nature, the of a WSN: the application specific parameters, con-
population will be refined step by step in a cycle of nectivity parameters and the energy related parameters.
computing the fitness of its individuals, selecting the Some possible parameters are discussed in (Ferentinos
best individuals and creating a new generation derived & Tsiligiridis, 2007):
from these. A fitness function is provided to assign the
fitness value for each individual, based on how close Operation energy: the energy that a sensor con-
an individual is to the optimal solution. Two randomly sumes during some specific time of operation. It
selected individuals, the parents, can exchange genetic depends whether the sensor operates as cluster
information in a process called crossover to produce head or as regular sensor.
two new chromosomes know as child. A process called Communication energy: the energy consump-
mutation may also be applied to obtain a good solution, tion due to communication between sensors. It
after the process of crossover. This process helps to re- depends on the distances between transmitter and
store any genetic values when the population converges receiver.
756
Genetic Algorithms for Wireless Sensor Networks
757
Genetic Algorithms for Wireless Sensor Networks
758
759
Alberte Castro
University of Santiago de Compostela, Spain
Genetic
Evaluation Selection Operators
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
genetics. Generally, a simple GA contains three basic it is a novel approach to estimate reflection coefficient,
operations: selection, genetic operations and replace- since a GA will determine the membership functions
ment. A typical GA cycle is shown in Fig. 1. for each variable involved in the fuzzy system.
In this paper it is shown how a genetic algorithm
can be used in order to optimize a fuzzy system which
is used in wave reflection analysis at submerged ANALySIS OF WAVE REFLECTION AT
breakwaters. SUBmERGED BREAKWATERS WITH A
GENETIC FUZZy SySTEm
760
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
distinguish three different groups of genetic FS design Hs: significant wave height.
process according to the KB components included in d: water depth. G
the learning process. These ones are the following: Tp: peak period or
Lp: peak wavelength
Genetic definition of the Fuzzy System Data Base
(Bolata and Now, 1995; Fathi-Torbaghan and These are parameters that connect the submerged
Hildebrand, 1994; Herrera and Verdegay, 1995b; breakwater model and the wave. The parameters that
Karr, 1991b). identified the submerged breakwater model (see fig. 3)
Genetic derivation of the Fuzzy System Rule Base are: the height (h) and the crest width (B), n (cotangent
(Bonarini, 1993; Karr, 1991a; Thrift, 1991). ), breakwater slope () and slope nature (smooth or
Genetic learning of the Fuzzy System Knowledge rough). To predict the reflection coefficient, the first ones
Base (Cooper and Vidal, 1993; Herrera, Lozano were used but in many cases dimensionless parameters
and Verdegay, 1995a; Leitch and Probert, 1994; were used instead the parameters separately.
Lee and Takagi, 1993; Ng and Lee, 1994). A lot of tests were done with different number of
input variables and different number of fuzzy sets for
In this paper, we create a Fuzzy System which each membership function. Depending of the variables
predicts reflection coefficient at a different model of and membership function number, a set of rules were
submerged breakwaters. To do this task, a part of this established for each case.
Fuzzy System, the Data Base, is defined and tuning by
a Genetic Algorithm.
PHySICAL TEST
SUBmERGED BREAKWATER DOmAIN A large number of tests have been carried out (Taveira-
Pinto, 2001) with different water deeps and wave
Submerged breakwaters are effective shore protection conditions for each model (figure 3 shows the general
structure against wave action with a reduced visual layout of the tested models). Eight impermeable physi-
impact (see fig. 2). cal models have been tested with different geometries
To predict reflection coefficient several parameters (crest width, slope), different slope nature (smooth,
have to be taken into account, they are: rough), values for tan (from 0.20 to 1.00) and n
(from 1 to 5 ) in the old unidirectional wave tank of the
Rc: water level above crest. Hydraulics Laboratory of the Faculty of Engineering
of the University of Porto.
761
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
762
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
Figure 4. Piece of a chromosome. Xij contains the position of one point (i) of one membership function (j)
G
addition of the errors is the fitness function value for The training set was made up of 24 physical tests
the individual. The smaller is the total error the better and the medium square error in that step was 0.84.
is the individual. Resultant membership functions can be seen in fig. 5.
The test set was made up of 11 physical tests and the
Results mean square error in that step was 0.89.
763
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
It is a hard task to choose the rule set and further- Bolata F. & Now A., 1995. From fuzzy linguistic speci-
more the systems accuracy depends on this set a fications to fuzzy controllers using evolution strategies.
lot. In Proc. Fourth IEEE International Conference on Fuzzy
The more inputs the problem have the more dif- Systems (FUZZ-IEEE95), Yokohama, pp. 77-82.
ficult become to define the rule set.
Bonarini A. 1993. Learning incomplete fuzzy rule
sets for an autonomous robot. In Proc. First European
Congress on Fuzzy and Intelligent Technologies (EU-
REFERENCES
FIT93), Aachen, pages 69-75.
Abul-Azm A. G., 1993. Wave Diffraction Through Cooper M. G. & Vidal J. J., 1993. Genetic design of
Submerged Breakwaters. Journal of Waterway, Port, fuzzy logic controllers. In Proc. Second International
Coastal and Ocean Engineering, Vol. 119, No. 6, pp. Conference on Fuzzy Theory and Technology (FTT93),
587-605. Durham.
Baglio S. & Foti E., 2003. Non-invasive measurements Cordn, O., Herrera, F., Hoffman F., Magdalena, L.
to analyze sandy bed evolution under sea waves action. (2001). Genetic fuzzy systems. World Scientific.
Instrumentation and Measurement, IEEE Transactions
Dingerson L. M., 2005. Predicting future shoreline
on. Vol. 52, Issue: 3, pp. 762-770.
condition based on land use trends, logistic regression
Blickle, T. (1997). Tournament selection. In T. Bck, and fuzzy logic. Thesis. The Faculty of the School of
D.G. Fogel, & Z. Michalewicz (Eds.), Handbook Marine Science.
of Evolutionary Computation. New York: Taylor &
Ergin A., Williams A.T. & Micallef A., 2006. Coastal
Francis Group.
Scenery: Appreciation and Evaluation. Journal of
764
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
Coastal Research Article: pp. 958-964. Volume 22, Mandani, E.H., 1977. Application of fuzzy logic to ap-
Issue 4. proximate reasoning using linguistic synthesis. IEEE G
Transactions on Computers C-26 (12), 11821191.
Fathi-Torbaghan M. & Hildebrand L., 1994. Evolution-
ary strategies for the optimization of fuzzy rules. In Oliveira S.S., Souza F.J. & Mandorino F., 2006. Using
Proc. Fifth International Conference on Information fuzzy logic model for the selection and priorization of
Processing and Management of Uncertainty in Knowl- port areas to hydrographic re-surveying. Evolutions in
edge Based Systems (IPMU94), Paris, pp. 671-674. hydrography, Antwerp (Belgium), Proceedings of the
15th International Congress of the International Federa-
Gezer E., 2004. Coastal Scenic Evaluation, A pilot
tion of Hydrographic Societies. Special Publication of
study for iralli. Thesis. The Graduate School of
the Hydrographic Society, 55: pp. 63-67.
Natural and Applied Sciences of Middle East Techni-
cal University. Ross T. J., 2004. Fuzzy Logic with Engineering Ap-
plications. John Wiley and Sons. Technology & In-
Herrera F., Lozano M. & Verdegay J. L., 1995a. A
dustrial Arts.
Learning process for fuzzy control rule using genetic
algorithms. Technical Report DECSAI-95108, Uni- Taveira-Pinto F., 2001, Analysis of the oscillations and
versity of Granada, Department of Computer Science velocity fields in the vicinity of submerged breakwaters
and Artificial Intelligence. under wave action, Ph. D. Thesis, Univ. of Porto (In
Portuguese).
Herrera F., Lozano M. & Verdegay J. L., 1995b. Tun-
ing fuzzy logic controllers by genetic algorithms. Taveira-Pinto F., 2005. Regular water wave measure-
International Journal of Approximate Reasoning 12: ments near submerged breakwaters. Meas. Science
293-315. and Technology.
Holland J. H., 1975. Adaptation in Natural and Artificial Thrift P., 1991. Fuzzy logic synthesis with genetic al-
Systems. University of Michigan Press, Ann Arbor. gorithms. In Proc. Fourth International Conference on
Genetic Algorithms (ACGA91), pages 509-513.
Karr C., 1991a. Applying genetics. AI Expert pages
38-43. Van Oosten R. P., Peix J., Van der Meer M.R.A. &
Van Gent H.J., 2006. Wave transmisin at low-crested
Karr C., 1991b. Genetics algorithms for fuzzy control-
structures using neural networks. ICCE 2006, Abstract
lers. AI Expert pages 26-33.
number: 1071.
Kobayashi N. & Wurjanto A., 1989. Wave Transmission
Yagci O., Mercan D. E., Cigizoglu H. K. & Kabdasli
Over Submerged Breakwaters. Journal of Waterway,
M. S., 2005. Artificial intelligence methods in break-
Port, Coastal and Ocean Engineering, Vol. 115, No.
water damage ratio estimation. Ocean engineering,
5, pp. 662-680.
vol. 32, no. 17-18, pp. 2088-2106.
Leitch D. & Probert P., 1994. Context depending coding
Yagci O., Mercan D. E. & Kabdasli M. S., 2003. Mod-
in genetic algorithms for the sesign of fuzzy systems. In
elling Of Anticipated Damage Ratio On Breakwaters
Proc. IEEE/Nagoya University WWW on Fuzzy Logic
Using Fuzzy Logic. EGS - AGU - EUG Joint Assembly,
and Neural Networks/Genetic Algorithms Nagoya.
Abstracts from the meeting held in Nice, France, 6
Lee M. & Takagi H., 1993. Embedding a priori knowl- - 11 April 2003.
edge into an integrate fuzzy system design method based
Zadeh L. A., 1965. Fuzzy sets. Information and Control
on genetic algorithms. In Proc. Fifth International Fuzzy
8: 358-353.
Systems Association World Congress (IFSA93), Seoul,
pages 1293-1296. Zadeh L. A., 1973. Outline of a new approach to the
analysis of complex systems and decision process.
Losada I.J., Silva R. & Losada, M.A., 1996. 3-D non-
IEEE Transactions on Systems, Man and Cybernetics
breaking regular wave interaction with submerged
3: 28-44.
breakwaters. Coastal Engineering, Volume 28, Number
1, pp. 229-248(20).
765
Genetic Fuzzy Systems Applied to Ports and Coasts Engineering
766
767
Juan Ros
Inteligencia Artificial, Facultad de Informatica, UPM, Spain
Alfonso Rodrguez-Patn
Inteligencia Artificial, Facultad de Informatica, UPM, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Grammar-Guided Genetic Programming
S
S E = N
Crossover node in S
parent 1 E = N
E = N N + E 7
- 8 E = N
E E
3 N
E + E 7
Crossover node in F + E E - E 8
N
parent 2 4
N N
F + E N
N N 4
6 4
3 2
E N 4
Invalid
production N 2
Subtrees to be swapped
6
768
Grammar-Guided Genetic Programming
Figure 2. The system of coordinates defined in SCPC to the grammar-generated language, as the crossed
nodes share the same symbol. This operators main G
() flaw is that there are other possible choices of node in
the second parent that are not explored and that could
(2) end in the target solution (Manrique, Marquez, Ros
& Rodrquez-Patn, 2005, pp. 252-261).
(2,1)
(2,1,3)
THE PROPOSED CROSSOVER
OPERATOR FOR GGGP SySTEmS
769
Grammar-Guided Genetic Programming
Proposed crossover
SCPC
WX
Fair KX
770
Grammar-Guided Genetic Programming
the 50 remaining lesions not chosen during the train- slowing down if convergence is towards a local optimum
ing phase to output the number of correctly classified (which happens in most cases). WX and SCPC produce G
patterns in what we have called the testing phase. good results, bettered only by the proposed crossover.
The CFG employed was formed by 19 non-terminal Its high convergence speed evidences the benefits of
symbols, 54 terminals and 51 production rules, some of taking into account all possible nodes of the second
them included to obtain an ambiguous grammar. The parent that can generate valid offspring.
population size employed was 1000, the upper bound Table 1 shows examples of fuzzy rules output in one
for the size of the derivation trees was set to 20. The of the executions for the best two crossover operators
fitnesss function consisted of calculating the number WX and the proposed operator once the training
of well-classified patterns. Therefore, the greater the phase was complete.
fitness, the fitter the individual is, with the maximum Table 2 shows the average number (rounded up
limit of 315 in the training phase and 50 in the test. or down to the nearest integer) of correctly classified
Figure 3 shows the average evolution process for patterns after 100 executions, achieved by the best indi-
each of the five crossover operators in the training vidual in the training and test phases, and the percentage
phase after 100 executions. of times that the system converged prematurely.
It is clear from Figure 3 that KX yields the worst KX again yields the worst results, correctly classify-
results, because it maintains an over-diverse population ing just 57.46% (181/315) of patterns in the training
and allows invalid individuals to be generated. This phase and 54% (27/50) in the testing phase. SCPC and
prevents it from focusing on one possible solution. Fair crossovers also return insufficient results: around
The effect of Fair is just the opposite, leading very 59% in the training phase and 54%-56% in the testing
quickly to one of the optimal solutions (this is why it phase, although, as shown in Figure 3, SCPC has a
has a relatively high convergence speed initially), and higher convergence speed. Finally, note the similarity
Table 1. Some knowledge base fuzzy rules output by two GGGP systems
771
Grammar-Guided Genetic Programming
between WX and the proposed operator. However, the can choose any node from the second parent to gener-
proposed operator has higher speed of convergence ate the offspring, rather than just those nodes with the
and is less likely to get trapped in local optima, as it same non-terminal symbols as the one chosen in the
converged prematurely only twice in 100 executions. first parent.
The continuation of the work described in this article Barrios, D., Carrascal, A., Manrique, D. & Ros, J.
can be divided into two main lines of investigation in (2003). Optimization with real-coded genetic algo-
GGGP. The first involves finding an algorithm that rithms based on mathematical morphology. Inter-
can estimate the maximum sizes of the trees generated national Journal of Computer Mathematics, (80) 3,
throughout the evolution process to assure that the op- 275-293.
timal solution will be reached. This would overcome
Couchet, J., Manrique, D., Ros, J. & Rodrguez-Patn,
the proposed crossover operators weakness of not
A. (2006). Crossover and mutation operators for gram-
being able to reach a solution because the permitted
mar-guided genetic programming. Softcomputing, DOI
maximum tree size is too restrictive for it to be able to
10.1007/s00500-006-0144-9.
reach a good solution, whereas this solution could be
found if individuals were just a little larger. Crawford-Marks, R. & Spector, L. (2002). Size control
The second interesting line of research derived via size fair genetic operators in the pushGP genetic
from this work is the use of ambiguous grammars. It programming system. In proceedings of the genetic
has been empirically observed that using the proposed and evolutionary computation conference, New York,
operator combined with ambiguous grammars in GGGP 733-739.
systems benefits convergence speed. However, too
much ambiguity is damaging. The idea is to get an Dhaesler, P. (1994). Context preserving crossover
ambiguity measure that can answer the question of in genetic programming. In IEEE Proceedings of the
how much ambiguity is needed to get the best results 1994 world congress on computational intelligence,
in terms of efficiency. Orlando, (1) 379-407
Dounias, G., Tsakonas, A., Jantzen, J., Axer, H., Bjer-
regard, B., & von Keyserlingk, D. (2002). Genetic
CONCLUSION Programming for the Generation of Crisp and Fuzzy
Rule Bases in Classification and Diagnosis of Medi-
This article summarizes the latest and most important cal Data. Proceedings of the 1st International NAISO
advances in GGGP, paying special attention to the Congress on Neuro Fuzzy Technologies, Havana, Cuba,
crossover operator, which (alongside the initialization 494-500.
method, the codification of individuals and, to a lesser
extent, the mutation operator, of course) is chiefly re- Grosman, B. & Lewin, D.R. (2004). Adaptive Genetic
sponsible for the convergence speed and the success Programming for Steady-State Process Modeling. Com-
of the evolution process. puters and Chemical Engineering, 28 2779-2790.
GGGP systems are able to find solutions to any Hussain, T.S. (2003). Attribute grammar encoding
problem that can be syntactically expressed by a CFG. of the structure and behaviour of artificial neural
The proposed crossover operator provides GGGP sys- networks. PhD Thesis, Queens University. Kingston,
tems with a satisfactory balance between exploration Ontario, Canada.
and exploitation capabilities. This results in a high
convergence speed, while eluding local optima as the Keedwell, E., & Narayanan, A. (2005). Intelligent
reported results demonstrate. To be able to achieve such bioinformatics. Wiley & Sons.
good results, the proposed crossover operator includes Koza, JR. (1992). Genetic programming: on the pro-
a computationally cheap mechanism to control bloat, gramming of computers by means of natural selection.
it always generates syntactically valid offspring and it MIT Press, Cambridge.
772
Grammar-Guided Genetic Programming
Langdon, WB. (1999). Size fair and homologous tree KEy TERmS
genetic programming crossovers. In proceedings of G
genetic and evolutionary computation conference, Ambiguous Grammar: Any grammar in which
GECCO99, Washington DC, 1092-1097. different derivation trees can generate the same sen-
tence.
Manrique, D. (2001). Diseo de redes de neuronas y
nuevas tcnicas de optimizacin mediante algoritmos Closure Problem: Phenomenon that involves al-
genticos [Artificial neural networks design and new ways generating syntactically valid individuals.
optimization techniques using genetic algorithms].
PhD Thesis, Facultad de Informtica, Universidad Code Bloat: Phenomenon to be avoided in a genetic
Politcnica de Madrid. programming system convergence process involving the
uncontrolled growth, in terms of size and complexity,
Manrique, D., Mrquez, F., Ros, J. & Rodrguez-Patn of individuals in the population
A. (2005). Grammar-based crossover operator in genetic
programming. Lecture Notes in Artificial Intelligence, Convergence:Process by means of which an algo-
3562 252-261. rithm (in this case an evolutionary system) gradually
approaches a solution. A genetic programming system
Rodrigues, E. & Pozo, A. (2002). Grammar-Guided is said to have converged when most of the individuals
Genetic Programming and Automatically Defined Func- in the population are equal or when the system cannot
tions. In proceedings of the 16th Brazilian symposium evolve any further.
on artificial intelligence, Recife, Brazil, 324-333.
Fitness: Measure associated with individuals in an
Terrio, MD., & Heywood, MI. (2002). Directing cross- evolutionary algorithm population to determine how
over for reduction of bloat in GP. In IEEE proceedings good the solution they represent is for the problem.
of Canadian conference on electrical and computer
engineering, (2) 1111-1115. Genetic Programming: A variant of genetic al-
gorithms that uses simulated evolution to discover
Whigham, P.A. (1995). Grammatically-based genetic functional programs to solve a task.
programming. In proceedings of the workshop on
genetic programming: from theory to real-world ap- Grammar-Guided Genetic Programming: The
plications, California, 33-41. application of analytical methods and tools to data
for the purpose of identifying patterns, relationships
Whigham, P.A. (1996). Grammatical bias for evolution- or obtaining systems that perform useful tasks such
ary learning. PhD Thesis, School of Computer Science, as classification, prediction, estimation, or affinity
Australian Defence Force (ADFA), University College, grouping.
University of New South Wales.
Intron: Segment of code within an individual
Yao, X., & Xu, Y. (2006). Recent advances in evolu- (subtree) that does not modify the fitness, but is on the
tionary computation. Journal of Computer Science & side of convergence process.
Technology, (21) 1 1-18.
773
774
Granular Computing
Georg Peters
Munich University of Applied Sciences, Germany
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Granular Computing
4. Tolerance for imprecision. Recall the example, A central objective of the concept of granular com-
Melbourne as a shopping paradise, given above. puting is to bridge this gap and compute with words G
To define Melbourne as shopping paradise its (Zadeh, 1996). This leads to the ideas of information
exact number of shops is not needed. It is suf- granularity or granular computing which was introduced
ficient to know that there are many shops. This by Zadeh (1986, 1979).
tolerance for impression often makes a statement The concept of information granularity has its roots
more robust and efficient in comparison to exact in fuzzy set theory (Zadeh, 1965, 1997). Zadeh (1986)
numerical values. advanced and generalized this idea so that granular
computing subsumes any kind of uncertainty and im-
So obviously humans often prefer not to deal with precision like set theory and interval analysis, fuzzy
precise but favor vague information that is immanent sets, rough sets, shadowed sets, probabilistic sets and
in natural language. probability [], high level granular constructs (Bar-
Humans would rarely formulate a sentence like: giela, Pedrycz, 2002, p. 5). The term granular computing
was first suggested by Lin (1998a, 1998b).
With a probability of 97.34% I will see Ken, who has
a height of 1.97m, at 12:05pm.
FUNDAmENTALS OF GRANULAR
Instead most humans would prefer to say: COmPUTING
Around noon I will almost certainly meet tall Ken. Singular and Granular Values
While the first formulation is computer compat- To more formally describe the difference between
ible since it contains numbers (singletons) the second natural language and precise information let us recall
formulation seems too be to imprecise to be used as the example sentences given in Section 2. The infor-
input for computers.
With a probability of 97.34% I will see Ken, who has a height of 1.97m, at 12:05pm.
775
Granular Computing
mation given in the two sentences can be mapped as hood values. See Figure 2 for an interval granulation of
depicted in Figure 1. the singleton of the variable height; a fuzzy member-
While the first sentence contains exact figures ship function (linguistic variable) would be another
(singletons) the second sentence describes the same possibility for a granule of tall (see Figure 3).
context using linguistic variables (granular values). The main difference in the representation of the
A comparison of the singular and granular values is variable heights is entailed by a different formulation of
given in Table 1. the constraints. While the formulation as a singleton is
For example, the variable height can be mapped to of bivalence nature (height=1.97m) a fuzzy formulation
the singleton 1.97m or the granule tall. The granule tall would contain memberships. This leads to the concept
covers not only the singleton 1.97m but also neighbor- of generalized constraints.
small tall
1.97m Height
776
Granular Computing
777
Granular Computing
Y*=f*(X*)
In contrast to precisiation granulation leads to
an imprecisiation of the information. Obviously the Gr(Y) isr Gr(X)
translation Ken is 1.97m Ken is c meters tall is a
v-imprecisiation and Ken is c meters tall Ken is tall with Y*, X* and f*() granules. It can be considered as
a m-imprecisiation. primary deduction rule since many others deduction
So for example, rough sets can be interpreted rules can be derived from it (Zadeh, 2006b).
as cascading cg-imprecisiation. In rough set theory
(Pawlak, 1982) a set is described by a lower and upper Example
approximation (LA and UA respectively). The lower
approximation is a subset of the upper approximation. Let us consider an example (Zadeh, 2005, 2006a,
While the objects in the lower approximation surely 2006b):
belong to the corresponding set the objects in a upper The following linguistic statement is given:
approximation might belong to the set.
Therefore rough set theory provides an example of Most Swedes are tall (Height(Swedes) are tall) is
a cascading granulation: X LA UA (see Figure 4). most.
In this Section we regard the term granular computing in with X(h) the height density function and tall(h) the
its literally meaning: how to compute with granules and membership function for the linguistic variable tall.
focus on principal deductions (Zadeh, 2005, 2006b): Second we have to apply the linguistic variable most
to the expression Swedes are tall and obtain:
Conjunction
Projection Most (Swedes are tall) mostl ( X(h)tall(h)dh )
Protagation
As result we get a precise formulation of the given
For more details on deduction rules the reader is linguistic statement.
referred to Zadeh (2005, 2006b).
778
Granular Computing
779
Granular Computing
Zimmermann, H. J. (2001). Fuzzy set theory and its Hybridization: Combination of methods like
applications. Boston: Kluwer Academic Publishers. probabilistic, fuzzy, rough concepts, or neural nets,
e.g. fuzzy-rough, rough-fuzzy or probabilistic-rough,
or fuzzy-neural approaches.
KEy TERmS Linguistic Variable: A linguistic variable is a
linguistic expression (one or more words) labeling an
Fuzzy Set Theory: Fuzzy set theory was intro- information granular. For example a membership func-
duced by Zahed in 1965. The central idea of fuzzy set tion is labeled by the expressions like hot temperature
theory is that an object belongs to more than one sets or rich customer.
simultaneously. the closeness of the object to a set is
Membership Function: A membership function
indicated by membership degrees.
shows the membership degrees of a variable to a cer-
Generalized Theory of Uncertainty (GTU): GTU tain set. For example, a temperature t=30 C belongs
is a framework that shall subsume any kind of uncer- to the set hot temperature with a membership degree
tainty (Zadeh 2006a). The core idea is to formulate HT(30)=0.8. The membership functions are not objec-
generalized constraints (like possibilistic, probabilistic, tive but context and subject-dependent.
veristic etc.). The objective of GTU is not to replace
Rough Set Theory: Rough set theory was intro-
existing theories like probability or fuzzy sets but to
duced by Pawlak in 1982. The central idea of rough
provide an umbrella that allows to formulate any kind
sets is that some objects distinguishable while others
of uncertainty in a unique way.
are indiscernible from each other.
Granular Computing: The idea of granular com-
Soft Computing: In contrast to hard computing
puting goes back to Zadeh (1979). The basic idea of
soft computing is collection of methods (fuzzy sets,
granular computing is that an object is describe by a
rough sets neutral nets etc.) for dealing with ambigu-
bunch of values in possible dimensions like indistin-
ous situations like imprecision, uncertainty, e.g. human
guishability, similarity and proximity. If a granular is
expressions like high profit at reasonable risks. The
labeled by a linguistic expressing it is called a linguistic
objective of applying soft computing is to obtain robust
variable. Zahed (2006a) defines granular computing
solutions at reasonable costs.
as a mode of computation in which the objects of
computation are generalized constraints.
780
781
Consuelo Gonzalo
Technical University of Madrid, Spain
Estbaliz Martnez
Technical University of Madrid, Spain
gueda Arquero
Technical University of Madrid, Spain
INTRODUCTION Mao & Jain, 1995) (Merlk & Rauber, 1997) (Rubio &
Gimnez 2003) (Vesanto, 1999), and some automatic
Currently, there exist many research areas that produce map analysis (Franzmeier, Witkowski & Rckert 2005)
large multivariable datasets that are difficult to visualize have been proposed.
in order to extract useful information. Kohonen self- In Kohonens SOM the network structure has to
organizing maps have been used successfully in the be specified in advance and remains static during the
visualization and analysis of multidimensional data. training process. The choice of an inappropriate network
In this work, a projection technique that compresses structure can degrade the performance of the network.
multidimensional datasets into two dimensional space Some growing self-organizing maps have been imple-
using growing self-organizing maps is described. With mented to avoid this disadvantage. In (Fritzke, 1994),
this embedding scheme, traditional Kohonen visualiza- Fritzke proposed the Growing Cell Structures (GCS)
tion methods have been implemented using growing model, with a fixed dimensionality associated to the
cell structures networks. New graphical map displays output map. In (Fritzke, 1995), the Growing Neural
have been compared with Kohonen graphs using two Gas is exposed, a new SOM model that learns topology
groups of simulated data and one group of real multi- relations. Even though the GNG networks get best grade
dimensional data selected from a satellite scene. of topology preservation than GCS networks, due to the
multidimensional nature of the output map it cannot be
used to generate graphical map displays in the plane.
BACKGROUND However, using the GCS model it is possible to create
networks with a fixed dimensionality lower or equal
Data mining first stage usually consist of building than 3 that can be projected in a plane (Fritzke, 1994).
simplified global overviews of data sets, generally in GCS model, without removal of cells, has been used to
graphical form (Tukey, 1977). At present, the huge compress biomedical multidimensional data sets to be
amount of information and its multidimensional displayed as two-dimensional colour images (Walker,
nature complicates the possibility to employ direct Cross & Harrison, 1999).
graphic representation techniques. Self-Organizing
Maps (Kohonen, 1982) fit well in the exploratory data
analysis since its principal purpose is the visualization GROWING CELL STRUCTURES
and the analysis of nonlinear relations between multi- VISUALIZATION
dimensional data (Rossi, 2006). In this sense, a great
variety of Kohonens SOM visualization techniques This work studies the GCS networks to obtain an em-
(Kohonen, 2001) (Ultsch & Siemon, 1990) (Kraaijveld, bedding method to project the bi-dimensional output
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Growing Self-Organizing Maps for Data Analysis
map, with the aim of generating several graphic map Several separated meshes could appear in the output
displays for the exploratory data analysis during and map when superfluous units are removed.
after the self-organization process. When the growing self-organization process fin-
ishes, the synaptic vectors of the output units along with
Growing Cell Structures the neighbouring connections can be used to analyze
different input space properties visually.
The visualization methods presented in this work are
based on self-organizing map architecture and learning Network Visualization: Constructing the
process of Fritzkes Growing Cell Structures (GCS) Topographic Map
network (Fritzke, 1994). GCS network architecture
consists of connected units forming k-dimensional The ability to project high-dimensional input data
hypertetrahedron structures linked between them. onto a low-dimensional grid is an important property
The interconnection scheme defines the neighbour- of Kohonen feature maps. By drawing the output map
hood relationships. During the learning process, new over a plane it will be possible to visualize complex
units are added and superfluous ones are removed, but data and discover properties or relations of the input
these modifications are performed in such way that the vector space not expected in advance. Output layer of
original architecture structure is maintained. Kohonen feature maps can be printed on a plane easily,
The training algorithm is an iterative process that painting a rectangular grid, where each cell represents
performs a non-linear projection of the input data over an output neuron and neighbour cells correspond to
the output map, trying to preserve the topology of the neighbour output units.
original data distribution. The self-organization pro- GCS networks have less regular output unit connec-
cess of the GCS networks is similar that in Kohonens tions than Kohonen ones. When k=2 architecture factor
model. For each input signal the best matching unit is used, the GCS output layer is organized in groups
(bmu) is determined, and bmu and its direct neighbours of interconnected triangles. In spite of bi-dimensional
synaptic vectors are modified. In GCS networks each nature of these meshes, it is not obvious how to embed
neuron has associated a resource, which can represent this structure into the plane in order to visualize it. In
the number of input signals received by the neuron, or (Fritzke, 1994), Fritzke proposed a physical model to
the summed quantization error caused by the neuron. construct the bi-dimensional embedding during the
In every adaptation step the resource of the bmu is self-organization process of the GCS network. Each
conveniently modified. A new neuron is inserted be- output neuron is modelled by a disc, with diameter d,
tween the unit with highest resource, q, and its direct made of elastic material. Two discs with distance d
neighbour with the most different reference vector, f, between centres touch each other, and two discs with
after a fixed number of adaptation steps. The new unit distance smaller than d repeal each other. Each neigh-
synaptic vector is interpolated from the synaptic vec- bourhood connection is modelled as an elastic string.
tors of q and f, and the resources values of q and f are Two discs connected but not touching are pulled each
redistributed too. In addition, neighbouring connections other. Finally, all discs are positively charged and re-
are modified in order to ensure the output architecture peal each other. Using this model, the bi-dimensional
structure. Once all the training vectors have been pro- topographic coordinates of each output neuron can be
cessed a fixed number of times (epoch), the neurons obtained, and thus, the bi-dimensional output meshes
whose reference vectors fall into regions with a very can be printed on a plane.
low probability density are removed. To guarantee the In order to obtain the output units bi-dimensional
architecture structure some neighbouring connections coordinates of the topographic map (for k=2), a slightly
are modified too. Relative normalized probability modified version of this physical model has been used
density estimation value proposed in (Delgado, 2004) in this contribution. At the beginning of the training
has been used in this work to determine the units to process, the initial three output neurons are placed in
be removed. This value provides better interpretation the plane in a triangle form. Each time a new neuron
of some training parameters, improving the removal is inserted, its position in the plane is located exactly
of cells and the topology preserving of the network. halfway of the position of the two neighbouring neurons
between which it has been inserted. After this, attraction
782
Growing Self-Organizing Maps for Data Analysis
and repulsion forces are calculated for every output different meshes are showed unconnectedly. Without
neuron and its positions are consequently moved. The any other additional information, this projection method G
attraction force of a unit is calculated as the sum of makes possible cluster detection.
individual attraction forces that all neighbouring con-
nections exercise over it. Attraction force between two Visualization Methods
neighbouring neurons i and j, with pi and pj coordinates
in the plane, and Euclidean distance e, is calculated as Using the projection method exposed, traditional Ko-
(e-d)/2 if ed, and 0 otherwise. The repelling force of honen visualization methods can be implemented using
a unit is calculated as the sum of individual repulsion GCS networks with k=2. Each output neuron is painted
forces that all no-neighbouring output neurons exercise as a circle in a colour determined by a major parameter.
over it. Repelling force between two no-neighbouring When greyscale is used, normally dark and clear tones
neurons i and j is calculated as d/5 if 2d<e3d, d/2 if are associated with high and low values respectively.
d<e2d, d if 0<ed, and 0 otherwise. There exist three The grey scales are relative to the maximum and mini-
basic differences between the embedding model used mum values taken by the parameter. The nature of the
in this work and the Fritzkes one. First, repelling force data used to calculate the parameter determines three
is only calculated with no-neighbouring units. Second, general types of methods for performing visual analysis
attracting force between two neurons i and j is multi- of self-organizing maps: distances between synaptic
plied by the distance normalization ((pj-pi)/e) and by vectors, training patterns projection over the neurons,
the attraction factor 0.1 (instead of 1). Last, repelling and individual information about synaptic vectors.
force between two neurons i and j is multiplied by the All the experiments have been performed using
distance normalization ((pi-pj)/e) and by the repulsion two groups of simulated data and one group of real
factor 0.05 (instead of 0.2). multidimensional data (Fig. 2) selected from a scene
The result of applying this projection method is registered by the ETM+ sensor (Landsat 7). The input
showed in Fig. 1. When removal of cells is performed, signals are defined by the six ETM+ spectral bands with
Figure 1. Output mesh projection during different self-organization process stages of a GCS network trained
with bi-dimensional vectors distributed on eleven separate regions.
Figure 2. (a) Eleven separate regions in the bi-dimensional plane. (b) Two three dimensional chain-link. (c)
Projection of multidimensional data of satellite image.
TM5
TM7
TM4
(a) (b) (c)
783
Growing Self-Organizing Maps for Data Analysis
the same spatial resolution: TM1 to TM5, and TM7. variation exists. Using GCS networks, separated meshes
The input data set has a total number of 1800 pixels, represent different input clusters, usually. Fig. 3 shows
1500 carefully chosen from the original scene and 300 an example of these three graphs, compared with the
randomly selected. The input vectors are associated to traditional Kohonens maps, when an eleven separate
six land cover categories. regions distribution data set is used. GCS network
represents eleven clusters in the three graphs, clearly.
Displaying Distances Distance map and U-map in Kohonens network show
the eleven clusters too, but in distance addition map it
The adaptation process of GCS networks places the is not possible to distinguish them.
synaptic vectors in regions with high probability den-
sity, removing units positioned into regions with a very Displaying Projections
low probability density. A graphical representation of
distances between the synaptic vectors will be a useful This technique takes into account the input distribu-
tool to detect clusters over the input space. Distance tion patterns to generate different values to assign to
map, unified distance map (U-map), and distance addi- each neuron. For GCS networks, data histograms and
tion map have been implemented to represent distance quantization error maps have been implemented.
map information with GCS networks. Generating the histogram, the number of training
In distance map, the mean distance between the patterns associated to each neuron is obtained. However,
synaptic vector of each neuron and the synaptic vec- when quantization error graph has to be produced, the
tors of all its direct neighbours is calculated. U-map sum of the distances between the synaptic vector of a
represents the same information than distance map neuron and the input vectors that lies in its Voronoi re-
but, in addition it includes the distance between all the gion is calculated. In both graphs, dark and clear zones
neighbouring neurons (painted in a circle form between correspond with high and low probability density areas,
each pair of neighbour units). Finally, the sum of the respectively, so it can be used in cluster analysis. Fig. 4
distance between the synaptic vector of a neuron and shows an example of these two methods compared with
the synaptic vectors of the rest of units is calculated, those obtained using Kohonens model when chain-link
when distance addition map is generated. In distance distribution data set is used. Using Kohonens model is
map and U-map, dark zones represent clusters and clear difficult to distinguish the number of clusters present
zones boundaries along with them. In distance addi- in the input space. On the other hand, GCS model has
tion map, neurons with near synaptic vectors appear generated three output meshes, two of them represent-
with similar colour, and boundaries can be detected ing one ring.
analyzing the regions where a considerable colour
Figure 3. From left to right: distance map, U-map (unified distance map), and distance addition map when an
eleven separate regions distribution data set is used. (a) Kohonen feature map with 10x10 grid of neurons. (b)
GCS network with 100 output neurons. The right column shows the input data and the network projection using
the two component values of the synaptic vectors.
(a)
(b)
784
Growing Self-Organizing Maps for Data Analysis
Figure 4. From left to right: Unified distance map, data histograms and quantization error maps when chain-link
distribution data set is used. (a) Kohonen feature map with 10x10 grid of neurons. (b) GCS network with 100
output neurons. The right column shows the input data and the network projection using the three component
values of the synaptic vectors.
(a)
(b)
Figure 5. GCS network trained with multidimensional data of satellite image, 54 output neurons. Graphs from
(a) to (f) show the component planes for the six elements of the synaptic vectors. (g) Direct visualization map
using distance addition map additional information.
785
Growing Self-Organizing Maps for Data Analysis
plane graphs exhibit possible dependences involving dimensional nature. We need to study the viability of
TM1, TM2 and TM3 input vector components and, cluster analysis with this projection technique when
TM5 and TM7 components too. this class of data samples is used.
Results
CONCLUSION
Several Kohonen and GCS networks have been trained
in order to evaluate and compare the resulting visualiza- The exposed embedding method allows multidimen-
tion graphs. For the sake of space only a few of these sional data to be displayed as two-dimensional grey
maps have been included here. Fig. 3 and Fig. 4 compare images. The visual-spatial abilities of human observers
Kohonen and GCS visualizations using distance map, can explore these graphical maps to extract interrela-
U-map, distance addition map, data histograms and tions and characteristics in the dataset.
quantization error map. It can be observed that GCS In GCS model the networks size does not have to
model offers much better graphical results in clusters be specified in advance. During the training process,
analysis than Kohonen networks. The removal of the size of the network grows and decreases adapting
units and connections inside low probability distribu- its architecture to the particular characteristics of the
tion areas causes that GCS network presents within a training dataset.
particular cluster the same quality of information that Although in GCS networks it is necessary to deter-
Kohonen network in relation to the entire map. Since mine a great number of training factors than in Kohonen
it has already been mentioned, the grey scale used in model, using the learning modified model the tuning of
all the maps is relative to the maximum and minimum the training factors values is simplified. In fact, several
values taken by the studied parameter. In all the cases experiments have been made on datasets of diverse
the range of values taken by the calculated factor using nature using the same values for all the training factors
GCS is minor than using Kohonen maps. and giving excellent results in all the cases.
The exposed visualization methods applied to the Especially notable is the cluster detection during the
visual analysis of multidimensional satellite data has self-organization process without any other additional
given very satisfactory results (Fig 5). All trained GCS information.
networks have been able to generate six sub maps in
the output layer (in some case they have arrived up to
eight) that identify the six land cover classes present REFERENCES
in the sample of data. The direct visualization map
and the component plane graphs have demonstrated Delgado S., Gonzalo C., Martnez E., & Arquero A.
to be a useful tool for the extraction of knowledge of (2004). Improvement of Self-Organizing Maps with
the multisensorial data. Growing Capability for Goodness Evaluation of
Multispectral Training Patterns. IEEE International
Proceedings of the Geoscience and Remote Sensing
FUTURE TRENDS Symposium. 1, 564-567.
Franzmeier M., Witkowski U., & Rckert U. (2005).
The proposed knowledge visualization method based
Explorative data analysis based on self-organizing
on GCS networks has results a useful tool for mul-
maps and automatic map analysis. Lecture Notes in
tidimensional data analysis. In order to evaluate the
Computer Science. 3512, 725-733.
quality of the trained networks we consider necessary
to develop some measure techniques (qualitative and Fritzke, B (1994). Growing Cell Structures A self-
quantitative in numerical and graphical format) to organizing Network for Unsupervised and Supervised
analyze the topology preservation obtained. In this way Learning. Neural Networks. 7(9), 1441-1460.
we will be able to validate the information visualized
by the methods presented in this paper. Fritzke, B (1995). A growing neural gas network learns
Also it would be interesting to validate these meth- topologies. Advances in neural information processing
ods of visualisation with new data sets of very high systems. 7, 625-632.
786
Growing Self-Organizing Maps for Data Analysis
787
788
Francesc Alas
Universitat Ramon Llull, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
p
work (Alas & Llor, 2003), introduced evolutionary
C t (ti , ui ) = wtj C uj (ti , ui )
j =1 (2)
computation to perform this tuning. More precisely, G
Genetic Algorithms (GA) were applied to obtain the
q most appropriate weight. The main added value of
C c (ui 1 , ui ) = wcj C cj (ui 1 , ui ) making use of GA to find optimal weight configura-
j =1 (3) tion is the independency to linear search models (as in
n MLR)and, in addition, it avoids the exhaustive search
where t1 represents the target units sequence {t1, (as in WSS).
n
t2,...,tn} and u1 represents the candidate units sequence However, all this methods lack on its dependency
{u1, u2,..., un}. on the acoustic measure to determine the actual quality
Pj ( ti )
2 of the synthesized speech, which in most part is rela-
P
tive to human hearing. To obtain better speech qual-
C tj (ti , ui ) = 1 e j
(4) ity, it was suggested that user should take part in the
2
process. In (Alas, Llor, Iriondo, Sevillano, Formiga
PjR ( ui 1 ) PjL ( ui )
& Socor, 2004) there were conducted preference
Pj
C cj (ui 1 , ui ) = 1 e
tests by synthesizing the training text according to two
(5) different weights and comparing the obtained speech
subjective quality. Subsequently, Active Interactive
Appropriate design of cost function by means of Genetic Algorithms were presented in (Llor, Sastry,
weight training is a crucial to earn high quality synthetic Goldberg, Gupta & Lakshmi, 2005) as one interactive
speech (Black, 2002). Nevertheless this concern has evolutionary computation method where the user
focused approaches with no unique response. Several feedback evolves the solutions through survival-of-
techniques have been suggested for weight tuning, the-fittest mechanism. The solutions inherent fitness is
which may be spitted into three families: i) manual-tun- based on the partial order provided by the evaluator;
ing ii) computationally-driven purely objective methods Active iGAs base its efficiency on evolving different
and iii) perceptually optimized techniques (Alas, Llor, solutions by means of surrogate fitness, which gener-
Formiga, Sastry & Goldberg, 2006). The present review alize the user preferences. This surrogate fitness and
is based on the techniques based on human feedback to the evolutionary process are based on the following
the training process, following previous work (Alas, key elements: i) partial ordering, ii) induced complete
Llor, Formiga, Sastry & Goldberg, 2006), which is order, and iii) surrogate function via Support Vector
outlined in the next section. Machines (-SVM). Preference decisions made by
the user are modelled as a directional graph which
The Approach: Interactive Evolutionary is used to generate partial ordering of solutions (e.g:
Weight Tuning x1 > x2 ; x2 > x3 : x1 x2 x3 ) (see figure 1). Table 1
shows the approach of global rank based on dominance
Computationally-driven purely objective methods are measure: given a vertex v, the number of dominated
mainly focused on an acoustic measure (obtained from vertexes (v) and dominating vertexes is computed.
cepstral distances) between the resynthesized and the Using this measures, the estimated fitness may be
natural signals. Hunt and Black adopted two approaches computed as f (v) = (v) (v) . The estimated ranking
in (Hunt & Black, 1996). The first approach was based r(v) is obtained by sorting based on f (v) (Llor, Sastry,
on adjusting the weights through an exhaustive search Goldberg, Gupta & Lakshmi, 2005). The procedure of
of a prediscretized weight space (weight space search, aiGA is detailed in algorithm 1.
WSS). The second approach proposed by the authors However, once the global weights were obtained
used a multilinear regression technique (MLR), across with aiGA, there was no single dominant weight solu-
the entire database to compute the desired weights. tion (Alas, Llor, Formiga, Sastry & Goldberg, 2006),
Later, Meron and Hirose (Meron & Hirose, 1999) pre- i.e. each test performed by different users gave similar
sented a methodology that improved the efficiency of and different solutions. This fact implied that a second
the WSS and refined the MLR method. In a previous group of users had to validate the obtained weights.
789
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
Figure 1. (Left) Partial evaluations allow building a directed graph among all solutions. (Right) Obtained graph
must be cleared to avoid cycles and draws.
Table 1. Estimation of the global ranking based on the dominance measure. Dur T Ene T and Pit T stand for the
weight values for target weights (duration, energy and pitch). In the same way, Pit C Ene C and Mfc C stand for
the weight values for concatenation weights (Pitch, Energy and Mel Frequency Cepstrum).
Thus, clustering problem from different tests was suit- GENERATIVE TOPOGRAPHIC mAPPING
able to the weight tuning problem with the goal of BEING A PLUS
extracting consistent results from the user tests.
GTM in a Nutshell
790
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
Algorithm 1
G
Algorithm 1 Algorithm description of active iGA
procedure aiGA()
1 Create an empty directed graph G .
2 Create 2 h random initial solutions ( S set).
3 Create the hierarchical tournament set T using the available solutions in S .
4 Present the tournaments in T to the user and update the partial ordering in G .
5 Estimate r(v) for each v S .
6 Train the surrogate E -SVM synthetic fitness based on S and r(v) .
7 Optimize the surrogate E -SVM synthetic fitness with cGA.
8 Create the S set with 2 h 1 different solutions where S S = , sampling out of
the probabilistic model evolved by cGA.
9 Create the hierarchical tournament set T with 2 h 1 tournaments using 2 h 1
solutions in S and 2 h 1 solucions in S .
10 S S S .
11 T T T .
12 Go to 4 while not converged.
methods perform this grouping (Figuereido & Jain, sional data is the largely appropriate added value of
2002): Expectation Maximization (EM), k-means, SOM. In addition, Generative Topographic Map-
Gaussian Mixture Models (GMM), Self Organizing ping is a nonlinear latent variable model introduced
Maps (SOM) and Generative Topographic Mapping, in (Bishop, Svensen & Williams, 1998). GTM intends
among others. to give an substitute answer to SOM by means of over-
Techniques may be grouped, according to (Figuere- coming its restrictions which are listed in (Kohonen,
ido & Jain, 2002), into two types of formulation: i) 2006): i) the absence of a cost function, ii) the lack of
model-based methods (e.g. GMM, EM, GTM) and a theoretical basis for choosing learning rate parameter
ii) heuristic methods (e.g. k-means or hierarchical schedules and neighbourhood parameters to ensure
agglomerative methods). The number of sources gen- topographic ordering, iii) the absence of any general
erating the data is the differential propriety. Indeed, proofs of convergence and iv) the fact that the model
model-based methods suppose that the observations does not define a probability density.
have been fashioned by one (arbitrarily chosen and GTM is based on a constrained mixture of GMM
unidentified) source of a set of alternative arbitrary whose parameters can be tuned through EM algorithm.
sources. Therefore, inferring these tuned sources and The handicap of heuristic based models is that there is
mapping the source to each observation leads to a clus- not a-priori distribution of the centroids for each cluster.
tering of the set of observations. Otherwise, heuristic In GTM, the set of latent points is modelled as a grid.
methods assume only one source for the observed data A circular gaussian distribution is a point in the grid
considering similar heterogeneity for them. with its equivalent correspondence, through a weighted
Self-Organizing Maps (or Kohonen maps) (Koho- non-linear basis functions, onto the multidimensional
nen, 1990) are a clustering technique based on neural space. Thus, grid is shaped to wrap the data due to the
networks. The easiness of visualizing of multidimen- explicit order among the gaussian distributions
791
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
Modelling User Preferences by GTM It is to note that the fitness itself does not get in-
volved into the optimization EM part on behalf it is
GTM is able to extract solutions from the different relative to each user and is not known for unevaluated
aiGA evolved graphs due to the consistency of its weight combinations for one specific user (unless -
theoretical basis. The key objective is to recognize SVM predicted).
important clusters inside the evolved data space and
therefore, determine the fitness entropy of each cluster Experiments and Results
in terms of fitness variance to choose the global weight
configuration set. On (Formiga & Alas, 2007) common knowledge was
GTM can model the best aIGA weights from multi- extracted from user evolved weights from previous
dimensional weight space into a two-dimensional space. tests conducted on catalan speech corpus with 9863
Taking into account the cluster with higher averaged recorded units (1207 diphones and triphones) (ob-
fitness and lower standard deviation allows selecting tained from 1520 sentences and words) (Alas, Llor,
the best weight configuration from different user aiGA Formiga, Sastry & Goldberg., 2006). On that test,
models. For adjusting this method the geometry of the five phonetically balanced sentences where extracted
gaussian distributions and the size of the latent space from the corpus to perform the global weight tun-
have to be set up manually. EM weights GTM cen- ing process by a web interface named SinEvo (Alas,
troids and the basis functions. Then, it is extracted from Llor, Iriondo, Sevillano, Formiga & Socor, 2004).
each cluster the average fitness as well as its standard The evolved weights were normalized through Max-
deviation. The computation of the averaged fitness and Min normalization to range all weights between 0 and
standard deviation is computed from the set which its 1. That test was conducted by three users, obtaining
weight combinations bayesian a posteriori probability fifteen different weight configurations.
is the highest to the cluster.
Figure 2. Performance of GTM: Different pareto fronts were analyzed for each configuration
792
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
On (Formiga & Alas, 2007), different configura- where Ustands for the number of users, W for the
tions of GTM were analyzed for mapping normalized number of weight configurations, N stands for the total G
weights (hexagonal or rectangular grid and different number of weights (U + W).
grid sizes: (3 3, 4 4, 5 5)). The purpose of this In addition, to avoid a perceptual manual validation
analysis was to find the optimal GTM configuration, stage ten different users-not involved whatsoever in the
i.e. The one which minimizes averaged standard de- tuning process-performed a comparison to the aIGA best
viation (std) per cluster and the number of significant weights to allow a comparison between GTM clustering
clusters per population (with averaged fitness over and real human preference on validation stage.
75%) while maximizing the averaged mean fitness per Analyzing the results on figure 3, the GTM most
cluster. As it may be noticed in figure 2, the 4 4 grid voted weights configurations fit in with the manual user
configuration with hexagonal latent grid was selected preferences for three sentences (De la Seva Selva, Del
as it yielded the best Pareto front (although the rest of Seu Territori and Grans Extensions). Though, the rest
4x4 grids achieved similar performance). of the sentences have quite different behaviour. The
After GTM was set up, each evolved weights were best weight combination selected from the users was
extracted and mapped to other users GTMs within the the second GTM best weight configuration in I els
same sentence, obtaining their corresponding fitness han venut while the best GTM weight combination
from the other users preferences. Equation 6 allowed was never been voted. Cosine correlation is taken into
to set a global fitness (gF) from overall averaged fitness account among problematic weights configurations as
( FAvGTM ) for each evolved weight configuration. the important matter is weight distribution instead of
analyzing the values themselves. In this case, GTM two
1 U W GTM better weights have a 0.7841 correlation, so GTM re-
gFw = FAv ( w j , ui )
i N i =1 j =1 sults may be measured satisfactory as weights approach
(6)
equivalent patterns. By the other hand, the correlation
Figure 3. The results of the comparison between normalized user voted preferences and GTM mapping are
presented for the five sentences. Two different solutions for same user were considered if they adopted similar
fitness in aIGA model.
793
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
between the two best GTM weights configurations is 8th International Conference on Spoken Language
0.8323 in Fusta de Birmnia and, as in the previous Processing (ICSLP) Jeju Island, Korea.
case, the correlation gives again satisfactory results.
Alas, F., Llor, X. (2003): Evolutionary weight tun-
ing based on diphone pairs for unit selection speech
synthesis. In: Proceedings of the 8th European Con-
FUTURE TRENDS
ference on Speech Communication and Technology
(EuroSpeech).
Future work will be focused on conducting new ex-
periments, e.g. by clustering similar units instead of Bishop, C.M., Svensen, M., Williams, C.K.I. (1998):
tackling global weight tuning on preselected sentences GTM: The generative topographic mapping. Neural
or by including more users in the training process. In Comp. 10(1) 215234.
addition the expansion of the capabilities of GTM to
Black, A.W., Campbell, N. (1995): Optimising selec-
map user preferences opens the possibility to focus
tion of units from speech databases for concatenative
on non-linear cost functions so as to overcome the
synthesis. In: Proceedings of EuroSpeech. Volume 1.,
linearity restrictions of the present function.
Madrid 581584.
Black, A.W. (2002): for all of the people all of the time.
CONCLUSIONS In: IEEE Workshop on Speech Synthesis (Keynote),
Santa Monica, USA.
This article continues the work of including user pref-
Figueiredo, M.A.F., Jain, A.K. (2002): Unsupervised
erences for tuning the weights of the cost function in
learning of finite mixture models. Pattern Analysis
Unit-selection TTS systems. In previous works we
and Machine Intelligence, IEEE Transactions on 24(3)
have presented a method to find cost function optimal
381396.
weight tuning based on perceptual criteria of individual
users. As a next step, this paper applies a heuristic Formiga L., Alas F. (2007): Extracting User Preferences
method for choosing the best solution among all users by GTM for aiGA Weight Tuning in Unit Selection
overcoming the need to conduct a second listening test Text-to-Speech Synthesis , International Workshop
to select the best weight configuration among indi- on Artificial Neural Networks (IWANN07), pp. 654-
vidual optimal solutions. This proof-of-principle study 661, ISBN 978-3-540-73006-4, June 2007, Donostia
shows that GTM is capable of mapping the common (Spain).
knowledge among different users thanks to working
on the perceptually optimized weights space obtained Formiga L., Alas F. (2006): Heuristics for imple-
through aiGA and getting a final solution that can be menting the A* algorithm for unit selection TTS
used for a final adjustment of the TTS. synthesis systems , IV Jornadas en Tecnologa del
Habla (4JTH06), pp. 219-224, ISBN 84-96214-82-6,
november, Zaragoza (Spain).
REFERENCES Holland J. (1975): Adaptation in Natural and Artificial
Systems. Ann arbor, Univ. of Michigan Press.
Alas, F., Llor, X., Formiga, L., Sastry, K., Goldberg,
D.E. (2006): Efficient interactive weight tuning for Hunt, A., Black, A.W. (1996): Unit selection in a
tts synthesis: reducing user fatigue by improving user concatenative speech synthesis system using a large
consistency. In: Proceedings of ICASSP. Volume I., speech database. In: Proceedings of ICASSP. Volume
Toulouse, France 865868. 1., Atlanta, USA 373376.
Alas, F., Llor, X., Iriondo, I., Sevillano, X., Formiga, Kohonen, T. (2006): Self-Organizing Maps. Spring-
L., Socoro, J. C. (2004): Perception- Guided and Pho- er.
netic Clustering Weight Tuning Based on Diphone Kohonen, T. (1990): The self-organizing map. Proceed-
Pairs for Unit Selection TTS. In: Proceedings of the ings of the IEEE 78(9) 14641480.
794
GTM User Modeling for aIGA Weight Tuning in TTS Synthesis
Llor., X., Sastry, K., Goldberg, D.E., Gupta, A., Lak- Mel Frequency Cepstral Coefficients (MFCC):
shmi, L. (2005): Combating user fatigue in IGAs: Partial The MFCC are the coefficients of the Mel cepstrum. G
ordering, support vector machines, and synthetic fitness. The Mel-cepstrum is the cepstrum computed on the
Proceedings of Genetic and Evolutionary Computation Mel-bands (scaled to human ear) instead of the Fourier
Conference 2005 (GECCO-2005) 13631371 note: spectrum.
(Also IlliGAL Report No. 2005009).
Natural Language Processing (NLP): Computer
Meron, Y., Hirose, K. (1999): Efficient weight understanding, analysis, manipulation, and/or genera-
training for selection based synthesis. In: Proceed- tion of natural language.
ings of EuroSpeech. Volume 5., Budapest, Hungary
Pitch: Intonation measure given a time in the
23192322.
signal.
Prosody: A collection of phonological features
including pitch, duration, and stress, which define the
KEy TERmS
rhythm of spoken language.
Correlation: A statistical measurement of the Text Normalization: The process of converting
interdependence or association between two or abbreviations and non-word written symbols into
qualitative variables. A typical calculation would be words that a speaker would say when reading that
performed by multiplying a signal by either another symbol out loud.
signal (cross-correlation) or by a delayed version of
Unit Selection Synthesis: A synthesis technique
itself (autocorrelation).
where appropriate units are retrieved from large da-
Digital Signal Processing (DSP): DSP, or Digital tabases of natural speech so as to generate synthetic
Signal Processing, as the term suggests, is the processing speech.
of signals by digital means. The processing of a digital
Unsupervised Learning: Learning techniques that
signal is done by performing numerical calculations.
group instances without a pre-specified dependent at-
Diphone: A sound consisting of two phonemes: one tribute. Clustering algorithms are usually unsupervised
that leads into the sound and one that finishes the sound. methods for grouping data sets.
e.g.: hello silence-h h-eh eh-l l-oe oe-silence.
Self-Organizing Maps: Self-organizing maps
Evolutionary Algorithms: Collective term for all (SOMs) are a data visualization technique which reduce
variants of (probabilistic) optimization and approxima- the dimensions of data through the use of self-organiz-
tion algorithms that are inspired by Darwinian evolution. ing neural networks
Optimal states are approximated by successive improve-
Surrogate Fitness: Synthetic fitness measure that
ments based on the variation-selection-paradigm.
tries to evaluate one evolutionary solution in the same
Generative Topographic Mapping (GTM): Itisa terms as one perceptual user would
technique for density modelling and data visualisation
inspired in SOM (see SOM definition).
795
796
Avichai Meged
Bar-Ilan University, Israel
INTRODUCTION status, etc., (Zhang & Srihari, 2003). This makes the
binary approach for data representation unattractive
Representing and consequently processing fuzzy data for widespread data types.
in standard and binary databases is problematic. The The current research describes a novel approach
problem is further amplified in binary databases where to binary representation, referred to as Fuzzy Binary
continuous data is represented by means of discrete Representation. This new approach is suitable for all
1 and 0 bits. As regards classification, the problem data types - nominal, ordinal and as continuous. We
becomes even more acute. In these cases, we may want propose that there is meaning not only to the actual
to group objects based on some fuzzy attributes, but explicit attribute value, but also to its implicit similarity
unfortunately, an appropriate fuzzy similarity measure to other possible attribute values. These similarities can
is not always easy to find. The current paper proposes either be determined by a problem domain expert or au-
a novel model and measure for representing fuzzy tomatically by analyzing fuzzy functions that represent
data, which lends itself to both classification and data the problem domain. The added new fuzzy similarity
mining. yields improved classification and data mining results.
Classification algorithms and data mining attempt More generally, Fuzzy Binary Representation and re-
to set up hypotheses regarding the assigning of differ- lated similarity measures exemplify that a refined and
ent objects to groups and classes on the basis of the carefully designed handling of data, including eliciting
similarity/distance between them (Estivill-Castro & of domain expertise regarding similarity, may add both
Yang, 2004) (Lim, Loh & Shih, 2000) (Zhang & Srihari, value and knowledge to existing databases.
2004). Classification algorithms and data mining are
widely used in numerous fields including: social sci-
ences, where observations and questionnaires are used BACKGROUND
in learning mechanisms of social behavior; marketing,
for segmentation and customer profiling; finance, for Binary Representation
fraud detection; computer science, for image process-
ing and expert systems applications; medicine, for Binary representation creates a storage scheme, wherein
diagnostics; and many other fields. data appear in binary form rather than the common
Classification algorithms and data mining meth- numeric and alphanumeric formats. The database
odologies are based on a procedure that calculates a is viewed as a two-dimensional matrix that relates
similarity matrix based on similarity index between entities according to their attribute values. Having
objects and on a grouping technique. Researches the rows represent entities and the columns represent
proved that a similarity measure based upon binary possible values, entries in the matrix are either 1 or
data representation yields better results than regular 0, indicating that a given entity (e.g., record, object)
similarity indexes (Erlich, Gelbard & Spiegler, 2002) has or lacks a given value, respectively (Spiegler &
(Gelbard, Goldman & Spiegler, 2007). However, binary Maayan, 1985).
representation is currently limited to nominal discrete In this way, we can have a binary representation for
attributes suitable for attributes such as: gender, marital discrete and continuous attributes.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Handling Fuzzy Similarity for Data Classification
Table 1 illustrates binary representation of a database Zadeh (1971) model, are essential in the classification
consists of five entities with the following two attributes: process.
Marital Status (nominal) and Height (continuous). In the current study we use similarities between enti-
ties and between entitys attribute values to get better
Marital Status, with four values: S (single), M classification. Following former reserches, (Gelbard
(married), D (divorced), W (widowed). & Spiegler, 2000) (Erlich, Gelbard & Spiegler, 2002),
Heights, with four values: 1.55, 1.56, 1.60 and the current study also uses Dice measure.
1.84.
Fuzzy Logic
However, practically, binary representation is cur-
rently limited to nominal discrete attributes only. In the The theory of Fuzzy Logic was first introduced by
current study, we extend the binary model to include Lotfi Zadeh (Zadeh, 1965). In classical logic, the
continuous data and fuzzy representation. only possible truth-values are true and false. In Fuzzy
Logic; however, more truth-values are possible beyond
Similarity Measures the simple true and false. Fuzzy logic, then, derived
from fuzzy set theory, is designed for situations where
Similarity/distance measures are essential and at the information is inexact and traditional digital on/off
heart of all classification algorithms. The most com- decisions are not possible.
monly-used method for calculating similarity is the Fuzzy sets are an extension of classical set theory
Squared Euclidean measure. This measure calculates and are used in fuzzy logic. In classical set theory,
the distance between two samples as the square root membership of elements in relation to a set is assessed
of the sums of all squared distances between their according to a clear condition; an element either belongs
properties (Jain & Dubes, 1988) (Jain, Murty & Flynn, or does not belong to the set. By contrast, fuzzy set theory
1999). permits the gradual assessment of the membership of
However, these likelihood-similarity measures are elements in relation to a set; this is described with the
applicable only to ordinal attributes and cannot be used aid of a membership function . An element
to classify nominal, discrete, or categorical attributes, mapped to the value 0 means that the member is not
since there is no meaning in placing such attribute values included in the given set, 1 describes a fully included
in a common Euclidean space. A similarity measure, member, and all values between 0 and 1 characterize
which applicable to nominal attributes and used in our the fuzzy members. For example, the continuous vari-
research is the Dice (Dice 1945). able Height may have three membership functions;
Additional binary similarity measures were de- stand for Short, Medium and Tall categories.
veloped and presented (Illingworth, Glaser & Pyle, An object may belong to few categories in different
1983) (Zhang & Srihari, 2003). Similarities measures membership degree, e.g 180 cm. height may belong
between the different attribute values, as proposed in to the Medium and Tall categories, in different
797
Handling Fuzzy Similarity for Data Classification
membership degree expressed by the range [0,1]. The naires) a 1 satisfactory rank (ordinal variable) might
membership degrees are returned from the membership be closer to the 2 rank than to the 5 rank.
functions. We can say that a man whose height is 180 The absence of these similarity intuitions are of
cm. is slightly medium and a man whose height is paramount importance in classification and indeed may
200 cm. is of perfect tall height. cause some inaccuracies in classification results.
Different membership functions might represent The following sections present a model that adds
different membership degrees. Having several possibili- relative similarity values to the data representation. This
ties for membership functions is part of the theoretical serves to empower the binary representation to better
and practical drawbacks in Zadas model. There is no handle both continuous and fuzzy data and improves
right way to determine the right membership functions classification results for all attribute types.
(Mitaim & Kosko, 2001). Thus, a membership function
may be considered arbitrary and subjective. Model for Fuzzy Similarity
In the current work, we make use of membership Representation
functions to develop the enhanced similarity calculation
for use in classification of fuzzy data. In standard binary representation, each attribute (which
may have several values, e.g., color: red, blue, green,
FUZZy SImILARITy REPRESENTATION etc.) is a vector of bits where only one bit is set to 1
and all others are set to 0. The 1 bit stands for
Standard Binary Representation exhibits data integrity the actual value of the attribute. In the Fuzzy Binary
in that it is precise, and preserves data accuracy without Representation, the zero bits are replaced by relative
either loss of information or rounding of any value. The similarity values.
mutual exclusiveness assumption causes the isolation The Fuzzy Binary Representation is viewed as a
of each value. This is true for handling discrete data two-dimensional matrix that relates entities according to
values. However, in dealing with a continuous attribute, their attribute values. Having the rows represent entities
e.g. Height, we want to assume that height 1.55 is closer and the columns represent possible values, entries in the
to 1.56 than to 1.60. However, when converting such matrix are fuzzy numbers in the range [0,1], indicating
values into a mutually exclusive binary representation the similarity degree of specific attribute value to the
(Table 1), we lose these basic numerical relations. actual one, where 1 means full similarity to the actual
Similarity measures between any pair with different value (this is the actual value), 0 means no similarity
attribute values is always 0, no matter how similar the al all and all other values means partial similarity.
attribute values are. This drawback makes the standard The following example illustrates the way for cre-
binary representation unattractive for representing and ating the Fuzzy Binary Representation: Lets assume
handling continuous data types. we have a database of five entities and two attributes
Similarity between attribute values is also needed for represented in a binary representation as illustrated in
nominal and ordinal data. For example, the color red Table1. The fuzzy similarities between all attribute val-
(nominal value) is more similar to the color purple ues are calculated (next section describes the calculation
than it is to the color yellow. In ranking (question- process) and represented in a two-dimensional Fuzzy
Table 2. Fuzzy similarity matrixes of the marital status and height attributes
798
Handling Fuzzy Similarity for Data Classification
Similarity Matrix, wherein rows and columns stand entity 1) and the other attributes values (e.g. 1.55,
for the different attributes values, and the matrix cells 1.56 and 1.84). H
contain the fuzzy similarity between the value pairs. Next, the fuzzy similarities, presented in decimal
The Fuzzy Similarity Matrix is symmetrical. Table 2 form, are converted into a binary format the Fuzzy
illustrates fuzzy similarity matrixes for Marital Status Binary Representation. The conversion should allow
and Height attributes. similarity indexes like Dice.
The Marital Status similarity matrix shows that To meet this requirement, each similarity value is
the similarity between Single and Widow is high represented by N binary bits, where N is determined
(0.8), while there is no similarity between Single and by the required precision. For one- tenth precision, 10
Married (0). The Height similarity matrix shows that binary bits are needed, for one-hundredth precision,
the similarity between 1.56 and 1.60 is 0.8 (high 100 binary bits are needed. For ten bits precision
similarity), while the similarity between 1.55 and fuzzy similarity 0 will be represented by ten 0s,
1.84 is 0 (not similar at all). These similarity matrixes the fuzzy similarity 0.1 will be represented by nine
can be calculated automatically, as is explained in the 0 followed by one 1, the fuzzy similarity 0.2 will
next section. be represented by eight 0s followed by two 1s and
Now, the zero values in the binary representation so on till the fuzzy similarity 1 which will be repre-
(Table 1) are replaced by the appropriate similarity sented by ten 1s. Table 4 illustrates the conversion
value (Table 2). For example, in Table 1, we will replace from fuzzy representation (Table 3) to fuzzy binary
the zero-bit stands for Height 1.55 of the first entity, representation.
with the fuzzy similarity between 1.55 and 1.60 (the The Fuzzy Binary Representation illustrated in Table
actual attribute value), as indicated in the Height fuzzy 4 is suitable for all data types (discrete and continuous)
similarity matrix (0.7). Table 3 illustrates the fuzzy and, with the new knowledge (fuzzy similarities values)
representation accepted after such replacements. it contains, a better classification is expected.
It should be noted that the similarities indicated The following section describes the process for
in the fuzzy similarity table relate to the similarity similarity calculations necessary for this type of Fuzzy
between the actual value of the attribute (e.g. 1.60 in Binary Representation.
799
Handling Fuzzy Similarity for Data Classification
Fuzzy Similarity Calculation Now that we have discussed both a model for
Fuzzy Binary Representation and a way to calculate
Similarity calculation between the different attribute similarities, we will show the new knowledge (fuzzy
values is not a precise science, i.e., there is no one way similarities) added to the standard binary representa-
to calculate it, just as there is no one way to develop tion improve the similarity measures between different
membership functions in the Fuzzy Logic world. entities, as discussed in the next section.
We suggest determining similarities according to
the attribute type. A domain expert should evaluate
similarity for nominal attributes like Marital Status. COmPARING STANDARD AND FUZZy
For example, Single, Divorced and Widowed are con- SImILARITIES
sidered one person, while Married is considered as
two people. Therefore, Single may be more similar In this section, we compare standard and fuzzy similari-
to Divorced and Widowed than it is to Married. On the ties. The similarities were calculated according to the
other hand Divorced is one that once was married, Dice index for the example represented in Table 4.
so may be it is more similar to Married than to single. Table 5 combines similarities of the different entities
In short, similarity is a relative, rather than an absolute related to (a) Martial Status (nominal), to (b) Height
measure, as there is hardly any known automatic way (continuous) and to (c) both the Marital Status and
to calculate similarities for such attributes and therefore Height attributes.
a domain expert is needed. Several points and findings arise from the repre-
Similarity for ordinal data like satisfactory rank can sentations shown above (Table 5). These are briefly
be calculated in the same way as for nominal or con- highlighted below:
tinuous attributes depending on the nature of attributes
values. Similarity for continuous data like Height can 1. In our small example, a nominal attribute (Marital
be calculated automatically. Unlike nominal attributes, Status) represented in standard binary representa-
in continuous data there is an intuitive meaning to the tion cannot be used for classification. In contrast,
distance between different values. For example, as the Fuzzy Binary Representation, with a large
regards the Height attribute, the difference between diversity of similarities results, will enable better
1.55 and 1.56 is smaller than the distance between 1.55 classification. Grouping entities with a similarity
and 1.70; therefore, the similarity is expected to be that is equal to or greater than 0.7 yields a class
higher accordingly. For continuous data, an automatic of entities 2, 3, 4 and 5, which represent Single,
method can be constructed, as showed, to calculate the Divorced and Widowed that belong to the set
similarities. one person.
Depending on the problem domain, a continuous 2. For a continuous attribute (Height) represented in
attribute can be divided into one or more fuzzy sets the standard binary representation, classification is
(categories), e.g., the Height attribute can be divided not possible. In contrast, the Fuzzy Binary Repre-
into three sets: Short, Medium and Tall. A membership sentation with diversity in similarities results will,
function for each set can be developed. once again, enable better classification. Entities
The calculated similarities depend on the specified 1 and 5 have absolute similarity (1), since for the
membership functions; therefore, they are referred to Height attribute they are identical. Entities 2 and
here as fuzzy similarities. The following algorithm 4 (similarity = 0.94) are very similar, since they
can be used for similarity calculations of continuous represent the almost identical heights of 1.55
data: and 1.56, respectively. Classification based on
these two entities is possible due to diversity of
For each pair of attribute values (v1 and v2) similarities.
For each membership function F 3. The same phenomena presented for a single at-
Similarities (v1, v2) = 1 - distance tribute (Marital Status or Height) exist also for
between F(v1) and F(v2) the both attributes (Marital Status + Height) when
Similarity (v1, v2) = Maximum of are taking together. Similarity greater than 0.8 is
the calculated Similarities
800
Handling Fuzzy Similarity for Data Classification
801
Handling Fuzzy Similarity for Data Classification
Erlich, Z., Gelbard, R. & Spiegler, I. (2002). Data Zhang, B., & Srihari, S.N. (2004). Fast k-Nearest
Mining by Means of Binary Representation: A Model Neighbor Classification Using Cluster-based Trees.
for Similarity and Clustering. Information Systems IEEE Transactions on Pattern Analysis and Machine
Frontiers, 4(2), 187-197. Intelligence, 26(4), 525-528.
Estivill-Castro, V. & Yang J. (2004). Fast and Robust
General Purpose Clustering Algorithms. Data Mining
and Knowledge Discovery, 8(2), 127-150. KEy TERmS
Gelbard, R. & Spiegler, I. (2000). Hempels raven
Classification: The partitioning of a data set into
paradox: a positive approach to cluster analysis. Com-
subsets, so that the data in each subset (ideally) share
puters and Operations Research, 27(4), 305-320.
some common traits - often proximity according to
Gelbard, R., Goldman, O. & Spiegler, I. (2007). Inves- some defined similarity/distance measure.
tigating Diversity of Clustering Methods: An Empirical
Data Mining: The process of automatically search-
Comparison, Data & Knowledge Engineering, 63(1),
ing large volumes of data for patterns, using tools such
155-166.
as classification, association rule mining, clustering,
Illingworth, V., Glaser, E.L. & Pyle, I.C. (1983). Ham- etc.
ming distance. In, Dictionary of Computing, Oxford
Database Binary Representation: A representation
University Press, 162-163.
where a database is viewed as a two-dimensional matrix
Jain, A.K. & Dubes, R.C. (1988). Algorithms for Clus- that relates entities (rows) to attribute values (columns).
tering Data. Prentice Hall. Entries in the matrix are either 1 or 0, indicating that
a given entity has or lacks a given value.
Jain, A.K., Murty, M.N. & Flynn, P.J. (1999). Data
Clustering: A Review. ACM Communication Surveys, Fuzzy Logic: An extension of Boolean logic dealing
31(3), 264-323. with the concept of partial truth. Fuzzy logic replaces
Boolean truth values (0 or 1, black or white, yes or no)
Lim, T.S., Loh, W.Y. & Shih, Y.S. (2000). A Comparison
with degrees of truth.
of Prediction Accuracy, Complexity, and Training Time
of Thirty-Three Old and New Classification Algorithms. Fuzzy Set: An extension of classical set theory.
Machine Learning, 40(3), 203-228. Fuzzy set theory used in Fuzzy Logic, permits the
gradual assessment of the membership of elements in
Mitaim, S. & Kosko, B. (2001). The Shape of Fuzzy
relation to a set.
Sets in Adaptive Function Approximation. IEEE
Transactions on Fuzzy Systems, 9(4), 637-656. Membership Function: The mathematical function
that defines the degree of an elements membership in
Spiegler, I. & Maayan, R. (1985). Storage and retrieval
a fuzzy set. Membership functions return a value in the
considerations of binary data bases. Information Pro-
range of [0,1], indicating membership degree.
cessing and Management, 21(3), 233-254.
Similarity: A numerical estimate of the difference
Zadeh, L.A., (1965). Fuzzy Sets. Information and
or distance between two entities. The similarity values
Control, 8(1), 338-353.
are in the range of [0,1], indicating similarity degree.
Zadeh, L.A., (1971). Similarity Relations and Fuzzy
Ordering. Information Sciences, 3, 177-200.
Zhang, B. & Srihari, S.N. (2003). Properties of Binary
Vector Dissimilarity Measures. In, Proc. JCIS Intl
Conf. Computer Vision, Pattern Recognition, and Im-
age Processing, 26-30.
802
803
INTRODUCTION BACKGROUND
The dam is the wall that holds the water in, and the Before this study, various researchers have tackled the
operation of multiple dams is complicated decision- dam scheduling problem using phenomenon-inspired
making process as an optimization problem (Oliveira techniques.
& Loucks, 1997). Traditionally researchers have used Esat and Hall (1994) introduced a GA model to the
mathematical optimization techniques with linear dam operation. They compared GA with the discrete
programming (LP) or dynamic programming (DP) differential dynamic programming (DDDP) technique.
formulation to find the schedule. GA could overcome the drawback of DDDP which
However, most of the mathematical models are valid requires exponentially increased computing burden.
only for simplified dam systems. Accordingly, during Oliveira and Loucks (1997) proposed practical dam
the past decade, some meta-heuristic techniques, such as operating policies using enhanced GA (real-code chro-
genetic algorithm (GA) and simulated annealing (SA), mosome, elitism, and arithmetic crossover). Wardlaw
have gathered great attention among dam researchers and Sharif (1999) tried another enhanced GA schemes
(Chen, 2003) (Esat & Hall, 1994) (Wardlaw & Sharif, and concluded that the best GA model for dam opera-
1999) (Kim, Heo & Jeong, 2006) (Teegavarapu & tion can be composed of real-value coding, tournament
Simonovic, 2002). selection, uniform crossover, and modified uniform
Lately, another metaheuristic algorithm, harmony mutation. Chen (2003) developed a real-coded GA
search (HS), has been developed (Geem, Kim & model for the long-term dam operation, and Kim et
Loganathan, 2001) (Geem, 2006a) and applied to al. (2006) applied an enhanced multi-objective GA,
various artificial intelligent problems, such as music named NSGA-II, to the real-world multiple dam sys-
composition (Geem & Choi, 2007) and Sudoku puzzle tem. Teegavarapu and Simonovic (2002) used another
(Geem, 2007). metaheuristic algorithm, simulated annealing (SA), to
The HS algorithm has been also applied to various solve the dam operation problem.
engineering problems such as structural design (Lee Although several metaheuristic algorithms have
& Geem, 2004), water network design (Geem, 2006b), been already applied to the dam scheduling problem,
soil stability analysis (Li, Chi & Chu, 2006), satellite the recently-developed HS algorithm was not applied to
heat pipe design (Geem & Hwangbo, 2006), offshore the problem before. Thus, this article deals with the HS
structure design (Ryu, Duggal, Heyl & Geem, 2007), algorithms pioneering application to the problem.
grillage system design (Erdal & Saka, 2006), and hydro-
logic parameter estimation (Kim, Geem & Kim, 2001).
The HS algorithm could be a competent alternative to HARmONy SEARCH mODEL AND
existing metaheuristics such as GA because the former APPLICATION
overcame the drawback (such as building block theory)
of the latter (Geem, 2006a). This article presents two major parts. The first part
To test the ability of the HS algorithm in multiple explains the structure of the HS model; and the second
dam operation problem, this article introduces a HS part applies the HS model to a bench-mark problem.
model, and applies it to a benchmark system, then
compares the results with those of the GA model pre-
viously developed.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Harmony Search for Multiple Dam Scheduling
Dam Scheduling Model Using HS than the original amount Ri(k) obtained from the HM.
The summation of probability is equal to one (p1 + p2
The HS model has the following formulation for the + p3 + p4 = 1).
multiple dam scheduling. If the newly-generated vector, RNEW, is better than the
Maximize the benefits obtained by hydropower worst harmony in the HM in terms of objective func-
generation and irrigation tion, the new harmony is included in the HM and the
Subject to the following constraints: existing worst harmony is excluded from the HM.
If the HS model reaches MaxImp (maximum
1. Range of Water Release: the amount of water number of function evaluations), computation is termi-
release in each dam should locate between nated. Otherwise, another new harmony (= vector) is
minimum and maximum amounts. generated by considering one of three above-mentioned
2. Range of Dam Storage: the amount of dam mechanisms.
storage in each dam should locate between
minimum and maximum amounts. Applying HS to a Benchmark Problem
3. Water Continuity: the amount of dam storage
in next stage should be the summation of the The HS model was applied to a popular multiple dam
amount in current stage, the amount of inflow, system as shown in Figure 1 (Wardlaw & Sharif,
and the amount of water release. 1999).
The problem has 12 two-hour operating periods,
The HS algorithm starts with filling random schedul- and only dam 4 has irrigation benefit because outflows
ing vectors in the harmony memory (HM). The structure of other dams are not directed to farms. The range of
of HM for the dam scheduling is as follows: water releases is as follows:
804
Harmony Search for Multiple Dam Scheduling
0.0 S1, S2, S3 10, 0.0 S4 15 (4) S1(0), S2(0), S3(0), S4(0) = 5 (5)
H
The initial and final storage conditions are as fol- S1(12), S2(12), S3(12) = 5, S4(12) = 7 (6)
lows:
6
Release
0
0 1 2 3 4 5 6 7 8 9 10 11
Time t
805
Harmony Search for Multiple Dam Scheduling
There are only two inflows: 2 units to dam 1; 3 model found five different global optima under the
units to dam 2. same number of function evaluations.
Moreover, the HS model did not perform sensitivity
I1 = 2, I2 = 3 (7) analysis of algorithm parameters while the GA model
tested many parameter values and different operation
Wardlaw and Sharif (1999) tackled this dam schemes. This could reduce time and trouble in choosing
scheduling problem using an enhanced GA model parameter values in HS.
(Population Size = 100; Crossover Rate = 0.70; Mutation
Rate = 0.02; Number of Generations = 500; Number
of Function Evaluations = 35,000; Binary, Gray, & REFERENCES
Real-Value Representations; Tournament Selection;
One-Point, Two-Point, & Uniform Crossovers; and Chen, L. (2003). Real Coded Genetic Algorithm Opti-
Uniform and Modified Uniform Mutations). The GA mization of Long Term Reservoir Operation. Journal
model found a best near-optimal solution of 400.5, of the American Water Resources Association, 39(5),
which is 99.8% of global optimum (401.3). 1157-1165.
The HS model was applied to the same problem
with the following algorithm parameters: HMS = 30; Erdal, F. & Saka, M. P. (2006). Optimum Design of
HMCR = 0.95; PAR = 0.05; and MaxImp = 35,000. Grillage Systems Using Harmony Search Algorithm.
The HS model could find five different global optimal Proceedings of the 5th International Conference on
solutions (HS1 ~ HS5) with identical cost of 401.3. Engineering Computational Technology (ECT 2006),
Table 1 shows one example out of five optimal water CD-ROM.
release schedules. Esat, V. & Hall, M. J. (1994). Water Resources System
Figure 2 shows corresponding release trajectories Optimization Using Genetic Algorithms. Proceedings
in all dams. of the First International Conference on Hydroinfor-
When the HS model was further tested with different matics, 225-231.
algorithm parameter values, it found a better solution
than that (400.5) of the GA model seven cases out of Geem, Z. W. (2006a). Improved Harmony Search from
eight ones. Ensemble of Music Players. Lecture Notes in Artificial
Intelligence, 4251, 86-93.
Geem, Z. W. (2006b). Optimal Cost Design of Water
FUTURE TRENDS Distribution Networks using Harmony Search. Engi-
neering Optimization, 38(3), 259-280.
From the success in this study, the future HS model
should consider more complex dam scheduling Geem, Z. W. (2007). Harmony Search Algorithm for
problems with various real-world situations. Solving Sudoku. Lecture Notes in Artificial Intelligence,
Also, algorithm parameter guidelines obtained In Press.
from considerable experiments on the values will be Geem, Z. W. & Choi, J. Y. (2007). Music Composition
helpful to engineers in practice because meta-heuristic Using Harmony Search Algorithm. Lecture Notes in
algorithms, including HS and GA, require lots of trials Computer Science, 4448, 593-600.
to obtain best algorithm parameters.
Geem, Z. W. & Hwangbo, H. (2006). Application of
Harmony Search to Multi-Objective Optimization for
CONCLUSION Satellite Heat Pipe Design. Proceedings of US-Korea
Conference on Science, Technology, & Entrepreneur-
Music-inspired algorithm, HS, was successfully applied ship (UKC 2006), CD-ROM.
to the optimal scheduling problem of the multiple dam Geem, Z. W., Kim, J. H., & Loganathan, G. V. (2001).
system, outperforming the results of GA. While the A New Heuristic Optimization Algorithm: Harmony
GA model obtained near-optimal solutions, the HS Search. Simulation, 76(2), 60-68.
806
Harmony Search for Multiple Dam Scheduling
Kim, J. H., Geem, Z. W., & Kim, E. S. (2001). Param- KEy TERmS
eter Estimation of the Nonlinear Muskingum Model H
Using Harmony Search. Journal of the American Water Evolutionary Computation: Solution approach
Resources Association, 37(5), 1131-1138. guided by biological evolution, which begins with
potential solution models, then iteratively applies al-
Kim, T., Heo, J. -H., Jeong, C. -S. (2006). Multireser-
gorithms to find the fittest models from the set to serve
voir System Optimization in the Han River basic us-
as inputs to the next iteration, ultimately leading to a
ing Multi-Objective Genetic Algorithm. Hydrological
model that best represents the data.
Processes, 20, 2057-2075.
Genetic Algorithm: Technique to search exact or
Lee, K. S. & Geem, Z. W. (2004). A New Structural
approximate solutions of optimization or search prob-
Optimization Method Based on the Harmony Search
lem by using evolution-inspired phenomena such as
Algorithm. Computers & Structures, 82(9-10), 781-
selection, crossover, and mutation. Genetic algorithm
798.
is classified as global search algorithm.
Li, L., Chi, S. -C., & Chu, X. -S. (2006). Location of
Harmony Search: Technique to search exact or ap-
Non-Circular Slip Surface Using the Modified Harmony
proximate solutions of optimization or search problem
Search Method Based on Correcting Strategy. Rock
by using music-inspired phenomenon (improvisation).
and Soil Mechanics, 27(10), 1714-1718.
Harmony search has three major operations such as
Oliveira, R., & Loucks, D. P. (1997). Operating Rules random selection, memory consideration, and pitch
for Multireservoir Systems. Water Resources Research, adjustment. Harmony search is classified as global
33(4), 839-852. search algorithm.
Ryu, S., Duggal, A.S., Heyl, C. N., & Geem, Z. W. Metaheuristics: Technique to find solutions by com-
(2007). Mooring Cost Optimization Via Harmony bining black-box procedures (heuristics). Here, meta
Search. Proceedings of the 26th International Confer- means beyond, and heuristic means to find.
ence on Offshore Mechanics and Arctic Engineering,
Multiple Dam Scheduling: Process of developing
ASME, CD-ROM.
individual dam schedule in multiple dam system. The
Teegavarapu, R. S. V., Simonovic, S. P. (2002). Optimal schedule contains water release amount at each time
Operation of Reservoir Systems Using Simulated An- period while satisfying release limit, storage limit, and
nealing. Water Resources Management, 16, 401-428. continuity conditions.
Wardlaw, R., Sharif, M. (1999). Evaluation of Genetic Optimization: Process of seeking to optimize
Algorithms for Optimal Reservoir System Operation. (minimize or maximize) an objective function while
Journal of Water Resources Planning and Management, satisfying all problem constraints by choosing the
ASCE, 125(1), 25-33. values of continuous or discrete variables.
Soft Computing: Collection of computational
techniques in computer science, especially in artificial
intelligence, such as fuzzy logic, neural networks, chaos
theory, and evolutionary algorithms.
807
808
Marco Pacheco
PUC-Rio, Brazil
Karla Figueiredo
UERJ, Brazil
Flavio Souza
UERJ, Brazil
INTRODUCTION BACKGROUND
Neuro-fuzzy [Jang,1997][Abraham,2005] are hybrid Hybrid Intelligent Systems conceived by using tech-
systems that combine the learning capacity of neural niques such as Fuzzy Logic and Neural Networks have
nets [Haykin,1999] with the linguistic interpretation of been applied in areas where traditional approaches
fuzzy inference systems [Ross,2004]. These systems were unable to provide satisfactory solutions. Many
have been evaluated quite intensively in machine learn- researchers have attempted to integrate these two tech-
ing tasks. This is mainly due to a number of factors: niques by generating hybrid models that associate their
the applicability of learning algorithms developed for advantages and minimize their limitations and deficien-
neural nets; the possibility of promoting implicit and cies. With this objective, hybrid neuro-fuzzy systems
explicit knowledge integration; and the possibility of [Jang,1997][Abraham,2005] have been created.
extracting knowledge in the form of fuzzy rules. Most Traditional neuro-fuzzy models, such as ANFIS
of the well known neuro-fuzzy systems, however, [Jang,1997], NEFCLASS [Nauck,1997] and FSOM
present limitations regarding the number of inputs [Vuorimaa,1994], have a limited capacity for creating
allowed or the limited (or nonexistent) form to create their own structure and rules [Souza,2002a]. Addi-
their own structure and rules [Nauck,1997][Nauck,19 tionally, most of these models employ grid partition
98][Vuorimaa,1994][Zhang,1995]. of the input space, which, due to the rule explosion
This paper describes a new class of neuro-fuzzy problem, are more adequate for applications with a
models, called Hierarchical Neuro-Fuzzy BSP Systems smaller number of inputs. When a greater number of
(HNFB). These models employ the BSP partition- input variables are necessary, the systems performance
ing (Binary Space Partitioning) of the input space deteriorates.
[Chrysanthou,1996] and have been developed to by- Thus, Hierarchical Neuro-Fuzzy Systems have
pass traditional drawbacks of neuro-fuzzy systems. been devised to overcome these basic limitations. Dif-
This paper introduces the HNFB models based on ferent models of this class of neuro-fuzzy systems have
supervised learning algorithm. These models were been developed, based on supervised technique.
evaluated in many benchmark applications related to
classification and time-series forecasting. A second
paper, entitled Hierarchical Neuro-Fuzzy Systems Part HIERARCHICAL NEURO-FUZZy
II, focuses on hierarchical neuro-fuzzy models based SySTEmS
on reinforcement learning algorithms.
This section presents the new class of neuro-fuzzy
systems that are based on hierarchical partitioning.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Hierarchical Neuro-Fuzzy Systems Part I
Two sub-sets of hierarchical neuro-fuzzy systems A singleton: The case where di = constant.
(HNF) have been developed, according to the learning A linear combination of the inputs: H
process used: supervised learning models (HNFB [So
uza,2002b][Vellasco,2004], HNFB-1 [Gonalves,2006], n
d i = w k x k + w0
HNFB-Mamdani [Bezerra,2005]); and reinforcement k =1
809
Hierarchical Neuro-Fuzzy Systems Part I
Figure 1. (a) Interior of Neuro-Fuzzy BSP cell. (b) Example of HNFB system. (c) Input space Partitioning of
the HNFB system
x (input) x2
121 122
d 22 BSP
R M x1 Bi-partitioning 22
d 21 2
d1 x 2
y (output)
y
x1 BSP
x2 11
d2 x (output)
d122 BSP y12
0 21
(consequents) 1 x2
d121 12 BSP
x1
d11 1
of the model and, consequently, linguistic rules. The structure is inverted and the architecture of the HNFB-1
parameters that define the profiles of the membership model is obtained. The basic cell of this new inverted
functions of the antecedents and consequents are re- structure is described below.
garded as fuzzy weights of the neuro-fuzzy system.
In order to prevent the structure from growing indefi- Basic Inverted-HNFB Cell
nitely, a non-dimensional parameter, named decomposi-
tion rate (), was created. More details of this algorithm Similarly to the original HNFB model, a basic In-
may be found in [Souza,2002b][Gonalves,2006]. verted-HNFB cell is a neuro-fuzzy mini-system that
The results obtained in classification and time performs fuzzy binary partitioning in a particular
series forecasting problems are presented in the Case space according to the same membership functions
Studies section. and . However, after a defuzzification process,
the Inverted-HNFB cell generates two crisp outputs
instead of one. Fig. 2(a) illustrates the interior of the
HIERARCHICAL NEURO-FUZZy BPS Inverted-HNFB cell.
FOR CLASSIFICATION By considering that membership functions are
complementary, the outputs of an HNFB-1 cell are given:
The original HNFB provides very good results for y1 = B * R ( x) and y 2 = B * M ( x) , where corresponds to
function approximation and time series forecasting. one of the two possible cases below:
However, it is not ideal for pattern classification ap-
plications, since it has only one output and makes use of =the input of the first cell: so =1.
the Takagi-Sugeno inference method [Takagi,1985], =is the output of a cell of a previous level: so
which reduces the rule base interpretability. =yj, where yj represents one of the two outputs
Therefore, a new hierarchical neuro-fuzzy BSP of a generic j cell.
model dedicated to pattern classification and rule
extraction, called the Inverted HNFB or HNFB-1, has Inverted-HNFB Architecture
been developed, which is able to extract classification
rules such as: If x is A and y is B then input-pattern Fig. 2(b) presents an example of the original HNFB
belongs to class Z. This new hierarchical neuro-fuzzy architecture obtained during the training phase of a
model is denominated inverted because it applies the database containing three distinct classes, while Fig.
learning process of the original HNFB to generate the 2(c) shows how the HNFB-1 model is obtained, after
models structure. After this first learning phase, the the inversion process.
810
Hierarchical Neuro-Fuzzy Systems Part I
Figure 2. (a) HNFB-1 basic cell. (b) Original HNFB architecture. (c) Inversion of the architecture shown in Fig.
2(b). (d) Connection of the inverted architecture to T-conorm cells
xi
y2
y1
(a)
y
(o u tp u t)
y y BSP
(o u tp u t) (o u tp u t) xk 0
BSP BSP 1 2
xk 0 xk xm xm
0 BSP BSP
1 2
1 2 1 2 y y
xm xm y1 4 5
m BSP BSP BSP xm BSP
1 2 1 2
xj BSP
d1 1 d d y1 y y 12
21 22 4 5
y
12
y y3
BSP xj 2
j BSP
12 12
d d y y3
121 122 2
C la s s 1 C la s s 2 C la s s 3
(b) (c) (d)
811
Hierarchical Neuro-Fuzzy Systems Part I
812
Hierarchical Neuro-Fuzzy Systems Part I
The output y is then calculated by Eq. (8). provide a superior performance than HNFB-Mamdani.
This is due to the Takagi-Sugeno inference method H
n
j
A i * Ci used by those models, which is usually more accurate
yj = i =1
n
than the Mamdani inference method [Bezerra,2005].
j
A i The disadvantage of the Takagi-Sugeno method is its
i =1 (8)
reduced interpretability.
where:
Electric Load Forecasting
yj : output of the HNFB-Mamdani for input pattern
j; This experiment made use of data related to the monthly
ij: output value of the i-th T-conorm neuron (Ti) electric load of 6 utilities of the Brazilian electrical
for input pattern j; energy sector.
Ci: value in the universe of discourse of the output The results obtained with the HNFB models were
variable where the Mi fuzzy set presents the compared with Backpropagation algorithm, statistical
maximum value; techniques, such as the Holt-Winters and Box & Jenkins,
*: product operator.
n: total number of fuzzy sets associated with the
output variable.
813
Hierarchical Neuro-Fuzzy Systems Part I
As can be seen from the results presented, HNFB models Bezerra, R.A., Vellasco, M.M.B.R., Tanscheit, R.
provide very good performance in different applications. (2005).Hierarchical Neuro-Fuzzy BSP Mamdani
To improve the performance of the HNFB-Mamdani System, 11th World Congress of International Fuzzy
model, which provided the worst results among the Systems Association,3,1321-1326.
supervised HNFB models, the model is being extended Bishop, C.M. (1995).Neural Networks for Pattern
to allow the use of different types of output fuzzy sets Recognition, Clarendon Press.
(such as Gaussian, trapezoidal, etc.) and by adding
an algorithm to optimize the total number of output Chrysanthou, Y. & Slater, M. (1992). Computing
fuzzy sets. dynamic changes to BSP trees, EUROGRAPHICS
92,11(3),321-332.
814
Hierarchical Neuro-Fuzzy Systems Part I
Figueiredo, K.T., Vellasco, M.M.B.R. Pacheco, M.A.C. Tito, E., Zaverucha, G., Vellasco, M.M.B.R., Pacheco,
(2005a).Hierarchical Neuro-Fuzzy Models based on M. (1999).Applying Bayesian Neural Networks to H
Reinforcement Learning for Intelligent Agents, Com- Electrical Load Forecasting, 6th Int. Conf. on Neural
putational Intelligence and Bioinspired Systems,LNCS- Information Processing.
3512,424-431.
Vellasco, M.M.B.R., Pacheco, M.A.C., Ribeiro-Neto,
Figueiredo, K., Santos, M., Vellasco, M.M.B.R., L.S., Souza, F.J.(2004).Electric Load Forecasting:
Pacheco, M.A.C. (2005b).Modified Reinforcement Evaluating the Novel Hierarchical Neuro-Fuzzy BSP
Learning-Hierarchical Neuro-Fuzzy Politree Model for Model, International Journal of Electrical Power &
Control of Autonomous Agents, International Journal Energy Systems, 26(2),131-142.
of Simulation Systems, Science & Technology,6(10-
Vuorimaa, P.(1994).Fuzzy self-organizing map, Fuzzy
11),4-13.
Sets and Systems, 66(2),223-231.
Gonalves, L. B., Vellasco, M.M.B.R., Pacheco,
Zhang,J. & Morris,A.J.(1995).Fuzzy neural networks
M.A.C., Souza, F.J. (2006).Inverted Hierarchical
for nonlinear systems modelling, IEE Proc.-Control
Neuro-Fuzzy BSP System: A Novel Neuro-Fuzzy
Theory Appl.142(6),551-561.
Model for Pattern Classification and Rule Extraction
in Databases, IEEE Transactions on Systems, Man &
Cybernetics,PartC,36(2),236-248.
Haykin, S. (1999). Neural Networks - A Comprehensive KEy TERmS
Foundation. Mcmillan College Publishing.
Artificial Neural Networks: Composed of sev-
Jang, J.-S.R., Sun, C.-T., Mizutani, E. (1997). Neuro- eral units called neurons, connected through synaptic
Fuzzy and Soft Computing: A Computational Approach weights, which are iteratively adapted to achieve the
to Learning and Machine Intelligence. Prenctice- desired response. Each neuron performs a weighted sum
Hall. of its inputs, which is then passed through a nonlinear
function that yields the output signal. ANNs have the
Nauck, D. & Kruse, R. (1997).A neuro_fuzzy method
ability to perform a non-linear mapping between their
to learn fuzzy classification rules from data. Fuzzy Sets
inputs and outputs, which is learned by a training
and Systems,88,277-288.
algorithm.
Nauck D. & Kruse, R. (1998).A Neuro-Fuzzy Approach
Bayesian Neural Networks: Multi-layer neural
to Obtain Interpretable Fuzzy Systems for Function
networks that use training algorithms based on statistical
Approximation, IEEE International Conference on
Bayesian inference. BNNs offer a number of important
Fuzzy Systems, 1106-1111.
advantages over the standard Backpropagation learn-
Ross, T.J. (2004).Fuzzy Logic with Engineering Ap- ing algorithm including: confidence intervals can be
plications, John Wiley&Sons. assigned to the predictions generated by a network;
they allow the values of regularization coefficients to
Souza, F.J., Vellasco, M.M.B.R., Pacheco, M.A.C.
be selected using only training data; similarly, they
(2002a).Hierarchical neuro-fuzzy quadtree models,
allow different models to be compared using only the
Fuzzy Sets and Systems 130(2),89-205.
training data dealing with the issue of model complexity
Souza F.J., Vellasco, M.M.B.R., Pacheco, M.A.C. without the need to use cross validation.
(2002b).Load Forecasting with The Hierarchical
Binary Space Partitioning: The space is succes-
Neuro-Fuzzy Binary Space Partitioning Model, Inter-
sively divided in two regions, in a recursive way. This
national Journal of Computers Systems and Signals
partitioning can be represented by a binary tree that
3(2),118-132.
illustrates the successive n-dimensional space sub-di-
Takagi, T. & Sugeno, M. (1985).Fuzzy identifica- visions in two convex subspaces. This process results
tion of systems and its application to modelling and in two new subspaces that can be later partitioned by
control, IEEE Trans. on Systems,Man and Cybernet- the same method.
ics,15(1),116-132.
815
Hierarchical Neuro-Fuzzy Systems Part I
Fuzzy Logic: Can be used to translate, in mathemati- Pattern Recognition: A sub-topic of machine learn-
cal terms, the imprecise information expressed by a set ing, which aims to classify input patterns into a specific
of linguistic IF-THEN rules. Fuzzy Logic studies the class of pre-defined groups. The classification is usually
formal principles of approximate reasoning and is based based on the availability of a set of patterns that have
on Fuzzy Set Theory. It deals with intrinsic imprecision, already been classified. Therefore, the resulting learning
associated with the description of the properties of a strategy is based on supervised learning.
phenomenon, and not with the imprecision associated
Supervised Learning: A machine learning tech-
with the measurement of the phenomenon itself. While
nique for creating a function from training data, which
classical logic is of a bivalent nature (true or false),
consist of pairs of input patterns as well as the desired
fuzzy logic admits multivalence.
outputs. Therefore, the learning process depends on the
Machine Learning: Concerned with the design and existance of a teacher that provides, to each input
development of algorithms and techniques that allow pattern, the real output value. The output of the function
computers to learn. The major focus of machine can be a continuous value (called regression), or a class
learning research is to automatically extract useful label of the input object (called classification).
information from historical data, by computational
and statistical methods.
816
817
Marco Pacheco
PUC-Rio, Brazil
Karla Figueiredo
UERJ, Brazil
Flavio Souza
UERJ, Brazil
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Hierarchical Neuro-Fuzzy Systems Part II
and reinforcement learning models (RL-HNFB by the execution of action a in state s, in accordance
[Figueiredo,2005a], RL-HNFP [Figueiredo,2005b]). with a policy . For further details about RL theory,
The focus of this article is on the second sub-set of see [Sutton,1998].
models. These models are described in the following The linguistic interpretation of the mapping imple-
sections. mented by the RL-NFP cell depicted in Figure 1(a) is
given by the following set of rules:
818
Hierarchical Neuro-Fuzzy Systems Part II
(a) (b)
819
Hierarchical Neuro-Fuzzy Systems Part II
The global reinforcement is calculated by equation The stopping criterion is achieved when the dif-
(3) below: ference between the velocity and the position value in
relation to the objective (x=0 and v=0) is smaller than
If (x>0 and v<0) or (x<0 and v>0) 5% of the universe of discourse of the position and
velocity inputs.
R global = k1e (-|distanc e_objective|) + k 2 e (|velocity|) Table 1 shows the average of the results obtained
(3)
in 5 experiments for each configuration. The columns
position and velocity limits refer to the limits imposed
Else Rglobal = 0
to the (position and velocity) state variables during
learning and testing. The actions used in these experi-
The evaluation function increases as the car gets
ments are: F1={-150,-75,-50,-30,-20,-10,-5,0,5,10,20,
closer to the centre of the environment with velocity
30,50,75,150}. The size of the structure column shows
zero. The k1 and k2 coefficients are constants greater
the average of the number of cells at the end of each
than 1 used for adapting the reinforcement values to
experiment and the last column shows the average steps
the models structure. The values used for time unit
during the learning phase. The number of cycles was
and mass were t=0.02 and m=2.0.
Table 1. Results of the RL-HNFB and RL-HNFP models applied to the cart-centering problem
Average
Position Velocity Size of the steps
No.
Limits Limits Structure learning
phase
RL-
|10| |10| 195 cells 424
HNFB1
RL-
|3| |3| 340 cells 166
HNFB2
RL-
|10| |10| 140 cells 221
HNFP1
RL-
|3| |3| 251 cells 145
HNFP2
Table 2. Testing results of the proposed models applied to the cart-centering problem
Initial Position
Configuration | 3| |2| |1|
Average number of steps
RL-HNFB1 387 198 141
RL-HNFB2 122 80 110
RL-NFHP1 190 166 99
RL-NFHP2 96 124 68
820
Hierarchical Neuro-Fuzzy Systems Part II
fixed at 1000. At each cycle, the cars starting points one behind. Its actions are executed via 2 independent
were x=-3 or x=3. motors with power varying between -20 and 20, one in H
As can be observed from Table 1, the RL-HNFP the right side and the other in the left side.
structure is smaller because each cell receives both The RL-HNFP model was trained in one environ-
input variables, while in the case of the RL-HNFB ment with a big central square obstacle, called environ-
model, a different input variable is applied at each ment I (Figure 2). It was tested in two environments:
level of the BSP tree. the same environment I (with a big central square
Table 2 presents the results obtained for one of the obstacle) but with different initial positions; and
5 experiments carried out at each configuration shown another environment with four additional obstacles.
in Table 1 when the car starts out at points (-2, -1, 1, 2) These new multi-obstacle environment, (environment
which were not used in the learning phase. II), comprises: big central, small top-left, bottom-right
In the first configuration of each model, the results and left obstacles (see Figure 3).
show that the broader the position and velocity limits In all experiments using both environments the
are (in this case equal to |10|), the more difficult it is to robots objective was to reach position (5, 5). In figures
learn. In these cases a small oscillation occurs around 2 and 3 the white circles indicate the initial positions
the central point. What actually happens is that the final of the robot while the gray circles show their final
velocity is very small but, after some time, it tends to positions.
move the car out of the convergence area, resulting in
a peak of velocity in the opposite direction to correct
the cars position. In the second configurations, fewer
oscillations occur because the position and velocity
limits were lowered to |3|.
Figure 2. Tests in environments I
Khepera Robot
821
Hierarchical Neuro-Fuzzy Systems Part II
822
Hierarchical Neuro-Fuzzy Systems Part II
BSP System: A Novel Neuro-Fuzzy Model for Pattern Vellasco,M.M.B.R., Pacheco,M.A.C., Ribeiro Neto,L.
Classification and Rule Extraction in Databases, IEEE S., Souza,F.J.(2004).Electric Load Forecasting: Evalu- H
Transactions on Systems, Man & Cybernetics, Part C: ating the Novel Hierarchical Neuro-Fuzzy BSP Model,
Applications and Review, Vol.36,No.2,pp.236-248. International Journal of Electrical Power & Energy
Systems, Vol.26,No.2,pp.131-142, Elsevier Science
Figueiredo,K, Vellasco,M.M.B.R., Pacheco,M.
Ltd, February.
A.C.(2005a).Hierarchical Neuro-Fuzzy Models based
on Reinforcement Learning for Intelligent Agents, Vuorimaa,P.(1994).Fuzzy self-organizing map, Fuzzy
Lecture Notes in Computer Science - Computa- Sets and Systems,vol.66,no. 2,pp.223-231.
tional Intelligence and Bioinspired Systems, Volume
3512,Springer,pp.424-431.
Figueiredo, K., Santos, M., Vellasco,M.M.B.R., KEy TERmS
Pacheco,M.A.C.(2005b).Modified Reinforcement
Learning-Hierarchical Neuro-Fuzzy Politree Model Binary Space Partitioning: In this type of par-
for Control of Autonomous Agents, special issue on titioning, the space is successively divided in two
Soft Computing for Modeling and Simulation of the regions, in a recursive way. This partitioning can be
International Journal of Simulation Systems, Science represented by a binary tree that illustrates the succes-
& Technology, UK,Vol. 6, Nos.10 and 11,pp.4-13. sive n-dimensional space sub-divisions in two convex
subspaces. The construction of this partitioning tree
Jang, J.-S.R., Sun, C.-T., Mizutani, E. (1997). Neuro-
(BSP tree) is a process in which a subspace is divided
Fuzzy and Soft Computing: A Computational Approach
by a hyper-plan parallel to the co-ordinates axes. This
to Learning and Machine Intelligence. Prenctice-
process results in two new subspaces that can be later
Hall.
partitioned by the same method.
Jouffe,L.(1998).Fuzzy Inference System Learning by
Fuzzy Inference Systems: Fuzzy inference is the
Reinforcement Methods, IEEE Trans. on SMCs-C
process of mapping from a given input to an output using
28/3,pp.338-355.
fuzzy logic. The mapping then provides a basis from
Koza,J.R.(1992).Genetic Programming: On the pro- which decisions can be made, or patterns discerned.
gramming of computers by means of natural selection, Fuzzy inference systems have been successfully applied
Cambridge,MA, MIT Press,(1992). in fields such as automatic control, data classification,
decision analysis.
Kruse,R. and Nauck,D.(1995).NEFCLASS-A neuro-
fuzzy approach for the classification of data, Proc. of Machine Learning: Concerned with the design and
ACM Symposium on Applied Computing, Nashville,pp. development of algorithms and techniques that allow
461-465. computers to learn. The major focus of machine
learning research is to automatically extract useful
Satoh,H.(2006).A State Space Compression Method
information from historical data, by computational
Based on Multivariate Analysis for Reinforcement
and statistical methods.
Learning in High-Dimensional Continuous State
Spaces, IEICE Transactions on Fundamentals of Politree Partitioning: The Politree partitioning
Electronics, Communications and Computer Sciences was inspired by the quadtree structure, which has been
E89-A(8): pp.2181-2191. widely used in the area of images manipulation and
compression. In the politree partitioning the subdivision
Sutton,R.S. and Barto,A.G.(1998).Reinforcement
of the n-dimensional space is accomplished by m=2n
Learning: An Introduction,MIT Press.
subdivision. The Politree partitioning can be represented
Souza,F.J., Vellasco,M.M.B.R., Pacheco,M.A.C.(2002). by a tree structure where each node is subdivided in m
Load Forecasting with The Hierarchical Neuro-Fuzzy leafs (Politree partitioning).
Binary Space Partitioning Model, International Jour-
Quadtree Partitioning: In this type of partitioning,
nal of Computers Systems and Signals, South Africa
the space is successively divided in four regions, in a
Vol.3,No.2, pp.118-132.
recursive way. This partitioning can be represented
823
Hierarchical Neuro-Fuzzy Systems Part II
by a quaternary tree that illustrates the successive n- the world to the actions the agent ought to take in those
dimensional space sub-divisions in four convex sub- states. Differently from supervised learning, in this case
spaces. The construction of this partitioning tree (Quad there is no target value for each input pattern, only a
tree) is a process in which a subspace is divided by a reward based of how good or bad was the action taken
two hyper-plan parallel to the co-ordinates axes. This by the agent in the existant environment.
process results in four new subspaces that can be later
Sarsa: It is a variation of the Q-learning (Reinforce-
partitioned by the same method. The limitation of the
ment Learning) algorithm based on model-free action
Quadtree partitioning (fixed or adaptive) is in the fact
policy estimation. SARSA admits that the actions are
that it works only in two-dimensional spaces.
chosen randomly with a predefined probability.
Reinforcement Learning: A sub-area of machine
WoLF:(Win or Learn Fast) is a method by [Bowl-
learning concerned with how an agent ought to take
ing,2002] for changing the learning rate to encourage
actions in an environment so as to maximize some
convergence in a multi-agents reinforcement learning
notion of long-term reward. Reinforcement learning
scenario.
algorithms attempt to find a policy that maps states of
824
825
Michael Littman
Rutgers University, USA
INTRODUCTION BACKGROUND
Reinforcement learning (RL) deals with the problem of In this section, we will introduce the MDP formalism,
an agent that has to learn how to behave to maximize its where most of the research in standard RL has been
utility by its interactions with an environment (Sutton done. We will then mention the two main approaches
& Barto, 1998; Kaelbling, Littman & Moore, 1996). used for learning MDPs: model-based and model-free
Reinforcement learning problems are usually formal- RL. Finally, we will introduce two formalisms that
ized as Markov Decision Processes (MDP), which extend MDPs and are widely used in the Hierarchical
consist of a finite set of states and a finite number of RL field: semi-Markov Decision Processes (SMDPs)
possible actions that the agent can perform. At any given and Factored MDPs.
point in time, the agent is in a certain state and picks
an action. It can then observe the new state this action Markov Decision Processes (MDPs)
leads to, and receives a reward signal. The goal of the
agent is to maximize its long-term reward. A Markov Decision Process consists of:
In this standard formalization, no particular structure
or relationship between states is assumed. However, a set of states S
learning in environments with extremely large state a set of actions A
spaces is infeasible without some form of generaliza- a transition probability function: Pr(s | s, a),
tion. Exploiting the underlying structure of a problem representing the probability of the environment
can effect generalization and has long been recognized transitioning to state s when the agent performs
as an important aspect in representing sequential deci- action a from state s. It is sometimes notated T(s,
sion tasks (Boutilier et al., 1999). a, s).
Hierarchical Reinforcement Learning is the subfield a reward function: E[r | s, a], representing the
of RL that deals with the discovery and/or exploitation expected immediate reward obtained by taking
of this underlying structure. Two main ideas come into action a from state s.
play in hierarchical RL. The first one is to break a task a discount factor (0, 1], that downweights
into a hierarchy of smaller subtasks, each of which can future rewards and whose precise role will be
be learned faster and easier than the whole problem. clearer in the following equations.
Subtasks can also be performed multiple times in the
course of achieving the larger task, reusing accumulated A deterministic policy :S -> A is a function that
knowledge and skills. The second idea is to use state determines, for each state, what action to take. For
abstraction within subtasks: not every task needs to be any given policy , we can define a value function V,
concerned with every aspect of the state space, so some representing the expected infinite-horizon discounted
states can actually be abstracted away and treated as return to be obtained from following such a policy
the same for the purpose of the given subtask. starting at state s:
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Hierarchical Reinforcement Learning
Bellman (1957) provides a recursive way of de- computation to obtain the Q function and the corre-
termining the value function when the reward and sponding optimal policy.
transition probabilities of an MDP are known, called Model-based methods make more efficient use of
the Bellman equation: experience, and thus require less data, but they involve
an extra planning step to compute the value function,
V(s) = R(s, (s)) + sS T(s, (s), s) V(s), which can be computationally expensive. Some well-
known algorithms can be found in the literature (Sutton,
commonly rewritten as an action-value function or 1990; Moore & Atkeson, 1993; Kearns & Singh, 1998;
Q-function: and Brafman & Tennenholtz, 2002).
Algorithms for reinforcement learning in MDP
Q(s,a) = R(s, a) + sS T(s, a, s) V(s). environments suffers from what is known as the curse
of dimensionality: an exponential explosion in the total
An optimal policy *(s) is a policy that returns the number of states as a function of the number of state
action a that maximizes the value function: variables. To cope with this problem, hierarchical
methods try to break down the intractable state space
*(s) = argmaxa Q*(s,a) into smaller pieces, which can be learned independently
and reused as needed. To achieve this goal, changes
States can be represented as a set of state variables need to be introduced to the standard MDP formal-
or factors, representing different features of the envi- ism. In the introduction we mentioned the two main
ronment: s = <f1, f2, f3, , fn>. ideas behind hierarchical RL: task decomposition and
state abstraction. Task decomposition implies that the
Learning in Markov Decision Processes agent will not only be performing single-step actions,
(mDPs) but also full subtasks which can be extended in time.
Semi-Markov Decision Processes (SMDPs) will let
The reinforcement-learning problem consists of de- us represent these extended actions. State abstraction
termining or approximating an optimal policy through means that, in certain contexts, certain aspects of the
repeated interactions with the environment (i.e., based state space will be ignored, and states will be grouped
on a sample of experiences of the form <state action together. Factored-state representations is one way of
next state reward>). dealing with this. The following section introduces these
There are three main approaches to learning such two common formalisms used in the HRL literature.
an optimal or near-optimal policy:
Beyond MDPs: SMDPs and
Policy-search methods: learn a policy directly Factored-State Representations
via evaluation in the environment.
Model-free (or direct) methods: learn the policy Well consider the limitations of the standard MDP
by directly approximating the Q function with formalism by means of an illustrating example. Imagine
updates from direct experience. an agent whose task is to exit a multi-storyed office
Model-based (or indirect) methods: first learn building. The starting position of the agent is a certain
the transition probability and reward functions, office in a certain floor, and the goal is to reach the
and use those to compute the Q function by means front door at ground level. To complete the task, the
of , for example, the Bellman equations. agent has to first exit the room, find its way through
the hallways to the elevator, take the elevator to the
Model-free algorithms are sometimes referred to as ground floor, and finally find its way from the elevator
the Q-learning family of algorithms. See Sutton (1988) to the exit. We would like to be able to reason in terms
or Watkins (1989) for the first best-known examples. of subtasks (e.g., exit room, go to elevator, go
It is known that model-free methods make inefficient to floor X, etc.), each of them of different durations
use of experience, but they do not require expensive and levels of abstraction, each encompassing a series of
826
Hierarchical Reinforcement Learning
lower-level or primitive actions. Each of these subtasks For learning algorithms in factored-state MDPs, see
is also concerned with only certain aspects of the full Kearns & Koller (1999) and Guestrin et al. (2002). H
state space: while the agent is inside the room, and the
current task is to exit it, the floor the elevator is on, or
whether the front door of the building is open or closed, HIERARCHICAL
is irrelevant. However, these features will become REINFORCEmENT-LEARNING
crucial later as the agents subtask changes. mETHODS
Under the MDP formalization, time is represented
as a discrete step of unitary and constant duration. This Different approaches and goals can be identified within
formulation does not allow the representation of tempo- the hierarchical reinforcement-learning subfield. Some
rally extended actions of varying durations, amenable algorithms are concerned with learning a hierarchical
to represent the kind of higher-level actions identified view of either the environment or the task at hand,
in the example. The formalism of semi-Markov Deci- while others are just concerned with exploiting this
sion Processes (SMDPs) enables this representation knowledge when provided as input. Some techniques
(Puterman, 1994). In SMDPs, the transition function try to learn or exploit temporally extended actions,
is altered to represent the probability that action a from abstracting together a set of actions that lead to the
state s will lead to next state s after t timesteps: completion of a subtask or subgoal. Other methods
try to abstract together different states, treating them
Pr(s, t | s, a) as if they were equal from the point of view of the
learning problem.
The corresponding value function is now: We will briefly review a set of algorithms that use
some combination of these approaches. We will also
V(s) = R(s, (s)) + sS t Pr(s, t | s, a) V(s) identify which of these methods are based on the model-
free learning paradigm as opposed to those that try to
SMDPs also enable the representation of continu- construct a model of the environment.
ous time. For dynamic programming algorithms for
solving SMDPs, see Puterman (1994) and Mahadevan Options: Learning Temporally Extended
et al., (1997).
Actions in the SMDP Framework
Factored-state MDPs deal with the fact that certain
aspects of the state space are irrelevant for certain
Options make use of the SMDP framework to allow the
actions. In factored-state MDPs, state variables are
agent to group together a series of actions (an options
decomposed into independently specified components,
policy) that lead to a certain state or set of states identified
and transition probabilities are defined as a product
as subgoals. For each option, a set of valid start states
of factor probabilities. A common way of represent-
is also identified, where the agent can decide whether
ing independence relations between state variables
to perform a single-step primitive action, or to make
is through Dynamic Bayes Networks (DBNs). As an
use of the option. We can think of options as pre-stored
example, imagine that the state is represented by four
policies for performing abstract subtasks.
state variables: s = <f1, f2, f3, f4>, and we know that for
A learning algorithm for options is described by
action a the value of variable f1 in the next state only
Sutton, Precup & Singh (1999) and belongs to the
depends on the prior values of f1 and f4, f2 depends on
model-free Q-learning family. In its current formulation,
f2 and f3, and the others only depend on their own prior
the options framework allows for two-level hierarchies
value. This transition probability in a Factored MDP
of tasks, although they could potentially be general-
would be represented as:
ized to multiple levels. End states (i.e., subgoals) are
given as input to the algorithm. There is work devoted
Pr(s | s, a) = Pr(f1 | f1 f4, a) Pr(f2 | f2 f3, a) Pr(f3 | f3,
to discovering these subgoals and constructing useful
a) Pr(f4 | f4, a)
options from them (imek et al., 2005; and Jong &
Stone, 2005).
827
Hierarchical Reinforcement Learning
While options have been shown to improve the hierarchy of smaller interlinked MDPs. HEXQ is model-
learning time of model-free algorithms, it is not clear free and based on Q-learning (Hengst, 2002).
that there is an advantage in terms of learning time HEXQ shows a promising method for discovering
over model-based methods. As any model-free method, abstractions and hierarchies, but still suffers from a
though, they do not suffer from the computational cost lack of any theoretical bounds or proofs. All the work
involved in the planning step. It is still an open question using HEXQ has been empirical, and its general power
whether options can be generalized to multiple-level still remains an open question.
hierarchies, and most of the work is empirical, with no
theoretical bounds. HAM-PHAM: Restricting the Class of
Possible Policies
MaxQ: Combining a Hierarchical Task
Decomposition with State Abstraction Hierarchies of Abstract Machines (HAMs) also make
use of the SMDP formalism. The main idea is to re-
MaxQ is also a model-free algorithm in the Q-learning strict the class of possible policies by means of small
family. It receives as input a multi-level hierarchical task nondeterministic finite-state machines, which constrain
decomposition, which decomposes the full underlying the sequences of actions that are allowed. Elements in
MDP into an additive combination of smaller MDPs. HAMs can be thought of as small programs, which at
Within each task, abstraction is used so that state certain points can decide to make calls to other lower-
variables that are irrelevant for the task are ignored level programs (Parr & Russell, 1997; andParr, 1998).
(Dietterich, 2000). See also Programmable HAMs (PHAMs), an extension
The main drawback of MaxQ is that the hierarchy by Andre & Russell (2000).
and abstraction have to be provided as input, and in HAM provides an interesting approach to make
its model-free form it misses opportunities for faster learning and planning easier, but has also only been
learning. shown to work better in certain empirical examples.
828
Hierarchical Reinforcement Learning
achieved mixed results, with algorithms and frameworks Jong, N & Stone, P. (2005) State Abstraction Disco-
focusing on just one or two combinations of the dif- very from Irrelevant State Variables. Proceedings of H
ferent aspects of the problem. A single approach that the 19th International Joint Conference on Artificial
can deal with structure discovery and its use, with both Intelligence.
temporal and state abstraction, and that can provably
Kaelbling, L. P.; Littman, M. L., & Moore, A. W.
learn and plan in polynomial time is still the main item
(1996). Reinforcement Learning: A Survey. Journal
in the research agenda of the field.
of Artificial Intelligence Research. (4) 237-285.
Kearns, M. & Singh, S. (1998). Near-Optimal Rein-
REFERENCES forcement Learning in Polynomial Time. Proceedings
of the 15th International Conference on Machine
Andre, D. & Russell. S. J. (2000). Programmable Learning.
reinforcement learning agents. Advances in Neural
Kearns, M. J., & Koller, D. (1999). Efficient reinfor-
Information Processing Systems (NIPS).
cement learning in factored MDPs. In Proceedings of
Barto, A.G. & Mahadevan, S. (2003). Recent Advances the 16th International Joint Conference on Artificial
in Hierarchical Reinforcement Learning. Special Issue Intelligence (IJCAI), 740747.
on Reinforcement Learning, Discrete Event Systems
Mahadevan, S., Marchalleck, N., Das, T. & Gosavi, A.
Journal. (13) 41-77.
(1997). Self-improving factory simulation using con-
Bellman, R. (1957). Dynamic Programming. Princeton tinuous-time average-reward reinforcement learning.
University Press. Proceedings of the 14th International Conference on
Machine Learning.
Boutilier, C.; Dean, T.; & Hanks, S. (1999) Decision-
theoretic planning: Structural assumptions and com- Moore, A. & Atkeson, Ch. (1993). Prioritized swee-
putational leverage. Journal of Artificial Intelligence ping: Reinforcement learning with less data and less
Research. (11) 1-94. real time. Machine Learning.
Brafman, R. & Tennenholtz, M. (2002). R-MAX a Parr, R. & Russell, S. (1997). Reinforcement learning
general polynomial time algorithm for near-optimal with hierarchies of machines. Proceedings of Advances
reinforcement learning. Journal of Machine Learning in Neural Information Processing Systems 10.
Research.
Parr, R. (1998). Hierarchical Control and learning for
Dietterich, T.G. (2000). Hierarchical reinforcement Markov decision processes. PhD thesis, University of
learning with the MAXQ value function decomposi- California at Berkeley.
tion. Journal of Artificial Intelligence Research (13)
Puterman, M. L. (1994). Markov Decision Problems.
227303.
Wiley, New York.
Diuk, C.; Strehl, A. & Littman, M.L. (2006). A Hier-
imek, , Wolfe, A.P. & Barto, A. (2005). Identifying
archical Approach to Efficient Reinforcement Learn-
useful subgoals in reinforcement learning by local graph
ing in Deterministic Domains. Proceedings of the Fifth
International Joint Conference on Autonomous Agents and partitioning. Proceedings of the 22nd International
Multiagent Systems (AAMAS). Conference on Machine Learning
Guestrin, C.; Patrascu, R.; & Schuurmans, D. (2002). Sutton, R. S. (1988). Learning to predict by the method
Algorithmdirected exploration for model-based rein- of temporal differences. Machine Learning.
forcement learning in factored MDPs. Proceedings of Sutton, R. S. (1990). Integrated architectures for learn-
the International Conference on Machine Learning, ing, planning and reacting based on approximating
235242. dynamic programming. Proceedings of the 7th Inter-
Hengst, B. (2002). Discovering hierarchy in reinfor- national Conference on Machine Learning.
cement learning with hexq. Proceedings of the 19th Sutton, R. S., & Barto, A. G. (1998). Reinforcement
International Conference on Machine Learning. Learning: An Introduction. The MIT Press.
829
Hierarchical Reinforcement Learning
Sutton, R.; Precup, D. & Singh, S. (1999) Between Hierarchical Task Decomposition: A decomposi-
MDPs and semi-MDPs: A framework for temporal tion of a task into a hierarchy of smaller subtasks.
abstraction in reinforcement learning. Artificial Intel-
Markov Decision Process: The most common
ligence
formalism for environments used in reinforcement
Watkins, C. (1989). Learning from Delayed Rewards. learning, where the problem is described in terms of
PhD Thesis. a finite set of states, a finite set of actions, transition
probabilities between states, a reward signal and a
discount factor.
KEy TERmS Reinforcement Learning: The problem faced by
an agent that learns to a utility measure behavior from
Factored-State Markov Decision Process: An its interaction with the environment.
extension to the MDP formalism used in Hierarchical
Semi-Markov Decision Process: An extension to
RL where the transition probability is defined in terms
the MDP formalism that deals with temporally extended
of factors, allowing the representation to ignore certain
actions and/or continuous time.
state variables under certain contexts.
State-Space Generalization: The technique of
Hierarchical Reinforcement Learning: A subfield
grouping together states in the underlying MDP and
of reinforcement learning concerned with the discovery
treating them as equivalent for certain purposes.
and use of task decomposition, hierarchical control,
temporal and state abstraction (Barto & Mahadevan,
2003).
830
831
Ahcene Farah
Ajman University, UAE
Hamid Bessalah
Center de Dveloppement des Technologies Avances (CDTA), Algrie
Ahmed Bouridene
Queens University of Belfast, UK
Nassim Chikhi
Center de Dveloppement des Technologies Avances (CDTA), Algrie
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
High Level Design Approach for FPGA Implementation of ANNs
bridge between the synthesis tool and the ANN design D iL = f ' (u iL )(d i y i ) (3)
requirements. This method offers a high flexibility in D iL = f ' (u iL )(d i Nl y i )
the design while achieving speed/area performances
D lj 1 = f ' (u lj1 ) N l wij D il 1 i N l , 1 l L
constraints. The three implementation figures of the
D lj 1 = f ' (u lj1 ) i =1 wij D il 1 i N l
i =1
, 1 l L (4)
ANN based back propagation algorithm are considered.
These are the off-type implementation, the on-chip
global implementation and the dynamic reconfiguration where, di is the desired output f the derivative func-
choices of the ANN. tion of f
To achieve our goal, a design for reuse strategy The Weight update step computes the weights up-
has been applied. To validate our approach, three case dates according to:
studies are considered using the Virtex-II and Virtex-4
FPGA devices. A comparative study is done and new wijl (t + 1) = wijl (t ) + wijl (t )
(5)
conclusions are given. ww
l l
ij (ijt(+
l l
HDwilijy(ljt)1 + wij (t )
t )1=) =
wijl (t ) = HD il y lj1
(6)
BACKGROUND where, is the learning factor, w the variation of
weights and l, the indices of the layers.
In this section, theoretical presentation of the multilayer
perceptron (MLP) based back propagation algorithm
Background on ANN Frameworks
is given. Then, discussion of the most related works to
the topics of high level design methodology and ANNs
The most related works to ANNs frameworks are pre-
frameworks are given.
sented by (F. Schurmann & all, 2002), (M. Diepenhorst
& all, 1999), and (J. Zhu & all, 1999).
Theoretical Background of the Back In the other hand, and with the increasing complexity
Propagation Algorithm of FPGAs circuits, Core-based synthesis methodol-
ogy is proposed as a new trend for efficient hardware
The back propagation is one of the well known algo- implementation of FPGAs. In these tools a library
rithms that are used to train the MLP ANN network of pre-designed IPs Intellectual Property cores are
in a supervised mode. The MLP is executed in three proposed. An example can be found in (Xilinx Core
phases: the feed forward phase, the error calculation Generator reference) and (Opencores reference).
phase and the synaptic weight updating phase (Free- In the core based design methodology, efficient
man, J. A. and Skapura, D. M, 1991). reuse is derived from the parameterized design with
In the feed forward phase, a pattern xi is applied VHDL and its many flexible constructs and charac-
to the input layer and the resulting signal is forward teristics (i.e. abstraction, encapsulation, inheritance
propagated through the network until the final outputs and reuse through attributes, package, procedures and
have been calculated; for each i (index of neuron) and functions). Beside this, the reuse concept is well suited
j (index of layer) for high regular and repetitive structures such as neural
n0
networks. However although all these advantages,
M [ji ] = W [ ]x
j
i
i
(1) seldom attention has been done to apply design for
reuse for ANNs.
i =1
832
High Level Design Approach for FPGA Implementation of ANNs
The design must be reconfigurable to meet the implementation, the global on chip implementation
requirement of many different applications. and implementation using run time reconfiguration H
The design must use standard interfaces. (RTR). Thus a Core is generated for each type of
The code must be synthesizable at the RTL implementation.
level. At this level, the user/designer can fix the parameters
The design must be verified. of the network, i.e. the number of neurons in each layer,
The design must have robust scripts and must be synaptic values, multiplier type, data representation and
well documented. precision. At the low level all the IP Cores that construct
the neuron are generated automatically from the library
of the synthesis tool which is in our case MENTOR
PRESENTATION OF THE PROPOSED GRAPHICS (Mentor Graphics user guide reference),
DESIGN APPROACH and which also integrates the Xilinx IP Core Generator.
In addition, for each IP Core, a graphical interface is
The proposed design approach is shown in Fig.1 as a generated to fix its parameters. Thus, the user/designer
process of flow. In this figure, the methodology used can change the network performances architecture by
is based on a top down design approach in which changing the IP cores that are stoked in the library.
the designer/user is guided step by step in the design Then a VHDL code at the register transfer level (RTL)
process of the ANN. is generated for synthesis. Before, functional simulation
First, the user is asked to select the dimension is required. The result is a file netlist ready for place
of the network. The next step involves selection of and rout followed by final FPGA prototyping on a
ANN implementation choices; these are the off chip board. Documentation is available at each level of the
The BP Graphical
user interface
ANN dimension
On chip implementation
No
Dynamic
reconfiguration
yes yes
Off chip training No
R eusable IP C ores
Define ANN parameters
FPGA
prototyping
833
High Level Design Approach for FPGA Implementation of ANNs
design process and the code is well commented. Thus, equation (2). As shown in Fig. 4, the hardware model
the design for reuse requirements is applied through of the neuron is mainly based on a:
the design process. In what follow, presentation of each
implementation type is given. Memory circuit where the final values of the
synaptic weights are stocked,
The Feed Forward Off-Chip A multiply circuit (MULT) which computes the
Implementation product of the stored synaptic weights with inputs
data
Fig. 2 shows a top view of the feed forward core which An accumulator circuit (ACUM) which computes
is composed of a data path module and a control mod- the sum of the above products
ule. At the top level these two modules are represented A circuit that approximates the activation function
by black boxes and only the neural network inputs and (example linear function or sigmoid function)
outputs signals are shown to the user/designer. A multiplexer circuit (MUX) in the case of serial
By clicking inside the boxes, we can get access to transfer between inputs in the same neuron
the network architecture which is composed of three
layers represented by black boxes as shown in Fig. 3 The neural network architecture has the following
(left side). By clicking inside each box, we can get properties:
access to the layer architecture which is composed of
black boxes representing the neurons as shown in Fig. Computation between layers is done serially
3 (right side); and by clicking inside each neurons box For the same layer, neurons are computed in
we can get access to the neuron hardware architecture parallel
as shown in Fig 4. For the same neuron, only one multiplier and one
Each neuron implements the accumulated weight accumulator (MULT +ACUM=MAC) are used
sum of equation (1) and the activation function of to compute the product sum.
Figure 2.The feed forward core module using the mentor graphics design tool
Synaptic weights
Outputs
Inputs
selectF
clk
wi[ j ]
Feed Forward
load
load1
load2
read
read1
read2
add_sub
addr
write
write1
write2
sel
sel1
sel2
reset
reset1
reset2
834
High Level Design Approach for FPGA Implementation of ANNs
N eurone 1
N eurone 1
N eurone_n
W ji[l ]
Memory
Activation Output
ACUM
MULT function
X1 X2...Xn
circuit
MUX
Inputs
Each multiplier is connected to a memory. The Each circuit that constructs the neuron is an IP core
depth of each memory is equal to the number of Intellectual Property that can be generated from the
neurons constituting the layer Xilinx Core Generator.
The whole network is controlled by a control unit The feed forward control module is composed
module. of three phases: control of the neuron, control of the
835
High Level Design Approach for FPGA Implementation of ANNs
layer and control of the network. Considering the posed architecture which is composed of a feed forward
fact that neurons work in parallel, so control of the module, an Error-calculation module and an Update
layer is similar to the control of the neuron plus the module. The set of the three modules is controlled
multiplexers control. Control of the neuron is divided by a global control unit. The feed forward module
into four phases: start, initialization, synaptic multipli- computes equations (1) and (2). The Error module
cation/accumulation and storage of the weighted sum. computes equations (3) and (4) and the Update module
The first state diagram of the feed forward control computes equations (5) and (6). Each module exhibits
module which was designed, was based on the Moore a high degree of regularity of the structure, modularity
machine in which the system vary only when its state and repeatability which make the whole ANN a good
change. The drawback of this machine is that it is not candidate for the application of the design for reuse
generic. For example, (load=0, reset=0) allows the ac- concept. As in the off-chip implementation case, first
cumulator to add a value present at the input register. the unit control unit has been done using a Moore
This accumulated value must be done as many times machine that integrates control of the three modules:
as the number of neurons in the previous layer. Thus, feed forward, error and update modules. In order to
if we change the number of neurons from one layer to achieve reuse, we have replaced the Moore machine
another one, we have to change all the flow state of the by a Mealy machine. Thus, the size of the network can
control module. To overcome this problem, the Moor be modified by simple copy/past or remove operations
machine is replaced by the Mealy machine in which of the boxes.
we add a counter program with a generic value M and
a transition variable Max such that: The Run Time Reconfiguration Strategy
ifoutput _ counter = M Max = 1
Our strategy for run time reconfiguration follows the
else Max = 0
following steps: first the feed forward and the global
control modules are configured. The results are stored
where the value of M is done equal to the number of in the Bus macro module of the Virtex FPGA device.
neuron. In the next step, the feed forward module is reset
By using this strategy, we obtain an architecture from the FPGA and the Update and Error modules are
that has two important key features: generecity and configured. The generated results are stored in the Bus
flexibility. Generecity is related to the data word size, macro modules and the same procedure is applied to
precision, and memory depth which are kept as generic the next training example of the ANN. A more detailed
parameters at the top level of the VHDL description. description is given in (N. Izeboudjen and all, 2007).
The flexibility feature is related to the size of the
network (the number of neurons in each layer), thus Performance Evaluation
it is possible to add neurons by simple copy/past of
the neurons boxes or cores and it is also possible to In this section, we discuss the performance of the three
remove them by simple cut operation of the boxes. It implementation figures of the back propagation algo-
is also possible to use other IP cores from the library rithm. The parameters to be considered are the number
(example replace parallel MULT with pipeline MULT) of configurable logic blocs (CLB), the time response
to change the performances of the network without (TR) and the number of Million connexions per second
changing the VHDL code. Thus, the design for reuse (MCPS). A comparison of these parameters is done
concept is applied. between the Virtex-II and Virtex-4 families. Functional
simulation is achieved using ModelSim simulator
The Direct On-Chip Implementation (ModelSim user guide reference). The RTL synthesis
Strategy is achieved using the Mentor graphics synthesis tool
(Mentor Graphics synthesis tool user guide reference)
In this section, we propose the equivalent architecture and for final implementation, the ISE foundation place
for implementation of the three successive phases of and rout (8.2) tool is used (ISE foundation user guide
the back propagation algorithm. Fig.5 depicts the pro- reference).
836
High Level Design Approach for FPGA Implementation of ANNs
load1
load2
reset1
reset2
write1
write2
addr1
addr2
read1
read2
addr
write
W3_1
W1
sel1
sel2
load
read
sel
reset
W3_3
W2_3
W1_3
W3_2
W3_2
W3
W2
X1
X2
X3
Feed_forward
selF
W13_3
W23_3
W33_3
W12_2
W22_2
W32_2
W31_1
W21_1
W11_1
clk
W12
W22
W32
W13
W23
W33
W11
W21
W31
O12
O22
O32
O13
O23
O33
O11
O21
O31
x11
x21
x22
x32
x13
x23
x33
Add_sub
x31
x12
clk
Add_sub
O11
O21
O31
W13_3
W23_3
W33_3
W31_1
W21_1
W11_1
W11
W21
W31
W32
W23
W33
x11
x21
x31
x22
x32
O13
O23
O33
W12_2
W22_2
W32_2
W12
W22
W32
W13
W23
W33
W12
W22
W13
x12
x13
x23
x33
O12
O22
O32
val sel4
val1 sel3
val2
D13_23_33 D13_23_33
sel6 D12_22_32 D12_22_32
sel7 D11_21_31 load4
D11_21_31
Update Error_calculation load3
Addr_lut
load1
load2
read2
reset2
load
sel1
sel2
add_su
read
se
rese
add
sel3
sel4
sel5
sel7
sel6
add_su
rea
writ
write
write
reset
val1
val2
reset
reset
va
load3
load4
l
r
t
l
1
1
2
3
4
b
b
resetU
resetF
clk
C o n tr o l R P G o n ch ip
Our first application is an ANN classifier that is the first family XC2V1000 as well as the XC4VlX15,
used to classify heart coronary diseases. The network and we have tried several families until we fixed the
has been trained off chip using the MATLAB 6.5 tool. XC4VlX80 for Virtex-4 and the XC2V8000 for Virtex-
After training the dimension of the network as well II. Synthesis results show that the XC2V8000 circuit
as the synaptic weight were fixed. The network has a consume 22% in terms of (CLB), the time response
dimension of (1, 8, 1) and the synaptic weights have a TR= 59.5 (ns) while the MCPS=202. Concerning the
data width of 24 bits. For this application we selected XC4VLX80, it consumes 30% in term of (CLB), TR =
the circuits XC2V1000 and XC4VLX15 devices, of 47.93 (ns) and MCPS= 250. From these results we can
Virtex-II and Virtex-4 respectively. Synthesis results conclude that with the Virtex-II family we can gain 8%
show that the XC2V1000 circuit consume 99% in terms of (CLB) in term of area ; this is due to the fact that the
of (CLB), the time response TR = 44.46 (ns) while the Virtex-II integrates more multipliers than the Virtex-4
MCPS=360. Concerning the XC4VLX15, it consumes and in which the MAC component is integrated into the
82% in term of CLB, TR= 26.76 (ns) and MCPS= 597. DSP48 (XC4VlX80 has 80 MAC DSP and XC2V8000
Thus, the XC4VLX15 achieves better performances in has 168 bloc multipliers). But the Virtex-4 circuit is
term of area (gain 19% of CLB in term of area), the faster than the Virtex-II and can achieve more MCPS
speed rate is 1.6 and MCPS rate is 1.6. (rate of ~1.24). The on chip implementation requires a
Our second application is the classical (2, 2, 1) lot of multipliers and this is why, we recommend using
XOR ANN which is used as a benchmark for non-lin- it if the timing constraints are not critical.
early separable problems. The general on chip learning In the third application, three arbitrary networks are
implementation has been applied to the network. It is to used to show the performance of the (RTR) over the
be mentioned that area constraints could not be met for global implementation. These are a (3,3,3) network,
837
High Level Design Approach for FPGA Implementation of ANNs
Through this paper, we have presented a successful Model Sim user guide www.model.com
design approach for FPGA implementation of ANNs. Mentor graphics user guide www.mentor.com
We have applied a design for reuse strategy and
parametric design to achieve our goal. The proposed N. Izeboudjen, A.Farah, H. Bessalah, A. Bouridene,
methodology offers high flexibility because the size of N. Chikhi (2007), Towards a Platform for FPGA
the ANN can be changed by simple copy/remove of the Implementation of the MLP Based back Propagation
neurons cores. In addition the format, data widths and Algorithm IWANN, LNCS, pp. 497-505
precision are considered as generic parameters. Thus,
OpenCores: www.opencores.org
different applications can be targeted in a reduced design
time. As for the three applications, the first conclusion PlanAhead User guide www.xilinx.com
is that the new Virtex-4 FPGA devices achieve faster
Richard P. Lippmann (1984), An Introduction to
networks comparing to Virtex-II; but regarding to the
computing with neural nets , IEEE ASSP Magazine,
area; i.e. number of CLBs, the Virtex-II is better. Thus
pp. 4 -22
in our opinion, the Virtex-II is well suited as a platform
to experiment ANN implementations. This can help to F. Schumann, S. Hofmann, J. Schemmel, K. Meier,
give new directions for future work. (2002), Evolvable Hardware Proceedings NASA/
DoD Conference on Volume, Issue, pp 266 - 273
838
High Level Design Approach for FPGA Implementation of ANNs
Xilinx application notes XAPP290 (2004) Two Flows On-Chip Training: A term that design implementa-
for Partial Reconfiguration: Module Based or Differ- tion the three phases of the back propagation algorithm H
ence Based, pp (1-28) www.xilinx.com. into one or several chips
Off-Chip Training: Training of the network is done
using software tools like MATLAB and only the feed
KEy TERmS forward phase is considered generalisation.
RTL: Acronym of Register Transfer Level
ASIC: Acronym Application Specific Integrated
Circuits Run Time Reconfiguration: A solution that permits
to use the smallest FPGA and to reconfigure it several
CLB: Acronym for Configurable Logic Blocs
times during the processing. Run time reconfiguration
FPGA: Field Programmable Gate Arrays can be partial or global.
High Level Synthesis: A top down design methodol- VHDL: Acronym for Very high speed integrated
ogy that transform an abstract level such as the VHDL circuits Hardware Description Language)
language into a physical implementation level
839
840
Monica Mordonini
Universit degli Studi di Parma, Italy
Luca Mussi
Universit degli Studi di Perugia, Italy
Giovanni Adorni
Universit degli Studi di Genova, Italy
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
HOPS
ous movements allow to track any interesting moving assembled and calibrated, it can be placed and moved
object. Three-dimensional reconstruction based on anywhere (for example in the middle of a room ceil- H
stereo vision is also possible. ing or on a mobile robot) without any need for further
calibrations.
Figure 1 shows the latest two HOPS prototypes. In
HOPS: HyBRID OmNIDIRECTIONAL the one that has been used for the experiments reported
PIN-HOLE SENSOR here, the traditional camera which, in this version, can-
not rotate, has been placed on top and can be pointed
This article is focused on the latest prototype of the downwards with an appropriate fixed tilt angle to obtain
HOPS (Hybrid Omnidirectional-Pinhole Sensor) sen- a high-resolution view of a restricted region close to
sor (Adorni 2001; Adorni 2002; Adorni 2003, Cagnoni the sensor. In the middle, one can see the catadioptric
2007). HOPS is a dual camera vision system that camera consisting of a traditional camera pointing
achieves a high-resolution 360-degrees field of view as upwards to a hyperbolic mirror hanging over it and
well as 3D reconstruction capabilities. The effective- held by a plexiglas cylinder. As can be observed, the
ness of this hybrid sensor derives from the joint use of mirror can be moved up and down to permit optimal
a traditional camera and a central catadioptric camera positioning (Swaminathan 2001; Strelow 2001) during
which both satisfy the single-viewpoint constraint. calibration.
Having two different viewpoints from which the world Moreover, to avoid undesired light reflections on
is observed, the sensor can therefore act as a stereo the internal surface of the Plexiglas cylinder, a black
pair finding effective applications in surveillance and needle has been placed on the mirror apex as suggested
robot navigation. in (Ishiguro 2001). Finally, in the lower part, some
To create a handy versatile system that could meet circuits generate video synchronization signals and
the requirements of the whole vision process in a wide allow for external connections.
variety of applications, HOPS has been designed to The newer version of HOPS (see Figure 1, right)
be considered as a single integrated object: one of the overcomes some limitations of the present one. It
most direct advantages offered by this is that, once it is uses two digital high-resolution Firewire cameras,
Figure 1. The two latest versions of the HOPS sensor: the one used for experiments (left) and the newest version
(right) which is currently being assembled and tested.
841
HOPS
in conjunction with mega-pixel lenses characterized part of the sensor as described by (Benosman 2001).
by a very low TV-distortion, to achieve better image The last, but probably most important, phase of the
quality. Furthermore, in this new version the traditional calibration is aimed at detecting geometric relationships
camera is hung to a stepper motor, controlled via a between the traditional image and the omnidirectional
USB interface, and therefore is able to rotate. This one: once again, a set of known points was used to
time the traditional camera has been placed below the estimate the parameters of the mapping.
catadioptric part: this makes it possible to have no wires Notice that the relationships that were computed
within the field of view of the omnidirectional image between the two views are constant in time because
besides allowing, in surveillance applications, to see of the sensor structure. In this way, once the calibra-
also the blind area of the omnidirectional view due to tion procedure is over, no external fixed references
the reflection of the camera on the mirror. are needed any longer, and one can place the sensor
anywhere and perform stereo vision tasks without
Sensor Calibration needing any further calibration.
Figure 2. Mirror position calibration: the sensor inside the calibration box (left) and the acquired omnidirec-
tional image (right) with the correct grid positions superimposed in white.
842
HOPS
ones super-imposed over it as they should appear if images suitable to be used for IPM. Choosing a virtual
the mirror had been correctly placed (see Figure 2). image plane as the new domain for the IPM, a perspec- H
This is a very cheap method which, however, yields tive reprojection similar to traditional images can be
very good results. obtained from part of those omnidirectional images.
Figure 3 shows a pair of images acquired by HOPS
Joint Camera Calibration and a perspective reconstruction of the omnidirectional
view obtained applying an IPM on the corresponding
To obtain a fully effective stereo system it is essential area seen by the traditional camera. As can be noticed,
to make a joint camera calibration to extract informa- the difference in resolution between the two perspective
tion about the relative positioning of the camera pair. views is considerable. Choosing a horizontal plane as
Usually, the internal reference frame of one of the two reference for the IPM, it is possible to obtain something
cameras is chosen as the global reference for the pair: very similar to an orthographic view of that area, usually
since two different kinds of cameras are available, the referred to as birds eye view. If the floor is used as
simplest choice is to set the omnidirectional cameras reference to perform IPM on both images, it is possible
frame as the global reference for the whole sensor. to extract useful information about objects/obstacles
Using once again the above-mentioned grids of points, surrounding the system (Bertozzi 1998).
images pair (omnidirectional and traditional) of grids
lying on different (parallel) planes with known relative 3D Reconstruction Tests
positions are acquired. Once 3D coordinates of points
positions, referred to the sensor reference frame, have To verify the correctness of the calibration process,
been estimated through the omnidirectional image, an estimation of the positions of points in a three-di-
solving for geometric constraints between points pro- mensional space can be performed along with other
jections in the traditional image permits to estimate the tests. After capturing one image from each of the two
relative position of the traditional camera. views, the points in the test pattern are automatically
To take the standard camera rotation into consider- detected and for each one the light rays from which it
ation, its position has to be described by a more complex was generated are computed based on the projection
transformation than a simply fixed rototranslation: the model obtained during calibration. Since the estimated
geometric and kinematic coupling between the two homologous rays are usually skew lines, the shortest
cameras has to be understood and modeled with more segment joining the two rays can be found and its middle
parameters. Obviously, this requires that images be point used as an estimate of the points 3D position.
taken with the traditional camera in many different In Table 1, results obtained using a 4x3 point
positions. test-pattern with 60 mm between point centers are
After this joint camera calibration, HOPS can be reported. Even if the origin of the sensor reference
used to perform metric measurements on the images, system is physically inaccessible and no high-precision
obtaining three-dimensional data referred to its own instruments were available, this pattern was placed as
global reference frame: this means that no further accurately as possible 390 mm perpendicularly ahead
calibrations are needed to take sensor displacements of the sensor itself (along the y direction in the chosen
into account. reference frame) and centered along the x direction: the
z coordinates of the points in the top row were measured
Perspective Reprojections & Inverse to be equal to 55 mm. This set-up is reflected by the
Perspective Mapping estimated values for the first experiment reported in
Table 1. More relevantly, the mean distance between
One of the opportunities offered by a perspective im- points was estimated to be 59.45 mm with a standard
age is the possibility to apply an Inverse Perspective deviation = 1.14: those values are fully compatible
Mapping (IPM) transformation (Little 1991) to obtain with the resolution available for measuring distances
a different image in which the information content is on the test pattern and with the mirror resolution (also
homogeneously distributed among all pixels. Since cen- limited by image resolution).
tral catadioptric cameras are characterized by a single In a second experiment, a test-pattern with six points
viewpoint, the images acquired by them are perspective spaced by 110 mm, located about 1 m ahead, 0.25 m
843
HOPS
Figure 3. Omnidirectional image (above, left) and traditional image (above, right) acquired by HOPS. Below a
perspective reconstruction of part of the omnidirectional one is shown.
to the right and a bit below the sensor, has been used. using a corner detector capable of sub-pixel accuracy
In the lower part of Table 1 the estimated positions would probably yield better results.
are shown: the estimated mean distance was 109.09
mm with a standard deviation = 8.89. In another test
with the same pattern located 1.3 m ahead, 0.6 m to FUTURE TRENDS
the left and 0.5 m below the sensor (see Figure 4) the
estimated mean distance was of about 102 mm with a A field which nowadays draws great interest is autono-
standard deviation = 9.98. mous vehicle navigation. Even if at the moment there
It should be noticed that, at those distances, taking are still many problems to be solved before seeing
into account image resolution as well as the mirror autonomous public vehicles, industrial applications
profile, the sensor resolution is of the same order of are already possible. Since results in omnidirectional
magnitude as the errors obtained. Furthermore, the visual servoing and ego-motion estimation are also ap-
method used to find the center of circles suffers from plicable to hybrid dual camera systems, and many more
luminance and contrast variations: substituting circles opportunities are offered by the presence of a second
with adjacent alternate black and white squares and high-resolution view, the use of such devices in this field
844
HOPS
Table 1. 3D estimation results: the tables show the estimated positions obtained. The diagrams below them show
the estimated distances between points on the test-pattern. All values are in mm. H
- Experiment 1
- Experiment 2
Figure 4. Omnidirectional image (left) and traditional image (right) acquired for a 3D stereo estimation test
845
HOPS
is desirable. Even if most applications of these systems navigation. Proceedings of the IEEE Workshop on
are related with surveillance, they could be applied Omnidirectional Vision. Madison Wisconsin, 21 June
even more directly to robot-aided human activities, 2003. IEEE Computer Society Press, 78-89.
since robots/vehicles involved in these situations are
Benosman, R. & Kang, S. (2001). Panoramic vision:
less critical and their controllability is easier.
Sensors, theory and applications. Springer-Verlag
New York, Inc.
CONCLUSIONS Bertozzi, M., Broggi, A. & Fascioli, A. (1998). Stereo
inverse perspective mapping: Theory and applications.
The Hybrid Omnidirectional Pin-hole Sensor (HOPS) Image and Vision Computing Journal Elsevier Vol.
dual camera system has been described. Since its joint 16, 585-590.
camera calibration leads to a fully calibrated hybrid
Cagnoni, S., Mordonini, M., Mussi, L. & Adorni, G.
stereo pair from which 3D information can be extracted,
(2007). Hybrid stereo sensor with omnidirectional vi-
HOPS suits several kinds of applications. For example,
sion capabilities: Overview and calibration procedures.
it can be used for surveillance and robot self-localiza-
Proceedings of the 14th International Conference of
tion or obstacle detection, offering the possibility to
Image Analysis and Processing. Modena, 11-13 Sep-
integrate stereo sensing with peripheral/foveal active
tember 2007. IEEE Computer Society Press, 99-104.
vision strategies: once objects or regions of interest
are localized on the wide-range sensor, the traditional Cui, Y., Samarasekera, S., Huang, Q. & Greiffenhagen,
camera can be used to enhance the resolution with M. (1998). Indoor monitoring via the collaboration
which these areas can be analyzed. between a peripheral sensor and a fovea1 sensor. VS:
Tracking of multiple objects/people relying on Proceedings of the 1998 IEEE Workshop on Visual
high-resolution images for recognition and access Surveillance. Bombay, 2 January 1998. IEEE Computer
control or estimating velocity, dimensions and tra- Society Press, Vol.00, 2-9.
jectories are some examples of surveillance tasks for
which HOPS is suitable. Accurate obstacle detection, Ishiguro, H. (2001). Development of low-cost compact
landmark localization, robust ego-motion estimation omnidirectional vision sensors. In R. Benosman & S.
or three-dimensional environment reconstruction are Kang (Eds.), Panoramic vision: Sensors, theory and
other examples of possible applications related to applications Springer-Verlag New York, Inc, 23-28.
(autonomous/holonomous) robot navigation in semi- Kraus, K. (1993). Photogrammetry: Fundamentals and
structured or completely unstructured environments. standard processes (4th ed., Vol. 1). Dmmler.
Some preliminary experiments have been performed
to solve both surveillance and robot navigation with Little, J., Bohrer, S., Mallot, H. & Blthoff, H. (1991).
encouraging results. Inverse perspective mapping simplifies optical flow
computation and obstacle detection. Biological Cyber-
netics Springer-Verlag Vol. 64, 177-185.
REFERENCES Nayar, S. & Boult, T. (1997). Omnidirectional vision
systems: 1998 PI report. Proceedings of the 1997
Adorni, G., Bolognini, L., Cagnoni, S., & Mordonini, DARPA Image Understanding Workshop. New Orleans,
M. (2001). A non-traditional omnidirectional vision 11-14 May 1997. Storming Media, 93-99.
system with stereo capabilities for autonomous robots.
In F. Esposito (Ed.), Lecture Notes In Computer Science Scotti, G., Marcenaro, L., Coelho, C., Selvaggi, F. &
Springer-Verlag, Vol. 2175, 344355. Regazzoni, C. (2005). Dual camera intelligent sensor
for high definition 360 degrees surveillance. IEE Pro-
Adorni, G., Cagnoni, S., Carletti, M., Mordonini, M. ceedings on Vision Image and Signal Processing. IEE
& Sgorbissa, A. (2002). Designing omnidirectional Press. Vol.152, 250-257.
vision sensors. AI*IA Notizie 15(1), 2730.
Swaminathan, R., Grossberg, M. D. & Nayar, S. K.
Adorni, G., Cagnoni, S., Mordonini, M. & Sgorbissa, (2001). Caustics of catadioptric cameras. Proceedings
A. (2003). Omnidirectional stereo systems for robot of the 8th International Conference on Computer Vision.
846
HOPS
Vancouver, 9-12 July 2001. IEEE Computer Society This means that, from a standing position, it can move
Press. Vol.2, 2-9. as easily in any direction. H
Trullier, O., Wiener, S., Berthoz, A. & Meyer, J. (1997). Inverse Perspective Mapping (IPM): A procedure
Biologically - based artificial navigation systems: Re- which allows for perspective effect to be removed
view and prospects. Progress in Neurobiology. Elsevier. from an image by homogeneously redistributing the
Vol. 51, 483544. information content of the image plane into a new
two-dimensional domain.
Yao, Y., Abidi, B. & Abidi, M. (2006). Fusion of
omnidirectional and PTZ cameras for accurate coop- Lens Distortion: Optical errors in camera lenses,
erative tracking. In AVSS: Proceedings of the IEEE usually due to mechanical misalignment of its parts,
International Conference on Video and Signal Based can cause straight lines in the observed scene to appear
Surveillance. Sydney, 22-24 November 2006. IEEE curved in the captured image. The deviation between
Computer Society Press, 46-51. the theoretical image and the actual one is mostly to
be attributed to lens distortion.
Zhang, Z. (2000). A flexible new technique for camera
calibration. IEEE Transactions on Pattern Analysis and Pin-Hole Camera: A camera that uses a tiny hole
Machine Intelligence. IEEE Computer Society Press. (the pin-hole) to convey all rays from the observed
Vol. 22, 1330-1334. scene to the image plane. The smaller the pin-hole, the
sharper the picture. Pin-hole cameras achieve a poten-
tially infinite depth of field. Because of its geometric
simplicity, the pin-hole model is used to describe
KEy TERmS most traditional cameras.
Camera Calibration: A procedure used to obtain Single Viewpoint Constraint: When all incom-
geometrical information about image formation in a ing principal light rays of a lens intersect at a single
specific camera essential to relate metric distances on point, an image with a non-distorted metric content is
the image to distances in the real word. Anyway, some obtained. In this case all information contained in this
a priori information is needed to reconstruct the third image is seen from this view-point.
dimension from only one image.
Visual Servoing: An approach to robot control based
Holonomous Robot: A robot with an unconstrained on visual perception: a vision system extracts informa-
freedom of movement with no preferential direction. tion from the surrounding environment to localize the
robot and consequently servoing its position.
847
848
Monica Mordonini
Universit degli Studi di Parma, Italy
Luca Mussi
Universit degli Studi di Perugia, Italy
Giovanni Adorni
Universit degli Studi di Genova, Italy
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Hybrid Dual Camera Vision Systems
only few recent, in (Kobilarov 2006; Mei 2006). More respect to the camera is also possible to satisfy the
surveillance application-oriented work has involved single viewpoint constraint and thus to obtain an image H
multi-camera systems, joining omnidirectional and which is perspectively correct. Moreover, catadioptric
traditional cameras, while other work dealt with geo- lenses are usually cheaper than fisheye ones. In Figure
metric aspects of hybrid stereo/multi-view relations, 1 a comparison between these two kinds of lenses can
as in (Sturm 2002; Chen 2003). be seen.
The natural choice to develop a cheap vision sys-
tem with both omni-sight and high-detail resolution
is to couple an omnidirectional camera with a moving OVERVIEW OF HyBRID DUAL CAmERA
traditional camera. In the sequel, we will focus on this SySTEmS
kind of systems that are usually called hybrid dual
camera systems. The first work concerning hybrid vision sensors is
probably the one mentioned in (Nayar 1997) referred
Omnidirectional Vision to as Omnidirectional Pan/Tilt/Zoom System where
the PTZ unit was guided by inputs obtained from the
There are two ways to obtain omnidirectional images. omnidirectional view. The next year (Cui 1998) pre-
With a special kind of lenses mounted on a standard sented a distributed system for indoor monitoring: a
camera, called fisheye lenses, it is possible to obtain peripheral camera was calibrated to estimate the distance
a field of view up to about 180-degrees in both direc- between a target and the projection of the camera on
tions. The widest fisheye lens ever produced featured the floor. In this way, they were able to precisely direct
a 220-degrees field of view. Unfortunately, it is very the foveal sensor, of known position, to the target and
difficult to design a fisheye lens that satisfies the single track it. A hybrid system for obstacle detection in robot
viewpoint constraint. Although images acquired by navigation was described in (Adorni 2001) few years
fisheye lenses may prove to be good enough for some later. In this work, a catadioptric camera was calibrated
visualization applications, the distortion compensation along with a perspective one as a single sensor: its
issue has not been solved yet, and the high unit-cost is calibration procedure permitted to compute an Inverse
a major drawback for its wide-spread applications. Perspective Mapping (IPM) (Little 1991) based on a
Combining a rectilinear lens with a mirror is the reference plane, the floor, for both images and hence,
other way to obtain omnidirectional views. In the so thanks to the cameras disparity, to detect obstacles
called catadioptric lenses a convex mirror is placed by computing the difference between the two images.
in front of a rectilinear lens achieving a field of view While this was possible only within the common field
possibly even larger than with a fisheye lens. Using of view of the two cameras, awareness or even tasks
particularly shaped mirrors precisely placed with such as ego-motion estimation were potentially pos-
Figure 1. Comparison between image formation in fisheye lenses (left) and catadioptric lenses (right)
849
Hybrid Dual Camera Vision Systems
Figure 2. A pair of images acquired with the hybrid system described in (Cagnoni 2007). The omnidirectional
image (left) and the perspective image (right). The different resolution of the two images is clearly visible.
sible thanks to the omni-view. This system was further important role is already played by hybrid dual camera
improved and mainly tested in RoboCup1 applications, systems. The monitoring system installed between Eagle
(Adorni 2002; Adorni 2003; Cagnoni 2007). In Figure Pass, Texas, and Piedras Negras, Mexico, by engineers
2 it is possible to see a pair of images acquired with of the Computer Vision and Robotics Laboratory at
such a system. the University of California, San Diego, affiliated
Some recent work has concentrated on using dual with the California Institute for Telecommunications
camera systems for surveillance applications. In (Scotti and Information Technology, is an example of a very
2005), when some alarm is detected on the omni- complex surveillance system in which hybrid dual
directional sensor, the PTZ camera is triggered and camera systems are involved (Hagen 2006). Because
the two views start to track the target autonomously. of the competitive cost, the compactness and the op-
Acquired video sequences and other metadata, like portunities offered by these systems, they are likely
object classification information, are then used to to be used more and more in the future in intelligent
update a distributed database to be queried later by surveillance systems.
users. Similarly in (Yao 2006), after the PTZ camera Another field subjected to great interest is autono-
is triggered by the omnidirectional one, the target is mous vehicle navigation. Even if at the moment there
tracked independently on the two views, but then a are still many problems to be solved before seeing
modified Kalman filter is used to perform data fusion: autonomous public vehicles, industrial applications are
this approach achieves an improved tracking accuracy already possible. Since omnidirectional visual servoing
and permits to resolve occasional occlusions leading and ego-motion estimation can actually be implemented
to a robust surveillance system. also using hybrid dual camera systems, and many more
opportunities are offered by the presence of a second
high-resolution view, their future involvement in this
FUTURE TRENDS field is desirable.
850
Hybrid Dual Camera Vision Systems
this kind of sensors with their different and complemen- IEEE International Conference on Computer Vision.
tary features: while the traditional camera can be used Nice, 13-16 October 2003. IEEE Computer Society H
to acquire detailed information about a limited region Press, 150-155.
of interest (foveal vision), the omnidirectional sen-
Cui, Y., Samarasekera, S., Huang, Q. & Greiffenhagen,
sor provides wide-range, but less detailed information
M. (1998). Indoor monitoring via the collaboration
about the surroundings (peripheral vision).
between a peripheral sensor and a fovea1 sensor. In
Tracking of multiple objects/people relying on high-
VS: Proceedings of the 1998 IEEE Workshop on Visual
resolution images for recognition and access control
Surveillance. Bombay, 2 January 1998. IEEE Computer
or estimating object/people velocity, dimensions and
Society Press. Vol.00, 2-9.
trajectory are some examples of possible automatic
surveillance tasks for which hybrid dual camera systems Hagen, D. & Ramsey, D. (2006). UCSD engineers
are suitable. Furthermore, their use in (autonomous) deploy novel video surveillance system on Texas
robot navigation, allows for accurate obstacle detec- Bridge over Rio Grande. Retrieved June 6, 2007, from
tion, egomotion estimation and three-dimensional the California Institute for Telecommunications and
environment reconstruction. With one of these sensors Information Technology Web site: http://www.calit2.
on board, a mobile robot can be provided with all the net/newsroom/release.php?id=873
necessary information needed to navigate safely in a
dynamic environment. Kobilarov, M., Hyams, J., Batavia, P. & Sukhatme, G.
S. (2006). People tracking and following with mobile
robot using an omnidirectional camera and a laser. In
Proceedings of the IEEE International Conference on
REFERENCES
Robotics and Automation. Orlando, 15-19 May 2006.
IEEE Computer Society Press, 557-562.
Adorni, G., Bolognini, L., Cagnoni, S. & Mordonini,
M. (2001). A non-traditional omnidirectional vision Little, J., Bohrer, S., Mallot, H. & Blthoff, H. (1991).
system with stereo capabilities for autonomous robots. Inverse perspective mapping simplifies optical flow
In F. Esposito (Ed.), Springer-Verlag. Lecture Notes In computation and obstacle detection. In Biological
Computer Science Vol. 2175, 344355. Cybernetics Springer-Verlag. Vol.64, 177-185.
Adorni, G., Cagnoni, S., Carletti, M., Mordonini, M. Mei, C. & Rives, P. (2006). Calibration between a
& Sgorbissa, A. (2002). Designing omnidirectional central catadioptric camera and a laser range finder
vision sensors. AI*IA Notizie XV (1), 2730. for robotic applications. In Proceedings of the IEEE
International Conference on Robotics and Automation.
Adorni, G., Cagnoni, S., Mordonini, M. & Sgorbissa,
Orlando, 15-19 May 2006. IEEE Computer Society
A. (2003). Omnidirectional stereo systems for robot
Press, 532-537.
navigation. In Proceedings of the IEEE Workshop on
Omnidirectional Vision. Madison, Wisconsin, 21 June Nayar, S. & Boult, T. (1997). Omnidirectional vision
2003. IEEE Computer Society Press, 79-89. systems: 1998 PI report. In Proceedings of the 1997
DARPA Image Understanding Workshop. New Orleans,
Benosman, R. & Kang, S. (2001). Panoramic vision:
11-14 May 1997. Storming Media, 93-99.
Sensors, theory and applications. Springer-Verlag.
Scotti, G., Marcenaro, L., Coelho, C., Selvaggi, F. &
Cagnoni, S., Mordonini, M., Mussi, L. & Adorni, G.
Regazzoni, C. (2005). Dual camera intelligent sensor
(2007). Hybrid stereo sensor with omnidirectional vi-
for high definition 360 degrees surveillance. In IEE
sion capabilities: Overview and calibration procedures.
Proceedings on Vision Image and Signal Processing,
In Proceedings of the 14th International Conference of
IEE Press, Vol.152, 250-257.
Image Analysis and Processing Modena, 11-13 Septem-
ber 2007. IEEE Computer Society Press, 99-104. Sturm, P. (2002). Mixing catadioptric and perspective
cameras. In Proceedings of the Workshop on Omnidi-
Chen, X., Yang, J. & Waibel, A. (2003). Calibration
rectional Vision. Copenhagen, 12-14 June 2002. IEEE
of a hybrid camera network. In Proceedings of the 9th
Computer Society Press, 37-44.
851
Hybrid Dual Camera Vision Systems
Yao, Y., Abidi, B. & Abidi, M. (2006). Fusion of that the principal point of the lens coincides with the
omnidirectional and PTZ cameras for accurate coop- external focus of the hyperboloid to let the internal one
erative tracking. In AVSS: Proceedings of the IEEE be the unique viewpoint for the observed scene.
International Conference on Video and Signal Based
Omnidirectional Camera: A camera able to see
Surveillance. Sydney, 22-24 November 2006. IEEE
in all directions. There are essentially two different
Computer Society Press, 46-51.
methods to obtain a very wide field of view: the older
one involves the use of a special type of lens, usually
referred to as fisheye lens, while the other one uses in
KEy TERmS conjunction rectilinear lenses and mirrors. Lenses ob-
tained in the latter case are usually called catadioptric
Camera Calibration: A procedure used to obtain lenses and the camera-lens ensemble is referred to as
geometrical information about image formation in catadioptric camera.
a specific camera. After calibration, it is possible to
PTZ Camera: A camera able to pan left and right,
relate metric distances on the image to distances in the
tilt up and down, and zoom. It is usually possible to
real word. In any case only one image is not enough
freely control its orientation and zooming status at a
to reconstruct the third dimension and some a priori
distance through a computer or a dedicated control
information is needed to accomplish this capability.
system.
Catadioptric Camera: A camera that uses in
Stereo Vision: A visual perception process that
conjunction catoptric, reflective, lenses (mirrors) and
exploits two different views to achieve depth percep-
dioptric, refractive, lenses. Usually the purpose of these
tion. The difference between the two images, usually
cameras is to achieve a wider field of view than the one
referred to as binocular disparity, is interpreted by the
obtained by classical lenses. Even if the field of view
brain (or by an artificial intelligent system) as depth.
of a lens could be improved with any convex surface
mirror, those of greater interest are conic, spherical, Single Viewpoint Constraint: To obtain an image
parabolic and hyperbolic-shaped ones. with a non-distorted metric content, it is essential that
all incoming principal light rays of a lens intersect at a
Central Catadioptric Camera: A camera that com-
single point. In this case a fixed viewpoint is obtained
bines lenses and mirrors to capture a wide field of view
and all the information contained in an image is seen
through a central projection (i.e. a single viewpoint).
from this point.
Most common examples use paraboloidal or hyperbo-
loidal mirrors. In the former case a telecentric lens is
needed to focalize parallel rays reflected by the mirror
and there are no constraints for mirror to camera relative ENDNOTE
positioning: the internal focus of the parabola acts as
the unique viewpoint; in the latter case it is possible to 1
Visit http://www.robocup.org for more informa-
use a normal lens, but mirror to camera positioning is tion.
critical for achieving a single viewpoint: it is essential
852
853
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Hybrid Meta-Heuristics Based System for Dynamic Scheduling
Secondly, most of the approximation methods pro- For literature on this subject, see for example (Glover
posed for the Job-Shop Scheduling Problems (JSSP) & Gary, 2003) and (Gonzalez, 2007).
are oriented methods, i.e. developed specifically for In last decades, there has been a significant level
the problem in consideration. Some examples of this of research interest in Meta-Heuristic approaches for
class of methods are the priority rules and the Shifting solving large real world scheduling problems, which are
Bottleneck (Pinedo, 2005). often complex, constrained and dynamic. Scheduling
Finally, traditional scheduling methods are essen- algorithms that achieve good or near optimal solutions
tially centralized in the sense that all the computations and can efficiently adapt them to perturbations are, in
are carried out in a central computing and logic unit. All most cases, preferable to those that achieve optimal ones
the information concerning every job and every resource but that cannot implement such an adaptation. This is
has to go through this unit. This centralized approach the case with most algorithms for solving the so-called
is especially susceptible to problems of tractability, static scheduling problem for different setting of both
because the number of interacting entities that must be single and multi-machine systems arrangements. This
managed together is large and leads to a combinatorial reality, motivated us to concentrate on tools, which could
explosion. Particularly since, a detailed schedule is deal with such dynamic, disturbed scheduling problems,
generated over a long time horizon, and planning and even though, due to the complexity of these problems,
execution are carried out in discrete buckets of time. optimal solutions may not be possible to find.
Centralized scheduling is therefore large, complex, and Several attempts have been made to modify algo-
difficult to maintain and reconfigure. On the other hand, rithms, to tune them for optimization in a changing
the inherent nature of much industrial and service proc- environment. It was observed in manufacturing all these
ess is distributed. Consequently, traditional methods studies, that the dynamic environment requires an algo-
are often too inflexible, costly, and slow to satisfy the rithm to maintain sufficient diversity for a continuous
needs of real-world scheduling systems. adaptation to the changes of the landscape. Although
By exploiting problem-specific characteristics, the interest in optimization algorithms for dynamic
classical optimisation methods are not enough for the optimization problems is growing and a number of
efficient resolution of those problems or are developed authors have proposed an even greater number of new
for specific situations (Leung, 2004) (Brucker, 2004) approaches, the field lacks a general understanding
(Logie, Sabaz & Gruver, 2004) (Blazewicz, Ecker as to suitable benchmark problems, fair comparisons
&Trystrams, 2005) (Pinedo, 2005). and measurement of algorithm quality (Branke, 1999)
(Cowling & Johanson, 2002) (Madureira, 2003), Madu-
Meta-Heuristics reira, Ramos & Silva, 2004) (Aytug, Lawley, McKay,
Mohan & Uzsoy, 2005).
As a major departure from classical techniques, a In spite of all the previous trials scheduling prob-
Meta-heuristic (MH) method implies higher-level lem still known to be NP-complete. This fact incites
strategy controlling lower-level heuristic methods. researchers to explore new directions.
Meta-heuristics exploit not only the problem charac-
teristics but also ideas based on artificial intelligence Hybrid Intelligent Systems
rationale, such as different types of memory structures
and learning mechanisms, as well as the analogy with Hybridization of intelligent systems is a promising
other optimization methods found in nature. research field of computational intelligence focusing
The interest of the Meta-Heuristic approaches is on combinations of multiple approaches to develop the
that they converge, in general, to satisfactory solutions next generation of intelligent systems. An important
in an effective and efficient way (computing time and stimulus to the investigations on Hybrid Intelligent
implementation effort). The family of MH includes, Systems area is the awareness that combined approaches
but it is not limited to Tabu Search, Simulated Anneal- will be necessary if the remaining tough problems in
ing, Soft Computing, Evolutionary Algorithms, Adap- artificial intelligence are to be solved. Meta-Heu-
tive Memory procedures, Scatter Search, Ant Colony ristics, Bio-Inspired Techniques, Neural computing,
Optimization, Swarm Intelligence, and their hybrids. Machine Learning, Fuzzy Logic Systems, Evolution-
854
Hybrid Meta-Heuristics Based System for Dynamic Scheduling
ary Algorithms, Agent-based Methods, among others, be Simple or Complex. It may be Simple, like a part,
have been established and shown their strength and requiring a set of operations to be processed. Complex H
drawbacks. Recently, hybrid intelligent systems are Final Items, requiring processing of several operations
getting popular due to their capabilities in handling on a number of parts followed by assembly operations
several real world complexities involving imprecision, at several stages, are also dealt with. Moreover, in prac-
uncertainty and vagueness (Boeres, Lima, Vinod & tice, scheduling environment tends to be dynamic, i.e.
Rebello, 2003), (Madureira, Ramos & Silva, 2004) new jobs arrive at unpredictable intervals, machines
(Bartz-Beielstein, Blesa, Blum, Naujoks, Roli, Rudolph breakdown, jobs can be cancelled and due dates and
&Sampels, 2007). processing times can change frequently (Madureira,
2003) (Madureira, Ramos & Silva, 2004).
It starts focusing on the solution of the dynamic
HyBRID mETA-HEURISTICS BASED deterministic EJSSP problems. For solving these we
SCHEDULING SySTEm developed a framework, leading to a dynamic schedul-
ing system having as a fundamental scheduling tool,
The purpose of this article is to describe an frame- a hybrid scheduling system, with two main pieces of
work based on combination of Meta-Heuristics, intelligence (Figure 1).
Tabu Search(TS) and Genetic Algorithms(GA), and One such piece is a combination of TS and GA
constructive optimization methods for solving a class based method and a mechanism for inter-machine
of real world scheduling problems, where the products activity coordination. The objective of this mechanism
(jobs) to be processed have due dates, release times is to coordinate the operation of machines, taking into
and different assembly levels. This means that parts account the technological constraints of jobs, i.e. job
to be assembled may be manufactured in parallel, i.e. operations precedence relationships, towards obtaining
simultaneously. good schedules. The other piece is a dynamic adapta-
The problem, focused in this work, which we call tion module that includes mechanisms for neighbour-
Extended Job-Shop Scheduling Problem (EJSSP) has hood/population regeneration under dynamic environ-
major extensions and differences in relation to the classic ments, increasing or decreasing it according new job
Job-Shop Scheduling Problem. In this work, we define a arrivals or cancellations.
job as a manufacturing order for a final item, that could
Jobs
Scheduling Module
Random
Events
Scheduling
Pr- Method
MH Parameterization
Processing Dynamic
Adaptation
Coordination
Mechanism
855
Hybrid Meta-Heuristics Based System for Dynamic Scheduling
856
Hybrid Meta-Heuristics Based System for Dynamic Scheduling
work has not yet started and time is available, then an Regeneration Mechanisms
obvious and simple approach to rescheduling would H
be to restart the scheduling from scratch with a new After integration/elimination of operations is carried
modified solution on which takes into account the out, by inserting/deleting positions/genes in the current
perturbation, for example a new job arrival. When solution/chromosome, population regeneration is done
there is not enough time to reschedule from scratch by updating its size. The population size for SMSP is
or job processing has already started, a strategy must proportional to the number of operations.
be used which adapts the current schedule having in After dynamic adaptation process, the scheduling
consideration the kind of perturbation occurred. method could be applied and search for better solutions
The occurrence of a partial event requires redefini- with the modified solution.
tion of job attributes and a re-evaluation of the schedule In this way we proposed a hybrid system in which
objective function. A change in job due date requires some self-organization aspects could be considered in
the re-calculation of the operation starting and comple- accordance with the problem being solved: the method
tion due times of all respective operations. However, and/or parameters can change in run-time, the used
changes in the operation processing times only requires MH can change according with problem characteris-
re-calculation of the operation starting and completion tics, etc.
due times of the succeeding operations. A new job
arrival requires definition of the correspondent opera-
tion starting and completion times and a regenerating FUTURE TRENDS
mechanism to integrate all operations on the respective
single machine problems. In the presence of a job can- Considering the complexity inherent to the manufac-
cellation, the application of a regenerating mechanism turing systems, the dynamic scheduling is considered
eliminates the job operations from the SMSP where an excellent candidate for the application of agent-
they appear. After the insertion or deletion of positions, based technology. A natural evolution to the approach
neighbourhood regeneration is done by updating the size above proposed is a Multi-agent Scheduling System
of the neighbourhood and ensuring a structure identical that assumes the existence of several Machines Agents
to the existing one. Then the scheduling module can (which are decision-making entities) distributed inside
apply the search process for better solutions with the the Manufacturing System that interact and cooperate
new modified solution. with other agents in order to obtain optimal or near-
optimal global performances.
Job Arrival Integration Mechanism The main idea is that from local, autonomous and
often conflicting behaviours of the agents a global so-
When a new job arrives to be processed, an integration lution emerges from a community of machine agents
mechanism is needed. This analyses the job precedence solving locally their schedules and cooperating with
graph that represents the ordered allocation of machines other machine agents (Madureira, Gomes & Santos,
to each job operation, and integrates each operation 2006). Agents must be able to learn and manage their
into the respective single machine problem. Two al- internal behaviours and their relationships with other
ternative procedures could be used for each operation: agents, by cooperative negotiation in accordance
either randomly select one position to insert the new with business policies defined by user manager. Some
operation into the current solution/chromosome or use self-organization aspects could be considered in ac-
some intelligent mechanism to insert this operation in cordance with the problem being solved: the method
the schedules, based on job priority, for example. and/or parameters can change in run-time, the agents
can use different MH according with problem charac-
Job Elimination Mechanism teristics, etc.
857
Hybrid Meta-Heuristics Based System for Dynamic Scheduling
858
Hybrid Meta-Heuristics Based System for Dynamic Scheduling
Dynamic Scheduling Systems: Are frequently Meta-Heuristics: Form a class of powerful and
subject to several kinds of random occurrences and practical solution techniques for tackling complex, H
perturbations, such as new job arrivals, machine break- large-scale combinatorial problems producing effi-
downs, employees sickness, jobs cancellation and due ciently high-quality solutions.
date and time processing changes, causing prepared
Multi-Agent Systems: A system composed of
schedules becoming easily outdated and unsuitable.
several agents, collectively capable of solve complex
Evolutionary Computation: A subfield of artifi- problems in a distributed fashion without the need for
cial intelligence that involve techniques implementing each agent to know about the whole problem being
mechanisms inspired by biological evolution such as solved.
reproduction, mutation, recombination, natural selec-
Scheduling: Can be seen as a decision making
tion and survival of the fittest.
process for operations starting and resources to be
Genetic Algorithms: Particular class of evolu- used. A variety of characteristics and constraints related
tionary algorithms that use techniques inspired by with jobs and machine environments (Single Machine,
evolutionary biology such as inheritance, mutation, Parallel machines, Flow-Shop and Job-Shop) can affect
selection, and crossover. scheduling decisions.
Hybrid Intelligent Systems: Denotes a software Tabu Search: A approximation method, belonging
system which employs, a combination of Artificial to the class of local search techniques, that enhances
Intelligence models, methods and techniques, such the performance of a local search method by using
Evolutionary Computation, Meta-Heuristics, Multi- memory structures (Tabu List).
Agent Systems, Expert Systems and others.
859
860
Ramon Zatarain
Instituto Tecnologico de Culiacan, Mexico
Lucia Barron
Instituto Tecnologico de Culiacan, Mexico
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Hybrid System for Automatic Infant Cry Recognition I
speech recognition algorithms (Lippmann, 1990). In system architecture of a fuzzy neural network and the
the connectionist approach, pattern classification is use of 3n-dimensional vectors to represent the fuzzy H
done with a multi-layer neural network. A weight is membership values of the input features to the primary
assigned to every link between neurons in contiguous linguistic properties low, medium, and high (Pal, 1992a)
layers. In the input layer each neuron receives one of and (Pal, and Mandal, 1992b). On the other side, the
the features present in the input pattern vectors. Each idea of using a relational neural network as a pattern
neuron in the output layer corresponds to each speech classifier was developed by Pedrycz and presented in
unit class (word or sub-word). The neural network (Pedrycz, 1991). As a result of the combination of the
associates input patterns to output classes by model- Pals and Pedryczs proposed methodologies in 1994
ing the relationship between the two pattern sets. The C. A. Reyes (1994) developed the hybrid model known
pattern is estimated or learned by the network with as fuzzy relational neural network (FRNN).
a representative sample of input and output patterns
(Morgan, and Scofield, 1991) (Pedrycz, 1991).. In
order to stabilize the perceptrons behavior, many THE AUTOmATIC INFANT CRy
researchers had been trying to incorporate fuzzy set RECOGNITION PROCESS
theory into neural networks. The theory of fuzzy sets,
developed by Zadeh in 1965 (Zadeh, 1965), has since The infant cry automatic classification process is, in
been used to generalize existing techniques and to general, a pattern recognition problem, similar to Auto-
develop new algorithms in pattern recognition. Pal matic Speech Recognition (ASR) (Huang, Acero, Hon,
(Pal, 1992a) suggested that to enable systems to handle 2001). The goal is to take the wave from the infants
real-life situations, fuzzy sets should be incorporated cry as the input pattern, and at the end obtain the kind
into neural networks, and, that the increase in the of cry or pathology detected on the baby (Cano, Esc-
amount of computation required with its incorpora- obedo and Coello, 1999) (Ekkel, 2002). Generally, the
tion, is offset by the potential for parallel computation process of Automatic Infant Cry Recognition is done in
with high flexibility that fuzzy neural networks have. two steps. The first step is known as signal processing,
Pal proposes how to do data fuzzification, the general or feature extraction, whereas the second is known as
861
A Hybrid System for Automatic Infant Cry Recognition I
pattern classification. In the acoustical analysis phase, THE FUZZy NEURAL NETWORK
the cry signal is first normalized and cleaned, and then mODEL
it is analyzed to extract the most important features in
function of time. The set of obtained features is repre- The system proposed in this work is based upon fuzzy
sented by a vector, which represents a pattern. The set set operations in both; the neural networks structure
of all vectors is then used to train the classifier. Later and the learning process. Following Pals idea of a
on, a set of unknown feature vectors is compared with general recognizer (Pal, S.K., 1992a), the model is
the acquired knowledge to measure the classification divided in two main parts, one for learning and another
output efficiency. Figure 1 shows the different stages for processing, as shown in Figure 2.
of the described recognition process.
Fuzzy Learning
Cry Patterns Classification
The fuzzy learning section is composed by three mod-
The vectors, representing patterns, obtained in the ules, namely the Linguistic Feature Extractor (LFE),
extraction stage are later used in the classification the Desired Output Estimator (DOE), and the Neural
process. There are four basic schools for the solution Network Trainer (NNT). The Linguistic Feature Extrac-
of the pattern classification problem, those are: a) Pat- tor takes training samples in the form of n-dimensional
tern comparison (dynamic programming), b) Statistic vectors containing n features, and converts them to
Models (Hidden Markov Models HMM), c) Knowledge Nn-dimensional form vectors, where N is the number
based systems (expert systems), and d) Connectionists of linguistic properties. In this case the linguistic
Models (neural networks). In recent years, a new strong properties are low, medium, and high. The resulting
trend of more robust hybrid classifiers has been emerg- 3n-dimensional vector is called Linguistic Properties
ing. Some of the better known hybrid models result Vector (LPV). In this way an input pattern Fi = [Fi1,
from the combination of neural and fuzzy approaches Fi2, ...,Fin] containing n features, may be represented
(Jang, 1993) (Lin Chin-Teng, and George Lee, 1996). as (Pal, and Mandal, 1992b)
For the work shown here, we have implemented a
hybrid model of this type, called the Fuzzy Relational Fi = low(Fi1 ) (Fi ), med (Fi1 ) (Fi ),
Neural Network, whose parameters are found trough
the application of genetic algorithms. We selected this high(Fi1 ) (Fi ),, high(Fin ) (Fi )
kind of model, because of its adaptation, learning and
knowledge representation capabilities. Besides, one of The DOE takes each vector from the training
its main functions is to perform pattern recognition. samples and calculates its membership to class k, in
In an Automatic Infant Cry Classification System, an l-class problem domain. The vector containing the
the goal is to identify a model of an unknown pattern class membership values is called the Desired Vector
obtained after the original sound wave is acoustically (DV). Both LPV and DV vectors are used by the neural
analyzed, and its dimensionality reduced. So, in this Network Trainer (NNT), which takes them for training
phase we determine the class or category to which the network.
each cry pattern belongs to. The collection of samples, The neural network has only one input and one
each of which is represented by a vector of n features, output layer. The input layer is formed by a set of
is divided in two subsets: The training set and the test Nn neurons, with each of them corresponding to one
set. First, the training set is used to teach the classifier of the linguistic properties assigned to the n input
to distinguish between the different crying types. Then features. In the output layer there are l neurons, with
the test set is used to determine how well the classifier each node corresponding to one of the l classes; in
assigns the corresponding class to a pattern by means of this implementation, each class represents one type
the classification scheme generated during training. of crying. There is a link from every node in the input
layer to every node in the output layer. All the con-
862
A Hybrid System for Automatic Infant Cry Recognition I
LPV k
Linguistic Feature
Extractor
training Neural Network
samples Trainer
Desired Output DV
Estimator R
Decision
Making
OUTPUT
PROCESSING PHASE
nections are described by means of fuzzy relations R: the learning phase, described in the previous section.
X Y [0, 1] between the input and output nodes. The output of this module is an LPV vector, which
The error is represented by the distance between the along with the fuzzy relational matrix R, are used by the
actual output and the target or desired output. During Fuzzy Classifier, which obtains the actual outputs from
each learning step, once the error has been computed, the neural network. The classifier applies the max-min
the trainer adjusts the relationship values or weights of composition to calculate the output. The output of this
the corresponding connections, either until a minimum module is an output vector containing the membership
error is obtained or a given number of iterations are values of the input vector to each of the classes. Finally,
completed. The output of the NNT, after the learning the Decision Making module selects the highest value
process, is a fuzzy relational matrix (R in Figure 1) from the classifier and assigns the corresponding class
containing the knowledge needed to further map the to the testing vector.
unknown input vectors to their corresponding class
during the classification process. Membership Functions
863
A Hybrid System for Automatic Infant Cry Recognition I
864
A Hybrid System for Automatic Infant Cry Recognition I
specifying the level of modification of the learning Cano, Sergio D, Escobedo, Daniel I., and Coello, Eddy
parameters with regard to their values in the previous (1999), El Uso de los Mapas Auto-Organizados de H
learning step k. A way of determining the increments Kohonen en la Clasificacin de Unidades de Llanto
am is with regard to the mth coordinates of a, m = 1, Infantil, Grupo de Procesamiento de Voz, 1er Taller
2,..., L. The computation of the overall performance AIRENE, Universidad Catolica del Norte, Chile, pp
index, and the derivatives to calculate the increments 24-29.
for each coordinate of a, and are explained in detail
Cano, Sergio D, Escobedo, Daniel I., Ekkel, Taco (2004)
in (Reyes, C. A., 1994). Once the training has been
A Radial Basis Function Network Oriented for Infant
terminated, the output of the trainer is the updated
Cry Classification, Proc. of 9th Iberoamerican Congress
relational matrix, which will contain the knowledge
on Pattern Recognition, Puebla, Mexico.
needed to map unknown patterns to their correspond-
ing classes. Cano-Ortiz, S.D,. Escobedo-Becerro, D. I (1999),
Clasificacin de Unidades de Llanto Infantil Mediante
el Mapa Auto-Organizado de Kohoeen, I Taller AI-
FUTURE TRENDS RENE sobre Reconocimiento de Patrones con Redes
Neuronales, Universidad Catlica del Norte, Chile,pp.
One unexplored possibility of improving the FRNN 24-29.
performance is the use of other fuzzy relational
Ekkel, T. (2002), Neural Network-Based Classification
products instead of max-min composition. Moreover,
of Cries from Infants Suffering from Hypoxia-Related
membership functions have parameters which can be
CNS Damage, Master Thesis, University of Twente.
optimized by genetic algorithms any other optimizing
The Netherlands.
technique. Adequate parameters may improve learning
and recognition efficiency of the FRNN. Huang, X., Acero, A., Hon, H. (2001) Spoken Language
Processing: A Guide to Theory, Algorithm, and System
Development, Prentice-Hall, Inc., USA.
CONCLUSIONS
Jang, J.-S. R. (1993), ANFIS: Adaptive Network-based
Fuzzy Inference System, in IEEE Transactions on Sys-
We have presented the development and implementa-
tems, Man, and Cybernetics, 23 (03):665-685.
tion of an AICR system as well as a powerful hybrid
classifier, the FRNN, which is a model formed by the Lin Chin-Teng, and George Lee, C.S. (1996), Neural
combination of fuzzy relations and artificial neural Fuzzy System: A Neuro-Fuzzy Synergism to Intelligent
networks. The synergistic symbiosis obtained though Systems, Prentice Hall, Upper Saddle River, NJ.
the fusion of both methodologies will be demonstrated.
In the related paper on applications of this model, we Lippmann, R.P. (1990), Review of Neural Networks for
will show some practical results, as well as an improved Speech Recognition, in Readings in Speech Recogni-
model by means of genetic algorithms. tion, Morgan Kauffman Publishers Inc., San Mateo,
Calif, pp 374-392.
Morgan, D.P., and Scofield, C.L. (1991), Neural
ACKNOWLEDGmENTS Networks and Speech Processing, Kluwer Academic
Publishers, Boston.
This work is part of a project that is being financed by
CONACYT-Mexico (46753). Orozco, J., Reyes, C.A. (2003), Mel-frequency Ceps-
trum Coefficients Extraction from Infant Cry for
Classification of Normal and Pathological Cry whit
Feed-Forward Neural Networks, Proc. of ESANN,
REFERENCES
Bruges, Belgium.
Bosma, J. F., Truby, H. M., and Antolop, W. (1965), Pal, S.K. (1992a) Multilayer Perceptron, Fuzzy Sets,
Cry Motions of the Newborn Infant. Acta Paediatrica and Classification, in IEEE Trans. on Neural Networks,
Scandinavica (Suppl.), 163, 61-92. vol 3, No 5, Sep 1992, pp 683-697.
865
A Hybrid System for Automatic Infant Cry Recognition I
Pal, S.K. and Mandal, D.P. (1992b), Linguistic Recog- KEy TERmS
nition Systems Based on Approximated Reasoning, in
Information Science, vol. 61, No 2, pp 135-161. Artificial Neural Networks: A network of many
simple processors that imitates a biological neural
Park, D., Cae, Z., and Kandel, A. (1992), Investigations
network. The units are connected by unidirectional
on the Applicability of Fuzzy Inference, in Fuzzy Sets
communication channels, which carry numeric data.
and Systems, vol 49, pp 151-169.
Neural networks can be trained to find nonlinear re-
Pedrycz, W.(1991), Neuro Computations in Relational lationships in data, and are used in applications such
Systems, IEEE Trans .On Pattern Analysis and Intel- as robotics, speech recognition, signal processing or
ligence, vol. 13, No 3, pp 289-296. medical diagnosis.
Petroni, M., Malowany, A. S., Johnston, C., and Stevens, Automatic Infant Cry Recognition (AICR): A
B. J., (1995),. Identification of pain from infant cry process where the crying signal is automatically ana-
vocalizations using artificial neural networks (ANNs), lyzed, to extract acoustical features looking to determine
The International Society for Optical Engineering. the infants physical state, the cause of crying or even
Volume 2492. Part two of two. Paper #: 2492-79. detect pathologies in very early stages of life.
pp.729-738.
Back propagation Algorithm: Learning algorithm
Reyes, C. A., (1994) On the design of a fuzzy relational of ANNs, based on minimising the error obtained from
neural network for automatic speech recognition, the comparison between the outputs that the network
Doctoral Dissertation, The Florida State University, gives after the application of a set of network inputs
Tallahassee, Fl,. USA. and the outputs it should give (the desired outputs).
Suaste, I., Reyes, O.F., Diaz, A., Reyes, C.A. (2004) Fuzzy Relational Neural Network (FRNN): A
Implementation of a Linguistic Fuzzy Relational Neu- hybrid classification model combining the advantages
ral Network for Detecting Pathologies by Infant Cry of fuzzy relations with artificial neural networks.
Recognition, Proc. of IBERAMIA, Puebla, Mexico ,
Fuzzy Sets: A generalization of ordinary sets by
pp. 953-962.
allowing a degree of membership for their elements.
Wasz-Hckert, O., Lind, J., Vuorenkoski, V., Partanen, This theory was proposed by Lofti Zadeh in 1965.
T., & Valanne, E. (1970) El Llanto en el Lactante Fuzzy sets are the base of fuzzy logic.
y su Significacin Diagnstica, Cientifico-Medica,
Hybrid Intelligent System: A software system
Barcelona.
which employs, in parallel, a combination of methods
Wasz-Hckert, O., Lind, J., Vuorenkoski, V., Partenen, and techniques from Soft Computing.
T., Valanne, E. (1968), The infant cry: a spectrographic
Learning Stage: A process to teach classifiers to
and auditory analisis, Clin. Dev. Med. 29, pp. 1-42
distinguish between different pattern types.
Zadeh, L.A. (1965), Fuzzy Sets, Inform. Contr., vol
8, pp 338-353.
866
867
Sandra E. Barajas
Instituto Nacional de Astrofsica ptica y Electrnica, Mexico
Esteban Tlelo-Cuautle
Instituto Nacional de Astrofsica ptica y Electrnica, Mexico
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Hybrid System for Automatic Infant Cry Recognition II
1 0 1 0 0 0 0 0 0 1
| linguistic properties | | membership | | classification | | learning rate |
| function | | method |
868
A Hybrid System for Automatic Infant Cry Recognition II
The analysis of the raw cry waveform provides the Preliminary Results
information needed for its recognition. At the same
time, it discards unwanted information such as back- Three different classification experiments were made,
ground noise, and channel distortion (Levinson S.E., the first one consists in classifying deaf and normal
and Roe, D.B., 1990). Acoustic feature extraction is a infant cry, the second one was made to classify infant
transformation of measured data into pattern data. Some cry in categories called asphyxia and normal, and the
of the most important techniques used for analyzing cry third one to classify hunger and pain crying. In each
wave signals are: Discrete Fourier Transform (DFT), task the training samples and the test samples are ran-
cepstral processing, and Linear Prediction Analysis domly selected. The results of the model in the clas-
(LPA) (Ainsworth, W.A., 1988) (Schafer and Rabiner sification of deaf and normal cry are given in Table I.
1990). The application of these techniques during signal In Table II we show the results obtained in the second
processing obtains the values of a set of acoustic fea- classification task. Finally Table III shows the results
tures. The features may be spectral coefficients, linear in the classification of hunger and pain cry. In every
prediction coefficients (LPC), Mel frequency cepstral classification task the GA was run about 15 times and
coefficients (MFCC), among others (Ainsworth, W.A., the reported results show the average of the best clas-
1988). The set of values for n features may be repre- sification in each experiment.
869
A Hybrid System for Automatic Infant Cry Recognition II
870
A Hybrid System for Automatic Infant Cry Recognition II
Performance Comparison with Other use to make the classification of infant cry. The use of
Models linguistic properties allows us to deal with the impre- H
ciseness of infant cry and provides the classifier with
Reyes and Orozco (Orozco, Reyes, 2003) classified very useful information. By applying the linguistic
cry samples from deaf and normal babies, obtaining information and given the nature of the model, it is
recognition results around 97.43%. Reyes et al (Suaste, not necessary to get training through a high number
Reyes, Diaz, Reyes, 2004) showed an implementation of learning epochs, a high number of iterations in the
of a linguistic fuzzy relational neural network to clas- genetic algorithm is not necessary either. The results of
sify normal and pathological infant cry with percentage classifying deaf and normal infant cry are very similar
of correct classification of 97.3% and 98%. Petroni, to other models, but when we classify hunger and pain
Malowany, Johnston and Stevens (1995) classified cry the results are much better than other models.
from normal babies to identify pain with artificial neural
networks and report results of correct classification
that go from 61% with cascade-correlation networks ACKNOWLEDGmENTS
up to 86.2% with feed-forward neural networks. In
(Lederman, 2002) Dror Lederman presents some clas- This work is part of a project that is being financed by
sification results for infants with respiratory distress CONACYT-Mexico (46753).
syndrome RDS (related to asphyxia) versus healthy
infants. For the classification he used a Hidden Markov
Model architecture with 8 states and 5 Gaussians/state. REFERENCES
The results reported are of 63 % of total mean correct
classification. Bck, Thomas (1993), Optimal mutation rates in genetic
search, Proceedings of the Fifth International Confer-
ence on Genetic Algorithms, San Mateo, California:
FUTURE TRENDS Morgan Kaufmann, pp 2-8.
Boersma, P., Weenink (2002), D. Praat v 4.0.8. A system
AICR systems may expand their utility by training them
for doing phonetics by computer, Institute of Phonetic
to recognize a larger number of pathologies. The first
Sciences of the University of Amsterdam, February.
requirement to achieve this goal is to collect a suitable
set of labeled samples for any target pathology. The Bosma, J. F., Truby, H. M., & Antolop, W. (1965),
GA presented here optimizes some parameters of the Cry Motions of the Newborn Infant. Acta Paediatrica
FRNN, but the model has more. So, other parameters Scandinavica (Suppl.), 163, 61-92.
can be added to the chromosomal representation in
Cano, Sergio D, Escobedo, Daniel I., and Coello, Eddy
order to improve the model, like initial values of the
(1999), El Uso de los Mapas Auto-Organizados de
relational matrix and of the bias vectors, number of
Kohonen en la Clasificacin de Unidades de Llanto
training epochs, and the values of the exponential fuzzy
Infantil, Grupo de Procesamiento de Voz, 1er Taller
generator and the denominational fuzzy generator used
AIRENE, Universidad Catolica del Norte, Chile, pp
by the DOE.
24-29.
Casillas, J., Cordon, O., Jesus, M.J. del, Herrera, F.,
CONCLUSIONS (2000), Genetic Feature Selection in a FuzzyRule.
Based Classification System Learning Process for High
The proposed genetic algorithm computes a selection Dimensional Problems, Technical Report DECSAI-
of the number of linguistic properties, the membership 000122, Universidad de Granada, Spain.
function used to calculate the linguistic features, the
Coello, Carlos A.(1995), Introduccin a los algoritmos
method to calculate the output of the classifier in the
genticos, Soluciones Avanzadas. Tecnologas de
fuzzy processing section and the learning rate of the
Informacin y Estrategias de Negocios, Ao 3, No.
FRNN. The solution obtained by the proposed genetic
17, pp. 5-11.
algorithm is a set of characteristics that the FRNN can
871
A Hybrid System for Automatic Infant Cry Recognition II
Cordon, Oscar, Herrera, Francisco, Hoffmann, Frank Zbigniew Michalewicz (1992), Genetic algorithms
and Magdalena, Luis, (2001), Genetic Fuzzy Systems: + data structures = evolution programs, Springer-
Evolutionary Tuning and Learning of Fuzzy Knowledge Verlag, 2nd ed.
Bases, Singapore, World Scientific.
De Jong, K. (1975), An Analysis of the Behavior of a
Class of Genetic Adaptive Systems. Ph.D. Dissertation. KEy TERmS
Dept. of Computer and Communication Sciences, Univ.
of Michigan, Ann Arbor. Binary Chromosome: Is an encoding scheme rep-
resenting one potential solution to a problem, during a
Ekkel, T. (2002), Neural Network-Based Classification
searching process, by means of a string of bits.
of Cries from Infants Suffering from Hypoxia-Related
CNS Damage, Master Thesis. University of Twente. Evolutionary Computation: A subfield of com-
The Netherlands. putational intelligence that involves combinatorial
optimization problems. It uses iterative progress, such
Goldberg, David E. (1989), Genetic Algorithms in
as growth or development in a population, which is
Search, Optimization and Machine Learning. Mas-
then selected in a guided random search to achieve
sachusetts: Addison-Wesley.
the desired end. Such processes are often inspired by
Gnter, Rudolph (1994), Convergence analysis of biological mechanisms of evolution.
canonical genetic algorithms, IEEE Transactions on
Fitness Function: It is a function defined over the
Neural Networks, vol. 5, pp 96-101.
genetic representation and measures the quality of the
Holland, J. (1975), Adaptation in Natural and Artificial represented solution. The fitness function is always
Systems, University of Michigan Press. problem dependent.
Huang, X., Acero, A., Hon, H. (2001) Spoken Language Genetic Algorithms: A family of computational
Processing: A Guide to Theory, Algorithm, and System models inspired by evolution. These algorithms encode
Development, Prentice-Hall, Inc., USA. a potential solution to a specific problem on a simple
chromosome-like data structure and apply recombi-
Lederman, D. (2002), Automatic Classification of
nation operators to these structures so as to preserve
Infants Cry. Master Thesis. University of Negev.
critical information. Genetic algorithms are often
Israel.
viewed as function optimizers, although the range of
Petroni, M., Malowany, A. S., Johnston, C., and Stevens, problems to which genetic algorithms have been ap-
B. J., (1995),. Identification of pain from infant cry plied is quite broad.
vocalizations using artificial neural networks (ANNs),
Hybrid Intelligent System: A software system
The International Society for Optical Engineering.
which employs, in parallel, a combination of methods
Volume 2492. Part two of two. Paper #: 2492-79.
and techniques mainly from subfields of Soft Com-
pp.729-738.
puting.
Reyes, C. A., (1994), On the design of a fuzzy relational
Signal Processing: The analysis, interpretation
neural network for automatic speech recognition,
and manipulation of signals. Processing of such sig-
Doctoral Dissertation, The Florida State University,
nals includes storage and reconstruction, separation
Tallahassee, Fl,. USA.
of information from noise, compression, and feature
Suaste, I., Reyes, O.F., Diaz, A., Reyes, C.A. (2004) extraction.
Implementation of a Linguistic Fuzzy Relational Neu-
Soft Computing: A partnership of techniques which
ral Network for Detecting Pathologies by Infant Cry
in combination are tolerant of imprecision, uncertainty,
Recognition, Proc. of IBERAMIA, Puebla, Mexico ,
partial truth, and approximation, and whose role model
pp. 953-962.
is the human mind. Its principal constituents are Fuzzy
Logic (FL), Neural Computing (NC), Evolutionary
Computation (EC) Machine Learning (ML) and Proba-
bilistic Reasoning (PR).
872
873
Alberto Jaspe
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
IA Algorithm Acceleration Using GPUs
GPUs architectures are quite variable due to that it operates on the whole stream, instead individual
their fast evolution and the incorporation of new elements. The typical use of a kernel is the evaluation
features. of a function over each element from an input stream,
calling this a map operation. Other operations of a
Therefore it is not easy to port an algorithm devel- kernel are expansions, reductions, filters, etc. (Buck,
oped for CPUs to run in a GPU. 2004; Horn, 2005; Owens, 2007). The kernel generated
outputs are always based on their input streams, what
Overview of the Graphics Pipeline means that inside the kernel, the calculations made on
an element never depends of the other ones. In stream
Nowadays GPUs make their computations following a programming model, applications are built connecting
common structure called Graphics Pipeline. The Graph- multiple kernels. An application can be represented as
ics Pipeline (Akenine-Mller, 2002) is composed by a a dependency graph where each graph node is a kernel
set of stages that are executed sequentially inside the and each edge represents a data stream between kernels
GPU, allowing the computing of graphics algorithms. (Owens, 2005b; Lefohn, 2005).
Recent hardware is made up of four main elements. The behavior of graphic pipeline is similar to the
First, the vertex processors, that receive vertex arrays stream programming model. Data flows through each
from CPU and make the necessary transformations stage, where the output feeds the next one. Stream
from their positions in space to the final position in the elements (vertex or fragment arrays) are processed
screen. Second, the primitive assembly build graphics independently by kernels (vertex or fragment programs)
primitives (for instance, triangles) using information and their output can be received again by another
about connectivity between different vertex. Third, in kernels.
the rasterizer, those graphical primitives are discretized The stream programming model allows an efficient
and turned into fragments. A fragment represents a computation, because kernels operate on independent
potential pixel and contains the necessary information elements from a set of input streams and can be pro-
(color, depth, etc.) to generate the final color of a pixel. cessed using hardware like GPU, that process vertex
Finally, in the fragment processors, fragments become or fragments streams in parallel. This allows making
pixels to which final color is written in a target buffer, parallel computing without the complexity of traditional
that can be the screen buffer or a texture. parallel programming models.
In the present, GPUs have multiple vertex and frag-
ment processors that compute operations in parallel. Computational Resources on GPU
Both are programmable using little pieces of code
called vertex and fragment programs, respectively. In order to implement any kind of algorithm on GPU,
In the last years different high-level programming there are different computational resources (Harris,
languages have released like Cg/HLSL (Mark, 2003; 2005; Owens, 2007). By one side, current GPUs have
HLSL Shaders) or GLSL (OpenGL Shading Language two different parallel programmable processors: vertex
Information Site), that make easier the programming and fragment processors. Vertex processors compute
of those processors. vertex streams (points with associated properties like
position, color, normal, etc.). A vertex processor applies
The GPU Programming Model a vertex program to transform each input vertex to its
position on the screen. Fragment processors compute
There is a big difference between programming CPUs fragment streams. They apply a fragment program to
and GPUs due mainly to their different programming each fragment to calculate the final color of the pixel. In
models. GPUs are based on the stream programming addition of using the attributes of each fragment, those
model (Owens, 2005a; Luebke, 2006; Owens, 2007), processors can access to other data streams like textures
where all data are represented by a stream that can when they are generating each pixel. Textures can be
be defined as a sorted set of data of the same type. A seen as an interface to access to read-only memory.
kernel operates on full streams, and takes input data Another available resource in GPU is the rasterizer.
from one or more streams to produce one or more It generates fragments using triangles built in from
output streams. The main characteristic of a kernel is vertex and connectivity information. The rasterizer
874
IA Algorithm Acceleration Using GPUs
allows generating an output set of data from a smaller programmable graphics system, a SiliconGraphics
input one, because it interpolates the properties of each Infinite Reality workstation. I
vertex that belongs to a triangle (like color, texture Later, with programmable hardware, Oh (2004)
coordinates, etc.) for each generated fragment. used the GPU for accelerating the process of obtaining
One of the essential features of GPUs is the render- the output of a multilayer perceptron ANN. Developed
to-texture one. This allows storing the pixels generated system was applied to pattern recognition obtaining 20x
by the fragments processor in a texture, instead of a lower computing time than CPU implementation..
screen buffer. This is at the moment the only mechanism Considering another kind of ANNs, Zhongwen
to obtain directly output data from GPU computing. (2005) used GPGPU to reduce computing time in
Render-to-texture cannot be thought as an interface to training Self-Organizing Maps (SOMs). The bigger
read-write memory, due to the fact that fragment pro- the SOM, the higher was the reduction. Whereas
cessor can read data from a texture in multiple times, using 128x128 neurons maps computing time was
but it can write there just one time, at the end of each similar between CPU and GPU, 512x512 neuron
fragment processing. maps involved a training process 4x faster using GPU
implementation.
Bernhard (2005) used GPU to simulate Spiking
ARTIFICIAL INTELLIGENCE Neurons model. This ANN model both requires high
ALGORITHmS ON GPU intensive calculations and has a parallel nature, so fits
very well on GPGPU computation. Authors made differ-
Using the stream programming model as well as ent implementations depending on the neural network
resources provided by graphics hardware, Artificial application. In the first case, an image segmentation
Intelligence algorithms can be parallelized and therefore algorithm was implemented using a locally-excitatory
computing-accelerated. The parallel and high-intensive globally-inhibitory Spiking Neural Network (SNN).
computing nature of this kind of algorithms makes them In this experiment, authors obtained up to 10x faster
good candidates for being implemented on the GPU. results. In the second case, SNNs were used to image
Consider the evolution process of genetic algo- segmentation using an algorithm based on histogram
rithms, where a fitness value needs to be computed for clustering where the ANN minimized the objective
each individual. Population could be considered as a function. Here the speed was improved up to 10 times
data stream and fitness function as a kernel to process also.
this stream. On GPU, for instance, the data stream Seoane (2007) showed multilayer perceptron
must be represented as a texture, whereas the kernel ANN training time acceleration using GA. GPGPU
must be implemented on a fragment program. Each techniques for ANN computing allowed accelerating
individuals fitness would be obtained in an output it up to 11 times.
stream, represented also by a texture, and obtained by The company Evolved Machines (Evolved Machines
the use of render-to-texture feature. Website) uses the powerful performance of GPUs to
Recently some works have been realized mainly simulating of neural computation, obtaining results up
in paralleling ANN and EC algorithms, described in to 100x faster than CPU computation.
following sections.
Evolutionary Computation
Artificial Neural Networks
In EC related works, Yu (2005) describes how parallel
Bohn (1998) used GPGPU to reduce training time genetic algorithms can be mapped in low-cost graphics
in Kohonens feature maps. In this case, the bigger hardware. In their approach, chromosomes and fitness
the map, the higher was the time reduction using the values are stored in textures. Fitness calculation and
GPU. On 128x128 sized maps, time was similar us- genetic operators were implemented using fragment
ing CPU and GPU, but on 512x512 sized maps, GPU programs on GPU. Different population sizes applied to
was almost 3.5 times faster than CPU, increasing to the Colville minimization problem were used for testing,
5.3 faster rates on 1024x1024 maps. This was one of resulting in better time reductions according to bigger
the first implementations of GPGPU, made on a non- populations. In the case of a 128x128 sized population,
875
IA Algorithm Acceleration Using GPUs
GPU genetic operators computing was 11.8 times faster allow using for general-purpose high-intensive com-
than CPU, whereas in a 512x512 sized population, that puting algorithms. Based on this idea, existent imple-
rate incremented to 20.1. In fitness function computing, mentations of IA models like ANN or EC on GPUs
rates were 7.9 and 17.1 respectively. have been described, with a considerable computing
In another work, Wong (2006) implemented Hybrid time reduction.
Genetic Algorithms on GPU incorporating the Cauchy General-purpose computing on GPU and its use to
mutation operator. All algorithm steps were imple- accelerating IA algorithms provides great advantages,
mented in graphics hardware, except random number being an essential contribution in application where
generation. In this approach, a pseudo-deterministic computing time is a decisive factor.
method was proposed for selecting process, allowing
significant running-time reductions. GPU implementa-
tion was 3x faster than CPUs one. REFERENCES
Fok (2007) showed how to implement evolutionary
algorithms on GPU. Since the crossover operators of Akenine-Mller, T. & Haines, E. (2002). Real-Time
GA requires more complex calculations than mutation Rendering. Second Edition. A.K. Peters.
ones, authors studied a GPU implementation of Evolu-
tionary Programming, using only mutation operators. Bernhard, F. & Keriven, R. (2006). Spiking Neurons
Tests have been proved with the Cauchy distribution to on GPUs. International Conference on Computational
5 different optimization problems, obtaining between Science ICCS 2006. 236-243.
1.25 and 5 times faster results. Bohn, C.-A. (1998). Kohonen Feature Mapping through
Graphics Hardware. In 3rd International Conference on
Computational Intelligence and Neurosciences.
FUTURE TRENDS
Buck, I. & Purcell, T. (2004). A toolkit for computa-
Nowadays GPUs are very powerful and they are evolv- tion on GPUs. In GPU Gems. R. Fernando, editor.
ing quite fast. By one side, there are more and more Addison-Wesley, 621-636.
programmable elements in GPUs; by the other one, Evolved Machines Website. (n.d.). Retrieved June 4,
programming languages are becoming full-featured. 2007 from http://www.evolvedmachines.com/
There are more and more implementations of different
kinds of general-purpose algorithms that take advantage Fok, K.L., Wong, T.T. & Wong, M.L. (2007). Evo-
of these features. lutionary Computing on Consumer-Level Graphics
In Artificial Intelligence field the number of devel- Hardware. IEEE Intelligent Systems. 22(2), 69-78.
opments is rather low, in spite of the great amount of General-Purpose Computation Using Graphics Hard-
current algorithms and their high computing require- ware Website. (n.d.). Retrieved June 4, 2007 from
ments. It seems very interesting using GPUs to extend http://www.gpu.org
existent implementations. For instance, some examples
of speeding ANNs simulations up have been shown, Harris, M. (2005). Mapping computational concepts
however there is no works in accelerating training times. to GPUs. In GPU Gems 2, M. Pharr, editor. Addison-
Likewise same ideas can be applied to implement other Wesley, 493-508.
kinds of ANNs architectures or IA techniques, like in HLSL Shaders. (n.d.). Retrieved June 4, 2007 from
genetic programming field, where there is neither any http://msdn.microsoft.com/archive/default.asp?url=/
development. archive/en-us/directx9_c_Dec_2005/HLSL_Shaders.
asp
876
IA Algorithm Acceleration Using GPUs
Lefohn, A., Kniss, J., Owens J. (2005). Implementing Zhongwen, L., Hongzhi, L., Zhengping, Y. & Xincai, W.
efficient parallel data structures on GPUs. In GPU (2005). Self-Organizing Maps Computing on Graphic I
Gems. 2, M. Pharr, editor. Addison-Wesley, 521-545. Process Unit. ESANN2005 proceedings - European
Symposium on Artificial Neural Networks. 557-562.
Luebke, D. (2006). General-Purpose Computation on
Graphics Hardware. In Supercomputing 2006 Tutorial
on GPGPU.
KEy TERmS
Mark, W.R., Glanville, R.S., Akeley, K. & Kilgard,
M.J. (2003). Cg: a system for programming graphics
Fragment: Potential pixel containing all the nec-
hardware in a C-like language. ACM Trans. Graph.
essary information (color, depth, etc.) to generate the
ACM Press, 22(3), 896-907.
final fragment color.
Oh, K.-S., Jung, K. (2004). GPU implementation of
Fragment Processor: Graphics system element
neural networks. Pattern Recognition. 37(6), 1311-
that receives as input a set of fragments and processes
1314.
it to obtain pixel, writing them in a target buffer. Pres-
OpenGL Shading Language Information Site. (n.d.). ent GPUs have multiple fragment processors working
Retrieved June 4, 2007, from http://developer.3dlabs. in parallel and can be programmed using fragment
com/openGL2/index.htm programs.
Owens, J. (2005a). Streaming architectures and tech- Graphics Pipeline: Three dimensional graphics
nology trends. In GPU Gems 2, M. Pharr, editor. Ad- oriented architecture, composed by several stages that
dison-Wesley, 457-470. run sequentially.
Owens, J. (2005b). The GPGPU Progamming Model. In Graphics Processing Unit (GPU): Electronic de-
General Purpose Computation on Graphics Hardware. vice designed for graphics rendering in computers. Its
IEEE Visualization 2005 Tutorial. architecture is specialized in graphics calculations.
Owens, L.D., Luebke, D., Govindaraju, N., Harris, General-Purpose Computation on GPUs (GP-
M., Kruger, J., Lefohn, A.E. & Purcell, T.J. (2007). A GPU): Trend in computing devices dedicated to
Survey of General-Purpose Computation on Graphics implement general-purpose algorithms using graph-
Hardware. COMPUTER GRAPHICS forum. 26(1), ics devices, called GPUs. At the moment, the high
80-113. programmability and performance of GPUs allow
developers run classical algorithms in these devices to
Seoane, A., Rabual, J. R. & Pazos, A. (2007). Acel-
speed non-graphics applications up, especially those
eracin del Entrenamiento de Redes de Neuronas
algorithms with parallel nature.
Artificiales mediante Algoritmos Genticos utilizando
la GPU. Actas del V Congreso Espaol sobre Meta- Pixel: Picture Element abbreviation, used for refer-
heursticas, Algortmos Evolutivos y Bioinspirados ring graphic image points.
- MAEB 2007. 69-76.
Rasterizer: Graphics Pipeline element, which from
Wong, M.L. & Wong, T.T. (2006). Parallel Hybrid graphic primitives provides appropriate fragments to
Genetic Algorithms on Consumer-Level Graphics a target buffer.
Hardware. IEEE Congress on Evolutionary Computa-
Render-to-Texture: GPU feature that allows stock-
tion. 2973-2980.
ing the fragment processor output on a texture instead
Yu, Q., Chen, C. & Pan, Z. (2005). Parallel Genetic on a screen buffer.
Algorithms on Programmable Graphics Hardware.
Stream Programming Model: This parallel pro-
Lecture Notes in Computer Science. 1051-1059.
gramming model is based on defining, by one side,
sets of input and output data, called streams, and by
the other side, intensive computing operations, called
877
IA Algorithm Acceleration Using GPUs
878
879
BACKGROUND
ImPROVING THE NAVE BAyES
The Nave Bayes classifier, also called simple Bayesian CLASSIFIER
classifier, is essentially a simple BN. Since no structure
learning is required, it is very easy to construct and This section introduces the two groups of approaches that
implement a Nave Bayes classifier. Despite its simplic- have been used to improve the Nave Bayes classifier.
ity, the Nave Bayes classifier is competitive with other In the first group, the strong independence assumption
more advanced and sophisticated classifiers such as is relaxed by restricted structure learning. The second
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Improving the Nave Bayes Classifier
group helps to select some major (and approximately in better performance than TAN. The performance of
independent) attributes from the original attributes or SP-TAN is also competitive with the Lazy Bayes Rule
transform them into some new attributes, which can (LBR), in which the lazy learning techniques are used
then be used by the Nave Bayes classifier. in the Nave Byes classifier (Zheng, & Webb, 2000;
Wang & Webb, 2002)
Relaxing the Independence Assumption Although LBR and SP-TAN have outstanding per-
formance on the testing data, the main disadvantage of
Relaxing the independence assumption means that the two methods is that they have high computational
the dependence will be considered in constructing complexity. Aggregating One-Dependence Estimators
the network. To consider the dependencies between (AODE), developed by Webb et al. (Webb, Boughton
attributes, Kononenko (Kononenko, 1991) proposed & Wang, 2005), can avoid model selection which may
the Semi-Nave Bayes classifier (SNB), which joined reduce computational complexity and lead to lower
the attributes based on the theorem of Chebyshev. variance. These advantages have been demonstrated by
The medical diagnostic data were used to compare some empirical experiment results. It is also empirically
the performance of the SNB and the NB. It was found found that the average prediction accuracy of AODE is
that the results of two domains are identical but in the comparative to that of LBR and SP-TAN but with lower
other two domains SNB slightly improves the perform- variance. Therefore, AODE might be more suitable for
ance. Nevertheless, this method may cause overfitting small datasets due to its lower variance.
problems. Another limitation of the SNB is that the
number of parameters will grow exponentially with Using Pre-Processing Procedures
the increase of the number of attributes that need to be
joined. In addition, the exhaustive searching technique In general, the pre-processing procedures for the
of joining attributes may affect the computational time. Nave Bayes classifier include feature selection and
Pazzani (Pazzani, 1995) used Forward Sequential Se- transforming the original attributes. The Selective
lection and Joining (FSSJ) and Backward Sequential Bayes classifier (SBC) (Langley & Sage, 1994) deals
Elimination and Joining (BSEJ) to search dependencies with correlated features by selecting only some at-
and join the attributes. They tested the two methods tributes into the final classifier. They used a greedy
on UCI data and found that BSEJ provided the most method to search the space and forward selection to
improvement. select the attributes. In their study, six UCI datasets are
Friedman et al. (Friedman, Geiger & Goldszmidt, used to compare the performance of the Nave Bayes
1997) found that Kononenkos and Pazzanis methods classifier, SBC and C4.5. It is found that selecting the
can be represented as an augmented Nave Bayes attributes can improve the performance of the Nave
network, which includes some subgraphs. They Bayes classifier when there are redundant attributes.
restricted the network to be Tree Augmented Nave In addition, SBC is found to be competitive with C4.5
Bayes (TAN) that spans over all attributes and can in terms of the datasets by which C4.5 outperforms the
be learned by tree-structure learning algorithms. The Nave Bayes classifier. The study by Ratanamahatana
results based on problems from the UCI repository & Gunopulos (Ratanamahatana & Gunopulos, 2002)
showed that the TAN classifier outperforms the Nave applied C4.5 to select the attributes for the Nave Bayes
Bayes classifier. It is also competitive with C4.5 while classifier. Interestingly, experimental results showed
maintains the computational simplicity. However, that the new attributes obtained by C4.5 can make the
the use of the TAN classifier is only limited to the Nave Bayes classifier outperform C4.5 with respect
problems with discrete attributes. For the problems to a number of datasets.
with continuous attributes, these attributes must be Transforming the attributes is another useful pre-
prediscretized. To address this problem, Friedman et processing procedure for the Nave Bayes classifier.
al. (Friedman, Goldszmidt & Lee, 1998) extended TAN Gupta (Gupta, 2004) found that Principal Component
to deal with continuous attributes via parametric and Analysis (PCA) was helpful to improve the classifica-
semiparametric conditional probabilities. Keogh & tion accuracy and reduce the computational complexity.
Pazzani (Keogh & Pazzani, 1999) proposed a variant Prasad (Prasad, 2004) applied Independent Compo-
of the TAN classifier, i.e. SP-TAN, which could result nent Analysis (ICA) to all the training data and found
880
Improving the Nave Bayes Classifier
881
Improving the Nave Bayes Classifier
tational complexity to a certain extent. It would be useful Langley, P. & Sage, S. (1994). Induction of Selec-
to model correlations among appropriate attributes that tive Bayesian Classifiers. Proceedings of the Tenth
can be captured by simple restricted structure but with Conference on Unvertainty in Artificial Intelligence.
good performance. 399-406.
Pazzani, M.J. (1995). Searching for Dependencies
in Bayesian Classifiers. Proceedings of the fifth In-
REFERENCES
ternational Workshop on Artificial Intelligence and
Statistics. 424-429.
Bressan, M., Vitria, J. (2002). Improving Nave Bayes
Using Class-conditional ICA. Advances in Artificial Prasad, M.N., Sowmya, A., Koch, I. (2004). Feature
Intelligence - IBERAMIA 2002. 1-10. Subset Selection using ICA for Classifying Emphysema
in HRCT Images. Proceedings of the 17th International
Cheng, J., & Greiner, R. (1999). Comparing Bayes-
Conference on Pattern Recognition. 515-518.
ian Network Classifiers. Proceedings of the Fifteenth
Conference on Uncertainty in Artificial Intelligence. Ratanamahatana, C.A. and Gunopulos, D., Feature se-
101-107. lection for the naive Bayesian classifier using decision
trees. Applied Artificial Intelligence. (17), 475-487.
Fan, L., Poh, K.L. (2007). A Comparative Study of
PCA, ICA and Class-conditional ICA for Nave Bayes Vitria, J., Bressan, M., & Radeva, P. (2007). Bayesian
Classifier. Computational and Ambient Intelligence. Classification of Cork Stoppers Using Class-conditional
Lecture Notes in Computer Science. (4507), 16-22. Independent Component Analysis. IEEE Transactions
on Systems, Man and Cybernetics. (37), 32-38.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997).
Bayesian Network Classifiers. Machine Learning. Wang, Z., & Webb, G.I. (2002). Comparison of Lazy
(29) 131-163. Bayesian Rule and Tree-Augmented Bayesian Learning.
Proceedings of the IEEE International Conference on
Friedman, N., Goldszmidt, M., & Lee, T.J. (1998). Bye-
Data Mining. 775-778.
sian Network Classification with Continuous Attributes:
Getting the best of Both Discretization and Parametric Webb, G., Boughton, J.R., & Wang, Z. (2005). Not so
Fitting. Proceedings of the Fifteenth International Nave Bayes: Aggregating One-Dependence Estima-
Conference on Machine Learning. 179-187. tiors. Machine Learning. (58), 5-24.
Gupta, G. K. (2004). Principal Component Analysis and Zheng, Z., & Webb, G. (2000). Lazy Learning of Bayes-
Bayesian Classifier Based on Character Recognition. ian Rules. Machine Learning. (41), 53-87.
Bayesian Inference and Maximum Entropy Methods in
Science and Engineering. AIP Conference Proceedings.
(707), 465-479.
KEy TERmS
Keogh, E., & Pazzani, M.J. (1999). Learning Aug-
mented Bayesian Classifiers: A Comparison of Dis- Decision Trees: Decision tree is a classifier in the
tribution-based and Classification-based Approaches. form of a tree structure, where each node is either a
Proceedings of the International Workshop on Artificial leaf node or a decision node. A decision tree can be
Interlligence and Statistics. 225-230. used to classify an instance by starting at the root of
the tree and moving through it until a leaf node, which
Kononenko, I. (1991). Semi-Nave Bayesian Classifier.
provides the classification of the instance. A well known
Proceedings of the sixth Eurjopean Working Session
and frequently used algorithm of decision tree over the
on Learning. 206-219.
years is C4.5.
Langley, P., Iba, W., & Thompson, K. (1992). An
Forward Selection and Backward Elimination:
Analysis of Bayesian Classifiers. Proceedings of the
A forward selection method would start with the empty
Tenth National Conference on Artificial Intelligence.
set and successively add attributes, while a backward
AAAI Press, San Jose, CA. 223-228.
882
Improving the Nave Bayes Classifier
elimination process would begin with the full set and simple Bayesian Network (BN). There exist two under-
remove unwanted ones. lying assumptions in the Nave Bayes classifier. First, I
all attributes are independent with each other given the
Greedy Search: At each point in the search, the
classification variable. Second, all attributes are directly
algorithm considers all local changes to the current
dependent on the classification variable. Nave Bayes
set of attributes, makes its best selection, and never
classifier computes the posterior of classification vari-
reconsiders this choice.
able given a set of attributes by using the Bayes rule
Independent Component Analysis (ICA): Inde- under the conditional independence assumption.
pendent component analysis (ICA) is a newly developed
Principal Component Analysis (PCA): PCA is
technique for finding hidden factors or components to
a popular tool for multivariate data analysis, feature
give a new representation of multivariate data. ICA
extraction and data compression. Given a set of multi-
could be thought of as a generalization of PCA. PCA
variate measurements, the purpose of PCA is to find a
tries to find uncorrelated variables to represent the
set of variables with less redundancy. The redundancy
original multivariate data, whereas ICA attempts to
is measured by correlations between data elements.
obtain statistically independent variables to represent
the original multivariate data. UCI Repository: This is a repository of databases,
domain theories and data generator that are used by the
Nave Bayes Classifier: The Nave Bayes classifier,
machine learning community for the empirical analysis
also called simple Bayesian classifier, is essentially a
of machine learning algorithms.
883
884
In this chapter we discuss how fuzzy logic extends the the model and discover previously unknown patterns.
envelop of the main data mining tasks: clustering, clas- The model is used for understanding phenomena from
sification, regression and association rules. We begin the data, analysis and prediction. The accessibility and
by presenting a formulation of the data mining using abundance of data today makes knowledge discovery
fuzzy logic attributes. Then, for each task, we provide a and data mining a matter of considerable importance
survey of the main algorithms and a detailed description and necessity.
(i.e. pseudo-code) of the most popular algorithms. We begin by presenting some of the basic concepts
of fuzzy logic. The main focus, however, is on those
concepts used in the induction process when dealing
INTRODUCTION with data mining. Since fuzzy set theory and fuzzy
logic are much broader than the narrow perspective
There are two main types of uncertainty in supervised presented here, the interested reader is encouraged to
learning: statistical and cognitive. Statistical uncer- read Zimmermann (2005).
tainty deals with the random behavior of nature and In classical set theory, a certain element either be-
all existing data mining techniques can handle the longs or does not belong to a set. Fuzzy set theory, on
uncertainty that arises (or is assumed to arise) in the the other hand, permits the gradual assessment of the
natural world from statistical variations or randomness. membership of elements in relation to a set.
Cognitive uncertainty, on the other hand, deals with Let U be a universe of discourse, representing a
human cognition. collection of objects denoted generically by u. A fuzzy
Fuzzy set theory, first introduced by Zadeh in 1965, set A in a universe of discourse U is characterized by
deals with cognitive uncertainty and seeks to overcome a membership function A which takes values in the
many of the problems found in classical set theory. interval [0, 1]. Where A(u) = 0 means that u is defi-
For example, a major problem faced by researchers of nitely not a member of A and A(u) = 1 means that u
control theory is that a small change in input results is definitely a member of A.
in a major change in output. This throws the whole The above definition can be illustrated on the
control system into an unstable state. In addition there vague set of Young. In this case the set U is the set
was also the problem that the representation of subjec- of people. To each person in U, we define the degree
tive knowledge was artificial and inaccurate. Fuzzy of membership to the fuzzy set Young. The member-
set theory is an attempt to confront these difficulties ship function answers the question to what degree is
and in this chapter we show how it can be used in data person u young?. The easiest way to do this is with a
mining tasks. membership function based on the persons age. For
example Figure 1 presents the following membership
function:
BACKGROUND
Data mining is a term coined to describe the process 0 age(u ) > 32
of sifting through large and complex databases for MYoung (u ) = 1 age(u ) < 16
identifying valid, novel, useful, and understandable 32 age(u ) otherwise
patterns and relationships. Data mining involves the
16 (1)
inferring of algorithms that explore the data, develop
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Incorporating Fuzzy Logic in Data Mining Tasks
Given this definition, John, who is 18 years old, has membership in a set (such as young) and the sets
degree of youth of 0.875. Philip, 20 years old, has degree complement (not young) must still sum to 1.
of youth of 0.75. Unlike probability theory, degrees The main difference between classical set theory
of membership do not have to add up to 1 across all and fuzzy set theory is that the latter admits to partial
objects and therefore either many or few objects in the set membership. A classical or crisp set, then, is a fuzzy
set may have high membership. However, an objects set that restricts its membership values to {0,1}, the
885
Incorporating Fuzzy Logic in Data Mining Tasks
endpoints of the unit interval. Membership functions Typically, before one can incorporate fuzzy concepts
can be used to represent a crisp set. For example, Figure into a data mining application, an expert is required to
2 presents a crisp membership function defined as: provide the fuzzy sets for the quantitative attributes,
along with their corresponding membership functions
0 age(u ) > 22 (Mitra and Pal, 2005). Alternatively the appropriate
MCrispYoung (u ) = fuzzy sets are determined using fuzzy clustering.
1 age(u ) 22 (2)
886
Incorporating Fuzzy Logic in Data Mining Tasks
framework for building a fuzzy tree including several One can use a simple algorithm to generate a set
inference procedures based on conflict resolution in of membership functions on numerical data. Assume I
rule-based systems and efficient approximate reasoning attribute ai has numerical value x from the domain X.
methods was presented in (Janikow, 1998). We can cluster X to k linguistic terms vi,j, j = 1,...,k. The
In this section we will focus on the algorithm pro- size of k is manually predefined. Figure 3 illustrates
posed in Yuan and Shaw (1995). This algorithm can the creation of four groups defined on the age attribute:
handle the classification problems with both fuzzy young, early adulthood, middle-aged and old
attributes and fuzzy classes represented in linguistic age. Note that the first set (young) and the last
fuzzy terms. It can also handle other situations in a set (old age) have a trapezoidal form which can be
uniform way where numerical values can be fuzzified uniquely described by the four corners. For example,
to fuzzy terms and crisp categories can be treated as the young set could be represented as (0,0,16,32). In
a special case of fuzzy terms with zero fuzziness. The between, all other sets (early adulthood and middle-
algorithm uses classification ambiguity as fuzzy en- aged) have a triangular form which can be uniquely
tropy. The classification ambiguity directly measures described by the three corners. For example, the set
the quality of classification rules at the decision node. early adulthood is represented as (16,32,48).
It can be calculated under fuzzy partitioning and mul- The induction algorithm of fuzzy decision tree
tiple fuzzy classes. measures the classification ambiguity associated with
When a certain attribute is numerical, it needs to each attribute and split the data using the attribute with
be fuzzified into linguistic terms before it can be used the smallest classification ambiguity. The classifica-
in the algorithm (Hong et al., 1999). The fuzzification tion ambiguity of attribute ai with linguistic terms vi,j,
process can be performed manually by experts or can j = 1,...,k on fuzzy evidence S, denoted as G(ai | S),
be derived automatically using some sort of cluster- is the weighted average of classification ambiguity
ing algorithm. Clustering groups the data instances calculated as:
into subsets in such a manner that similar instances
are grouped together; different instances belong to k
G (ai S ) = w(vi , j S ) G (vi , j S )
different groups. The instances are thereby organized j =`1 (3)
into an efficient representation that characterizes the
population being sampled.
887
Incorporating Fuzzy Logic in Data Mining Tasks
where w(vi,j | S) is the weight which represents the than . This parameter is used to filter out insignificant
relative size of vi,j and is defined as: branches.
After partitioning the data using the attribute with the
M (vi , j S ) smallest classification ambiguity, the algorithm looks
w(vi , j S ) = for nonempty branches. For each nonempty branch,
M (v i,k S)
the algorithm calculates the truth level of classifying
k (4)
all instances within the branch into each class. The
The classification ambiguity of vi,j is defined as truth level is calculated using the fuzzy subsethood
measure S(A,B).
If the truth level of one of the classes is above a
G (vi , j S ) = g p C vi , j
, predefined threshold then no additional partitioning
is needed and the node become a leaf in which all
which is measured based on the possibility distribu- instance will be labeled to the class with the highest
tion vector truth level. Otherwise the procedure continues in a
recursive manner. Note that small values of will lead
to smaller trees with the risk of underfitting. A higher
p C vi , j = p c1 vi , j ,..., p c k vi , j .
may lead to a larger tree with higher classification
accuracy. However, at a certain point, higher values
Given vi,j, the possibility of classifying an object to may lead to overfitting.
class cl can be defined as: In a regular decision tree, only one path (rule) can
be applied for every instance. In a fuzzy decision tree,
S (vi , j , cl ) several paths (rules) can be applied for one instance. In
p cl vi , j = order to classify an unlabeled instance, the following
max S (v i, j , ck )
steps should be performed:
k (5)
where S(A,B) is the fuzzy subsethood that measures the Step 1: Calculate the membership of the instance
degree to which A is a subset of B. The subsethood for the condition part of each path (rule). This
can be used to measure the truth level of the rule of membership will be associated with the label
classification rules. For example given a classification (class) of the path.
rule such as IF Age is Young AND Mood is Happy Step 2: For each class calculate the maximum
THEN Comedy we have to calculate S(HotSunny, membership obtained from all applied rules.
Swimming) in order to measure the truth level of the Step 3: An instance may be classified into sev-
classification rule. eral classes with different degrees based on the
membership calculated in Step 2.
The function g ( p ) is the possibilistic measure of
ambiguity or nonspecificity and is defined as:
Fuzzy Clustering
p
g ( p ) = pi pi+1 ln(i ) The goal of clustering is descriptive, that of classification
i =1 (6) is predictive. Since the goal of clustering is to discover
a new set of categories, the new groups are of inter-
where est in themselves, and their assessment is intrinsic. In
classification tasks, however, an important part of the
p = p1 ,, p p assessment is extrinsic, since the groups must reflect
some reference set of classes.
Clustering groups data instances into subsets in such
is the permutation of the possibility distribution p
sorted such that pi pi +1 . All the above calculations a manner that similar instances are grouped together,
are carried out at a predefined significant level . An while different instances belong to different groups.
instance will take into consideration of a certain branch The instances are thereby organized into an efficient
vi,j only if its corresponding membership is greater representation that characterizes the population being
888
Incorporating Fuzzy Logic in Data Mining Tasks
889
Incorporating Fuzzy Logic in Data Mining Tasks
1. Incorporation of domain knowledge for improving Maher, P. E., & Clair, D. C. (1993). Uncertain reasoning
the fuzzy modeling. in an ID3 machine learning framework, in Proc. 2nd
2. Developing methods for presenting fuzzy data IEEE Int. Conf. Fuzzy Systems, pp. 712.
model to the end-users.
Mitra, S., & Hayashi, Y. (2000). Neuro-fuzzy Rule
3. Efficient integration of fuzzy logic in data mining
Generation: Survey in Soft Computing Framework.
tools.
IEEE Trans. Neural Networks, 11(3):748-768.
4. A hybridization of fuzzy sets with data mining
techniques. Mitra, S., & Pal, S. K. (2005), Fuzzy sets in pattern
recognition and machine intelligence, Fuzzy Sets and
Systems 156(1):381386
CONCLUSIONS
Nasraoui, O., & Krishnapuram, R. (1997). A Genetic
Algorithm for Robust Clustering Based on a Fuzzy
This chapter discussed how fuzzy logic can be used
Least Median of Squares Criterion, Proceedings of
to solve several different data mining tasks, namely
NAFIPS, Syracuse NY, pp. 217-221.
classification clustering, and discovery of association
rules. The discussion focused mainly one representative Nauck, D. (1997). Neuro-Fuzzy Systems: Review and
algorithm for each of these tasks. Prospects Paper appears in Proc. Fifth European Con-
There are at least two motivations for using fuzzy gress on Intelligent Techniques and Soft Computing
logic in data mining, broadly speaking. First, as men- (EUFIT97), Aachen, pp. 1044-1053
tioned earlier, fuzzy logic can produce more abstract
and flexible patterns, since many quantitative features Olaru, C., & Wehenkel L. (2003). A complete fuzzy
are involved in data mining tasks. Second, the crisp decision tree technique, Fuzzy Sets and Systems,
usage of metrics is better replaced by fuzzy sets that 138(2):221-254, 2003.
can reflect, in a more natural manner, the degree of Peng, Y. (2004). Intelligent condition monitoring us-
belongingness/membership to a class or a cluster. ing fuzzy inductive learning, Journal of Intelligent
Manufacturing, 15 (3): 373-380.
890
Incorporating Fuzzy Logic in Data Mining Tasks
Clustering: The process of grouping data instances fulness and falsehood thus it can deal with imprecise
into subsets in such a manner that similar instances are or ambiguous data. Boolean logic is considered to be I
grouped together into the same cluster, while different a special case of fuzzy logic.
instances belong to different clusters.
Instance: A single object of the world from which
Data Mining: The core of the KDD process, involv- a model will be learned, or on which a model will be
ing the inferring of algorithms that explore the data, used.
develop the model, and discover previously unknown
Knowledge Discovery in Databases (KDD): A
patterns.
nontrivial exploratory process of identifying valid,
Fuzzy Logic: A type of logic that recognizes more novel, useful, and understandable patterns from large
than simple true and false values. With fuzzy logic, and complex data repositories.
propositions can be represented with degrees of truth-
891
892
Independent Subspaces
Lei Xu
Chinese University of Hong Kong, Hong Kong, & Peking University, Beijing, China
INTRODUCTION or
Several unsupervised learning topics have been ex- x = x + e = Ay + e, [y = y(1) , y (2) , y (3) ]T . (1)
tensively studied with wide applications for decades
in the literatures of statistics, signal processing, and Typically, the error e = x x is measured by the
machine learning. The topics are mutually related and square norm, which is minimized when e is orthogonal
certain connections have been discussed partly, but to x . Collectively, the minimization of the average error
2
still in need of a systematical overview. The article e on a set of samples or its expectation E e 2 is featured
provides a unified perspective via a general frame- by those natures given at the bottom of Fig.1(a).
work of independent subspaces, with different topics Generally, the task consists of three ingredients, as
featured by differences in choosing and combining shown in Fig.2. First, how the error e = x x is meas-
three ingredients. Moreover, an overview is made via ured. Different measures define different projections.
2
three streams of studies. One consists of those on the The square norm d = e applies to a homogeneous
widely studied principal component analysis (PCA) and medium between x and x . Other measures are needed
factor analysis (FA), featured by the second order inde- for inhomogeneous mediums. In Fig.1(c), a non-or-
pendence. The second consists of studies on a higher thogonal but still linear projection is considered via
2
order independence featured independent component d = e B = e T e1e with e1 = B T B , as if e is first mapped
analysis (ICA), binary FA, and nonGaussian FA. The to a homogeneous medium by a linear mapping e and
third is called mixture based learning that combines then measured by the square norm. Shown at the bot-
individual jobs to fulfill a complicated task. Extensive tom of Fig.1(c) are the natures of this Min e B . Being
2
literatures make it impossible to provide a complete considerably different from those of Min e , more
2
review. Instead, we aim at sketching a roadmap for assumptions have to be imposed externally.
each stream with attentions on those topics missing The second ingredient is a coordinate system, via
in the existing surveys and textbooks, and limited to either linear vectors in Fig.1(a)&(c) or a set of curves
the authors knowledge. on a nonlinear manifold in Fig.1(b). Moreover, there
is the third ingredient that imposes certain structure
to further constrict how y is distributed within the
A GENERAL FRAmEWORK OF coordinates, e.g., by the nature d).
INDEPENDENT SUBSPACES The differences in choosing and combining the
three ingredients lead to different approaches. We use
A number of unsupervised learning topics are featured the name independent subspaces to denote those
by its handling on a fundamental task. As shown in structures with the components of y being mutually
Fig.1(b), every sample x is projected into x on a mani- independent, and get a general framework for accom-
fold and the error e = x x of using x to represent modating several unsupervised learning topics.
x is minimized collectively on a set of samples. One Subsequently, we summarize them via three
widely studied situation is that a manifold is a subspace streams of studies by considering
represented by linear coordinates, e.g., spanned by three
linear independent basis vectors a1 , a 2 , a 3 as shown in
2
d = e B = e T e1e and two special cases,
Fig.1(a). So, x can be represented by its projection three types of independence structure, and wheth-
y ( j ) on each basis vector, i.e., er there is temporal structure among samples,
varying from one linear coordinate system to
x = j y (1) a j
3
multiple linear coordinate systems at different
locations, as shown in Fig.2.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Independent Subspaces
Figure 1.
I
Figure 2.
893
Independent Subspaces
STUDIES FEATURED By SECOND tive MCA learning for TLS filtering becomes a popular
ORDER INDEPENDENCE topic of signal processing, see (Feng,Bao&Jiao,1998)
and Refs.24,30,58,60 in (Xu,2007a).
We start at considering samples of independently and It was also suggested in (Xu,Krzyzak&Oja,1992)
identically distributed (i.i.d.) by linear coordinates and that an implementation of PCA or MCA is made by
(j) (j)
an independent structure of a Gaussian p(y t | ) , with switching the updating sign in the above eqn.(11a). Ef-
the projection measure varying as illustrated within the forts were subsequently made to examine the existing
first column of the table in Fig.2. We encounter factor PCA rules on whether they remain stable after such
2
analysis (FA) in the general case d = e B = eT BT Be. At a sign switching. These jobs usually need tedious
the special case B = e I , the linear coordinates span mathematical analyses of ODE stability, e.g., Chen &
a principal subspace of data. Further imposing ATA = Amari (2001). An alternative way is turning an opti-
I and requiring the columns of A given by the first m mization of a PCA cost into a stable optimization of an
principal components (PCs), i.e., eigenvectors that induced cost for MCA, e.g., the LMSER cost is turned
correspond the largest eigenvalues of = ( B B ) .
T 1
into one for subspace spanned by multiple MCs (Xu,
It becomes equivalent to PCA. Moreover, at the de- 1994, see Ref.111, Xu2007a). A general method is
generated case e = 0, y = xW de-correlates components further given by eqns(24-26) in (Xu, 2003) and then
of y, e.g., performing a pre-whitening as encountered discussed in (Xu, 2007a).
in signal processing.
We summarize studies on the Roadmap A. The first LMSER Learning and Subspace Tracking
stream originated from 100 years ago. The first adaptive
learning one is Oja rule that finds the 1st-PC (i.e., the A new adaptive PCA rule is derived from the gradient
eigenvector that corresponds the largest eigenvalue E 2 (W ) for a least mean square error reconstruction
of ), without explicitly estimating . Extended (LMSER) (Xu,1991), with the first proof proposed
to find multi-PCs, one way is featured by either an on global convergence of Oja subspace rule--a task
asymmetrical or a sequential implementation of the that was previously regarded as difficult. It was
1st-PC rule, but suffering error-accumulation. Details shown mathematically and experimentally that LM-
are referred to Refs.5,6,67,76,96 in (Xu, 2007a). The SER improves Oja rule by further comparative stud-
other way is finding multi-PCs symmetrically, e.g., Oja ies, e.g, see (Karhunen,Pajunen&Oja,1998) and see
subspace rule. Further studies are summarized into the (Refs14,15,48,54,71,72, Xu2007a). Two years after
following branches: (Xu,1991), this E2(W) is used for signal subspace track-
ing via a recursive least square technique (Yang,1993),
MCA, Dual Subspace, and TLS Fitting then followed by others in the signal processing litera-
ture (Refs.33&55, Xu2007a). Also, PCA and subspace
In (Xu, Krzyzak&Oja, 1991), a dual pattern recognition analysis can be performed by other theories or costs
is suggested by considering both the principal subspace (Xu, 1994a&b). The algebraic and geometric properties
and its complementary subspace, as well as both the were further analyzed on one of them, namely relative
multiple PCs and its complementary counterparts--the uncertainty theory (RUT), by Fiori (2000&04, see
components that correspond the smallest eigenvalues Refs.25,29, Xu2007a). Moreover, the NIC criterion
of (i.e., the row vectors of U in Fig.2). Moreover, for subspace tracking is actually a special case of this
the first adaptive rule is proposed by eqn.(11a) in (Xu, RUT, which can be observed by comparing eqn.(20)
Krzyzak&Oja, 1991) to get the component that corre- in (Miao& Hua,1998 ) with the equation of e at the
sponds the smallest eigenvalue of , under the name end of Sec.III.B in (Xu,1994a).
Minor component analysis (MCA) firstly coined by Xu,
Oja&Suen (1992), and it is also used for implementing a Principal Subspace vs. Multi-PCs
total least square (TLS) curve fitting. Subsequently, this
topic has been brought to the signal processing literature Oja subspace rule does not truly find the multi-PCs
by Gao, Ahmad & Swamy (1992) that was motivated due to a rotation indeterminacy. Interestingly, it is
by a visit of Gao to Xus office where Xu introduced demonstrated experimentally that adding a sigmoid
him the result of Xu,Oja&Suen (1992). Thereafter, adap- function makes LMSER approximate the multi-PCs
894
Independent Subspaces
Figure 3.
I
895
Independent Subspaces
Figure 4.
896
Independent Subspaces
well (Xu,1991). Working at Harvard in the late sum- ponent analysis (ICA), tackled in the following four
mer 1991, Xu got aware of Brockett (1991) and thus branches: I
extended the Brockett flow of n n orthogonal matrices
to that of n n1 orthogonal matrices with n > n1 , from Seeking extremes of the higher order cumulants
which two learning rules for truly the multi-PCs are of y.
obtained through modifying the LMSER rule and Oja Using nonlinear Hebbian learning for removing
subspace rule. The two rules were included as eqns higher order dependences among components of
(13)&(14) in Xu (1993) that was submitted in 1991, y, actually from which ICA studies originate.
which are independent and also different from Oja Optimizing a cost that bases on
(1992). Recently, Tanaka (2005) unifies these rules into
one expression controlled by one parameter, and a m
p( y ) = p( y ( j ) )
comparative study was made to show that eqn(14) in j =1
897
Independent Subspaces
becomes nonGaussian ones, FA is extended to a binary Next, we move to multiple subspaces at different
FA (BFA) if y is binary, and a nonGaussian FA (NFA) locations as shown in Fig.2. Studies are summarized on
if y is real but nonGaussian. Similar to FA perform- Roadmap C, categorized according to one key point, i.e.,
a scheme p ,t that allocates a sample x t to different
2
ing PCA at e = e I , both BFA and NFA become to
perform a P-ICA at e = e2 I . subspaces. This p ,t bases on two issues.
Observing the first box in this column, for et = xt One is a local measure on how the -th subspace is
Ay 0 we need to seek an appropriate nonlinear map suitable for representing x t . The other is a mechanism
y = f(x). It usually has no analytical solution but needs that summarizes the local measures of subspaces to
an expensive computation to approximate. As discussed yield p ,t . One typical mechanism is that emerges
in (Xu, 2003), nonlinear LMSER uses a sigmoid non- in the EM algorithm for the maximum likelihood or
Bayesian learning, where x t is fractionally allocated
( j) ( j)
linearity y t = s ( z t ),), zz ==xWW
x to avoid computing costs
and approximately implements a BFA for a Bernoulli among subspaces proportional to their local measures.
p( y ( j ) ) with a probability p j = N1 t =1 s( z t( j ) ) and a NFA for Another typical mechanism is that x t is nonlinearly
N
p( y ( j ) ) with a pseudo uniform distribution on (, +), located to one or more winners via a competition based
as well as a nonnegative ICA (Plumbley&Oja,2004) on the local measures, e.g,, as in the classic competitive
( j)
when p( y ) is on [0, +). However, further quantita- learning and the rival penalized competitive learning
tive analysis is needed for this approximation. (RPCL).
Without approximation, the EM algorithm is de- Also, a scheme p ,t may come from blending both
veloped for maximum likelihood learning since 1997, types of mechanisms, as that from the BYY harmony
still suffering expensive computing costs. Favorably, learning. Details are referred to (Xu,2007c) and its
further improvements have also been achieved by the two http-sites.
BYY harmony learning. Details are referred to the
rightmost column on Roadmap B.
FUTURE TRENDS
TEmPORAL AND LOCALIZED Another important task is how to determine the number
EXTENSIONS k of subspaces and the dimension m of each subspace.
It is called model selection, usually implemented in
We further consider temporal samples shown at the two phases. First, a set of candidates are considered
bottom of the rightmost column on both Roadmap A by enumerating k and m , with unknown parameters
and Roadmap B, via embedding a temporal structure estimated by the maximum likelihood learning. Second,
in p(y (j) (j) the best among the candidates is selected by one of
t | t . A typical one is using
criteria, such as AIC, CAIC, SIC/BIC/MDL, Cross
Validation, etc. However, this two-phase implemen-
= { (j) }
(j) (j) (j) (j) q(j)
= j ,
t t t t = , tation is computationally very extensive. Moreover,
the performance will degenerate considerably when
e.g., a linear regression the sample size is finite while k and m are not too
small.
q( j )
(j)
t = =1 ( j ) (j)
t ,
One trend is letting model selection to be made
automatically during learning, i.e., on a candidate
with k and m initially being large enough, learn-
to turn a model (e.g., one in the table of Fig.2) into
ing not only determines unknown parameters but also
temporal extensions. Information is carried over time
(j) automatically shrinks k and m to appropriate ones.
in two ways. One is computing t by the regres-
(j) Two such efforts are RPCL and the BYY harmony
sion, with learning on t made through the gradient
learning. Details are referred to (Xu,2007c) and its
with respect to j j by a chain rule. The second is
two http-sites.
computing p(y t | t
(j) (j) (j) (j)
t t and getting the Also, there are open issues on x = Ay + e, e 0,
gradient with respect to j j . Details are referred to Xu with components of y mutually independent in higher
(2000&01a&03). order statistics. Some are listed below:
898
Independent Subspaces
Figure 5.
I
and from a systematic perspective. A general frame- Plumbley, M.D., & Oja, E., (2004), A nonnegative
work of independent subspaces is presented, from PCA algorithm for independent component analysis,
which a number of learning topics are summarized IEEE Transactions Neural Networks 15(1),66-76.
via different features of choosing and combining the
Tanaka, T., (2005), Generalized weighted rules for
three basic ingredients.
principal components tracking, IEEE Transactions
Signal Processing 53(4),1243- 1253.
ACKNOWLEDGmENT Xu, L., (2007a), A unified perspective on advances
of independent subspaces: basic, temporal, and local
The work is supported by Chang Jiang Scholars Program structures, Proc.6th.Intel.Conf.Machine Learning and
by Chinese Ministry of Education for Chang Jiang Cybernetics, Hong Kong, 19-22 Aug.2007, 767-776.
Chair Professorship in Peking University. Xu, L., (2007b), One-bit-matching ICA theorem,
convex-concave programming, and distribution ap-
proximation for combinatorics, Neural Computation
REFERENCES 19,546-569.
Brockett, R.W., (1991), Dynamical systems that sort Xu, L., (2007c), A unified perspective and new re-
lists, diagonalize matrices, and solve linear program- sults on RHT computing, mixture based learning, and
ming problems, Linear Algebra and Its Applications multi-learner based problem solving, Pattern Recogni-
146,79-91. tion 40,2129-2153. Also see http://www.scholarpedia.
org/article/Rival_Penalized_Competitive_Learning
Chen, T., & Amari, S., (2001), Unified stabilization http://www.scholarpedia.org/article/Bayesian_Ying_
approach to principal and minor components extraction Yang_Learning.
algorithms, Neural Networks 14(10),13771387.
Xu, L., (2003), Independent component analysis and
Feng, D.Z., Bao, Z., & Jiao, L.C., (1998), Total least extensions with noise and time: A Bayesian Ying-Yang
mean squares algorithm, IEEE Transactions Signal learning perspective, Neural Information Processing
Processing 46,21222130. Letters and Reviews 1(1),1-52.
Gao, K., Ahmad, M.O., & Swamy, M.N., (1992), Learn- Xu, L., (2001a), BYY harmony learning independent
ing algorithm for total least-squares adaptive signal state space and generalized APT financial analyses,
processing, Electronic Letters 28(4),430432. IEEE Transactions Neural Networks 12,822849.
Hyvarinen, A., Karhunen, J., & Oja, E., (2001), Inde- Xu, L., (2001b), An Overview on Unsupervised
pendent component analysis, John Wiley, NY, 2001. Learning from Data Mining Perspective, Advances
Karhunen, J., Pajunen, P. & Oja , E., (1998), The non- in Self-Organizing Maps, Allison et al, Eds., Springer,
linear PCA criterion in blind source separation: relations 2001,181210.
with other approaches, Neurocomputing 22,5-20. Xu, L., (2000), Temporal BYY learning for state space
Miao, Y.F., & Hua, Y.B., (1998), Fast subspace track- approach, hidden Markov model and blind source
ing and neural network learning by a novel informa- separation, IEEE Transactions Signal Processing
tion criterion, IEEE Transactions Signal Processing 48,21322144.
46,196779. Xu, L., Cheung, C.C., & Amari, S., (1998), Learned
Oja, E., (1992), Principal components, minor compo- parametric mixture based ICA algorithm, Neurocom-
nents, and linear neural networks, Neural Networks puting 22,69-80.
5,927-935. Xu, L., (1994a), Beyond PCA learning: from linear
Oja, E., Ogawa, H., & Wangviwattana, J., (1991), to nonlinear and from global representation to local
Learning in nonlinear constrained Hebbian networks, representation, Proc.ICONIP94, Vol.2,943-949.
Proc.ICANN91, 385-390.
900
Independent Subspaces
Xu, L., (1994b), Theories for unsupervised learning: basis vectors and the corresponding coordinates are
PCA and its nonlinear extensions, Proc.IEEE ICNN94, mutually independent. I
Vol.II,1252-1257.
Least Mean Square Error Reconstruction (LM-
Xu, L., (1993), Least mean square error reconstruc- SER): For an orthogonal projection xt onto a subspace
tion principle for self-organizing neural-nets, Neural spanned by the column vectors of a matrix W, maximiz-
ing N t =1 (w x t ) subject to W W = I is equivalent to
N
Networks 6,627648. 1 T 2 t
Xu, L., Oja, E., & Suen, C.Y., (1992), Modified Heb-
bian learning for curve and surface fitting, Neural using the projection x t = WW T x t as reconstruction of
Networks 5,393-407. xt, which is reached when W spans the same subspace
spanned by the PCs.
Xu, L., & Yuille, A.L., (1992&95), Robust PCA
learning rules based on statistical physics approach, Minor Component (MC): Being orthogo-
Proc.IJCNN92-Baltimore, Vol.I:812-817. An ex- nal complementary to the PC, the solution of
min (w t w =1} J(w) = N1 t =1 (w r x t )2 = w T
N
tended version on IEEE Transactions Neural Networks is the MC,
6,131143. while the m-MCs are referred to the columns of W
that minimizes J(W ) = N t =1 || W x t || = Tr[W W ]
1 N r 2 T
Xu, L., Krzyzak, A., & Oja, E., (1991), A neural net
with a zero mean, its PC is a unit vector w originated
for dual subspace pattern recognition methods, Inter-
at zero with a direction along which the average of the
national Journal Neural Systems 2(3),169-184.
orthogonal projection by every sample is maximized,
i.e., max(w t w =1} J(w) = N1 t =1 (w T x t )2 = w T
N
Yang, B., (1993), Subspace tracking based on the projec- , the
tion approach and the recursive least squares method, solution is the eigenvector of the sample covariance
matrix = N1 t =1 t Tt , corresponding to the largest
N
Proc.IEEE ICASSP93, Vol.IV,145148.
eigen-value. Generally, the m-PCs are referred to
the m orthonormal vectors as the columns of W that
maximizes J(W ) = N1 t =1 || W r x t ||2 = Tr[W T W ] .
N
KEy TERmS
Rival Penalized Competitive Learning: It is a
BYY Harmony Learning: It is a statistical learning development of competitive learning in help of an
theory for a two pathway featured intelligent system appropriate balance between participating and leav-
via two complementary Bayesian representations of ing mechanisms, such that an appropriate number of
the joint distribution on the external observation and agents or learners will be allocated to learn multiple
its inner representation, with both parameter learning structures underlying observations. See http://www.
and model selection determined by a principle that scholarpedia.org/article/Rival_Penalized_Competi-
two Bayesian representations become best harmony. tive_Learning.
See http://www.scholarpedia.org/article/Bayes- Total Least Square (TLS) Fitting: Given samples
ian_Ying_Yang_Learning.
{z t }t =1 , z t = [y t , xtT ]T, instead of finding2 a vector w
N
Factor Analysis: A set of samples {x t }t =1 is de- to minimize the error N t =1 y t w x t , the TLS
N 1 N T
901
902
Jose C. Principe
University of Florida, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Information Theoretic Learning
for supervised and unsupervised, off-line and on-line to the LMS (least-mean-square) algorithm - essential
learning. Renyis definition of entropy for a random for training complex systems with large data sets. I
variable X is Supervised and unsupervised learning is unified under
information-based criteria. Minimizing error entropy in
1 supervised regression or maximizing output entropy for
HA ( X ) = log pA ( x)dx
1A (1) unsupervised learning (factor analysis), minimization
of mutual information between the outputs of a system
This generalizes Shannons linear additivity postu- to achieve independent components or maximizing
late to exponential additivity resulting in a parametric mutual information between the outputs and the desired
family. Dropping the logarithm for optimization simpli- responses to achieve optimal subspace projections in
fies algorithms. Specifically of interest is the quadratic classification is possible. Systematic comparisons of
entropy (=2), because its sample estimator requires ITL with conventional MSE in system identification
only one approximation (the density estimator itself) verified the advantage of the technique for nonlinear
and an analytical expression for the integral can be system identification and blind equalization of com-
obtained for kernel density estimates. Consequently, a munication channels. Relationships with instrumental
sample estimator for quadratic entropy can be derived variables techniques were discovered and led to the
for Gaussian kernels of standard deviation on an iid error-whitening criterion for unbiased linear system
sample set {x1,,xN} as the sum of pairwise sample identification in noisy-input-output data conditions
(particle) interactions (Principe et al, 2000): (Rao et al, 2005).
1 N N
H 2 ( X ) = log( 2 G ( xi x j )) SOmE IDEAS IN AND APPLICATIONS
N i =i j =1 2S
(2)
OF ITL
The pairwise interaction of samples through the
Kernel Machines and Spectral Clustering: KDE has
kernel intriguingly provides a connection to entropy
been motivated by the smoothness properties inherent to
of particles in physics. Particles interacting trough in-
reproducing kernel Hilbert spaces (RKHS). Therefore,
formation forces (as in the N-body problem in physics)
a practical connection between KDE-based ITL, kernel
can employ computational techniques developed for
machines, and spectral machine learning techniques
simulating such large scale systems. The use of entropy
was imminent. This connection was realized and ex-
in training multilayer structures can be studied in the
ploited in recent work that demonstrates an information
backpropagation of information forces framework
theoretic framework for pairwise similarity (spectral)
(Erdogmus et al, 2002). The quadratic entropy estima-
clustering, especially normalized cut techniques (Shi
tor was employed in measuring divergences between
& Malik, 2000). Normalized cut clustering is shown
probability densities and blind source separation
to determine an optimal solution that maximizes the
(Hild et al, 2006), blind deconvolution (Lazaro et al,
CSD between clusters (Jenssen, 2004). This connection
2005), and clustering (Jenssen et al, 2006). Quadratic
immediately allows one to approach kernel machines
expressions with mutual-information-like properties
from a density estimation perspective, thus providing
were introduced based on the Euclidean and Cauchy-
a robust method to select the kernel size, a problem
Schwartz distances (ED/CSD). These are advantageous
still investigated by some researchers in the kernel
with computational simplicity and statistical stability
and spectral techniques literature. In our experience,
in optimization (Principe et al, 2000).
kernel size selection based on suitable criteria aimed
Following the conception of information potential
at obtaining the best fit to the training data - using
and force and principles, the pairwise-interaction
Silvermans regularized squared error fit (Silverman,
estimator is generalized to use arbitrary kernels and
1986) or leave-one-out cross-validation maximum
any order of entropy. The stochastic information
likelihood (Duin, 1976), for instance - has proved to
gradient (SIG) is developed (Erdogmus et al, 2003) to
be convenient, robust, and accurate techniques that
train adaptive systems with a complexity comparable
avoid many of the computational complexity and load
903
Information Theoretic Learning
issues. Local data spread based modifications resulting where K1/f is an equivalent kernel generated from the
in variable-width KDE are also observed to be more original kernel K (Gaussian here). One difficulty with
robust to noise and outliers. kernel machines is their nonparametric nature, the
An illustration of ITL clustering by maximizing the requirement to solve for the eigendecomposition of a
CSD between the two estimated clusters is provided large positive-definite matrix that has size NN, for N
in Figure 1. The samples are labeled to maximize training samples. The solution is a weighted sum of
kernels evaluated over each training sample, thus the
< p, q > f test procedure for each novel sample involves evalu-
DCS ( p, q ) = log ating the sum of N kernels: ytest = kN=1 wk K ( xtest xk ) .
|| p || f || q || f
(3) The Fast Gauss Transform (FGT) (Greengard, 1991),
which uses the polynomial expansions for a Gaussian
where p and q are KDE for two candidate clusters, f is (or other) kernel has been employed to overcome this
the overall data KDE and the weighted inner product difficulty. FGT carefully selects few center points
to measure angular distance between clusters is around which truncated Hermite polynomial expansions
approximate the kernel machine. FGT still requires
< p, q > f = p ( x)q ( x) f 1 ( x)dx heavy computational load in off-line training (minimum
(4)
O(N2), typically O(N3)). The selection of expansion
When estimated using a weighted KDE variant, this centers is typically done via clustering (e.g., Ozertem
criterion becomes equivalently & Erdogmus, 2006).
Correntopy as a Generalized Similarity Metric:
K1/ f ( xi , y j ) The main feature of ITL is that it preserves the universe
DCS ( p, q)
xi p , y j q of concepts we have in neural computing, but allows
K1/ f ( xi , x j ) K1/ f ( yi , y j ) the adaptive system to extract more information from
xi p , xi p yi q , y j q
(5) the data. For instance, the general Hebbian principle is
Figure 1. Maximum CSD clustering of two synthetic benchmarks: training and novel test data (left), KDE using
Gaussian kernels with Silverman-kernel-size (center), and spectral projections of data on two dominant eigen-
functions of the kernel. The eigenfunctions are approximated using the Nystrom formula.
904
Information Theoretic Learning
reduced into a second order metric in traditional artificial and measures how dense the two random variables are
neural network literature (input-output product), thus along the line x=y in the joint space. Notice that it is I
becoming a synonym for second order statistics. The similar to correlation, which also asks the same question
learning rule that maximizes output entropy (instead in a second moment framework. However, correntropy
of output variance), using SIG with Gaussian kernels is local to the line x=y, while correlation is quadrati-
is w(n) = H ( x(n) x(n 1))( y (n) y (n 1)) (Erdogmus et cally dependent upon distances of samples in the joint
al, 2002), which still obeys the Hebbian principle, space. Using a KDE with Gaussian kernels
yet extracts more information from the data (leading
to the error-whitening criterion for input-noise robust 1 N
V ( X ,Y ) = G ( xi yi )
learning). N i =1 (7)
ITL quantifies global properties of the data, but
will it be possible to apply it to functions, specifically Correntropy is a positive-definite function, thus
those in RKHS? A concrete example is on similarity defines a RKHS. Unlike correlation, RKHS is nonlin-
between random variables, which is typically expressed early related to the input, because all moments of the
as second order correlation. Correntropy generalizes random variable are included in the transformation.
similarity to include higher order moment information. It is possible to analytically solve for least squares
The name indicates the strong relation to correlation, regression and principal components in this space,
but also stresses the difference the average over the yielding nonlinear fits in input space. Correntopy in-
lags (for random processes) or over dimensions (for duced metric (CIM) behaves as the L2-norm for small
multidimensional random variables) is the information distances and progressively approaches the L1-norm
potential, i.e. the argument of second order Renyis en- and then converges to L0 at infinity. Thus robustness to
tropy. For random variables X and Y with joint density outliers is automatically achieved and equivalence to
p(x,y), correntropy is defined as Hubers robust estimation can be proven (Santamaria,
2006). Unlike conventional kernel methods, correntropy
V ( X , Y ) = D ( x y ) p ( x, y )dxdy (6) solutions remain in the same dimensionality as the in-
Figure 2. Maximum mutual information projection versus kernel LDA test ROC results on hand-written digit
recognition shown in terms of type-1 and type-2 errors (left); ROC results (Pdetect vs Pfalse) compared for various
techniques on sonar data. Both data are from the UCI Machine Learning Repository (2007).
905
Information Theoretic Learning
put vector. This might indicate built-in regularization have been Gaussian distributed (nor symmetrically
properties, yet to be explored. distributed in general for the ideal kernel for a given
Nonparametric Learning in the RKHS: It is problem since these are positive definite functions).
possible to obtain robust solutions to a variety of Consequently, the hasty use of kernel extensions of
problems in learning using the nonparametric and local second-order techniques is not necessarily optimal
nature of KDE and its relationship with RKHS theory. in a meaningful statistical sense. Nevertheless, these
Recently, we explored the possibility of designing techniques have found successful applications in various
nonparametric solutions to the problem of identifying problems; however, their suboptimality is clear from
nonlinear dimensionality reduction schemes that main- comparisons with more carefully designed solutions.
tain maximal discriminative information in a pattern In order to illustrate how drastic the performance dif-
recognition problem (quite appropriately measured ference could be, we present a comparison of a mutual
by the mutual information between the data and the information based nonlinear nonparametric projection
class labels as agreed upon by many researchers). approach (Ozertem et al, 2006) and kernel LDA in a
Using the RKHS formalism and based on the KDE, simplified two-class handwritten digit classification
results were obtained that consistently outperformed case study and sonar mine detection case study. The
the alternative rather heuristic kernel approaches such ROC curves of both algorithms on the test set after being
as kernel PCA and kernel LDA (Scholkopf & Smola, trained with the same data is shown in Figure 2. The
2001). The conceptual oversight in the latter two is that, kernel is assumed to be a circular Gaussian with size
both PCA and LDA procedures are most appropriate set to Silvermans rule-of-thumb. For the sonar data, we
for Gaussian distributed data (although acceptable for also include KDE-based approximate Bayes classifier
other symmetric unimodal distributions and are com- and linear LDA for reference. In this example, KLDA
monly but possibly inappropriately used for arbitrary performs close to mutual information projections, as
data distributions). observed occasionally.
Clearly, the distribution of the data in the kernel
induced feature space could not be Gaussian for all typi-
cally exploited kernel selections (such as the Gaussian FUTURE TRENDS
kernel), since these are usually translation invariant,
therefore the data is, in principle, mapped to an infinite Nonparametric Snakes, Principal Curves and
dimensional hypersphere on which the data could not Surfaces: More recently, we have been investigating
Figure 3. Nonparametric snake after convergence from an initial state that was located at the boundary of the
guitar image rectangle (left). The global principal curve of a mixture of ten Gaussians obtained according to
the local subspace maximum definition for principal manifolds (right).
906
Information Theoretic Learning
the application of KDE and RKHS to nonparametric solving for minimum information distortion clustering
clustering, principal curves and surfaces. Interesting (Rao et al, 2006) and maximum likelihood modelling I
mean-shift-like fixed-point algorithms have been achieving minimum Kullback-Leibler divergence
obtained; specifically interesting is the concepts of asymptotically (Carreira-Perpinan & Williams, 2003;
nonparametric snakes (Ozertem & Erdogmus, 2007) Erdogmus & Principe, 2006).
and local principal manifolds (Erdogmus & Ozertem,
2007) that we developed recently. The nonparametric
snake approach overcomes the principal difficulties CONCLUSION
experienced by snakes (active contours) for image
segmentation, such as low capture range, data cur- The use of information theoretic learning criteria in
vature inhomogeneity, and noisy and missing edge neural networks and other adaptive system solutions
information. Similarly, the local conditions for deter- have so far clearly demonstrated a number of advan-
mining whether a point is in a principal manifold or tages that arise due to the increased information content
not provide guidelines for designing fixed point and of these measures relative to second-order statistics
other iterative learning algorithms for identifying such (Erdogmus & Principe, 2006). Furthermore, the use of
important structures. kernel density estimation with smooth kernels allows
Specifically in nonparametric snakes, we treat the one to obtain continuous and differentiable criteria
edgemap of an image as samples and the values of the suitable for iterative descent/ascent-based learning
edginess as weights to construct a weighted KDE, from and the nonparametric nature of KDE and its variants
which, a fixed point iterative algorithm can be devised (such as variable-size kernels) allow one to achieve
to detect the boundaries of an object in background. simultaneously robustness, global optimization through
The designed algorithm can be easily made robust to kernel annealing, and data modeling flexibility in de-
outlier edges, converges very fast, and can penetrate signing neural networks and learning algorithms for a
into concavities, while not being trapped into the object variety of benchmark problems. Due to lack of space,
at missing edge localities. The guitar image in Figure detailed mathematical treatments cannot be provided
3 emphasizes these advantages as the image exhibits in this article; the reader is referred to the literature
both missing edges and concavities, while background for details.
complexity is trivially low as that was not the main
concern in this experiment the variable width KDE
easily avoids textured obstacles. The algorithm could REFERENCES
be utilized to detect the ridge-boundary of a structure
in any dimensional data set in other applications. Bregman, L.M., (1967). The Relaxation Method of
In defining principal manifolds, we avoided the Finding the Common Point of Convex Sets and Its
traditional least-squares error reconstruction type cri- Application to the Solution of Problems in Convex
teria, such as Hasties self-consistent principal curves Programming. USSR Computational Mathematics and
(Hastie & Stuetzle, 1989), and proposed a local subspace Physics, (7), 200-217.
maximum definition for principal manifolds inspired
by differential geometry. This definition lends itself to Carreira-Perpinan, M.A., Williams, C.K.I., (2003).
a uniquely defined principal manifold hierarchy such On the Number of Modes of a Gaussian Mixture.
that one can use inflation and deflation to obtain a d-di- Proceedings of Scale-Space Methods in Computer
mensional principal manifold from a (d+1)-dimensional Vision. 625-640.
principal manifold. The rigorous and local definition Csiszr, I., Krner, J. (1981). Information Theory:
lends itself to easy algorithm design and multiscale Coding Theorems for Discrete Memoryless Systems,
principal structure analysis for probability densities. Academic Press.
We believe that in the near future, the community
will be able to prove maximal information preserving Duin, R.P.W., On the Choice of Smoothing Parameter
properties of principal manifolds obtained using this for Parzen Estimators of Probability Density Functions.
definition in a manner similar to mean-shift clustering IEEE Transactions on Computers, (25) 1175-1179.
907
Information Theoretic Learning
Erdogmus, D., Ozertem, U., (2007). Self-Consistent Lazaro, M., Santamaria, I., Erdogmus, D., Hild II,
Locally Defined Principal Surfaces. Proceedings of K.E., Pantaleon, C., Principe, J.C., (2005). Stochastic
ICASSP 2007. to appear. Blind Equalization Based on PDF Fitting Using Parzen
Estimator. IEEE Transactions on Signal Processing,
Erdogmus, D., Principe, J.C., From Linear Adaptive
(53) 2, 696-704.
Filtering to Nonlinear Information Processing. IEEE
Signal Processing Magazine, (23) 6, 14-33. Ozertem, U., Erdogmus, D., (2006). Maximum Entropy
Approximation for Kernel Machines. Proceedings of
Erdogmus, D., Principe, J.C., Hild II, K.E., (2002). Do
Hebbian Synapses Estimate Entropy? Proceedings of MLSP 2005.
NNSP02, 199-208. Ozertem, U., Erdogmus, D., Jenssen, R., (2006). Spec-
Erdogmus, D., Principe, J.C., Hild II, K.E., (2003). tral Feature Projections that Maximize Shannon Mutual
On-Line Entropy Manipulation: Stochastic Informa- Information with Class Labels. Pattern Recognition,
tion Gradient. IEEE Signal Processing Letters, (10) (39) 7, 1241-1252.
8, 242-245. Ozertem, U., Erdogmus, D., (2007). A Nonparametric
Erdogmus, D., Principe, J.C., Vielva, L. Luengo, D., Approach for Active Contours. Proceedings of IJCNN
(2002). Potential Energy and Particle Interaction Ap- 2007, to appear.
proach for Learning in Adaptive Systems. Proceedings Parzen, E., (1967). On Estimation of a Probability
of ICANN02, 456-461. Density Function and Mode. Time Series Analysis
Fano, R.M. (1961). Transmission of Information: A Papers, Holden-Day, Inc.
Statistical Theory of Communications, MIT Press. Principe, J.C., Fisher, J.W., Xu, D., (2000). Information
Greengard, L., Strain, J., (1991). The Fast Gauss Theoretic Learning. Unsupervised Adaptive Filtering,
Transform. SIAM Journal of Scientific and Statistical (S. Haykin, ed.), Wiley, 265-319.
Computation, (12) 1, 7994. Rao, Y.N., Erdogmus, D., Principe, J.C., (2005). Error
Gyorfi, L., van der Meulen, E.C. (1990). On Nonpara- Whitening Criterion for Adaptive Filtering: Theory and
metric Estimation of Entropy Functionals. Nonpara- Algorithms. IEEE Transactions on Signal Processing,
metric Functional Estimation and Related Topics, (G. (53) 3, 1057-1069.
Roussas, ed.), Kluwer Academic Publisher, 81-95. Rao, S., de Madeiros Martins, A., Liu, W., Principe, J.C.,
Hastie, T., Stuetzle, W., (1989). Principal Curves. (2006). Information Theoretic Mean Shift Algorithm.
Journal of the American Statistical Association, (84) Proceedings of MLSP 2006.
406, 502-516. Renyi, A., (1970). Probability Theory, North-Holland
Hild II, K.E., Erdogmus, D., Principe, J.C., (2006). Publishing Company.
An Analysis of Entropy Estimators for Blind Source Sayed, A.H. (2005). Fundamentals of Adaptive Filter-
Separation. Signal Processing, (86) 1, 182-194. ing. Wiley & IEEE Press.
Huber, P.J., (1981). Robust Statistics. Wiley. Scholkopf, B., Smola, A.J. (2001). Learning with
Jenssen, R., Erdogmus, D., Principe, J.C., Eltoft, T., Kernels. MIT Press.
(2004). The Laplacian PDF Distance: A Cost Function Shannon, C.E., Weaver, W. (1964). The Mathemati-
for Clustering in a Kernel Feature Space. Advances in cal Theory of Communication, University of Illinois
NIPS04, 625-632. Press.
Jenssen, R., Erdogmus, D., Principe, J.C., Eltoft, T., Shi, J., Malik, J., (2000). Normalized Cuts and Image
(2006). Some Equivalences Between Kernel Methods Segmentation. IEEE Transactions on Pattern Analysis
and Information Theoretic Methods. JVLSI Signal and Machine Intelligence, (22) 8, 888-905.
Processing Systems, (45) 1-2, 49-65.
908
Information Theoretic Learning
Silverman, B.W., (1986). Density Estimation for Sta- Information Theoretic Learning: A technique
tistics and Data Analysis, Chapman & Hall. that employs information theoretic optimality criteria I
such as entropy, divergence, and mutual information
Santamaria, I., Pokharel, P.P., Principe, J.C., (2006).
for learning and adaptation.
Generalized Correlation Function: Definition, Proper-
ties, and Application to Blind Equalization. IEEE Trans- Information Potentials and Forces: Physically
actions on Signal Processing, (54) 6, 2187-2197. intuitive pairwise particle interaction rules that emerge
from information theoretic learning criteria and govern
UCI Machine Learning Repository (2007). http://
the learning process, including backpropagation in
mlearn.ics.uci.edu/MLRepository.html. last accessed
multilayer system adaptation.
in June 2007.
Kernel Density Estimate: A nonparametric tech-
nique for probability density function estimation.
KEy TERmS Mutual Information Projections: Maximally
discriminative nonlinear nonparametric projections
Cauchy-Schwartz Distance: An angular density for feature dimensionality reduction based on the
distance measure in the Euclidean space of probability reproducing kernel Hilbert space theory.
density functions that approximates information theo-
Renyi Entropy: A generalized definition of entropy
retic divergences for nearby densities.
that stems from modifying the additivity postulate and
Correntropy: A statistical measure that estimates results in a class of information theoretic measures that
the similarity between two or more random variables contain Shannons definitions as special cases.
by integrating the joint probability density function
Stochastic Information Gradient: Stochastic
along the main diagonal of the vector space (line along
gradient of nonparametric entropy estimate based on
ones). It relates to Renyis entropy when averaged over
kernel density estimation.
sample-index lags.
909
910
I. Rojas
University of Granada, Spain
F. Rojas
University of Granada, Spain
A. Guillen
University of Granada, Spain
L. J. Herrera
University of Granada, Spain
F. J. Rojas
University of Granada, Spain
M. Cepero
University of Granada, Spain
INTRODUCTION of the ECG in order to work with it, with this specific
pathology. In order to achieve this we will study dif-
This chapter is focused on the analysis and classifica- ferent ways of elimination, in the best possible way,
tion of arrhythmias. An arrhythmia is any cardiac pace of any activity that is not caused by the auriculars. We
that is not the typical sinusoidal one due to alterations will study and imitate the ECG treatment methodologies
in the formation and/or transportation of the impulses. and the characteristics extracted from the electrocardio-
In pathological conditions, the depolarization process grams that were used by the researchers that obtained
can be initiated outside the sinoatrial (SA) node and the best results in the Physionet Challenge, where the
several kinds of extra-systolic or ectopic beatings can classification of ECG recordings according to the type
appear. of Atrial Fibrillation (AF) that they showed, was re-
Besides, electrical impulses can be blocked, ac- alised. We will extract a great amount of characteristics,
celerated, deviated by alternate trajectories and can partly those used by these researchers and additional
change its origin from one heart beat to the other, thus characteristics that we consider to be important for the
originating several types of blockings and anomalous distinction mentioned before. A new method based on
connections. In both situations, changes in the signal evolutionary algorithms will be used to realise a selec-
morphology or in the duration of its waves and intervals tion of the most relevant characteristics and to obtain
can be produced on the ECG, as well as a lack of one a classifier that will be capable of distinguishing the
of the waves. different types of this pathology.
This work is focused on the development of intel-
ligent classifiers in the area of biomedicine, focusing
on the problem of diagnosing cardiac diseases based BACKGROUND
on the electrocardiogram (ECG), or more precisely on
the differentiation of the types of atrial fibrillations. The electrocardiogram (ECG) is a diagnostic tool that
First of all we will study the ECG, and the treatment measures and records the electrical activity of the heart
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Classifier for Atrial Fibrillation (ECG)
in exquisite detail (Lanza 2007). Interpretation of these it serves as the basis for the automated determination
details allows diagnosis of a wide range of heart condi- of the heart rate, as an entry point for classification I
tions. The QRS complex is the most striking waveform schemes of the cardiac cycle, and often it is also used
within the electrocardiogram (Figure 1). Since it re- in ECG data compression algorithms.
flects the electrical activity within the heart during the A normal QRS complex is 0.06 to 0.10 sec (60 to
ventricular contraction, the time of its occurrence as 100 ms) in duration. In order to have a signal clean
well as its shape provide much information about the of auricular activity in the ECG, we will analyse and
current state of the heart. Due to its characteristic shape compare performances from these two different ap-
proaches:
Figure 2. The segments are shown detected by the algorithm on the two channels of a registration. In green the
end of the wave T is shown, and in red the principle of the wave Q. Therefore each tract among final of wave
T (green) and wave principle Q (red), it is a segment of auricular activity. The QRST complex is automatically
detected with good precision.
911
Intelligent Classifier for Atrial Fibrillation (ECG)
more than 30 years. Once the QRS complex is identi- governs acceptable (or meaningful) arrangements of
fied, we will have a starting point to implement some information on the chromosome. For GAs, the chro-
different techniques for the QRST removal. Figure 2 mosomes are typically syntax free.
show how the QRST is automatically detected. This The field of program induction, using a tree-struc-
is the first step in the analysis of the ECG. tured approach, was first clearly defined by Koza
The study and analysis of feature extraction tech- (Koza, 2003).The following steps summarise the search
niques from ECG signals is a very common task in any procedure used with GP.
implementation of automatic classification systems
from signals of any kind. During the execution of 1. Create an initial population of programs, randomly
this sub-task, it is very important to analyse different generated as compositions of the function and
research results existing in the literature. terminal sets.
It is important to analyse the use of the frequency 2. WHILE termination criterion not reached DO
domain to obtain the Dominant Atrial Frequency (DAF) (a) Execute each program to obtain a perfor-
which is an index of the auricular activity which mea- mance (fitness) measure representing how
sures the dominant frequency in the frequency spectrum well each program performs the specified
that can be obtained from the auricular activity signal. task.
In this spectrum, for each ECG record, the maximum (b) Use a fitness proportionate selection method
energy peak is calculated, and this frequency will be the to select programs for reproduction to the
one that dominates the spectrum (Cantini et al. 2004). next generation.
It is also important to use the RR distance, and dif- (c) Use probabilistic operators (crossover and
ferent filters in the 4-10Hz range, using a Butterworth mutation) to combine and modify compo-
filter of first order. It is important to note the MUSIC nents of the selected programs.
(Multiple Signal Clasification) method of order 12 3. The fittest program represents a solution to the
to calculate the pseudo-periodogram of the signal. In problem.
order to obtain more robust estimations, signal filter-
ing by variable-length windows, with no overlapping,
and on every one of them, an analysis of the frequency A NEW INTELLIGENT CLASSIFIER
spectrum can be performed. It is also important to note BASED ON GENETIC PROGRAmmING
the Welch method, the Choi-Williams transform, and FOR ECG.
some heuristical methods used by cardiology experts
(Atrial Fibrillation, 2007). In the different articles we have studied, the authors did
not use any algorithmic method in order to try to classify
the electrocardiograms (Cantini et al. 2004, Lemay et
GENETIC PROGRAmmING al. 2004). The authors applied simple methods to try to
establish the possible classification based on the clas-
The genetic programming (GP) can be understood as an sification capacity of one single characteristic or pairs
extension of the genetic algorithm (GA) (Zhao, 2007). of characteristics (through a graphic representation)
GP began as an attempt to discover how computers (Hay et al. 2004). Nevertheless, the fact that one single
could learn to solve problems, in different fields, like characteristic might not be perfect individually to clas-
automatic desing, function approximation, classifica- sify a group of patterns in the different categories, does
tion, robotic control, signal processing, without being not mean that combined with another or others it does
explicitly programmed to do so (Koza, 2003). Also, in not obtain some high percentages in the classification.
bio-medical application, GP has been extensively and Due to the great quantity of characteristics obtained
satisfactorily used (Lopes, 2007). The primary differ- from the ECG, a method to classify the patterns was
ences between GAS and GP can be summarised as a) needed, alongside a way of selecting the subgroup of
GP typically codes solutions as tree structured, variable characteristics optimal for classifying, since the great
length chromosomes, while GAs generally make use quantity of existing characteristics would introduce
of chromosomes of fixed length and structure, b) GP noise as soon as the search for the optimal classifier of
typically incorporates a domain specific syntax that the patterns of characteristics begins. In total 55 different
912
Intelligent Classifier for Atrial Fibrillation (ECG)
characteristics were used, from the papers (Cantini et The algorithm consists of a loop in which in each
al. 2004, Lemay et al. 2004, Hayn et al. 2004, Mora et repetition a new population is formed from the previ- I
al. 2004). There are other paper in the bibliography that ous through the genetic operators. The classifiers that
used soft-computing method to analyze ECG (Wiggins score the highest on fitness will have more possibilities
et al. 2008, Lee et al. 2007, Yu et al. 2007). to participate, with which the population will tend to
In this paper, a new intelligent algorithm based on improve its quality with the successive generations.
genetic programming (one paradigm of the soft-comput- The proposed algorithm is composed of the following
ing area) for simultaneously select the best features is building blocks:
proposed for the problem of classification spontaneous
termination of atrial fibrillation. In this algorithm genetic 1. Fitness function. The fitness function combines
programming is used to search for a good classifier at the double objective of achieving a good classi-
the same time as the search for an optimal subgroup of fication and a small subgroup of characteristics:
characteristics. The algorithm consists of a population
of classifiers, and each one of those is associated with a
B
Fitness = f 1 + A e n
fitness value that indicates how well it classifies. Each (1)
classifier is made up of:
In this equation, f is the sum of the cases of success
1. A binary vector of characteristics, which indicates in the classification of the trees, is the cardinality of
with 1s the characteristics it uses. the feature subset used, n is the total number of features
2. A multitree with as many trees as classes as has and is a parameter which determines the relative im-
the collection of data of the problem. Every tree portance that we want to assign for correct classification
i distinguishes between the class i (giving a posi- and the size of the feature subset, calculated as:
tive output) and the rest of the classes (negative
output). Furthermore, it is connected to values pj gen
(frequency of failures), and wj (frequency of suc- A = C 1
TotalGen
cesses). The trees are made up of function nodes (2)
[+,-,*,/, trigonometric functions (sine, cosine,
etc.), statistic functions (minimums, maximums, where C is a constant, and TotalGen is the number of
average)] and terminal nodes {constant number generations proposed genetic algorithm is evolved,
and features}. Their translation to a mathematical and gen is the current generation number.
formula is immediate.
Figure 3. An example of a crossover operation in the proposed multitree classifier. (a) and (b) are initially the
classifiers P1 and P2. In the figures (c) and (d) the results of the crossover operator is presented.
913
Intelligent Classifier for Atrial Fibrillation (ECG)
914
Intelligent Classifier for Atrial Fibrillation (ECG)
of the long-term recording, at least an hour fol- selects the required features while design the multitree
lowing the segment-) and registration T (Group classifier. I
T: AF that terminates immediately (within one Different genetic operator has been design for the
second) after the end of the record). multitree classifier, and for a better performance of the
Event B: To differ among the type registrations classifier, the initialization process generates solution
S (Group S: AF that terminates one minute after using smaller feature subsets with has been previously
the end of the record) and those of type T. selected with a greedy search algorithm (G-Flips) for
Event C: To differ among registrations type N maximizing the evaluation function. The effective-
of AF and a second group in which registrations ness of the proposed scheme is demonstrated in a real
type S and type T are included. problem: The Classification Spontaneous Termination
Event D: Separation of the 3 types of registrations of Atrial Fibrillation. At this point, it is important to note
in a simultaneous way. that the use of different characteristic gives different
classification result as can be observed by the authors
These groups N,T and S are distributed across a working in this challenge. The selection of different
learning set (consisting of 10 labelled records from features extracted from an electrocardiogram has a
each group) and two test sets. Test set A contains 30 strong influence on the problem to be solve and in the
records, of which about one-half are from group N, behaviour of the classifier. Therefore it is important to
and of which the remainder are from group T. Test develop a general tool able to be face with different
set B contains 20 records, 10 from each of groups S cardiac illnesses, which can select the most appropriate
and T. Table 1 shows the simulation results (in % of features in order to obtain an automatic classifier. As it
classification), for different method and the evolutive can be observed, the proposed methodology has very
algorithm proposed for ECG classification: good result compared with the winner of the challenge
from PhysioNet and Computers in Cardiology 2004,
even if this methodology has been developed in a general
FUTURE TRENDS way to resolved different classification problems.
915
Intelligent Classifier for Atrial Fibrillation (ECG)
916
917
Evelio J. Gonzlez
University of La Laguna, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent MAS in System Engineering and Robotics
It is clear that the design of a MAS is more com- ARCHON (Architecture for Cooperative Hetero-
plex than a single agent. Apart from the code for the geneous on-line systems) architecture (Cockburn,
treatment of the task-problem, a developer needs to 1996) that has been used in at least three engineer-
implement those aspects related to communication, ing domains: Electricity Transportation, Electric-
negotiation among the agents and its organization in ity Distribution and Particle Accelerator Control.
the system. Nevertheless, it has been shown that MAS In ARCHON, each application program (known
offer more than they cost (Cockburn, 1996) (Gonzalez, as Intelligent System) is provided with a layer
2006) (Gonzalez, 2006b) (Gyurjyan, 2003) (Seilonen (called Archon Layer) that allows it to transfer
2005). data/messages to other Intelligent Systems.
A second approach consists of those systems that
implement a closed loop-based control. In this
mAS, AI AND SySTEm ENGINEERING sense, we will cite the work of (Velasco et al.,
1996) for the control of a thermal central.
An important topic in System Engineering is that of A different proposal consists of complementing a
process control problem. We can define it as the one of pre-existing process automation system with agent
manipulating the input variables of a dynamic system technology. In other words, it is a complementa-
in an attempt to influence over the output variables in a tion, not a replacement. The agent system is an
desired fashion, for example, to achieve certain values additional layer that supervises the automation
or certain rates (Jacquot, 1981). In this context, as other system and reconfigures it when it is necessary.
Engineering disciplines, we can find a lot of relevant Seilonen et al. also propose a specification of
formalisms and standards, whose descriptions are out a BDI-model-based agent platform for process
of the scope of this chapter. An interested reader can automation (Seilonen, 2005).
get an introductory presentation of these aspects in V. Gyurjyan et al. (2003) propose a controller
(Jacquot, 1981). system architecture with the ability of combining
Despite their advantages, there are few approaches heterogeneous processes and/or control systems
to the application of MAS technology to process auto- in a homogeneous environment. This architecture
mation (much less than applications to other fields such (based on the FIPA standard) develops the agents
as manufacturing industry). Some reasons for this lack as a level of abstraction and uses a description
of application can be found in (Seilonen, 2005): of the control system in a language called COOL
(Control Oriented Ontology Language).
Process automation requires run-time specifica- Tetiker et al. (2006) propose a decentralized
tions that are difficult to reach by the current agent multi-layered agent structure for the control of
technology. distributed reactor networks where local control
The parameters in the automation process design agents individually decide on their own objectives
are usually interconnected in a strict way, thus allowing the framework to achieve multiple local
it is highly difficult to decompose the task into objectives concurrently at different parts of the
agent behaviors. network. On top of that layer, a global observer
Lack of parallelism to be modeled through agent continuously monitors the system.
agents. Horling, Lesser et al. (2006) describe a soft real-
time control architecture designed to address tem-
In spite of these difficulties, some significant ap- poral and ordering constraints, shared resources
proaches to the application of MAS to process control and the lack of a complete and consistent world
can be distinguished: view. From challenges encountered in a real-time
distributed sensor allocation environment, the
An interesting approach of application of MAS to system is able to generate schedules respecting
process control is that in which communication temporal, structural and resource constraints, to
techniques among agents are used as a mecha- merge new goals with existing ones, and to detect
nism of integration among systems independently and handle unexpected results from activities.
designed. An example of this approach is the Other proposal of real-time control architecture
918
Intelligent MAS in System Engineering and Robotics
is CIRCA (A Cooperative Intelligent Real-Time present clear analogies. The training of a neural network
Control Architecture) by Musliner, Durfee and consists of finding the best values of the weights of the I
Shin (1993), that uses separate AI and real-time network while it is necessary to optimize some param-
subsystems to address the problems for which eters for a model (identification) or for a controller in
each is designed. a STR. Because of this similarity of methods, we have
considered the application of ANN training methods to
In this context, we proposed a MAS (called MAS- control problems. In this case, ANN are applied for two
CONTROL) for identification and control of processes, purposes: the parameter optimization of a model of the
whose design follows the FIPA specifications (FIPA, unknown system and the optimization of the parameters
2007) regarding architecture, communication and pro- of a controller. This way, the resulting system could
tocols. This MAS implements a self-tuning regulator be seen as a hybrid intelligent system for a real-time
(STR) scheme, so this is not a new general control application. An interested reader can get a deeper de-
algorithm but a new approach for its development. Its scription of the system in (Gonzalez, 2006b).
main contribution consists of showing the potential that It is important to remark that this framework can be
a controller, through the use of MAS and ontologies used for every algorithm of identification and control.
expressed in OWL (Ontology Web Language)-, can In this context, we have checked the MAS control-
control systems in an autonomous way, using actions ling several and different plants, obtaining a proper
whose description, for example, is on the web, and behavior. In contrast, due to the transmission rate and
can read on it (without knowing a priori) the logic of optimization time, the designed MAS should be used
how to do the control. In this context, our experience for the controlling of not-excessively fast processes,
is that agents do not offer any advantage if they are not according to the first restriction stated above. However,
intelligent and ontologies represent an intelligent way we expect to have shown an example of how the other
to manage knowledge since they provides the common two (strong interdependency of the parameters and lack
format in which they can express that knowledge. Two of parallelism) can be overcome.
important advantages of their use are extensibility and As can be seen, the mentioned restrictions often
communication with other agents sharing the same become serious obstacles in the application of MAS
language. These advantages are shown in the particular to Engineering Systems. In this framework, the use of
case of open systems, that is, when different MAS from Fuzzy rules is a very usual solution in order to define
different developers interact (Gonzalez, 2006). single-agents behaviours (Hoffmann, 2003). Unfor-
As a STR, our MAS tries to carry out the processes tunately, the definition of the rules is cumbersome in
of identification and control of a plant. We consider most cases. As a possible solution to the difficult task
that this model can be properly managed by a MAS of generating the adequate rules, several automatic
due to two main reasons: algorithms have been proposed. New rule extraction
approaches based on connectionist models have been
A STR scheme contains modules that are con- proposed. Among them, the Neuro-Fuzzy systems has
ceptually different, such as the direct interaction been proven as a way to obtain the rules, taking advan-
with the plant to control, identification of the tage of the learning properties of the Neural Networks
system and determination of the best values for and the form of expressing the knowledge by Fuzzy
the controller parameters. rules (Mitra and Hayashi, 2000).
It is possible to carry out the calculations in a par- In this context, several applications have been
allel way. For instance, several transfer functions developed. In Robotics applications, it could be cited
could be explored simultaneously. Thus, several the work of (Lee and Qian, 1998), who describe a
agents can be launched in different computers, two-component system for picking up moving objects
taking advantage of the possibility of parallelism for a vibratory feeder or the work of (Kiguchi, 2004),
provided by the MAS. proposing a hierarchical neuro-fuzzy controller for
a robotic exoskeleton, to assist motion of physically
Other innovator aspect of this work is the use of weak persons such as elderly, disabled, and injured
artificial neural networks (ANN) for the identification persons. As a particular case, a system for the detection
and determination of the parameters. ANN and STR and identification of road markings will be presented
919
Intelligent MAS in System Engineering and Robotics
in this chapter. This system has been incorporated to the rules have generated great interest in recent years
a vehicle as it can be seen in Figure 1. (Guillaume, 2007).
This system is based on infrared technology and The final purpose is to achieve a MAS, where each
a classification tool based on a Neuro-Fuzzy System. agent does its work as fast as possible, overcoming
A particular feature to take into account in this kind the temporal limitations of the MAS as pointed out
of tasks is that the detection and classification have by (Seilonen, 2005). In this context, we would like to
to be done in real time. Hence, the time consumed by remark some approaches of MAS applied to decision
the hardware system and the processing algorithms is fusion for distributed sensor systems, in particular
critical in order to take a right decision within the time that by Yu and Sycara (2006). In order to achieve the
frame of its relevance. Looking for an inexpensive mentioned MAS, it is necessary to obtain the rules
and fast system, the infrared technology is a good for each agent. Furthermore, a depth analysis over the
alternative solution in this kind of applications. In this rules has to be done, minimizing the number of them
direction, taking into account the time limitations, a and setting the mapping between these rules and the
combination between a device based on infrared tech- different scenarios.
nology and different techniques to extract convenient The approach used in the shown case is based on
Fuzzy rules are used (Marichal, 2006). It is important designing rules for each situation found by the vehicle.
to remark that the extraction and the interpretation of In fact, each different scenario should be expressed
920
Intelligent MAS in System Engineering and Robotics
Reference 1 3 5 7
Value
by its own rules. This feature gives more flexibility in FUTURE TRENDS
the process of designing the desired MAS. Because
of that, the separation of rules according to the kind As technology provides faster and more efficient
of road marking could help in this purpose. In Table computers, the application of AI techniques to MAS
1, it is shown the result of this process for the infrared is supposed to become increasingly popular. That im-
system shown in Figure 1. Note that, the reference provement in the computer capacity and some emerging
values are the values associated with each road mark- techniques (meta-level accounting, schedule caching,
ing, the range refers to the interval where the output variable time granularities, etc.) (Horling, Lesser et
values of the resultant Fuzzy system could be for a al., 2006) will imply that other AI methods- impos-
particular sign and finally, the rules are indicated by sible to be currently applied in the field of System
an order number. Engineering- will be introduced in an efficient way in
It is important to remark that it is necessary to a near future.
interpret the obtained rules. In this way, it is possible In our opinion, other important feature to be explored
to associate these rules with different situations and is the improvement in MAS communication. It is also
generate new rules more appropriate for a particular convenient to look for more efficient MAS protocols
case under consideration. Hence, the agents related with and standards, in addition to those aspects related to
the detection and classification of the signs could be new hardware features. These improvements would
expressed by this set of Fuzzy rules. Moreover, agents, allow, for example, developing operative real-time
which are in charge of taking decisions based on the tele-operated applications.
information, provided by the detection and classification
of a particular road marking, could incorporate these
rules as part of them. Problems in task decomposition CONCLUSION
process, pointed out by (Seilonen, 2005), could be
simplified in this way. On the other hand, although The application of MAS to Engineering Systems and
the design of behaviors is very important, it should be Robotics is an attractive platform for the convergence
said that the issues related with the co-operation among of various AI technologies. This chapter shows in
agents are also essential. In this context, the work of a summarized manner how different AI techniques
(Howard et al, 2007) could be cited. (ANN, Fuzzy rules, Neuro-Fuzzy systems) have been
921
Intelligent MAS in System Engineering and Robotics
successfully included into MAS technology in the field depollution problem. Fuzzy Sets and Systems. 158,
of System Engineering and Robotics. These techniques 2078-2094
can also overcome some of the traditionally described
Gyurjyan, V., Abbott, D., Heyes, G., Jastrzembski,
drawbacks for MAS application, in particular, highly
E., Timmer, C. & Wolin, E. (2003) FIPA agent based
difficult decomposition of the task into agent behav-
network distributed control system. 2003 Computing
iors and lack of parallelism to be modeled through
in High Energy and Nuclear Physics (CHEP03).
agents.
However, present-day MAS technology does not Hewitt, C. (1977). Viewing Control structures as Pat-
fulfill completely the severe real-time requirements terns of Passing Messages. Artificial Intelligence, (8)
that are implicit in automation processes. Thus, and 3, 323-364.
until the technology provides faster and more efficient
computers, our opinion is that the application of AI Hoffmann, F. (2003). An Overview on Soft Computing
techniques in MAS needs to be optimized for real-time in Behavior Based Robotics. Lecture Notes in Computer
systems, for example, extracting convenient Fuzzy Science, 544-551.
rules and minimizing its number. Horling, B., Lesser V., Vincent R. & Wagner T. (2006)
The Soft Real-Time Agent Control Architecture,
Autonomous Agents and Multi-Agent Systems, 12(1),
REFERENCES 35-91
Brustoloni, J.C. (1991). Autonomous Agents: Charac- Howard A, Parker L. E., and Sukhatme G., (2006).
terization and Requirements. Carnegie Mellon Tech- Experiments with a Large Heterogeneous Mobile
nical Report CMU-CS-91-204, Pittsburgh: Carnegie Robot Team: Exploration, Mapping, Deployment, and
Mellon University Detection. International Journal of Robotics Research,
vol. 25, 5-6, 431-447.
Cockburn D. & Jennings, N. R. (1996). ARCHON: A
Distributed Artificial Intelligence System for Industrial Jacquot, R.G. (1981) Modern Digital Control Systems.
Applications. In G.M.P. OHare and N.R. Jennings, edi- Marcel Dekker, Editor. Electrical engineering and
tors, Foundations of Distributed Artificial Intelligence. electronics; 11.
John Wiley & Sons. Kiguchi, K.; Tanaka, T.; Fukuda, T. (2004) Neuro-fuzzy
Franklin S. & Graesser A. (1996). Is it an Agent, or control of a robotic exoskeleton with EMG signals, IEEE
just a Program?: A Taxonomy for Autonomous Agents. Transactions on Fuzzy Systems 12, 4, 481 - 490.
Intelligent Agents III. Agent Theories, Architectures Lee, K. M. & Qian, Y. F. (1998) Intelligent vision-based
and Languages (ATAL96). part-feeding on dynamic pursuit of moving objects,
FIPA web site. http://www.fipa.org. Last access: 15th Journal Manufacturing Science EngineeringTransac-
August 2007. tions ASME 120(3), 640647.
Gonzlez, E.J., Hamilton, A., Moreno, L., Marichal, Maes, P. (1995). Artificial Life Meets Entertainment:
R., & Muoz V. (2006) Software experience when us- Life like Autonomous Agents, Communications of the
ing ontologies in a multi-agent system for automated ACM, 38 (11), 108-114
planning and scheduling. Software - Practice and Marichal, G.N., Gonzlez, E.J., Acosta, L., Toledo,
Experience, 36 (7), 667-688. J., Sigut, M. & Felipe, J. (2006). An Infrared and
Gonzlez, E.J., Hamilton, A., Moreno, L., Marichal, Neuro-Fuzzy-Based Approach for Identification and
R., Marichal, G.N., & Toledo J. (2006b) A MAS Classification of Road Markings. Advances in Natural
Implementation for System Identification and Process Computation. Lecture Notes in Computer Science,
Control. Asian Journal of Control, 8 (4). 417-423. 4.222, 918-927.
Destercke S., Guillaume S. and Charnomordic B. Mitra, S. & Hayashi, Y. (2000). Neuro-fuzzy rule gen-
(2007) Building an interpretable fuzzy rule base from eration: survey in soft computing framework. IEEE
data using orthogonal least squares- application to a Transactions on Neural Networks. (11) 3, 748-768
922
Intelligent MAS in System Engineering and Robotics
Velasco, J., Gonzlez, J.C., Magdalena, L. & Iglesias, Real-Time System: System with operational dead-
C. (1996). Multiagent-based control systems: a hybrid lines from event to system response.
approach to distributed process. Control Engineering Self-Tuning Regulator: Type of adaptive control
Practice, 4, 839-846. system composed of two loops, an inner loop (process
Yu B. & Sycara K. (2006) Learning the Quality of and ordinary linear feedback regulator), and an outer
Sensor Data in Distributed Decision Fusion, Interna- loop (recursive parameter estimator and design calcula-
tional Conference on Information Fusion (Fusion 06), tion which adjusts its parameters).
Florence, Italy.
923
924
Fahrettin Yaman
Abant zzet Baysal University, Turkey
INTRODUCTION BACKGROUND
The query answering system realizes the selection of The query answering system in agents utilizes fuzzy
the data, preparation, pattern discovering, and pattern SQL queries from the agents, then creates and optimizes
development processes in an agent-based structure a query plan that involves the multiple data source of
within the multi agent system, and it is designed to the whole multi agent system. Accordingly, it controls
ensure communication between agents and an effective the execution of the task to generate the data set. The
operation of agents within the multi agent system. The query operation constitutes the basic function of query
system is suggested in a way to process and evaluate answering. By query operation, the most important fun-
fuzzy incomplete information by the use of fuzzy ction of the system is fulfilled. This study also discusses
SQL query method. The modelled system gains the peer to peer network structure and SQL structure, as
intelligent feature, thanks to the fuzzy approach and well as query operation.
makes predictions about the future with the learning Query operation was applied in various fields. For
processing approach. example, selecting the related knowledge in a web
The operation mechanism of the system is a pro- environment was evaluated in terms of relational
cess in which the agents within the multi agent system concept in databases. Relational database system par-
filter and evaluate both the knowledge in databases ticularly assists the system in making evaluations for
and the knowledge received externally by the agents, making decisions about the future and in making the
considering certain criteria. The system uses two right decisions with fuzzy logic approach (Raschia &
types of knowledge. The first one is the data existing Mauaddib, 2002; Tatarinov et al. 2003; Galindo et al.
in agent databases within the system and the latter is 2001; Bosc et al. Chaudhry et.al. 1999; Saygn et al.
the data agents received from the outer world and not 1999; Turgay et al.2006).
included in the evaluation criteria. Upon receiving data Query operation was mostly used in choosing
from the outer world, the agent primarily evaluates it the related information web environment (Jim &
in knowledgebase, and then evaluates it to be used Suciu, 2001; He et al. (2004). Data mining approach
in rule base and finally employs a certain evaluation was used in dynamic site discovery process by the
process to rule bases in order to store the knowledge data preparation and type recognition approaches in
in task base. Meanwhile, the agent also completes the complex matching schema with correlation values
learning process. in query interfaces and query schemas (Nambiar &
This paper presents an intelligent query answer- Kambhampati, 2006; Necib & Freytag, 2005). Query
ing mechanism, a process in which the agents within processing within peer to peer network structure with
the multi-agent system filter and evaluate both the SQL structure was discussed generally (Cybenko et
knowledge in databases and the knowledge received al. 2004; Bernstein et al. 1981). Query processing and
externally by the agents. The following sections in- database was reviewed with relational database (Genet
clude some necessary literature review and the query & Hinze, 2004; Halashek-Wiener et al., 2006). Fuzzy
answering approach Then follow the future trends and set was proposed by Zadeh (1965) and the division of
the conclusion. the features into various linguistic values was widely
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Query Answering Mechanism in Multi Agent Systems
used in pattern recognition and in the fuzzy inference ticular, a well-defined query answering process within
system. Kubat, et al. (2004) reviewed the frequency multi agent systems provides communication among I
of the fuzzy logic approach in operations research agents, the sharing of knowledge and the effective
methods as well as artificial intelligence ones in dis- performance of data processing process and learning
crete manufacturing. Data processing process within activities. The system is able to process incomplete
the multi-agent systems can be grouped as static and or fuzzy knowledge intelligently with the fuzzy SQL
dynamic. While the evaluation process of existing data query approach.
by the system can be referred to as a static structure, the The distributed query answering mechanism was
evaluation process of new data or possible data within proposed as a cooperative agent-based solution for
the system can be referred to as a dynamic structure. information management with fuzzy SQL query. A
The studies on the static structure can be expressed multi-agent approach to information management
as database managements query process (McClean, includes some features such as:
Scotney, Rutjes & Hartkamp, 2003) and the studies on
the dynamic structure can be expressed as the whole of Concurrency
the agent system (Purvia, Cranefield, Bush & Carter, Distributed computation
2000; Hoschek, 2002; Doherty, Lukaszewicz, & Szalas, Modularity
2004, Turgay, 2006) Cooperation
Figure 1. Model driven framework for query answering mechanism in a multi-agent system
Agent 1 I
N
T
E
R Data Query Query Variables
Evaluation with
Fuzzy SQL
F
A QUERY
Agent 2
C
ANSWER
E
Find Result Obtained Rules
From Query Based
. U
N
Agent n-1
Agent n I
T
925
Intelligent Query Answering Mechanism in Multi Agent Systems
query and evaluation. The task base structure of the Step4: determines the knowledge in compliance with
system is updated by the mechanism in line with the the criteria through fuzzy SQL commands
obtained fuzzy rules, and then, it is ensured that the Step5: sends the obtained task or rule to the related
system makes an appropriate and right decision and agent
acts intelligently. Step6: performs the answering operation
Operation Mechanism of Agent Based The agent based query answering system involves
Fuzzy Query Answering System three main stages: knowledge processing, query pro-
cessing and agent learning (see Figure2). The operation
The agent does the following: types of these stages are given in detail below.
Step1: receives the task knowledge from the related Knowledge Processing
agent
Step2: does the fuzzification of knowledge This is the stage where the knowledge is received by
Step3: determines fuzzy grade values according to the agent from the external environment and necessary
knowledge features preparations are made before query. The criteria and
Receive knowledge
Agent
Knowledge Processing
knowledge .. knowledge
Fuzzification
Data F 1 .. Data F n
Query Processing
Query
Query Parsing
Fuzzy Query Decomposition
Fuzzy Query Optimization
Agent Learning
926
Intelligent Query Answering Mechanism in Multi Agent Systems
keywords to be used in evaluating the received data perceived by Agent i, it is referred to as Aix. This input
are defined in this stage. This stage can also be called can also be used in knowledgebase, rule base and task I
pre-query. The keywords, concepts, attribute and re- base. The following goals that were determined as
lationship knowledge to be analysed by the agent are a result of the process and the evaluation of the
determined in this stage before query. information coming to the knowledge-base should
In this system, the behaviour structure of intelligent have been achieved in the mechanism of intelligent
query answering system is formed. During the system query answering.
modelling, the perception model considered being
coming signal, data and knowledge from the external Goal definition
environment for a more understandable structure in Data selection
learning module plays an important role. Coming from Data preparation
the external environment and called the input modelling;
<Ai,x, Ai, > is defined as the perception set. Agent i, Query Processing
x perception coming from the external environment,
refer to the Ai,x. Table 1 includes the nomenclature of The agent performs two types of query in the process
agent based query answering system. The multi-agent of defining keywords, concepts or attributes during
system consists of more than one agent. The agent set knowledge processing. The first is external query, which
is A={A1, A2,,Ai}. The knowledge set is K={K1, K2, is realized among the agents, while the second is the
...,Ky}The knowledgebase is <Definition of Knowledge, internal query, where the agent scans the knowledge
Attribute, Dependency Situation, Agent > (in Table 1 within itself. During these query processes, the fuzzy
and Figure 3). SQL approach is applied.
The rule set is R={R1, R2, ...,Rx}. The rule base is Feature-Attribute At and relation Re are elements
<Definition of Rule, Attribute, Dependency Situation, formed among the components within the system.
Agent > . The task set is T={T1, T2, ...,Tj}. The task These elements are the databases of knowledgebase,
base is <Definition of Task, Attribute, Dependency rule base and task base. While attribute refers to agent
Situation, Agent > . specifications, Resource includes not only raw data
When data arrives from the external environment, externally received but also knowledgebase, rule base
it is perceived as input : <Ai,x, Ai, > When x is and task base which each agent possesses.
927
Intelligent Query Answering Mechanism in Multi Agent Systems
A={At, Re(Ki,y, Ri,x, Ti,t)} bids are included into databases of the multi-agent
system. The information in databases is fuzzified and
Let P(At) denote the set of all possibility distribu- the interrelation between them is determined in terms
tions that may be defined over the domain of an attribute of weight and importance level.
At. A fuzzy relation R with schema A1, A2, ,An,
where Ai is an attribute is defined as R=P(At1)P(At2) Agent Learning Process
P(Atn) D, where D is a system-supplied attribute
for membership degree with a domain [0,1] and This is a process where the system learns the knowl-
denotes the cross product. edge obtained as a result of query as a rule or task. The
Each data value V of the attribute is associated with system fulfils not only the task but also the learning
a possibility distribution defined over the domain of process (in Figure 3). Learning process is acquired and
the attribute and has a membership function denoted the data from the external transition is processed by the
by v(x). If the data value is crisp, its possibility dis- agent system of the defined aim during the activities.
tribution is defined by Learning algorithm shows the variability of the system
status(in Table 2).
1 if x=v In the learning process with the help of the
M v (x ) = query processing, candidate rules are determined
0 otherwise (1) by taking the fuzzy dimension attributes and the
attribute measures into consideration. Therefore, it
Like standard SQL, queries in fuzzy SQL are speci- would be true to say that a hierarchical order from
fied in select statement of the following form: knowledge-base to rule-base and from rule- base to
task-base is available in the system.
SELECT Attributes
FROM Relations Algorithm Learning Analysis
WHERE Selection Conditions. Input: A relational view that contains a set of records
and the questions for influence analysis.
The semantics of a fuzzy SQL query is defined Output: An efficient association rule.
based on satisfaction degrees of query conditions. Step1: Specifies the fuzzy dimension attribute and the
Consider a predicate XY in a WHERE clause. The measure attribute.
satisfaction degree, denoted by d(XY), is evaluated Step2: Identifies the fuzzy dimension item sets and
for values of X and Y. Let the value of X be v1 and that calculates the support coefficient
of Y of v2. Then, Step3: Identifies the measure item sets and calculates
the support coefficient.
d(XY)=maxX,Y (min(v1 (X), v2(Y), Step4: Constructs sets of candidate rules, and computes
(X,Y)) (2) the confidence and aggregate value.
Step5: Obtains a rule at the granularity level with
where X and Y are crisp values in the common domain greatest confidence, and forms a rule at the ag-
over which v1 and v2 are defined(Yang et al., 2001). gregation level with largest abstract value of the
Function is a function that compares the degrees measure attribute.
in terms of satisfaction among the variables. When Step6: Computes the assertions at different levels, exits
the satisfaction degree is evaluated for X and Y the if comparable (i.e., there is no inconsistency found
former takes the value of v1, while the latter takes the in semantics at different levels).
value of v2. Step7: Generates rules from the refined measure item
As shown in Figure 2, bids are taken as a set, the sets and forms the framework of the rule.
frequencies of the received bids are fixed and then the Step8: Constructs the final rule as a task for related
bids are decomposed into groups. The decomposed agent.
928
Intelligent Query Answering Mechanism in Multi Agent Systems
Percept Includes
Knowledge Base If using of the KnowledgeBase
<Ai,x, Ki,y, Ai, > Querying is <Ki,y, Ai, >
or
Rule Base If using of the Rule Base
FSQL
<Ai,r, Ri,x, Ai, > Querying is < Ri,x, Ai, >
or
Task Base If using of the Task Base
<Ai,x, Ti,t, Ai, > Querying is < Ti,t, Ai, >
Task is Realizing
929
Intelligent Query Answering Mechanism in Multi Agent Systems
FUTURE TRENDS ties have been shown to be useful for condition and
action status, as well. The partitioning of the rule set
Future tasks of the system will be realized when the into multi agent system events has also been discussed
system performs query answering more quickly thanks as an example of inter-rule fuzziness. Similarity based
to the distributed, autonomous, intelligent and commu- event detection has been introduced to active multi
nicative agent structure of the suggested agent based agent databases, which is an important contribution
fuzzy query answering system. In fuzzy approach, the from the perspective of performance.
system will primarily examine and group the relational
database in databases of the agents with the fuzzy logic
and then will shape the rule base of the system by ap- REFERENCES
plying the fuzzy logic method to these data. After the
related rule is chosen, the rule base of the system will Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L.
be designed and the decision mechanism of the system & Rothnie, J.B. (December 1981), Query Processing
will operate. Therefore, relational database structure in a System for Distributed Databases (SDD-1), ACM
and system behaviour are important in determining Transactions on Database Systems, 6(4), 602-625.
the first peculiarity of the system and in terms of data
Bosc, P. & Pivert, O. (1995), SQLf: A relational data-
clearing.
base language for fuzzy querying, IEEE Transactions
For future research, it is noted that the design of
on Fuzzy Systems, 3, 11-17.
fuzzy databases involves not just modelling the data
but also modelling operations on the data. Relational Chaudhry, N., Moyne, J. & Rundensteiner, E.A.
databases support only limited data types, while fuzzy (1999), An extended database design methodology for
and possibility databases allow a much larger number uncertain data management, Information Sciences,
of comparatively complex data types (e.g., possibility 121, 83-112.
distributions). This suggests that it might be fruitful to
employ object-oriented database technology to allow Cybenko, G., Berk, V., Crespi, V., Gray, R. & Jiang,
explicit modelling of complex data types. G. (2004), An overview of Process Query Systems,
The incorporation of fuzziness into distributed Proceedings of SPIE Defense and Security Symposium,
events can be performed as a future study. Finally, 12-16 April, Orlando, Florida, USA.
due to frequent changes in the positions and status of Doherthy, P., Szalas ,A. & Lukaszewicz, W. (2004),
objects in an active mobile database environment, the Approximate Databases and Query Techniques for
issue of temporality should be considered by adapting Agents with Heterogeneous Ontologies and Percep-
the research results of temporal database systems area tive Capabilities, Proceedings on the 9th International
into active mobile databases. Conference on Principles of Knowledge Representation
and Reasoning.
930
Intelligent Query Answering Mechanism in Multi Agent Systems
Halashek-Wiener, C., Parsia, B. & Sinn, E. (2006), Raschia, G. & Mouaddib, N. (2002), SAINTETIQ: a
Towards Continuous Query Answering on the Seman- fuzzy set-based approach to database summarization, I
tic Web, In UMIACS Technical Report,. http://www. Fuzzy Sets and Systems, 129, 137-162.
mindswap.org/papers/2006/ ContQueryTR2006.pdf.
Saygn, Y., Ulusoy . & Yazc, A. (1999), Dealing
He, B., Chang, K.C. & Han, J. (2004), Discovering with fuzziness in active mobile database systems,
Complex Matching across Web Query Interfaces: A Information Sciences, 120, 23-44.
Correlation Mining Approach, Proceedings of the tenth
Tatarinov, I., Ives, Z., Madhavan, J., Halevy A., Suciu,
ACM SIGKDD international conference on Knowledge
D., Dalvi, N., Dong, X.L., Kadiyska, Y., Miklau &
discovery and data mining, KDD04, August 22-25,
G., Mork, P. (September 2003), The Piazza Peer Data
Seattle, Washington, USA.
Management Project, SPECIAL ISSUE: Special topic
Hoschek, W. (2002), Query Processing in Containers section on peer to peer data management ACM SIG-
Hosting Virtual Peer-to-Peer Nodes, Intl. Conf. on MOD Record, 32(3), , p 47 52, ISSN:0163-5808.
Information Systems and Databases (ISDB 2002),
Turgay, S. (2006, May 29-31), Analytic Model of an
Tokyo, Japan, September.
Intelligent Agent Interface, Proceedings of 5th Inter-
Jim, T. & Suciu D.(2001), Dynamically Distributed national Symposium on Intelligent Manufacturing
Query Evaluation, Proceedings of the twentieth ACM Systems, Turkey, (pp.1222-1229).
SIGMOD-SIGACT-SIGART symposium on Principles
Turgay, S., Kubat, C. & ztemel, E. (2006, May 29-31),
of database systems, Santa Barbara, California, United
Intelligent Fuzzy Decision Mechanism in Multi-Agent
States, Pg: 28 39, 2001ISBN:1-58113-361-8
Systems for Flexible Manufacturing Environment, Pro-
Kubat, C., Takn, H., Topal, B. & Turgay, S. (2004), ceedings of 5th International Symposium on Intelligent
Comparison of OR and AI methods in discrete manu- Manufacturing Systems, Turkey, (pp. 1159-1169).
facturing using fuzzy logic, Journal of Intelligent
Yang, Q., Zhang, W., Liu, C., Wu, J., Yu, C., Nakajima,
Manufacturing, 15, 517-526.
H & Rishe, N.D. (2001). Efficient Processing of Nested
McClean, S., Scotney, B., Rutjes, H. & Hartkamp, Fuzzy SQL Queries in a Fuzzy Database, IEEE Transac-
J.,(2003), Metadata with a MISSION: Using Meta- tions on Knowledge and Data Engineering, 13(6).
data to Query Distributed Statistical Meta-Information
Zadeh, L.A. (1965), Fuzzy sets, Information Control,
Systems, 2003 Dublin Core Conference: Supporting
8 (3), 338353.
Communities of Discourse and Practice-Metadata
Research&Applications, DC-2003, 28 September-2
October, Seattle, Washington USA.
Nambiar, U. & Kambhampati,S. (2006), Answering KEy TERmS
Imprecise Queries over Autonomous Web Databases,
Proceedings of the 22nd International Conference on Agent : A system that fulfils the independent func-
ICDE 06 Data Engineering, 03-07 April 2006. tions, perceives the outer world and establishes the
linking among the agents through its software.
Necib, C. B. & Freytag, J.C.(2005), Query Processing
Using Ontologies, Proceedings of the 17th Confer- Flexible Query: Incorporates some elements of the
ence on Advanced Information Systems Engineering natural language so as to make a possible simple and
(CAISE05), Porto, Portugal, June. powerful expression of subjective information needs.
Purvia, M., Cranefield, S., Bush, G. & Carter, D., Fuzzy SQL(Structural Query Language): It is an
(January 4-7, 2000), The NZDIS Project: an Agent- extension of the SQL language that allows us to write
Based Distributed Information Systems Architecture, flexible conditions in our queries. The FSQL allows us
in Proceedings of the Hawaii International Conference to use linguistic labels defined on any attribute.
on System Sciences, Maui, Hawaii.
931
Intelligent Query Answering Mechanism in Multi Agent Systems
Fuzzy SQL Query: Fuzzy SQL allows the system problems that are beyond the individual capacities or
to make flexible queries about crisp or fuzzy attributes knowledge of each problem solver.
in fuzzy relational data or knowledge.
Query: Caries out the scanning of the data with
Intelligent Agent: It consists of a sophisticated required specifications.
intelligent computer program; which is acting of situ-
Query Answering: Answers a user query with the
ated, independent, reactive, proactive, flexible, recovers
help of a single or multi-database in the multi agent
from failure and interacts with other agents.
system.
Multi-Agent System: It is a flexible incorporated
System: A set of components considered to act as
network of software agents that interact to solve the
a single goal-oriented entity.
932
933
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Radar Detectors
INTELLIGENT RADAR DETECTORS where the indexes h and k varies from 1 to N, Pc is the
BASED ON ARTIFICIAL NEURAL clutter power, c is the one-lag correlation coefficient
NETWORKS and fc is the Doppler frequency of the clutter.
The relationship between the Weibull distribution
This section starts with a discussion of the models parameters and Pc is
selected for the target, clutter and noise signals. For
these models, the optimum and suboptimum detec- 2b 2 2
Pc = (3)
tors are presented. These detectors will be taken as a a a
reference for the experiments. After, it is presented the where ( ) is the Gamma function.
intelligent detector proposed in this work. This detector The model used to generate coherent correlated
is based on intelligent systems like the ANNs, and a Weibull sequences consists of two blocks in cascade:
further analysis of its structure and parameters is made. a correlator filter and a NonLinear MemoryLess Trans-
Finally, several results are obtained for the detectors formation (NLMLT) (Farina, 1987a). To obtain the
under study in order to analyze their performances. desired sequence, a coherent white Gaussian sequence
is correlated with the filter designed according to (2)
Signal Models: Target, Clutter and Noise and (3). The NLMLT block, according to (1), gives
the desired Weibull distribution to the sequence. So, in
The Radar is assumed to collect N pulses in a scan, so that way, it is possible to obtained a coherent sequence
input vectors (z) are composed of N complex samples, with the desired correlation and PDF.
which are presented to the detector. Under hypothesis Taking into consideration that the complex noise
H0 (target absent), z is composed of N samples of clut- samples are of unity variance (power), the following
ter and noise. Under hypothesis H1 (target present), power relationships are considered for the study:
a known target characterized by a fixed amplitude
(A) and phase () for each of the N pulses is summed Signal to Noise Ratio: SNR = 10log10(A2)
up to the clutter and noise samples. Also, a Doppler Clutter to Noise Ratio: CNR = 10log10(Pc)
frequency in the target model of 0,5 PRF is assumed,
where PRF is the Pulse Repetition Frequency of the Neyman-Pearson Detectors: Optimum
Radar system. and Suboptimum Detectors
The noise is modelled as a coherent white Gauss-
ian complex process of unity power, i.e., a power of The problem of optimum Radar detection of targets
for the quadrature and phase components, respec- in clutter is explored in (Farina, 1987a) when both are
tively. The clutter is modelled as a coherent correlated time correlated and have arbitrary PDFs. The optimum
sequence with Gaussian AutoCorrelation Function detector scheme is built around two non-linear esti-
(ACF), whose complex samples have a modulus with mators of the disturbances in both hypotheses, which
a Weibull PDF: minimize the MSE. The study of Gaussian correlated
w
a targets detection in Gaussian correlated clutter plus
noise is carried out, but for the cases where the hypoth-
p ( w ) = ab a a 1 b
w e (1)
esis are non-Gaussian distributed, only suboptimum
solutions are studied.
where |w| is the modulus of the coherent Weibull se-
The proposed detectors basically consist of two
quence and a and b are the skewness (shape) and scale
channels. The upper channel is matched to the condi-
parameters of a Weibull distribution, respectively.
tions that the sequence to be detected is the sum of the
The NxN autocorrelation matrix of the clutter is
target plus clutter in presence of noise (hypothesis
given by
H1). While the lower one is matched to the detection
fc of clutter in presence of noise (hypothesis H0).
2 j 2 (h k )
(M c )h,k = Pc c
hk
e PRF (2) For the detection problem considered in this paper,
the suboptimum detection scheme (TSKAP) shown
in figure 1 is taken. Considering that the CNR is very
934
Intelligent Radar Detectors
high (CNR>>1), the inverse of the NLMLT is assumed ing learning rate and momentum (Haykin, 1999) and
to transform the Weibull clutter in Gaussian, so the the Levenberg-Marquardt (LM) with varying adaptive
Linear Prediction Filter (LPF) is a N-1 order linear parameter (Bishop, 1995). While BP is based on the
one. Then, the NLMLT transforms the filter output in steepest descent method, the LM is based on the Newton
a Weibull sequence. Besides being suboptimum, this method, which is designed specifically for minimizing
scheme presents two important drawbacks: the MSE. For MLPs which have up to few hundred of
weights (W), the LM algorithm is more efficient than
1. The prediction filters have N-1 memory cells that the BP one with variable learning rate or the conjugate
must contain the suitable information to predict gradient algorithms, being able to converge in many
correct values for the N samples of each input cases when the other two algorithms fail (Hagan, 1994).
pattern. So N+(N-1) pulses are necessary to decide The LM algorithm uses the information (estimation of
if the target is present or not. the WxW Hessian matrix) of the error surface in each
2. The target sequence must be subtracted from the iteration to find the minimum. It makes this algorithm
input of the H1 channel. faster than the previous ones.
Cross-validation is used with both training algo-
There is no sense in subtracting the target com- rithms, where training and validation sets are syntheti-
ponent before deciding if this component is present cally generated. Moreover, a new set (test set) of patterns
or not. So, in practical cases, it makes this scheme is generated to test the trained MLP for estimating
non-realizable. the Pfa and Pd using Montecarlo simulation. All the
patterns of the three sets are generated under the same
Intelligent Radar Detectors conditions (SNR, CNR and a parameters of the Radar
problem) in order to study the capabilities of the MLP
In order to overcome the drawbacks of the scheme plus hard limit thresholding working as a detector.
proposed in the previous section, a detector based on a MLPs are initialized using the Nguyen-Widrow
MLP with log-sigmoid activation function in its hidden method (Nguyen, 1999) and, in all cases, the training
and output neurons with hard limit threshold after its process is repeated ten times to guarantee that the per-
output is proposed. Also, as MLPs have been probed formance of all the MLPs is similar in average. Once
to approximate the NP detector when minimizing the all the MLPs are trained, the best MLP in terms of the
MSE (Jarabo, 2005), it can be expected that the MLP- estimated MSE with the validation set is selected, in
based detector outperforms the suboptimum scheme order to avoid the problem of keeping in local minima
proposed in (Farina, 1987a). at the end of the training.
MLPs have been trained to minimize the MSE using The architecture of the MLP considered for the
two algorithms:the back-propagation (BP) with vary- experiments is I/H/O, where I is the number of MLP
935
Intelligent Radar Detectors
inputs, H is the number of hidden neurons in its hid- detector according to the criterion exposed above. The
den layer and O is the number of MLP outputs. As the study shows the influence of the training algorithm and
MLPs work with real arithmetic, if the input vector the MLP size, i.e., the number of independent elements
(z) is composed of N complex samples, the MLP will (W weights) that has the ANN to solve the problem.
have 2N inputs (N in phase and N in quadrature com- For the case of study, two important aspects have to be
ponents). The number of MLP independent elements noted. The first one is related with the training algo-
(weights) to solve the problem is W=(I+1)H+(H+1)O, rithm. As can be observed, the performance achieved
including the bias of each neuron. with a low size MLP (6/05/1) is very similar for both
training algorithms (LM and BP). But when the MLP
Results size is greater, for instance, 6/10/1, the performance
achieved with the LM algorithm is better than the
The performance of the detectors exposed in the performance achieved with the BP one. It is due to
previous sections is shown in terms of the Receiver the LM algorithm is more efficient than the BP one
Operating Characteristics (ROC) curves. They give finding the minimum of the error surface. Moreover,
the estimated Pd for a desired Pfa, which values are the MLP training with LM is faster than the training
obtained varying the output threshold of the detector. with BP, because the number of training epochs can be
The experiments presented are made for an integration reduced in an order of magnitude. The second aspect
of two pulses (N=2). So, in order to test correctly the is related with the MLP size. As can be observed, no
TSKAP detector, observation vectors (also called pat- performance improvement is achieved when 20 or more
terns during the text) of length 3 (N+(N-1)) complex hidden neurons are used comparing both algorithms
samples are generated, due to memory requirements as occurred with 10 hidden neurons. Moreover, from
of the TSKAP detector (N-1 pulses). 20 (W=121 weights) to 30 (W=181 weights) hidden
The a priori probabilities of H0 and H1 hypothesis neurons, the performance tends to a maximum value
are supposed to be the same. Three sets of patterns (independently of the training algorithm used), i.e.,
are generated for each experiment: train, validation almost no performance improvement is achieved with
and test sets. The first and the second ones have 5103 more weights. So, an MLP-based detector with 20
patterns, respectively. The third one has 2.5106 pat- hidden neurons achieves an appropriate performance
terns, so the error in the estimation of the Pfa and the with low complexity.
Pd is lower than 10% of the estimated values in the A comparison between the performances achieved
worst case (Pfa=10-4). The patterns of all the sets are with the TSKAP detector and the MLP-based detector
synthetically generated under the same conditions. of size 6/20/1 trained with BP and LM algorithms is
These conditions involve typical values (Farina, 1987a, shown in figure 3. Two differences can be observed. The
DiFranco, 1980, Farina, 1987b) for the SNR (20 dB), first one is that the MLP-based detector performance
the CNR (30 dB) and the a (a=1.2) parameter of the is practically independent of the training algorithm,
Weibull-distributed clutter. comparing their results with the ones obtained for the
The MLP architecture used to generate the MLP- TSKAP detector. And the second one is that the 6/20/1
based detector is 6/H/1. The number of MLP outputs MLP-based detector is always better than the TSKAP
(O=1) is established by the problem (binary detec- detector when they are compared in the same conditions
tion). The number of hidden neurons (H) is studied of availability of information, i.e., with the availability
in this work. And the number of MLP inputs (I=6) is of 3 (N+(N-1)) pulses to decide. Under these conditions
established according to the next criterion. A total of 6 and comparing figures 2 and 3, it can be observed that
inputs (2(N+(N-1))) are selected when the MLP-based a 6/05/1 MLP-based detector is enough to overcome
detector wants to be compared with the TSKAP detector the TSKAP one.
in the same conditions, i.e., when both detectors have The appreciated differences between the TSKAP and
the same available information (3 pulses for an integra- MLP-based detectors appear because the first one is a
tion of N=2 pulses). Because of the TSKAP detector suboptimum detector and the second one approximates
memory requirements, this case is considered. the optimum one, but it will be always worse than the
Figure 2 shows the results of a study when 3 pulses optimum detector. It can not be demonstrated because
are used to take the final decision by the MLP-based an analytical expression for the optimum detector
936
Intelligent Radar Detectors
Figure 2. MLP-based detector performances for different structure sizes (6/H/1) and different training algo-
rithms: (a) BP and (b) LM I
(a) (b)
Figure 3. TSKAP and MLP-based detectors perfor- related with the research in Radar detectors. In the first
mances for MLP size 6/20/1 trained with BP and LM trend, it is possible to emphasize the research in areas
algorithms like ensembles of ANNs, committee machines based
on ANNs and others way to combine the intelligence
of different ANNs like the MLPs, the Radial Basis
Functions and others. Moreover, new trends try to
find different ways to train ANNs. In the second trend,
several researchers are trying to find different ways to
create radar detectors in order to improve their perfor-
mances. Moreover, several solutions are proposed, but
they depend on the Radar environment considered. So,
detectors based on signal processing tools seem to be the
most appropriated, but the intelligent detector exposed
here is a new way of working, which can brings good
solutions to these problems. This is possible because
of the intelligence of the ANNs to adapt to almost any
kind of Radar conditions and problems.
CONCLUSION
can not be obtained detecting targets in presence of
Weibull-distributed clutter. After the developed study, several conclusions can be
set. The LM training algorithm achieves better MLP-
based detectors than the BP one. No performance
FUTURE TRENDS improvement is obtained for training MLPs with LM
or BP algorithms when their sizes are greater than
Two different future trends can be mentioned. The 6/20/1. But, the great advantage of the LM one against
first one is related with ANNs and the second one is the BP one is its fastest training for low size MLPs (a
937
Intelligent Radar Detectors
few hundred of weights), i.e., the MLPs considered in Haykin, S. (1999). Neural Networks. A Comprehensive
this study. Finally, the MLP-based detector works bet- Foundation (Second Edition). Prentice-Hall.
ter than the TSKAP one in cases of working with the
Jarabo-Amores, P., Rosa-Zurera, M., Gil-Pita, R., &
same available information (N+(N-1)=3), because the
Lpez-Ferreras, F. (2005). Sufficient Condition for an
memory requirements of the TSKAP one. In those cases,
Adaptive System to Aproximate the Neyman-Pearson
low complexity MLP-based detectors can be obtained
Detector. Proc. IEEE Workshop on Statistical Signal
because a 6/05/1 MLP has enough intelligence to obtain
Processing. 295-300.
better performance than the TSKAP one.
Nguyen, D., & Widrow, B. (1999). Improving the
Learning Speed of 2-layer Neural Networks by Choos-
REFERENCES ing Initial Values of the Adaptive Weights. Proc. of the
Int. Joint Conf. on Neural Networks. 21-26.
Andina, D., & Sanz-Gonzalez, J.L. (1996). Comparison
Ruck, D.W., Rogers, S.K., Kabrisky, M., Oxley, M.E.,
of a Neural Network Detector Vs Neyman-Pearson
& Suter, B.W. (1990). The Multilayer Perceptron as
Optimal Detector. Proc. of ICASSP-96. 3573-3576.
an Approximation to a Bayes Optimal Discriminant
Bishop, C.M. (1995). Neural networks for pattern Function. IEEE Trans. on Neural Networks. (1) 11,
recognition. Oxford University Press Inc. 296-298.
De la Mata-Moya, D., Jarabo-Amores, P., Rosa-Zurera, Van Trees, H.L. (1997). Detection, Estimation and
M., Lpez-Ferreras, F., & Vicen-Bueno, R. (2005). Ap- Modulation Theory. Part I. John Wiley and Sons.
proximating the Neyman-Pearson Detector for Swerling
Vicen-Bueno, R., Jarabo-Amores, M. P., Rosa-Zurera,
I Targets with Low Complexity Neural Networks. Lec-
M., Gil-Pita, R., & Mata-Moya, D. (2007). Performance
ture Notes in Computer Science. (3697), 917-922.
Analysis of MLP-Based Radar Detectors in Weibull-
Cheikh, K., & Faozi S. (2004). Application of Neural Distributed Clutter with Respect to Target Doppler
Networks to Radar Signal Detection in K-distributed Frequency. Lecture Notes in Computer Science. (4669),
Clutter. First Int. Symp. on Control, Communications 690-698.
and Signal Processing Workshop Proc. 633-637.
Vicen-Bueno, R., Rosa-Zurera, M., Jarabo-Amores, P.,
DiFranco, J.V., & Rubin, W.L. (1980). Radar Detec- & Gil-Pita, R. (2006). NN-Based Detector for Known
tion. Artech House. Targets in Coherent Weibull Clutter. Lecture Notes in
Computer Science. (4224), 522-529.
Farina, A.,Russo, A., Scannapieco, F., & Barbarossa,
S. (1987a). Theory of Radar Detection in Coherent
Weibull Clutter. In: Farina, A. (eds.): Optimised Radar
Processors. IEE Radar, Sonar, Navigation and Avionics, KEy TERmS
Series 1. Peter Peregrinus Ltd. 100-116.
Artificial Neural Networks (ANNs): A network
Farina, A., Russo, A., & Scannapieco, F. (1987b).
of many simple processors (units or neurons) that
Radar Detection in Coherent Weibull Clutter. IEEE
imitates a biological neural network. The units are
Trans. on Acoustics, Speech and Signal Processing.
connected by unidirectional communication channels,
ASSP-35 (6), 893-895.
which carry numeric data. Neural networks can be
Gandhi, P.P., & Ramamurti, V. (1997). Neural Networks trained to find nonlinear relationships in data, and are
for Signal Detection in Non-Gaussian Noise. IEEE used in applications such as robotics, speech recogni-
Trans. on Signal Processing. (45) 11, 2846-2851. tion, signal processing or medical diagnosis.
Hagan. M.T., & Menhaj, M.B. (1994). Training Feed- Backpropagation Algorithm: Learning algorithm
forward Networks with Marquardt Algorithm. IEEE of ANNs, based on minimising the error obtained from
Trans. on Neural Networks. (5) 6, 989-993. the comparison between the ANN outputs after the
application of a set of network inputs and the desired
938
Intelligent Radar Detectors
outputs. The update of the weights is done according where to go in order to find the minimum of the error
to the gradient of the error function evaluated in the function, instead of the local minimum one that gives I
point of the input space that indicates the input to the the backpropagation algorithm.
ANN.
Probability Density Function: The statistical func-
Knowledge Extraction: Explicitation of the internal tion that shows how the density of possible observations
knowledge of a system or set of data in a way that is in a population is distributed.
easily interpretable by the user.
Radar: It is the acronym of Radio Detection and
Intelligence: It is a property of mind that encom- Ranging. In few words, a Radar emits an electromag-
passes many related abilities, such as the capacities to netic wave that is reflected by the target and others
reason, plan, solve problems, think abstractly, compre- objects present in its observation space. Finally, the
hend ideas and language, and learn. Radar receives these reflected waves (echoes) to
analyze them in order to decide whether a target is
Levenberg-Marquardt Algorithm: Similar to the
present or not.
Backpropagation algorithm, but with the difference that
the error is estimated according to the Hessian Matrix.
This matrix gives information of several directions
939
940
Somasheker Akkaladevi
Virginia State University, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Software Agents Analysis in E-Commerce I
issues requiring true human intelligence and interven- agents to communicate and cooperate with them to
tion. Implementing the personalized, social, continu- achieve tasks for human. I
ously running, and semi-autonomous ISA technologies
in business information systems, the online business Intelligent Agent Taxonomy and
can become more user-friendly, semi-intelligent, and Typology
human-like (Pivk 2003).
Franklin and Grasser (1996) proposed a general tax-
onomy of agent (see Figure 1).
LITERATURE REVIEW This taxonomy is based on the fact that ISA technolo-
gies are implemented in a variety of areas, including
A number of scholars have defined the term intelligent biotechnology, economic simulation and data-mining,
software agent. Bradshaw (1997) proposed that one as well as in hostile applications (malicious codes),
persons intelligent agent is another persons smart machine learning and cryptography algorithms. In
object. Jennings and Wooldridge (1995) defined agents addition, Nwana (1996b) proposed the agent typology
as a computer system situated in some environment that (see Figure 2) in which four types of agents can be
is capable of autonomous action in this environment categorized: collaborative agents, collaborative learn-
to meets its design objective. Shoham (1997) further ing agents, interface agents and smart agents. These
described an ISA as a software entity which functions four agents have different congruence amid learning,
continuously and autonomously in a particular environ- autonomy, and cooperation and therefore tend to ad-
ment, often inhabited by other agents and processes. In dress different sides of this topology in terms of the
general, an ISA is a software agent that uses Artificial functionality.
Intelligence (AI) in the pursuit of the goals of its clients According to Nwana (1996b), collaborative agents
(Croft 2002). It can perform tasks independently on emphasize more autonomy and cooperation than learn-
behalf of a user in a network and help users with infor- ing. They collaborate with other agents in multi-agent
mation overload. It is different from current programs environments and may have to negotiate with other
in terms of being proactive, adaptive, and personalized agents in order to reach mutually acceptable agree-
(Guttman et al. 1998b). Also, it can actively initiate ments for users. Unlike collaborative agents, interface
actions for its users according to the configurations set agents emphasize more autonomy and learning. They
by the users; it can read and understand users prefer- support and provide proactive assistance. They can
ences and habits to better cater to users needs; it can observe users actions in the interface and suggest
provide the users with relevant information according better ways for completing a task for the user. Also,
to the pattern it adapts from the users. interface agents cooperation with other agents is typi-
ISA is a cutting-edge technology in computational cally limited to asking for advice (Ndumu et al. 1997).
sciences and holds considerable potential to develop
new avenues in information and communication
technology (Shih et al. 2003). It is used to perform
Figure 1. Franklin and Grassers agent taxonomy
multi-task operations in decentralized information
(Source: Franklin & Grasser. 1996)
systems, such as the Internet, to conduct complicated
and wide-scale search and retrieval activities, and assist
in shopping decision-making and product information
search (Cowan et al. 2002). ISAs ability of performing
continuously and autonomously stems from human
desire in that an agent is capable of operating certain
activities in a flexile and intelligent manner responsive
to changes in the environment without constant human
supervision. Over a long period of time, an agent is
capable of adapting from its previous experience and
would be able to inhabit an environment with other
941
Intelligent Software Agents Analysis in E-Commerce I
The benefits of interface agents include reducing users or collating information from many distributes sources
efforts in repetitive work and adapting to their users (Nwana 1996b). Reactive agents choose actions by
preferences and habits. Smart agents are agents that using the current world state as an index into a table
are intelligent, adaptive, and computational (Carley of actions, where the indexing functions purpose is
1998). They are advanced intelligent agents summing to map known situations to appropriate actions. These
up the best capabilities and properties of all presented types of agents are sufficient for limited environments
categories. where every possible situation can be mapped to an
This proposed typology highlights the key contexts action or set of actions (Chelberg 2003). Hybrid agents
in which the agent is used in AI literature. Yet Nwana adopt strength of both the reactive and deliberative
(1996b) argued that agents ideally should do all three paradigms. They aim to have the quick response time
equally well, but this is the aspiration rather than the of reactive agents for well known situations, yet also
reality. Furthermore, according to Nwana (1996b) have the ability to generate new plans for unforeseen
and Jennings and Wooldridge (1998), five more agent situations (Chelberg 2003). Heterogeneous agents
types could be derived based on the typology, from a systems refer to an integrated set-up of at least two or
panoramic perspective (see Figure 3). more agents, which belong to two or more different
In this proposed typology, mobile agents are autono- agent classes (Nwana 1996b).
mous and cooperative software processes capable of
roaming wide area networks, interacting with foreign
hosts, performing tasks on behalf of their owners CONCLUSION AND FUTURE WORK
(Houmb 2002). Information agents can help us manage
the explosive growth of information we are experienc- This paper explores how ISAs can automate and add
ing. They perform the role of managing, manipulating, value to e-commerce transactions and negotiations. By
Figure 3. A panoramic overview of the different agent types (Source: Jennings & Wooldridge, 1998)
942
Intelligent Software Agents Analysis in E-Commerce I
leveraging ISA-based e-commerce systems, companies Franklin, S., and Graesser, A. Is it an Agent, or just
can more efficiently make decisions because they have a Program?: A Taxonomy for Autonomous Agents, I
more accurate information and identify consumers Proceedings of the Third International Workshop
tastes and habits. Opportunities and limitations for ISA on Agent Theories, Architectures, and Languages,
development are also discussed. Future technologies Springer-Verlag, 1996.
of ISAs will be able to evaluate basic characteristics
Guttman, R., Moukas, A., and Maes, P. Agent-medi-
of online transactions in terms of price and product
ated Electronic Commerce: A Survey, Knowledge
description as well as other properties, such as warranty,
Engineering Review (13:3), June 1998a.
method of payment, and after-sales service. Also, they
would better manage ambiguous content, personalized Guttman, R., Moukas, A., and Maes, P. Agents as
preferences, complex goals, changing environments, Mediators in Electronic Commerce, International
and disconnected parties (Guttman et al. 1998a). Ad- Journal of Electronic Markets (8:1), February 1998b,
ditionally, for the future trend of ISA technology de- pp 22-27.
ployment, Nwana (1996a) describes that Agents are
here to stay, not least because of their diversity, their He, M., Jennings, N.R., and Leung, H.-F. On Agent-
wide range of applicability and the broad spectrum of Mediated Electronic Commerce, IEEE Transactions
companies investing in them. As we move further and on Knowledge and Data Engineering (15:4), July/Au-
further into the information age, any information-based gust 2003.
organization which does not invest in agent technology Houmb, S.H. Software Agent: An Overview, online:
may be committing commercial hara-kiri. http://www.idi.ntnu.no/emner/dif8914/ppt-2002/sw-
agent_dif8914_2002.ppt) 2002.
Cowan, R., and Harison, E. Intellectual Property Nwana, H.S. Software Agents: An Overview, online:
Rights in Intelligent-Agent Technologies: Facilitators, http://agents.umbc.edu/introduction/ao/) 1996b.
Impediments and Conflicts, online: http://www.itas. Pivk, A. Intelligent Agents in E-Commerce, online:
fzk.de/e-society/preprints/ecommerce/CowanHarison. http://ai.ijs.si/Sandi/IntelligentAgentRepository.html)
pdf) 2002. 2003.
Croft, D.W. Intelligent Software Agents: Definitions Shih, T.K., Chiu, C.-F., and Hsu, H.-h. An Agent-
and Applications, online: http://www.alumni.caltech. Based Multi-Issue Negotiation System in E-commerce,
edu/~croft/research/agent/definition/) 2002. Journal of Electronic Commerce in Organizations (1:1),
Jan-March 2003, pp 1-16.
943
Intelligent Software Agents Analysis in E-Commerce I
KEy TERmS
944
945
Somasheker Akkaladevi
Virginia State University, USA
ISA OPPORTUNITIES AND LImITATIONS the dominant WWW protocol, can be used to provide
IN E-COmmERCE many services, such as robust and scalable web serv-
ers, firewall access, and levels of security for these
Cowan et al. (2002) argued that the human cognitive E-commerce applications.
ability to search for information and to evaluate their Agents can be made to work individually, as well
usefulness is extremely limited in comparison to those as in a collaborative manner to perform more complex
of computers. In detail, its cumbersome and time- tasks (Franklin and Graesser, 1996). For example, to
consuming for a person to search for information from purchase a product on the Internet, a group of agents
limited resources and to evaluate the informations can exchange messages in a conversation to find the
usefulness. They further indicated that while people best deal, can bid in an auction for the product, can
are able to perform several queries in parallel and are arrange financing, can select a shipper, and can also
good at drawing parallels and analogies between pieces track the order. Multi-agent systems (groups of agents
of information, advanced systems that embody ISA ar- collaborating to achieve some purpose) are critical for
chitecture are far more effective in terms of calculation large-scale e-commerce applications, especially B2B
power and parallel processing abilities, particularly in interactions such as service provisioning, supply chain,
the quantities of material they can process (Cowan et negotiation, and fulfillment, etc. The grouping of agents
al. 2002). According to Bradshaw (1997), information can be static or dynamic depending on the specific need
complexity will continue to increase dramatically in the (Guttman et al., 1998b). A perfect coordination should
coming decades. He further contended that the dynamic be established for the interactions between the agents to
and distributed nature of both data and applications achieve a higher-level task, such as requesting, offering
require that software not merely respond to requests and accepting a contract for some services (Guttman
for information but intelligently anticipate, adapt, and et al., 1998a).
actively seek ways to support users. There are several agent toolkits publicly available
E-commerce applications based on agent-oriented which can be used to satisfy the customer requirements
e-commerce systems have great potential. Agents can and ideally they need to adhere to standards which
be designed using the latest web-based technologies, define multi-party agent interoperability. For example,
such as Java, XML, and HTTP, and can dynamically fuzzy logic based intelligent negotiation agents can be
discover and compose E-services and mediate inter- used to interact autonomously and consequently, and
actions to handle routine tasks, monitor activities, set save human labor in negotiations. The aim of model-
up contracts, execute business processes, and find the ing a negotiation agent is to reach mutual agreement
best services (Shih et al., 2003). The main advantages efficiently and intelligently. The negotiation agent
of using these technologies are their simplicity of us- should be able to negotiate with other such agents over
age, ubiquitous nature, and their heterogeneity and various sets of issues, and on behalf of the real-world
platform independence (Begin and Boisvert, 2002). parties they represent, i.e. they should be able to handle
XML will likely become the standard language for multi-issue negotiations at any given time.
agent-oriented E-commerce interactions to encode The boom in e-commerce has now created the need
exchanged messages, documents, invoices, orders, for ISAs that can handle complicated online transac-
service descriptions, and other information. HTTP, tions and negotiations for both sellers and buyers. In
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Software Agents Analysis in E-Commerce II
general, buyers want to find sellers that have desired the user with the result of the search, and, if needed,
products and services. And they want to find product purchase the tickets for the user if certain requirements
information and gain expert advice before and after can be met. In the meantime, interface agents can
the purchase from sellers, which, in turn, want to find monitor the users reaction and decision behavior, and
buyers and provide expert advice about their product would provide the user with informational assistance
or service as well as customer service and support. in terms of advice, recommendation, and suggestion
Therefore, there is an opportunity that both buyers for any related and similar transactions.
and sellers can automate handling this potential trans- On the other hand, however, this kind of intelligent
action by adopting ISA technology. The use of ISAs electronic communication and transaction is relatively
will be essential to handling many tasks of creating, inapplicable in traditional commerce where different
maintaining, and delivering information on the Web. competitive vendors are not willing to share informa-
By implementing ISA technology in e-commerce, tion with each other (Maes et al., 1999). The level of
agents can shop around for their users; they can com- willingness in ISA-based e-commerce is, however,
municate with other agents for product specifications, somewhat limited due to sociological and ethical
such as price, feature, quantity, and service package, factors, which will be discussed later in this paper. In
and make a comparison according to users objective addition, designing and implementing ISA technology
and requirement and return with recommendations of is a costly predicament preventing companies from
purchases, which can meet those specifications; they adopting this emerging tool. Companies need to invest
can also act for sellers by providing product or service a lot of money to get the ISA engine started. Notwith-
sales advice, and help troubleshoot customer problems standing the exciting theoretical benefits discussed
by automatically offering solutions or suggestions; above, many companies are still not sure about how
they can automatically pay bills and keep track of the much ISA technology can benefit themselves in terms
payment. of revenue, ROI, and business influence in the market
Looking at ISA development from an international where other players are yet to adopt this technology
stand point, the nature of Internet in developed countries, to cooperate with each other. Particularly, medium or
such as USA, Canada, West Europe, Japan, and Austra- small size companies are reluctant to embark on this
lia, etc. and the consequent evolution of e-commerce arena mainly due to the factor of cost.
as the new model provide exciting opportunities and Additionally, lack of consistent architectures in
challenges for ISA-based developments. Opportunities terms of standards and laws also obstructs the further
include wider market reach in a timely manner, higher development of ISA technology (He et al., 2003). In
earnings, broader spectrum of target and potential detail, IT industry has not yet finalized the ISA stan-
customers, and collaboration among vendors. This dards, as there are a number of proprietary standards
ISA-powered e-commerce arena would be different set by various companies. This causes a confusion
than our traditional commerce, because the traditional problem for ISAs to freely communicate with each
form of competition can give way to collaborative other. Also, related to standards, relevant laws have
efforts across industries for adding value to business not surfaced to regulate how ISAs can legally cooper-
processes. This means that agents of different vendors ate with each other and represent their human users in
can establish a cooperative relationship to communicate the cyber world.
with each other via XML language in order to set up Additionally, ISA development and deployment
and complete transactions online. is not a global perspective (Jennings et al. 1998). De-
Technically, for instance, if an information agent spite the fact that ISA technology is an ad-hoc topic
found that the vendor is in need of more airplane tick- in developed countries, developing countries are not
ets, it would notify a collaborative agent to search for fully aware of the benefits of ISA and therefore have
relevant information regarding the ticket in terms of not deployed ISA-based systems on the Web because
availability, price, and quantity etc. from other sources their e-commerce development levels and skills are not
over the Internet. In this case, the collaborative agent as sophisticated or advanced as those of the developed
would work with mobile agents and negotiate with other countries. This intra-national limitation among devel-
agents working for different vendors and obtain ticket oped and developing countries unfortunately hinders
information for its user. It would be able to provide
946
Intelligent Software Agents Analysis in E-Commerce II
agents from freely communicating with each other over company who wrote the agent, the company who
the globally connected Internet. customized and used the agent, or both) should be I
responsible for the legal issues.
Cyber-ethical issues: Eichmann (1994) and Etzioni
SOCIOLOGICAL AND ETHICAL & Weld (1994) proposed the following etiquettes
CHALLENGES for ISAs which gather information on the Web.
Agents must identify themselves;
In the preceding sections of this paper, the technical They must moderate the pace and frequency
issues involved in agent development have been ad- of their requests to some server;
dressed. However, in addition to these issues, there are They must limit their searches to appropri-
also a range of social and cyber-ethical problems, such ate servers;
as trust and delegation, privacy, responsibility, and legal They must share information with others;
issues, which will become increasingly important in the They must respect the authority placed on
field of agent technology (Bradshaw 1997; Jennings et them by server operators;
al. 1998; Nwana 1996b). Their services must be accurate and up-to-
date;
Trust and delegation: For users who want to Safety: they should not destructively alter
depend on ISA technology to obtain desired in- the world;
formation, they must trust agents which autono- Tidiness: they should leave the world as
mously delegate for users to do the job. It would they found it;
take time for users to get used to their agents and Thrift: they should limit their consumption
gain confidence in the agents that work for them. of scarce resources;
And users have to make a balance between agents Vigilance: they should not allow client ac-
continually seeking guidance and never seeking tions with unanticipated results.
guidance. Users might need to set proper limitations
for their agents, otherwise agents might surpass
their authorities. CONCLUSION AND FUTURE WORK
Privacy: In the explosive information society,
security is becoming more and more important. ISA technology has to confront the increasing complex-
Therefore, users must make sure that their agents ity of modem information environments. Research and
always maintain their privacy in the course of development of ISAs on the Internet is crucial for the
transactions. Electronic agent security policies may development of next generation in open information
be needed to encounter this potential threat. environments. Sociological and cyber-ethical issues
Responsibility: Users need to seriously consider need to be considered for the next generation of agents
how much responsibility the agents need to carry in e-commerce system, which will explore new types
regarding the transaction pitfall. To some extent, of transactions in the form of dynamic relationships
agents are rendered responsibility to get the desired among previously unknown parties (Guttman et al.
product/service for their users. If the users are not 1998b). According to Nwana (1996a), the ultimate
satisfied with the transaction result, they may need ISAs success will be the acceptance and mass usage
to redesign or reprogram the agent rather than di- by users, once issues such as privacy, trust, legal, and
rectly blame the fault on electronic agents. responsibility are addressed and considered when
Legal issues: In addition to responsibility, users users design and implement ISA technologies in e-
should also think about any potential legal issues commerce and emerging commerce, such as mobile
triggered by their agents, which, for instance, of- commerce (M-commerce) and Ubiquitous commerce
fer inappropriate advice to other agents resulting (U-commerce). It is expected that future research can
in liabilities to other people. This would be very further explore how ISAs are leveraged in these two
challenging to the ISA technology development, newly emerged avenues.
and the scenario would be complicated since the
current law does not specify which party (the
947
Intelligent Software Agents Analysis in E-Commerce II
948
Intelligent Software Agents Analysis in E-Commerce II
949
950
Milan Stankovi
University of Belgrade, Serbia
Uro Kradinac
University of Belgrade, Serbia
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Software Agents with Applications in Focus
Flexibility means that the agent is able to per- sequence of an agent would completely define the
ceive its environment and respond to changes agents behavior. This would define the function that I
in a timely fashion; it should be able to exhibit maps any sequence of percepts to the concrete action
opportunistic, goal-directed behaviour and take the agent function. The program that defines the agent
the initiative whenever appropriate. In addition, function is called the agent program. So, the agent
an agent should be able to interact with other function is a formal description of the agents behavior,
agents and humans, thus to be social. and the agent program is a concrete implementation
of that formalism. (Krcadinac, Stankovic, Kovanovic
For some researchers - particularly those interested & Jovanovic, 2007)
in AI - the term `agent has a stronger and more specific To implement all this, we need to have a comput-
meaning than that sketched out above. These researchers ing device with appropriate sensors and actuators on
generally mean an agent to be a computer system that, which the agent program will run. This is called agent
in addition to having the properties identified above, is architecture. So, an agent is essentially made of two
either conceptualized or implemented using concepts components: the agent architecture and the agent
that are more usually applied to humans. For example, program.
it is quite common in AI to characterize an agent using Also, as Russell and Norvig (1995) specify, one of
mentalistic notions, such as knowledge, belief, inten- the most sought after characteristics of an agent is its
tion, and obligation (Wooldridge & Jennings, 1995). rationality. An agent is rational if it always does the
action that will lead to the most successful outcome.
The rationality of an agent depends on (a) the perfor-
INTELLIGENT SOFTWARE AGENTS mance measure that defines what is a good action and
what is a bad action, (b) the agents knowledge about
Agents and Environments the environment, (c) the agents available actions, and
(d) the agents percept history.
An agent collects its percepts through its sensors, and
acts upon the environment through its actuators. Thus, The Types of Agents
the agent is proactive. Its actions in any moment de-
pend on the whole sequence of these inputs up to that There are several basic types of agents with respect to
moment. A decision tree for every possible percept their structure (Russell & Norvig, 1995):
951
Intelligent Software Agents with Applications in Focus
happy? along with the standard what this action service providers with service requesters in the agent
will have as a result?. world. These agents are useful in various roles, such
4. Utility-based agents use a utility function that as matchmakers or yellow page agents that collect and
maps each state to a number that represents the process service offers (advertisements), blackboard
degree of happiness. They are able to perform agents that collect requests, and brokers that process
rationally even in the situations when there are both (Sycara, Decker, & Williamson, 1997). There are
conflicting goals, as well as when there are sev- several alternatives to middle agents, such as Electronic
eral goals that can be achieved, but none with Institutions a framework for Agents Negotiation
certainty. which seeks to incorporate organizational concepts
5. Learning agents do not have a priori knowledge into multi-agent systems. (Rocha and Oliveira, 2001)
of the environment, but learn about it. This is Communication among agents is achieved by
beneficial because these agents can operate in exchanging messages represented by mutually under-
unknown environments and to a certain degree standable language (syntax) and containing mutually
facilitates the job of developers because they understandable semantics. In order to find a common
do not need to specify their whole knowledge ground for communication, an agent communication
base. language (ACL) should be used to provide mechanisms
for agents to negotiate, query, and inform each other.
Multi-Agent Systems The most important such languages today are KQML
(Knowledge Query and Manipulation Language)
Multi-Agent Systems (MAS) are systems composed (ARPA Knowledge Sharing Initiative, 1993) and FIPA
of multiple autonomous components (agents). They ACL (FIPA, 1997).
historically belong to Distributed Artificial Intelligence
(Bond & Gasser, 1998). MAS can be defined as a
loosely coupled network of problem solvers that work AGENT APPLICABILITy
together to solve problems that are beyond the individual
capabilities or knowledge of a single problem solver There are great possibilities for applying multi-agent
(Durfee and Lesser, 1989). In a MAS, each agent has systems to solving different kinds of practical prob-
incomplete information or capabilities for solving the lems.
problem and thus has a limited viewpoint. There is no
global system control, the data is decentralized and the Auction negotiation model, as a form of commu-
computation is asynchronous. nication, enables a group of agents to find good
In addition to MAS, there is also the concept of solutions by achieving agreement and making
a multi-agent environment, which can be seen as an mutual compromises in case of conflicting goals.
environment that includes more than one agent. Thus, Such an approach is applicable to trading systems,
it can be cooperative, or competitive, or a combined where agents act on behalf of buyers and sellers.
one, and creates a setting where agents need to interact Financial markets, as well as scheduling, travel
(socialize) between each other, either to achieve their arrangement, and fault diagnosing also represent
individual objectives, or to manage the dependencies applicable fields for agents.
that follow from being situated in a common environ- Another very important field is information gath-
ment. These interactions range from simple semantic ering, where agents are used to search through
interoperation (exchanging comprehensible com- diverse and vastly different information sources
munications), client-server interactions (the ability to (e.g., World Wide Web) and acquire relevant
request that a particular action is performed), to rich information for their users. One of the most
social interactions (the ability to cooperate, coordinate, common domains is Web browsing and search,
and negotiate about a course of action). where agents are used to adapt the content (e.g.,
Because of the issues due to heterogeneous nature search results) to the users preferences and offer
of agents involved in communication (e.g., finding relevant help in browsing.
one another), there is also a need for middle-agents, Process control software systems require various
which cover cooperation among agents and connect kinds of automatic (autonomous) control and re-
952
Intelligent Software Agents with Applications in Focus
action for its processes (e.g. production process). technologies, to operation, analysis and provision of
Reactive and responsive, agents perfectly fit the training, and mission rehearsal for combat situations. I
needs of such a task. Example domains in this The Human Variability in Computer Generated Forces
field include: production process control, climate (HV-CGF) project, undertaken on behalf of the UKs
monitoring, spacecraft control, and monitoring Ministry of Defence, developed a framework for simu-
nuclear power plants. lating behavioral changes of individuals and groups
Artificial life studies the evolution of agents, or of military personnel when subjected to moderating
populations of computer simulated life forms influences such as caffeine and fatigue. The project
in artificial environments. The goal is to study was built with the JACK Intelligent Agents toolkit, a
phenomena found in real life evolution in a con- commercial Java-based environment for developing and
trolled manner, hopefully to eliminate some of the running multiagent applications. Each team member
inherent limitations and cruelty of evolutionary is a rational agent able to execute actions such as doc-
studies using live animals. trinal and non-doctrinal behaviour tactics, which are
Finally, intelligent tutoring systems often include encoded as JACK agent graphical plans. (Belecheanu
pedagogical agents, which represent software et al., 2005)
entities constructed to present the learning content Mobility Agents is an agent-based architecture that
in a user-friendly fashion and monitor the users helps a person with cognitive disabilities to travel us-
progress through the learning process. These ing public transportation. Agents are used to represent
agents are responsible for guiding the user and transportation participants (buses and travelers) and
suggesting additional learning topics related to to enable notification of bus approaching and arrival.
the users needs (Devedzic, 2006). Information is passed to the traveler using a multimedia
interface, via a handheld device. Customizable user
Some of the more specific examples of intelligent profiles determine the most appropriate modality of
agent applications include Talaria System, military interaction (voice, text, and pictures) based on the users
training, and Mobility Agents. T a l a r i a System (The abilities (Repenning & Sullivan, 2003). This imposes
Autonomous Lookup And Report Internet Agent Sys- a personal agent to take care that abstract goals, as
tem) is a multi-agent system, developed for academic go home, are translated into concrete directions. To
purposes at the University of Belgrade, Serbia. It was achieve this, an agent needs to collect information about
built as a solution to the common problem of gathering user-specific locations and must be able to suggest the
information from diverse Web sites that do not provide right bus for the particular users current location and
RSS feeds for news tracking. The system was imple- destination.
mented using the JADE modeling framework in Java.
(Stankovic, Krcadinac, Kovanovic & Jovanovic, 2007)
Talaria System is using the advantages of human-agent FUTURE TRENDS
communication model to improve usability of web
sites and to relieve users from annoying and repetitive Future looks bright for this technology as development
work. The system provides each user with a personal is taking place within a context of broader visions and
agent, which periodically monitors the Web sites that trends in IT. The whole growing field of IT is about
the user expressed interest in. The agent informs its to drive forward the R&D of intelligent agents. We
user about relevant changes, filtered by assumed user especially emphasize the Semantic Web, ambient in-
preferences and default relevance factors. Human-agent telligence, service oriented computing, Peer-to-peer
communication is implemented via email, so that a user computing and Grid Computing.
can converse with her/his agent in natural language, The Semantic Web is the vision of the future Web
whereas the agent heuristically interprets concrete in- based on the idea that the data on the Web can be defined
structions from the mail text (e.g., monitor this site and linked in such a way that it can be used by machines
or kill yourself). for the automatic processing and integration (Berners-
Simulation and modelling are extensively used in Lee, Hendler, & Lassila, 2001). The key to achieving
a wide range of military applications, from develop- this is by augmenting Web pages with descriptions
ment, testing and acquisition of new systems and of their content in such a way that it is possible for
953
Intelligent Software Agents with Applications in Focus
machines to reason automatically about that content. another for their individual and/or collective benefit. A
The common opinion is that the Semantic Web itself number of significant advances have been made over
will be a form of intelligent infrastructure for agents, the past two decades in design and implementation of
allowing them to understand the meaning of the data individual autonomous agents, and in the way in which
on the Web (Luck et al., 2005). they interact with one another. These concepts and
The concept of ambient intelligence describes a technologies are now finding their way into commercial
shift away from PCs to a variety of devices which are products and real-world software solutions. Future IT
embedded in our environment and which are accessed visions share the common need for agent technologies
via intelligent interfaces. It requires agent-like tech- and prove that agent technologies will continue to be
nologies in order to achieve autonomy, distribution, of vital importance. It is foreseeable that agents will
adaptation, and responsiveness. become the integral part of informational technologies
Service oriented computing is where MAS could and artificial intelligence in the near future, and that is
become very useful. In particular, this might involve why they should be kept an eye on.
web services, where the Quality Of Service demands
are important. Each web service could be modeled as
an agent, with dependencies, and then simulated for REFERENCES
observed failure rates.
Peer-to-peer (P2P) computing, presenting net- Agha, G., Wegner, P., & Yonezawa, A. (Eds.). (1993).
worked applications in which every node is in some Research directions in concurrent object-oriented
sense equivalent to all others, tends to become more programming. Cambridge, MA: The MIT Press.
complex in the future. Auction mechanism design,
agent negotiation techniques, increasingly advanced ARPA Knowledge Sharing Initiative. (1993). Specifi-
approaches to trust and reputation, and the application cation of the KQML agent-communication language
of social norms, rules and structures - presents some plus example agent policies and architectures. Re-
of the agent technologies that are about to become trieved January 30, 2007, from http://www.csee.umbc.
relevant in the context of P2P computing. edu/kqml/papers/kqmlspec.pdf.
Grid Computing is the high-performance agent- Barber, K. S., and Martin, C. E. (1999). Agent Au-
based computing infrastructure for supporting large- tonomy: Specification, Measurement, and Dynamic
scale distributed scientific endeavour. The Grid provides Adjustment, Autonomy Control Software Workshop,
a means of developing eScience applications, yet it Seattle, Washington.
also provides a computing infrastructure for support-
ing more general applications that involve large-scale Belecheanu, A. R., Luck, M., McBurney P., Miller T.,
information handling, knowledge management and Munroe, S., Payne T., & Pechoucek M. (2005). Com-
service provision. The key benefit of Grid computing mercial Applications of Agents: Lessons, Experiences
is flexibility the distributed system and network can and Challenges. (p. 2) Southampton, UK.
be reconfigured on demand in different ways as busi- Berners-Lee, T., Hendler, J., & Lassila, O. (2001, May).
ness needs change. The Semantic Web, Scientific American, pp. 35-43.
Some considerable challenges have still remained
in the agent-based world, such as the lack of sophis- Bond, A. H., & Gasser, L. (Eds.). (1998). Readings
ticated software tools, techniques and methodologies in distributed artificial intelligence. San Mateo, CA:
that would support the specification, development, Morgan Kaufmann Publishers.
integration and management of agent systems. Booch, G. (2004). Object-oriented analysis and design
(2nd ed.). MA: Addison-Wesley.
CONCLUSION Booth, D., Haas, H., McCabe, F., Newcomer, E., Cham-
pion, M., Ferris, C., & Orchard, D. (2004, February).,
Today, research and development in the field of intel- Web services architecture. W3C working group note
ligent agents is rapidly expanding. At its core is the 11. Retrieved January 30, 2007, from http://www.
concept of autonomous agents interacting with one w3.org/TR/ws-arch/.
954
Intelligent Software Agents with Applications in Focus
Devedzic, V. (2006). Semantic web and education. Agents in: Khosrow-Pour, M. (Ed.). Encyclopedia of
Berlin, Heidelberg, New York: Springer. Information Science and Technology, 2nd Edition, Idea I
Group International Publishing, (forthcoming)
Durfee, E. H., & Lesser, V. (1989). Negotiating task
decomposition and allocation using partial global plan- Sycara, K., Decker, K., & Williamson, M. (1997).
ning. In L. Gasser, & M. Huhns (Eds.), Distributed Middle-Agents for the Internet, In M. E. Pollack, (Ed.),
artificial intelligence: Volume II ( pp. 229244) Lon- Proceedings of the Fifteenth International Joint Confer-
don: Pitman Publishing and San Mateo, CA: Morgan ence on Artificial Intelligence (pp. 578-584). Morgan
Kaufmann. Kaufmann Publishers.
FIPA (1997). Part 2 of the FIPA 97 specifications: Wooldridge, M., & Jennings, N. R. (1995). Intelligent
Agent communication language. Retrieved January agents: Theory and practice. The Knowledge Engineer-
30, 2007, from http://www.fipa.org/specs/fipa00003/ ing Review, 10(2), pp. 115152.
OC00003A.html.
Krcadinac, U., Stankovic, M., Kovanovic, V., & Jo-
vanovic, J. (2007). Intelligent Multi-Agent Systems KEy TERmS
in: Carteli, A., & Palma, M. (Eds.). Encyclopedia of
Information Communication Technology, Idea Group Actuators: Software component and part of the
International Publishing, (forthcoming) agent used as a mean of performing actions in the
agent environment.
Luck, M., McBurney, P., Shehory, O., & Willmott, S.
(2005). Agent technology: Computing as interaction. Agent Autonomy: Agents active use of its capa-
Retrieved January 30, 2007, from http://www.agentlink. bilities to pursue some goal, without intervention by
org/roadmap/al3rm.pdf. any other agent in the decision-making process used
to determine how that goal should be pursued (Barber
Maes, P.(1994) Agents that reduce work and informa-
& Martin, 1999).
tion overload. Communications of the ACM, 37(7),
3140. Agent Percepts: Every information that an agent
receives trough its sensors, about the state of the en-
Repenning, A., & Sullivan, J. (2003). The Pragmatic
vironment or any part of the environment.
Web: Agent-Based Multimodal Web Interaction with
no Browser in Sight, In G.W.M. Rauterberg, M. Me- Intelligent Software Agent: An encapsulated
nozzi, & J. Wesson, (Eds.), Proceedings of the Ninth computer system that is situated in some environment
International Conference on Human-Computer Inter- and that is capable of flexible, autonomous action in
action (pp. 212-219). Amsterdam, The Netherlands: that environment in order to meet its design objectives
IOS Press. (Wooldridge & Jennings, 1995).
Rocha, A. P. & Oliveira, E. (2001) Electronic Institutions Middle-Agents: Agents that facilitate coopera-
as a framework for Agents Negotiation and mutual tion among other agents and typically connect service
Commitment. In P. Brazdil, A. Jorge (Eds.), Progress providers with service requesters.
in Artificial Intelligence (Proceedings of 10th EPIA),
LNAI 2258, pp. 232-245, Springer. Multi-Agent System (MAS): A software system
composed of several agents that interact in order to
Russell, S. J., & Norvig, P. (1995). Artificial intelligence: find solutions of complex problems.
A modern approach. New Jersey: Prentice-Hall.
Sensors: Software component and part of the agent
Stankovic, M., Krcadinac, U., Kovanovic, V., & Jova- used as a mean of acquiring information about current
novic, J. (2007). An Overview of Intelligent Software state of the agent environment (i.e., agent percepts).
955
956
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intelligent Traffic Sign Classifiers
The objective of this work is the study of different video sequence, the Image Extraction block makes the
classification techniques combined with different pre- video sequence easy to read and it is the responsible to I
processings to implement an intelligent TSC system. obtain images. The Sign Detection and Extraction Stage
The preprocessings considered are shown below and extracts all the traffic signs contained in each image
are used to reduce the classifier complexity and to and generates the small images called blobs, one per
improve its performance. The studied classifiers are possible sign. Figure 1 also shows an example of the way
the k-Nearest Neighbor (k-NN) and an ANN based this block works. The Color Recognition Stage is the
method using Multilayer Perceptrons (MLPs). So, responsible to discern among the different predominant
this work tries to find which are the best preprocess- color of the traffic sign: blue, red or others. Once the
ings, the best classifiers and which combination of blob is classified according to its predominant color, the
them minimizes the error rate. TSC Stage has the responsibility to recognize the exact
type of signal, which is the aim of this work. This stage
is divided in two parts: the traffic sign preprocessing
INTELLIGENT TRAFFIC SIGN stage and the TSC core.
CLASSIFICATION
Database Description
An intelligent traffic sign classification can be achieved
taking into account two important aspects. The first one The database of blobs used to obtain the results pre-
focus on the extraction of the relevant information of sented in this work is composed of blobs with only
the input traffic signs, which can be done adaptively or noise and nine different types of blue traffic signs,
fixed. The second one is related with the classification which belong to the international traffic code. Figure
core. From the point of view of this part, ANNs can 2.a (Normal Traffic Signs) shows the different classes
play a great role, because they are able to learn from of traffic signs considered in this work, which have
different environments. So, an intelligent combination been collected by the TSC system presented above. So,
of both aspects can lead us to the success in the clas- they present distortions due to the problems described
sification of traffic signs. in previous sections, which are shown in figure 2.b
(Traffic Signs with problems). The problems caused
Traffic Sign Classification System by vandalism are shown in the example of class S8.
Overview The problems related to the blob extraction in the Sign
Detection and Extraction Stage (not a correct fit in the
The TSC system and the blocks that compose it are square image) are shown in the examples of classes S2,
shown in figure 1. Once the Video Camera block takes a S4 and S9. Examples of signs with problems of rotation,
translation or inclination are those of classes S4, S6 and
957
Intelligent Traffic Sign Classifiers
Figure 2. Noise and nine classes of international traffic signs: (a) Normal traffic signs and (b) Traffic signs with
problems
(a) (b)
S9. Finally, the difference of brightness is observed in preprocessing is usually used to reduce the noise
both parts of figure 2. For example, when the lighting in an image.
of the blob is high, the vertical row of the example of Histogram equalization (HE). It tries to enhance
class S3 is greater than horizontal row of the example the contrast of M. The pixels are transformed ac-
of class S2. cording to a specified image histogram (Paulus,
2003). This equalization is usually used to improve
Traffic Sign Preprocessing Stage the dynamic range of M.
Vertical (VH) and horizontal (HH) histograms
Each blob presented at the input of the TSC stage (Vicen, 2005a, Vicen, 2005b). They are computed
contains information of the three-color components: with
red, green and blue. Each blob is composed of 31x31
31
pixels. So, the memory required for each blow is 2883 1
bytes. Due to the high quantity of data, the purpose of
vh i = (mi, j >T ) , i =1,2,...,31
31 j=1
(1)
this stage is to reduce it and to limit the redundancy of
information, in order to improve the TSC performance 31
1
and to reduce the TSC core computational cost. hh j = (mi, j >T ) , j =1,2,...,31
31 i=1
The first preprocessing made in this stage is the (2)
transformation of the color blob (3x31x31) to a gray
scale blob (31x31) (Paulus, 2003). Consider for the respectively, where mi,j is the element of the i-th
next explanation that M is a general bidimensional row and j-th column of the matrix M and T is the
matrix that contains either the gray scale blob or the fixed or adaptive threshold of this preprocessing.
output of one of the next preprocessings: If T is fixed, it is established at the beginning of
the preprocessing, but if T is adaptive, it can be
Median filter (MF) (Abdel, 2004). It is applied calculated with the Otsu method (Ng, 2004) or
to each pixel of M. A block of nxn elements that with the mean value of the blob, so both methods
surrounds a pixel of M is taken, which is sorted in are M-dependent. vhi corresponds to the ratio of
a linear vector. The median value of this vector is values of column j-th that are greater than T and
selected as the value of the processed pixel. This hhj corresponds to the ratio of values of row i-th
that are greater than T.
958
Intelligent Traffic Sign Classifiers
959
Intelligent Traffic Sign Classifiers
criterion is minimized. Moreover, cross-validation is performance is obtained with the CPP3 and an MLP of
used in order to reduce generalization problems. 62/62/10, where its error rate is Pe=2,6% (Pc=97,4%).
The CPP2 achieves good performances but they are
always lower than in the case of using the CPP3. The
RESULTS use of CPP1 with MLPs achieves the worst results of
the three cases under study.
The database considered for the experiments is com- The study of the TSC core based on an MLP with
posed of 235 blobs of ten different classes: noise (S0) two hidden layers (h=2) (table 3) shows that the best
and nine classes of traffic signs (S1-S9). The database combination of the CPPs and [H1,H2] for the MLP is
has been divided in three sets: train, validation and test, CPP3 and [H1=70,H2=20], respectively. In this case, the
which are composed of 93, 52 and 78 blobs, respec- best performance achieved is Pe=1,3% (Pc=98,7%). As
tively, being preprocessed before they are presented to occurs for MLPs with one hidden layer, the best CPP
the TSC core. The first one is used as the training set is the third one and the worst one is the first one.
for the k-NN and the MLPs. The second one is used
to stop the MLP training algorithm (Bishop, 1995,
Haykin, 1999) according to the cross-validation ap- FUTURE TRENDS
plied during the training. And the last one is used to
evaluate the performance of the k-NN and the MLPs. New innovations can be achieved in this research area.
Experimental environments characterized by a large The new trends try to improve the preprocessing tech-
dimensional space and a small data set pose generaliza- niques. In this case, advance signal processing can be
tion problems. For this reason, the MLPs training is applied to TSC. On the other hand, other TSC cores can
repeated 10 times with different weights initialization be used. For instance, classifiers based on Radial Basis
each time and the best MLP in terms of Pe estimated Function or Support Vector Machines (Maldonado,
with the validation set is selected. 2007) can be applied. Finally, optimization techniques,
Once the color blobs are transformed to gray scale, like Genetic Algorithms, have an important role in
three different combinations of preprocessings (CPPs) this research area to find which is the best selection of
are applied, so each CPP output is 62 elements: preprocessings of a bank of them.
960
Intelligent Traffic Sign Classifiers
Table 1.Pe(%) versus k parameter for each TSC based on different CPPs and k-NN
I
k 1 3 4 5 6 7 8 9 10 11 12
CPP1 29,5 30,8 29,5 29,5 32,0 30,8 30,8 28,2 25,6 25,6 25,6
CPP2 19,2 17,9 14,1 16,7 14,1 15,4 19,2 16,7 17,9 19,2 19,2
CPP3 6,4 9,0 9,0 11,5 12,8 12,8 12,8 12,8 12,8 12,8 10,3
Table 2. Pe(%) versus H1 parameter for each TSC based on different CPPs and MLPs of sizes (62/H1/10)
H1 6 14 22 30 38 46 54 62 70 78 86
CPP1 24,4 17,9 17,9 15,4 18,9 16,7 14,1 17,9 19,2 17,9 15,4
CPP2 21,8 14,1 14,1 14,1 12,8 10,3 11,5 10,3 12,8 9,0 11,5
CPP3 12,8 3,8 5,1 3,8 3,8 5,1 3,8 2,6 5,1 5,1 3,8
Table 3. Pe(%) versus [H1, H2] parameters for each TSC based on different CPPs and MLPs of sizes (62/H1/
H2/10)
H1 10 10 15 15 25 25 40 40 60 60 70 70
H2 6 8 5 7 8 10 15 20 18 25 20 30
CPP1 28,2 24,4 23,1 25,6 19,2 19,2 19,2 19,2 17,9 17,9 15,4 15,4
CPP2 25,6 25,6 26,9 23,1 17,9 20,5 16,7 11,5 12,8 11,5 12,8 9,0
CPP3 15,4 10,3 15,4 12,8 7,7 5,1 6,4 5,1 5,1 5,1 1,3 5,1
961
Intelligent Traffic Sign Classifiers
Cybenko, G. (1989). Approximation by Superpositions ognition Tasks. Lecture Notes in Computer Science.
of a Sigmoidal Function. Mathematics of Control, (3512), 865-872.
Signals and Systems. (2), 303-314.
Vicen-Bueno, R., Gil-Pita, R., Jarabo-Amores, M. P.,
Escalera, A. de la, et. al. (2004). Visual Sign Information & Lpez-Ferreras, F. (2005b). Complexity reduction in
Extraction and Identification by Deformable Models Neural Networks applied to Traffic Sign Recognition
for Intelligent Vehicles. IEEE Trans. on Intelligent tasks. 13th European Signal Processing Conference.
Transportation Systems. (5) 2, 57-68. EUSIPCO 2005.
Escalera, A. de la, et. al. (2003). Traffic Sign Recogni-
tion and Analysis For Intelligent Vehicles. Image and
Vision Computing. (21), 247-258. KEy TERmS
Haykin, S. (1999). Neural Networks. A Comprehensive
Artificial Neural Networks (ANNs): A network
Foundation (Second Edition). Prentice-Hall.
of many simple processors (units or neurons) that
Hsu, S.H., & Huang, C.L. (2001). Road Sign Detection imitates a biological neural network. The units are
and Recognition Using Matching Pursuit Method. Im- connected by unidirectional communication channels,
age and Vision Computing. (19), 119-129. which carry numeric data. Neural networks can be
trained to find nonlinear relationships in data, and are
Kisienski, A. A., et al. (1975). Low-frequency Ap-
used in applications such as robotics, speech recogni-
proach to Target Identification. Proc. IEEE. (63),
tion, signal processing or medical diagnosis.
1651-1659.
Backpropagation Algorithm: Learning algorithm
Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Ji-
of ANNs, based on minimizing the error obtained from
menez, P., Gomez-Moreno, H., & Lopez-Ferreras, F.
the comparison between the ANN outputs after the
(2007). Road-Sign Detection and Recognition Based
application of a set of network inputs and the desired
on Support Vector Machines. IEEE Trans. on Intelligent
outputs. The update of the weights is done according
Transportation Systems. (8) 2, 264-278.
to the gradient of the error function evaluated in the
Ng, H.F. (2004). Automatic Thresholding for Defect point of the input space that indicates the input to the
Detection. IEEE Proc. Third Int. Conf. on Image and ANN.
Graphics. 532-535.
Classification: The act of distributing things into
Paulus, D.W.R., & Hornegger, J. (2003). Applied Pat- classes or categories of the same type.
tern Recognition (4th Ed.): Algorithms and Implemen-
Detection: The perception that something has oc-
tation in C++. Vieweg.
curred or some state exists.
Prez, E., & Javidi, B. (2002). Nonlinear Distortion-
Information Extraction: Obtention of the relevant
Tolerant Filters for Detection of Road Signs in Back-
aspects contained in data. It is commonly used to reduce
ground Noise. IEEE Trans. on Vehicular Technology.
the input space of a classifier.
(51) 3, 567-576.
Pattern: Observation vector that for its relevance
Rosenblatt, F. (1962). Principles of Neurodynamics.
is considered as an important example of the input
Spartan books.
space.
Vicen-Bueno, R., Gil-Pita, R., Rosa-Zurera, M.,
Preprocessing: Operation or set of operations
Utrilla-Manso, M., & Lpez-Ferreras, F. (2005a).
applied to a signal in order to improve some aspects
Multilayer Perceptrons Applied to Traffic Sign Rec-
of it.
962
963
John Wang
Montclair State University, USA
INTRODUCTION arise and the major methods that have been proposed
to cope with them.
Todays e-commerce environment requires that interac-
tive systems exhibit abilities such as autonomy, adaptive Sources of Uncertainties
and collaborative behavior, and inferential capability.
Such abilities are based on the knowledge about users The user model based interactive systems face the
and their tasks to be performed (Raisinghani, Klassen problems of uncertainty in the reference rule, the facts,
and Schkade, 2001). To adapt users input and tasks and representation languages. There is no widely ac-
an interactive system must be able to establish a set of cepted definition about the presence of uncertainty in
assumptions about users profiles and task characteris- user modeling. However, the nature of uncertainty in
tics, which is often referred as user models. However, a user model can be investigated through its origin.
to develop a user model an interactive system needs Uncertainty can arise from a variety of sources. Several
to analyze users input and recognize the tasks and authors have emphasized the need for differentiating
the ultimate goals users trying to achieve, which may among the types and sources of uncertainty. Some of
involve a great deal of uncertainties. the major sources are as follows:
Uncertainty refers to a set of values about a piece of
assumption that cannot be determined during a dialog (1) The imprecise and incomplete information obtained
session. In fact, the problem of uncertainty in reasoning from the users input. This type of source is related to
processes is a complex and difficult one. Information the reliability of information, which involves the fol-
available for user model construction and reasoning lowing aspects:
is often uncertain, incomplete, and even vague. The
propagation of such data through an inference model Uncertain or imprecise information exists in the
is also difficult to predict and control. Therefore, the factual knowledge (Dutta, 2005). The contents
capacity of dealing with uncertainty is crucial to the of a user model involve uncertain factors. For
success of any knowledge management system. instance, the system might want to assert "It is
Currently, a vigorous debate is in progress concern- not likely that this user is a novice programmer."
ing how best to represent and process uncertainties in This kind of assertion might be treated as a piece of
knowledge based systems. This debate carries great knowledge. But it is uncertain and seems difficult
importance because it is not only related to the con- to find a numerical description for the uncertainty
struction of knowledge based system but also focuses in this statement (i.e., no appropriate sample space
on human thinking in which most decisions are made in which to give this statement statistical meaning,
under conditions of uncertainty. This chapter presents if a statistical method is considered for capturing
and discusses uncertainties in the context of user model- the uncertainty).
ing in interactive systems. Some elementary distinctions The default information often brings uncertain
between different kinds of uncertainties are introduced. factors to inference processes (Reiter, 1980). For
The purpose is to provide an analytical overview and example, the stereotype system carries extensive
perspective concerning how and where uncertainties default assumptions about a user. Some assump-
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Interactive Systems and Sources of Uncertainties
tions may be subject to change as interaction (4) Deformation while transferring knowledge. There
progresses. might be no semantic correspondence between one
Uncertainty occurs as a result of ill-defined con- representation language to another. It is possible that
cepts in the observations or due to inaccuracy and there is even no appropriate representation for certain
poor reliability of the measurement (Kahneman expertise, for example, the measurement of users
and Tversky, 1982). For example, a user's typing mental workload. This makes the deformation of
speed could be considered as a measurement for a transformation inevitable. In addition, human factors
user's file editing skill. But for some applications greatly affect the procedure of information translation.
it may be questionable. Several tools that use cognitive models for knowledge
The uncertain exception to general assump- acquisition have been presented (Jacobson and Freil-
tions for performing some actions under some ing, 1988).
circumstances can cause conflicts in reasoning
processes.
CONCLUSION
(2) Inexact language by which the information is
conveyed. The second source of uncertainty is caused Generally, uncertainty affects the performance of an
by the inherent imprecision or inexactness of the adaptive interface in the following aspects and obvi-
representation languages. The imprecision appears in ously, the management of uncertainty must address all
both natural languages and knowledge representation of the following aspects (Chen and Norcio, 2001).
language. It has been proposed to classify three kinds
of inexactness in natural language (Zwick, 1999). The How to determine the degree to which the premise
first is generality, in which a word applies to a multi- of a given rule has been satisfied.
plicity of objects in the field or reference. For example, How to verify the extent to which external con-
the word table can apply to objects differing in size, straints have been met.
shape, materials, and functions. The second kind of How to propagate the amount of uncertain infor-
linguistic exactness is ambiguity, which appears when a mation through triggering of a given rule.
limited number of alternative meanings have the same How to summarize and evaluate the findings
phonetic form (e.g., bank). The third is vagueness, in provided by various rules or domain expertise.
which there are no precise boundaries to the meaning How to detect possible inconsistencies among
of the word (e.g., old, rich). the various sources and,
In knowledge representation languages employed How to rank different alternatives or different
in user modeling systems, if rules are not expressed goals.
in a formal language, their meaning usually cannot be
interpreted exactly. This problem has been partially
addressed by the theory of approximate reasoning. REFERENCES
Generally, a proposition (e.g., fact, event) is uncertain
if it involves a continuous variable. Note that an exact Barr, A. and Feigenbaum, E. A., The Handbook of Ar-
assumption may be uncertain (e.g., the user is able to tificial Intelligence 2. Los Altos, Kaufmann , 1982.
learn this concept), and an assumption that is absolutely
certain may be linguistically inexact (e.g. the user is Bhatnager, R. K. and Kanal, L. N., Handling Uncer-
familiar with this concept). tainty Information: A Review of Numeric and Nonnu-
meric Methods, Uncertainty in Artificial Intelligence,
(3) Aggregation or summarization of information. The Kanal, L. N. and Lemmer, J. F. (ed.), pp2-26, 1986.
third type of uncertainty source arises from aggrega- Bonissone, P. and Tong, R. M., Editorial: Reasoning
tion of information from different knowledge sources with Uncertainty in Expert Systems, International
or expertise (Bonissone and Tong, 2005). Aggregating Journal of Man-Machine Studies, Vol. 30, 69-111
information brings several potential problems that are (2005)
discussed in (Chen and Nocio 1997).
964
Interactive Systems and Sources of Uncertainties
Buchanan, B. and Smith, R. G. Fundamentals of Ex- McDermott, D. and Doyle, J., Non-monotonic Logic,
pert Systems, Ann. Rev., Computer Science, Vol. 3, AI Vol. 13, pp. 41-72. (1980). I
pp. 23-58, 1988.
Pearl, J., Probabilistic Reasoning in Intelligent Systems:
Chen, Q. and Norcio, A.F. Modeling a Users Domain Networks of Plausible Inference, Morgan Kaufmann
Knowledge with Neural Networks, International Publisher, San Mateo, CA ,1988.
Journal of Human-Computer Interaction, Vol. 9, No.
Pearl, J., Reverend Bayes on Inference Engines: A
1, pp. 25-40, 1997.
Distributed Hierarchical Approach, Proc. of the 2nd
Chen, Q. and Norcio, A.F. Knowledge Engineering National Conference on Artificial Intelligence, IEEE
in Adaptive Interface and User Modeling, Human- Computer Society, pp. 1-12, 1985.
Computer Interaction: Issues and Challenges, (ed.)
Pednault, E. P. D., Zucker, S. W. and Muresan, L.V.,
Chen, Q. Idea Group Pub. 2001.
On the Independence Assumption Underlying Subjec-
Cohen, P. R. and Grinberg, M. R., A Theory of Heu- tive Bayesian Updating, Artificial Intelligence, 16,
ristic Reasoning about Uncertainty, AI Magazine, Vol. pp. 213-222. 1981
4(2), pp. 17-23, 1983.
Raisinghani, M., Klassen, C. and Schkade, L. Intel-
Dempster, A. P., Upper and Lower Probabilities ligent Software agents in Electonic Commerce: A
Induced by a Multivalued mapping, The Annuals Socio-Technical Perspective, Human-Computer In-
of Mathematical Statistics , Vol. 38(2), pp. 325-339, teraction: Issues and Challenges, (ed.) Chen, Q. Idea
1967. Group Pub. 2001.
Dubois, D. and Prade, H., Combination and Propaga- Reiter, R., A Logic for Default Reasoning, Artificial
tion of Uncertainty with Belief Functions -- A Reex- Intelligence, Vol. 13, 1980 pp. 81-132.
amination, Proc. of 9th International Joint Conference
Rich, E., User Modeling via Stereotypes, Cognitive
on Artificial Intelligence, pp. 111-113, 1985.
Sciences, Vol. 3 1979, pp. 329-354.
Dutta, A., Reasoning with Imprecise Knowledge in
Shafer, G., A Mathematical Theory of Evidence, Priceton
Expert Systems, Information Sciences, Vol. 37, pp.
University Press, 1976.
2-24, 2005.
Zadeh, L. A., Review of Books : A Mathematical
Doyle, J., A Truth Maintenance System, AI, Vol. 12,
Theory of Evidence, AI Magazine., 5(3), 81-83.
1979, pp. 231-272.
1984
Garvey, T. D., Lowrance, J. D. and Fischer, M. A.
Zadeh, L. A. Knowledge Representation in Fuzzy
An Inference Technique for Integrating Knowledge
Logic, IEEE Transactions on Knowledge and Data
from Disparate Source, Proc. of the 7th International
Engineering, Vol. 1, No. 1, pp. 89-100, 1989.
Joint Conference on AI, Vancouver, B. C. pp. 319-325,
1982 Zwick, R., Combining Stochastic Uncertainty and
Linguistic Inexactness: Theory and Experimental
Heckerman, D., Probabilistic Interpretations for
Evaluation of Four Fuzzy Probability Models, Int. J.
MYCINs Certainty actors, Uncertainty in Artificial
Man-Machine Studies, Vol. 30, pp. 69-111, 1999.
Intelligence,(ed.). Kanal, L. N. and Lemmer, J. F. ,
1986
Jacobson, C. and Freiling, M. J. ASTEK: A Mulit- KEy TERmS
paradigm Knowledge Acquisition tool for complex
structured Knowledge. International. Journal of Interactive System: A system that allows dialogs
Man-Machine Studies, Vol. 29, 311-327. 1988. between the computer and the user.
Kahneman, D. and Tversky, A (1982). Variants of Knowledge Based Systems: A computer system
Uncertainty, Cognition, 11, 143-157. that programmed to imitate human problem-solving
965
Interactive Systems and Sources of Uncertainties
966
967
George D. Sergiadis
Aristotle University of Thessaloniki, Greece
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Intuitionistic Fuzzy Image Processing
be uncertain, since some words may often mean dif- of the sound advantages of fuzzy-based techniques.
ferent things to different people. Moreover, extracting Qualitative properties describe in a more natural and
the knowledge from a group of experts who do not all human-centric manner image attributes, such as the
agree, leads in consequents having a histogram of values contrast and the homogeneity of an image region,
associated with them. Additionally, data presented as or the edginess of a boundary. However, as already
inputs to an FLS, as well as data used for its tuning, pointed out, these terms are themselves imprecise and
are often noisy, thus bearing an amount of uncertainty. thus they additionally increase the uncertainty of image
As a result, these uncertainties translate into additional pixels. It is therefore a necessity, rather than a luxury,
uncertainties about FS membership functions. Finally, to employ A-IFSs theory to cope with the uncertainty
Atanassov et al. (Atanassov, Koshelev, Kreinovich, present in real-world images.
Rachamreddy & Yasemis, 1998) proved that there ex- In order to apply the IFIP framework, images should
ists a fundamental justification for applying methods first be expressed in terms of elements of A-IFSs theory.
based on higher-order FSs to deal with everyday-life Analyzing and synthesizing digital images to and from
situations. Therefore, it comes as a natural consequence their corresponding intuitionistic fuzzy components is
that such an extension should also be carried in the field not a trivial task and can be carried out using either
of digital image processing. heuristic or analytic approaches.
Heuristic Modelling
THE IFIP FRAmEWORK
As already stated, the factors introducing hesitancy in
In quest for new theories treating imprecision, various real-world images can be traced back to the acquisition
higher-order extensions of FSs were proposed by differ- stage of imaging systems and involve pixel degradation,
ent scholars. Among them, A-IFSs (Atanassov, 1986) mainly triggered by the presence of quantization noise
provide a simple and flexible, yet solid, mathematical generated by the A/D converters, as well as the sup-
framework for coping with the intrinsic uncertainties pression of the dynamic range caused by the imaging
characterizing real-world systems. A-IFSs are defined sensor. A main effect of quantization noise in images
using two characteristic functions, namely the member- is that there exist a number of gray levels with zero,
ship and the non-membership that do not necessarily or almost zero, frequency of occurrence, while gray
sum up to unity. These functions assign to elements levels in their vicinity possess high frequencies. This
of the universe corresponding degrees of belonging- is due to the fact that a gray level g in a digital image
ness and non-belongingness with respect to a set. The can be either (g+1) or (g-1) without any appreciable
membership and non-membership values induce an change in the visual perception.
indeterminacy index, which models the hesitancy of An intuitive and heuristic approach to the model-
deciding the degree to which an element satisfies a ling of the aforementioned sources of uncertainty in
particular property. In fact, it is this additional degree of the context of A-IFSs was proposed by Vlachos &
freedom that provides us with the ability to efficiently Sergiadis (Vlachos & Sergiadis, 2005) (Vlachos &
model and minimize the effects of uncertainty due to the Sergiadis, 2007 d) for gray-scale images, while an
imperfect and/or imprecise nature of information. extension to color images was presented in Vlachos &
Hesitancy in images originates out of various factors, Sergiadis (Vlachos & Sergiadis, 2006). The underlying
which in their majority are caused by inherent weak- idea involves the application of the concept of the fuzzy
nesses of the acquisition and the imaging mechanisms. histogram of an image, which models the notion of the
Distortions occurred as a result of the limitations of the gray level approximately g. The fuzzy histogram
acquisition chain, such as the quantization noise, the sup- takes into account the frequency of neighboring gray
pression of the dynamic range, or the nonlinear behavior levels to assess the frequency of occurrence of the gray
of the mapping system, affect our certainty regarding level under consideration. Consequently, a quantitative
the brightness or edginess of a pixel and therefore measure of the quantization noise can be calculated as
introduce a degree of hesitancy associated with the cor- the normalized absolute difference between the ordinary
responding pixel. Moreover, dealing with qualitative (crisp) and fuzzy histograms.
rather than quantitative properties of images is one
968
Intuitionistic Fuzzy Image Processing
Finally, to further incorporate the additional distor- it does not require an a priori knowledge of the system
tion factors into the calculation of hesitancy, parameters characteristics, nor a particular pre-defined image I
are employed that model the influence of the dynamic acquisition model. Generally, it consists of sequential
range suppression and the fact that lower gray levels operations that primarily aim to optimally transfer the
are more prone to noise than higher ones. image from the pixel domain (PD) to the intuitionistic
fuzzy domain (IFD), where the appropriate actions
Analytic Modelling will be performed, using the fuzzy domain (FD) as
an intermediate step. After the modification of the
The analytic approach offers a more generic treat- membership and non-membership components of the
ment to hesitancy modelling of digital images, since image in the IFD, an inverse procedure is carried out
969
Intuitionistic Fuzzy Image Processing
for transferring the image back to the PD. A block intuitionistic fuzzy entropy is considered as optimal.
diagram illustrating the analytic modelling is given in We refer to this process of selection as the maximum
Figure 1. Details on each of the aforementioned stages intuitionistic fuzzy entropy principle (Vlachos & Sergia-
of IFIP are provided below. dis, 2007 d). The optimal parameter set is then used to
construct membership and non-membership functions
Fuzzification corresponding to the intuitionistic fuzzy components of
the image in the IFD. This procedure is schematically
It constitutes the first stage of the IFIP framework, illustrated in Figure 3.
which assigns degrees of membership to image pixels
with respect to an image property, such as brightness, Modification of Intuitionistic Fuzzy
homogeneity, or edginess. These properties are Components
application dependent and also determine the opera-
tions to be carried out in the following stages of the It involves the actual processing of the intuitionistic
IFIP framework. For the task of contrast enhancement fuzzy components of the image with respect to a par-
one may consider the brightness of gray levels and ticular property. Depending on the desired image task
construct the corresponding FS Bright pixel or Dark one is about to perform, suitable intuitionistic fuzzy
pixel using different schemes that range from simple operators are applied to both membership and non-
intensity normalization to more complex approaches membership functions.
involving knowledge extracted from a group of human
experts (Figure 2). Intuitionistic Defuzzification
Intuitionistic Fuzzification After obtaining the modified intuitionistic fuzzy compo-
nents of the image, it is required that these components
Intuitionistic fuzzification is one of the most important should be combined to produce the processed image
stages of the IFIP architecture, since it involves the in the FD. This procedure involves the embedding of
construction of the A-IFS that represents the image hesitancy into the membership function. To carry out
properties in the IFD. The analytic approach allows this task, we utilize suitable parametric intuitionistic
for an automated modelling of the hesitancy carried by fuzzy operators that de-construct an A-IFS into an FS.
image pixels, by rendering image properties directly It should be stressed out that the final result soundly de-
from the FS obtained in the fuzzification stage through pends on the selected parameters of the aforementioned
the use of intuitionistic fuzzy generators (Bustince, operators. Therefore, optimization criteria, such as the
Kacprzyk & Mohedano, 2001). In order to construct maximization of the index of fuzziness of the image, are
an A-IFS that efficiently models a particular image employed to select the overall optimal parameters with
property, tunable parametric intuitionistic fuzzy gen- respect to the considered image operation.
erators are utilized.
The underlying statistics of images are closely Defuzzification
related to and soundly affect the process of hesitancy
modelling. Different parameter values of the intuition- The final stage of the IFIP framework involves the
istic fuzzy generators produce different A-IFSs and transfer of the processed fuzzy image into the PD.
therefore alternative representations of the image in Depending on the desired image operation, various
the IFD are possible. Consequently, an optimization functions may be applied to carry out this task.
criterion should be employed, in order to select the pa-
rameter set that derives the A-IFS that optimally models Applications
the hesitancy of pixels from the multitude of possible
representations. Such a criterion, that also encapsulates The IFIP architecture has been successfully applied to
the image statistics, is the intuitionistic fuzzy entropy many image processing problems. Vlachos & Sergiadis
(Burillo & Bustince, 1996) (Szmidt & Kacprzyk, 2001) (2007 d) exploited the potential of the framework in
of the image under consideration. Therefore, the set of order to perform contrast enhancement to low-con-
parameters that produce the A-IFS with the maximum
970
Intuitionistic Fuzzy Image Processing
trasted images. Different approaches were introduced, (2007 d) and (2006 b), respectively. Application of A-
namely the intuitionistic fuzzy contrast intensification IFSs theory to edge detection was also demonstrated I
and the intuitionistic fuzzy histogram hyperbolization in Vlachos & Sergiadis (Vlachos & Sergiadis, 2007
(IFHH). An extension of the IFHH technique to color d), based on intuitionistic fuzzy similarity measures.
images was proposed in Vlachos & Sergiadis (Vlachos The problem of image thresholding and segmentation
& Sergiadis, 2007 b). Additionally, the effects of em- under the context of IFIP, was also addressed (Vlachos
ploying different intuitionistic fuzzification and intu- & Sergiadis, 2006 a) using novel intuitionistic fuzzy
itionistic defuzzification schemes to the performance information measures. Under the general framework
of contrast enhancement algorithms was thoroughly of IFIP, the notions of the intuitionistic fuzzy histo-
studied and investigated in Vlachos & Sergiadis (2007) grams of a digital image were introduced (Vlachos
971
Intuitionistic Fuzzy Image Processing
& Sergiadis, 2007 c) and their application to contrast challenging way to perceive and deal with real-world
enhancement was demonstrated (Vlachos & Sergiadis, image processing problems.
2007 a). Finally, the IFIP architecture was successfully
applied in mammographic image processing (Vlachos
& Sergiadis, 2007 d). Figure 4 illustrates the stages of REFERENCES
IFIP in the case of the IFHH approach.
Atanassov, K.T. (1986). Intuitionistic Fuzzy Sets. Fuzzy
Sets and Systems, 20 (1), 87-96.
FUTURE TRENDS
Atanassov, K.T., Koshelev, M., Kreinovich, V., Ra-
chamreddy, B., & Yasemis, H. (1995). Fundamental
Even though higher-order FSs have been widely
Justification of Intuitionistic Fuzzy Logic and of In-
applied to decision-making and pattern recognition
terval-Valued Fuzzy Methods. Notes on Intuitionistic
problems, it seems that their application in the field of
Fuzzy Sets, 4 (2), 42-46.
digital image processing is just beginning to develop.
As a newly-introduced approach, the IFIP architecture Bezdek, J.C., Keller, J., Krisnapuram, R., & Pal, N.R.
remains a suggestively and challenging open field for (1999). Fuzzy Models and Algorithms for Pattern
future research. Therefore, it is expected that the IFIP Recognition and Image Processing, Springer.
framework will attract the interest of theoreticians and
practitioners in the near future. Burillo, P., & Bustince, H. (1996). Entropy on Intu-
The proposed IFIP context bases its efficiency in the itionistic Fuzzy Sets and on Interval-Valued Fuzzy
ability of A-IFSs to capture and render the hesitancy Sets. Fuzzy Sets and Systems, 78 (3), 305-316.
associated with image properties. Consequently, the Bustince, H., Kacprzyk, J., & Mohedano, V. (2000).
analysis and synthesis of images in terms of elements Intuitionistic Fuzzy Generators: Application to In-
of A-IFSs theory plays a key role in the performance tuitionistic Fuzzy Complementation. Fuzzy Sets and
of the framework itself. Therefore, the stages of intu- Systems, 114 (3), 485-504.
itionistic fuzzification and defuzzification need to be
further studied from an application point of view, to Chi, Z., Yan, H., & Pham, T. (1996). Fuzzy Algorithms:
provide meaningful ways of extracting and embedding With Applications to Image Processing and Pattern
hesitancy from and to images. Finally, the IFIP archi- Recognition, World Scientific Publishing Company.
tecture should be extended to image processing task Mendel, J.M., & Bob John, R.I. (2002). Type-2 Fuzzy
handled today by FS theory, in order to investigate and Sets Made Simple. IEEE Transactions on Fuzzy Sys-
evaluate its advantages and particular merits. tems, 10 (2), 117-127.
Pal, S.K., & King, R.A. (1980). Image Enhancement Us-
CONCLUSION ing Fuzzy Set. Electronics Letters, 16 (10), 376-378.
Pal, S.K., & King, R.A. (1981). Image Enhancement
This article describes an intuitionistic fuzzy architecture Using Smoothing with Fuzzy Sets. IEEE Transactions
for the processing of digital images. The IFIP framework on Systems, Man, and Cybernetics, 11 (7), 495-501.
exploits the potential of A-IFSs to efficiently model the
uncertainties associated with image pixels, as well as Pal, S.K., & King, R.A. (1982). A Note on the Quan-
with the definitions of their properties. The proposed titative Measurement of Image Enhancement Through
methodology provides alternative approaches for ana- Fuzziness. IEEE Transactions on Pattern Analysis and
lyzing/synthesizing images to/from their intuitionistic Machine Intelligence, 4 (2), 204-208.
fuzzy components. Application of the IFIP framework Prewitt, J.M.S. (1970). Object Enhancement and Ex-
to diverse imaging domains demonstrates its efficiency traction. Picture Processing and Psycho-Pictorics (pp.
compared to traditional image processing techniques. 75-149), Lipkin, B.S. Rosenfeld, A. (Eds.), Academic
It is expected that the proposed context will provide Press, New York.
theoretician and practitioners with an alternative and
972
Intuitionistic Fuzzy Image Processing
Szmidt, E., & Kacprzyk, J. (2001). Entropy for In- Vlachos, I.K., & Sergiadis, G.D. (2007 e). The Role
tuitionistic Fuzzy Sets. Fuzzy Sets and Systems, 118 of Entropy in Intuitionistic Fuzzy Contrast Enhance- I
(3), 467-477. ment. Proceedings of the World Congress of the In-
ternational Fuzzy Systems Association (IFSA 2007),
Vlachos, I.K., & Sergiadis, G.D. (2005). Towards
Cancun, Mexico.
Intuitionistic Fuzzy Image Processing. Proceedings
of the International Conference on Computational
Intelligence for Modelling, Control and Automation
(CIMCA 2005), Vienna, Austria. KEy TERmS
Vlachos, I.K., & Sergiadis, G.D. (2006). A Heuristic
Crisp Set: A set defined using a characteristic func-
Approach to Intuitionistic Fuzzification of Color Im-
tion that assigns a value of either 0 or 1 to each element of
ages. Proceedings of the 7th International FLINS
the universe, thereby discriminating between members
Conference on Applied Artificial Intelligence (FLINS
and non-members of the crisp set under consideration.
2006), Genova, Italy.
In the context of fuzzy sets theory, we often refer to
Vlachos, I.K., & Sergiadis, G.D. (2006 a). Intuitionistic crisp sets as classical or ordinary sets.
Fuzzy InformationApplications to Pattern Recogni-
Defuzzification: The inverse process of fuzzifica-
tion. Pattern Recognition Letters, 28 (2), 197-206.
tion. It refers to the transformation of fuzzy sets into
Vlachos, I.K., & Sergiadis, G.D. (2006 b). On the Intu- crisp numbers.
itionistic Defuzzification of Digital Images for Contrast
Fuzzification: The process of transforming crisp
Enhancement. Proceedings of the 7th International
values into grades of membership corresponding to
FLINS Conference on Applied Artificial Intelligence
fuzzy sets expressing linguistic terms.
(FLINS 2006), Genova, Italy.
Fuzzy Logic: Fuzzy logic is an extension of tradi-
Vlachos, I.K., & Sergiadis, G.D. (2007). A Two-Di-
tional Boolean logic. It is derived from fuzzy set theory
mensional Entropic Approach to Intuitionistic Fuzzy
and deals with concepts of partial truth and reasoning
Contrast Enhancement. Proceedings of the Interna-
that is approximate rather than precise.
tional Workshop on Fuzzy Logic and Applications
(WILF 2007), Genova, Italy. Fuzzy Set: A generalization of the definition of
the classical set. A fuzzy set is characterized by a
Vlachos, I.K., & Sergiadis, G.D. (2007 a). Hesitancy
membership function, which maps the members of
Histogram Equalization. Proceedings of the IEEE In-
the universe into the unit interval, thus assigning to
ternational Conference on Fuzzy Systems (FUZZ-IEEE
elements of the universe degrees of belongingness
2007), London, United Kingdom.
with respect to a set.
Vlachos, I.K., & Sergiadis, G.D. (2007 b). Intuitionistic
Image Processing: Image processing encompasses
Fuzzy Histogram Hyperbolization for Color Images.
any form of information processing for which the input
Proceedings of the International Workshop on Fuzzy
is an image and the output an image or a corresponding
Logic and Applications (WILF 2007), Genova, Italy.
set of features.
Vlachos, I.K., & Sergiadis, G.D. (2007 c). Intuitionistic
Intuitionistic Fuzzy Index: Also referred to as
Fuzzy Histograms of an Image. Proceedings of the
hesitancy margin or indeterminacy index. It rep-
World Congress of the International Fuzzy Systems
resents the degree of indeterminacy regarding the as-
Association (IFSA 2007), Cancun, Mexico.
signment of an element of the universe to a particular
Vlachos, I.K., & Sergiadis, G.D. (2007 d). Intuitionistic set. It is calculated as the difference between unity
Fuzzy Image Processing. Soft Computing in Image Pro- and the sum of the corresponding membership and
cessing: Recent Advances (pp. 385-416), Nachtegael, non-membership values.
M. Van der Weken, D. Kerre, E.E. Philips, W. (Eds.),
Intuitionistic Fuzzy Set: An extension of the fuzzy
Series: Studies in Fuzziness and Soft Computing, 210,
set. It is defined using two characteristic functions,
Springer.
973
Intuitionistic Fuzzy Image Processing
the membership and the non-membership that do not function of crisp sets. In fuzzy logic, it represents the
necessarily sum up to unity. They attribute to each degree of truth as an extension of valuation.
individual of the universe corresponding degrees of
Non-Membership Function: In the context of
belongingness and non-belongingness with respect to
Atanassovs intuitionistic fuzzy sets, it represents the
the set under consideration.
degree to which an element of the universe does not
Membership Function: The membership function belong to a set.
of a fuzzy set is a generalization of the characteristic
974
975
Santiago Rodrguez
University of A Corua, Spain
Mara Seoane
University of A Corua, Spain
Sonia Surez
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
KM Systems
proposal identifies a small set of descriptors that support Proposed Formalisation Schema
the formalisation (making explicit) of the knowledge
although, (i) its identification does not result from an The natural language is the language par excellence
exhaustive study and (ii) it does not enable a complete for sharing knowledge. Due to this, a good identifica-
formalisation as it is solely restricted to some generic tion of all the necessary elements for conceptualising
properties. (understanding) the knowledge of any domain (and
The second step suggests that the whole process for therefore those for whom the respective formalisation
KMS development should consider the technological mechanisms must be provided) can be done from the
as well as the human vision. The first one is focused on analysis of the different grammatical categories of the
how obtaining, storing and sharing the relevant knowl- natural language: nouns, adjectives, verbs, adverbs,
edge that exists within an organisation, by creating the locutions and other linguistic expressions. This study,
Corporate Memory and the computer support system. whose detailed description and applications have been
The second vision involves, not only the creation of described in several works (Andrade, 2006; Andrade,
a collaborative atmosphere within the organisation in 2008), reveals that all the identified conceptual ele-
order to achieve the involvement of the workers in ments can be put into the following knowledge levels
the KM program, but also the tendency to share their according to their function within the domain:
knowledge and use the one already provided by other
members. Static. It regards the structural or operative knowl-
Despite the previous fact, the vast majority of the edge domain, meaning domain facts that are true
analysed approaches are solely focused on the techno- and that can be used in some operations as con-
logical KM viewpoint, which jeopardises the success of cepts, properties, relationships and constraints.
a KMS (Andrade, 2003). In fact, among the previously Dynamic. It is related to the performance of the
mentioned proposals, only the Tiwana (2000) proposal domain, that is, functionality, action, process
explicitly considers the human viewpoint by including or control: inferences, calculations and step
a specific phase for it. sequence. This level can be divided into two
As a result of both aspects, the current proposals are sublevels:
restricted to a set of generic guides for performing KM,
which is quite different from the formal and detailed Strategic. It includes what to do, when and in
vision that is being demanded. In other words, the cur- what order (i.e., step factorisation).
rent approaches indicate what to do but not how to do it Tactical. It specifies how and when obtaining
(prescriptive viewpoint against descriptive/procedural new operative knowledge (i.e., the description
viewpoint). In this scenario the developers of this type of a given step).
of systems have to elaborate their own ad hoc approach,
achieving results that only depend on the experience Every one of these levels approaches a different
and the capabilities of the development team. fragment of the organisation knowledge, although they
all are obviously interrelated; in fact, the strategic level
controls the tactical one, as for every last level/elemental
DEVELOPmENT FOR KNOWLEDGE step (strategic knowledge) the interferences and cal-
mANAGEmENT SySTEmS culi must be indicated (tactical knowledge). Also the
level of the operative knowledge is controlled by the
This section presents a methodological framework for other two, as it specifies how, not only the bifurcation
the explicit KM that solves the previously mentioned points or execution alternatives are decided (strategic
problems. A study of the object to be managed has been knowledge), but also how interferences and calculi are
performed for obtaining a knowledge formalisation done (tactical knowledge).
schema, i.e., for knowing the relevant knowledge items Therefore, a KMS must provide support to all these
and the characteristics/properties that should be made levels. As it can be observed at Table 1, the main for-
explicit. Using the results achieved after this study a malisation schema has been divided, on one hand, into
methodological framework for KMS creation has been several individual schemas corresponding to each one
defined. Both aspects are following discussed. of the identified knowledge levels and, on the other, into
976
KM Systems
a common one for the three levels, providing the global PROPOSED mETHODOLOGICAL
vision of the organisation knowledge. Therefore, the FRAmEWORK
knowledge formalisation involves a dynamic schema
including the strategic and tactical individual schemas, The proposed process, whose basic structure is shown
a dynamic schema including an operative schema, and in Figure 1, has been elaborated bearing in mind the
a common schema, for describing the common aspects problems detected at the KM discipline and already
regardless the level. Every individual schema is also mentioned throughout the present work.
constituted by some components. As it can be noticed in the previous figure, this
The catalogue of terms is a common component for process includes the following phases:
the schemas, providing synonyms and abbreviations for
identifying every knowledge asset within the organi- 1. Setting-up of the KM commission: the direction
sation. The strategic schema describes the functional defines a KM commission for tracking and per-
splitting of every KMS operation and also each identified forming the KM project.
step. As the description varies when the step is termi- 2. Scope identification. The problem to be ap-
nal or not (elemental step), two different components proached would be specified by means of deter-
are needed for including all the characteristics of this mining on where the present cycle of the KM
level. The approachprocedural or algorithmic, for project must have a bearing. In order to achieve
instanceshould be described with detail for every asset this, the framework proposes the use of the SWOT
included into the catalogue of terminal steps. All this analysis (Strengths, Weaknesses, Opportuni-
information is included at the catalogue of tactical steps. ties, Tricks), together with the proposal of Zack
Lastly, the static schema is made up of the catalogue (1999).
of conceptsincluding the identified concepts and their 3. Knowledge acquisition, including:
description, the catalogue of relationshipsdescribing
the identified relationships and their meaningand the 3.1. Identification of knowledge domains. The
catalogue of propertiesreferring the properties of the knowledge needs regarding the approached subject
previously mentioned concepts and relationships. area are determined by means of different meetings
The detailed description of this study, together with involving the development team, the KM committee
the descriptors of every component, can be found in and the people responsible of every operation to be
(Andrade, 2008). performed.
3.2. Capture of the relevant knowledge. The ob-
taining of all the possible knowledge related with the
977
KM Systems
3 4 5 6
Knowledge Knowledge Knowledge Creation of the
acquisition assimilation consolidation support system
1 2
Setting-up of
Scope
the KM
identification
commission
7
Creation of the
collaboration
environment
operation approached is based on the identified domains. points of the proposal presented here, as the proposed
This is done by means of: formalisation schema indicates the specific descriptors
needed for a correct and complete formalisation of the
(a) Identifying where the knowledge lies in. The knowledge.
KM commission is in charge of identifying and
providing the human and non human knowledge 5. Knowledge consolidation, including:
sources that are going to be analysed.
(b) Determining the knowledge that has to be cap- 5.1. Knowledge verification. In order to detect
tured. As in the previous epigraph, it should be failures and omissions related with the represented
necessary to bear in mind the strategic, tactical knowledge it should be considered:
and operative knowledge.
(c) Knowledge obtaining. (a) Generic aspects. It has to be checked that any
knowledge element (strategic, tactical and opera-
Obviously, when all the knowledge that is needed tive) is included into the catalogue of terms, that
does not exist at the organisation it should be gener- any term included there has been made explicit
ated or imported. according to the type of knowledge and, that all
the fields are completed.
(b) Strategic aspects. It should be verified that (i)
4. Knowledge assimilation, comprising: any decision regarding an execution is made
according to the existing operative knowledge,
4.1. Knowledge conceptualisation. Its goal is the (ii) any last level step is associated to an existing
comprehension of the captured knowledge. It is rec- tactical knowledge and, (iii) any non terminal step
ommended to start with the strategic knowledge for is correctly split. All the previous facts would be
subsequently focusing on the tactical knowledge. As achieved by checking the accordance between
the strategic and tactical elements are understood, it the split tree and the content of the formalisation
would be necessary to assimilate arisen elements of schema of the terminal strategic knowledge.
the operative level. (c) Tactical aspects. It should be verified that: (i) the
4.2. Knowledge representation. The relevant whole of the tactical knowledge is used in some
knowledge has to be made explicit and formalised, of the last level steps of the strategic knowledge
according to the components (Andrade, 2008) sum- and that any operative knowledge related to the
marised at Table 1. This in one of the main distinguishing tactical knowledge is available. In order to achieve
978
KM Systems
this, the operative knowledge items will be rep- 6.1. Definition of the incorporation mechanisms.
resented as nodes within a knowledge map. This The KM commission and the development team deter- K
type of maps enable the graphic visualisation of mine the adequacy of the incorporation type (passive,
how new elements are obtained from the existing active or their combination) according to criteria such
ones. Once the map has been done, it should be as financial considerations or stored knowledge.
scoured for checking that the whole of the opera- 6.2. Definition of the notification mechanisms.
tive knowledge has been included. The KM commission and the development team will
(d) Operative aspects. It should be confirmed that: establish the most suitable method for notifying the
(i) there are not isolated concepts, (ii) there newly included knowledge. The notification can be
are not attributes unrelated to a concept or to a passive or active; even the absence of notification could
relationship, (iii) there are not relationships as- be considered.
sociating non existing concepts or relationships 6.3. Definition of the mechanisms for knowledge
and (iv) the whole of the operative knowledge localisation. Several alternatives, such as the need of
is used in some of the tactical knowledge and/or including intelligent searches or meta-searches, are
in the decision making of the flow control of evaluated.
the strategic knowledge. In order to perform the 6.4. Development of the KM support system. It will
three first verifications, a relationships diagram be necessary to define and to implement the corporate
will be elaborated for graphically showing the memory, the communication mechanisms and the ap-
existing relationships among the different ele- plications for collaboration and team work.
ments of the operative knowledge. The syntax 6.5. Population of the corporate memory. Once
of this type of diagrams is analogous to the one the KM system has been developed. The knowledge
of the class diagrams used in the methodologies captured, assimilated and consolidated will be included
of object-oriented software development. The into the corporate memory.
verification of the last proposal will be done by
using a knowledge map; the execution structures 7. Creation of the collaboration environment. The
included into the content of the formalisation main goal of this phase is to promote and to
schema for the strategic knowledge of last level improve the contribution of knowledge and its
related to every process (the remaining inferior subsequent use by the organisation. It should be
levels are included into the superior one) will borne in mind the risk that involves the use of an
be also used in this verification. With these two unsuitable organisation culture or of inadequate
mentioned graphic representations it could be tools for promotion and reward. The following
verified that every operative element is included strategies should be followed instead:
into at least one of the representations.
Considering the employee worth according his/her
5.2. Knowledge validation. In order to verify the knowledge contribution to the organisation
knowledge represented and verified, the development Supporting and awarding the use of the organi-
team, the KM commission and the involved parts will sational existing knowledge
revise: Promoting the relaxed dialogue among employees
from different domains
(a) The knowledge splitting tree Promoting a good atmosphere among the em-
(b) The knowledge map ployees
(c) The relationship diagram Committing all the employees
(d) The functional splitting tree
(e) The content of the formalisation schema
FUTURE TRENDS
6. Creation of the support system, which is divided
into: As it has been indicated, the KM discipline remains in
an immature stage due to an inadequate viewpoint: the
absence of a strict study for determining the relevant
979
KM Systems
knowledge and the characteristics that should be sup- and Technology for Development (IASTED) AI and
ported. Such situation has led to an important detail Soft Computing Conference.
shortage of the existing proposals for KMS develop-
Daniel, M., Decker, S., Domanetzki, A., Heimbrodt-
ment, currently dependant solely from the individual
Habermann, E., Hhn, F., Hoffmann, A., Rstel, H.,
good work of the developers.
Studer, R., & Wegner R. (1997). ERBUS-Towards a
The present proposal means a new viewpoint for
Knowledge Management System for Designers. Proc.
developing this type of systems. However, it still
of the Knowledge Management Workshop at the 21st
remains a lot to do. As the authors are aware of the
Annual German AI Conference.
high grade of bureaucracy that might be needed for
specifically following the present proposal, it should Holsapple, C., Joshi, K. (1997). Knowledge Manage-
be expedited and characterised for specific domains. ment: A Three-Fold Framework. Kentucky Initiative
Nevertheless, this viewpoint could be considered as for Knowledge Management Paper, n. 104.
the key for achieving specific ontologies for KM in
every domain. Junnarkar, B. (1997). Leveraging Collective Intellect by
Building Organizational Capabilities. Expert Systems
with Applications, 13 (1), 29-40.
CONCLUSION Liebowitz, J., & Beckman, T. (1998). Knowledge
Organizations. What Every Manager Should Know.
This article has presented a methodological framework CRC Press.
for the development of KMS that, differently from
the existing proposals, is based on the strict study of Mat, J.L., Paradela, L.F., Pazos, J., Rodrguez-Patn, A.
the knowledge to be managed. This characteristic has & Silva A. (2002). MEGICO: an Intelligent Knowledge
provided the system with a higher procedural level Management Methodology. Lecture Notes in Artificial
of detail than the current proposals, as the elements Intelligence, 2473, 102-107.
conceptually needed for understanding and represent- Staab, S., & Schnurr, H.P. (1999). Knowledge and
ing the knowledge of any domain have been identified Business Processes: Approaching and Integration.
and formalised. Workshop on Knowledge Management and Organi-
zational Methods. IJCAI99.
980
KM Systems
981
982
Rafael Garca
University of A Corua, Spain
Mara Seoane
University of A Corua, Spain
Sonia Surez
University of A Corua, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Knowledge Management Tools
983
Knowledge Management Tools
The analysis included thirteen tools (Table 1), all the adequate knowledge that the user might need at a
of them approaching at least two of the previously given moment. Therefore, and as it has been pointed
mentioned aspects. It should be highlighted that all previously, for the best use of the knowledge, it should
the tools implement the Corporate Memory as a docu- be somehow structured. The communication supports
ment warehouse, while the Yellow Pages appear as a are also quite important.
telephone directory. The characteristics of a KM support tool should
be then necessarily defined, together with a guide for
Results Evaluation approaching them.
984
Knowledge Management Tools
1.1. Knowledge formalisation. Before being 1.2. Incorporation mechanisms. The knowledge
included into the Corporate Memory, the can be incorporated in an active or passive K
knowledge has to be formalised by means way (Andrade et al., 2003c). The active
of the determination of, not only the relevant incorporation is based on the existence of
knowledge, but also the attributes that de- a KM group in charge of looking after the
scribe it. When performing this formalisa- quality of the knowledge that is going to be
tion it should be born in mind that there incorporated. This guarantees the quality of
are two types of knowledge; on one hand, the knowledge included into the Corporate
the Corporate Memory must include the Memory but it also takes human resources
knowledge needed to describe the opera- up. Differently from the previous way, at
tions for performing an organisational task. the passive incorporation does not exist any
On the other side, it is necessary to capture group for quality evaluation, as the own
the knowledge that has been acquired by the individual ready to share knowledge and
individuals after their experience and life experience will be responsible for evaluat-
time. This markedly heuristic knowledge ing that the proposal fulfils the minimum
is known as Learned Lessons: positive as requirements of quality and relevancy. The
well as negative experiences that can be main advantage of the second alternative
used for improving the future performance is that it does not take additional resources
of the organisation (Van Heijst, 1997), and up. Bearing in mind the previous consider-
therefore refining its current knowledge. ations, the active knowledge incorporation
a. Organisational knowledge (Andrade is preferred whenever it might be possible,
et al., 2003b). A KM system should as in such way the quality and the relevancy
consider different types of knowledge of the knowledge will be guaranteed.
when structuring the relevant knowl- 1.3. Notification mechanisms. All the members
edge associated to the operations that of the organisation should be informed when
exist at the organisation: a new knowledge is incorporated as this
Strategic or control knowledge: enables the refinement of their knowledge.
it indicates, not only what to do, The step previous to the notification is the
but also why, where and when. definition of the group of people tan will
For that reason, the constituents be informed of the new appearance of a
of the functional disintegration knowledge item. There are two alternatives
of every operation should be (Garca et al., 2003): subscription, where
identified. every individual at the organisation might
Tactical: it specifies how and take out a subscription to certain preferred
under what circumstances the specific issues, and spreading, where the
tasks are done. This type of notification messages reach the workers
knowledge is associated with without previous request. At the spreading,
the execution process of every the messages can be sent to all the members
last-level strategic step. of the organisation, but this is not advisable
b. Learned lessons. It is related to the as the receptor would be not able of discern
experience and the knowledge that which ones of the vast amount of messages
the individuals have with regards to received might be interesting for him/her.
their task. It provides the person who Other spreading possibility would rely on
possesses it with the ability for refining an individual or a group that would be in
both, the processes that follows at work charge of determining the addressees for
and the already existing knowledge, every given message; this last option is
in order to be more efficient. Whereas quite convenient for the members of the
its appropriate to create systems of organisation but it takes up a vast amount
learned lessons (Weber, 2001) in order of resources that have to contain themselves
to save this type of knowledge.
985
Knowledge Management Tools
a lot of information regarding the interests any other type of document among two
of every one of the members. or several users
1.4. Localisation mechanisms. The tool should Forum. It consists of a Web page where
be provided with some search mechanism the participants leave questions that do
in order to achieve the maximum possible not have to be answered at that very
profit from the captured and incorporated moment. Other participants leave the
knowledge (Tiwana, 2000). It is necessary answers which, together with the ques-
to reach an agreement between efficiency tions, can be seen by anyone entering
and functionality, as enough search options the forum at any moment.
should be available without increasing the Suggestion box. It enables sending
system complexity. For this reason, the fol- suggestions or comments of any rel-
lowing search mechanisms are suggested: evant aspect of the organisation to the
Hierarchy search: this search cata- adequate person or department.
logues the knowledge into a fixed Notice board. It is a common space
hierarchy, in such way that the user where the members of the organisa-
might move through a group of links tion can publish some announcements
for refining the search performed. appropriate for the public interest.
Attribute search: is based on the 3.2 Synchronous communication. This type
specification of terms in which the of interactive technology is based on real-
user is interested, resulting into some time communications. Some of the most
knowledge elements that might con- important systems are the following:
tent those terms. This type of search Chat. It implies the communication
provides more general results than the among several people through the
previous one. computer, as all the people connected
2. Yellow Pages: a KM system should not try to can follow the communication, ex-
capture and assimilate the whole of the knowledge press an opinion, contribute ideas,
that exists at the organisation as it would not be make or answer questions when they
feasible. Therefore, the Yellow Pages are used for decide.
including, not only the systems that store knowl- Electronic board. It provides the mem-
edge, but also the individuals that have additional bers of the organisation with a shared
knowledge. Their elaboration is performed after space for improving the interchange
determining the knowledge possessed by every the ideas where everybody draws or
individual at the organisation or by any other non writes.
human agents. Audio conference. Two or more users
3. Collaboration and communication mechanisms: can use real-time voice communica-
at the organisations, the knowledge is shared tion.
and distributed regardless the process might Video conference. Two or more users
be automated or not. The technology helps the can use real-time image communica-
interchange of knowledge and ideas among the tion.
members of the organisation, as it enables bring-
ing the best possible knowledge within reach of
the individual who requires it. The collaboration FUTURE TRENDS
and communication mechanisms detected are the
following: As it has been mentioned before, there is not a current
3.1 Asynchronous communication. Does not KM tool that might cover adequately the organisational
require the connection between the ends needs. This problem has been approached in the present
of the communication at the same time. work by trying to determine the functionality that any
E-mail. The electronic messenger of these tools should incorporate. This is a first step that
enables the interchange of text and/or should be complemented with subsequent works, as it
986
Knowledge Management Tools
is necessary to go deeper and determine better how to Garca, R.; Rodrguez, S.; Seoane, M. & Surez, S.
approach and implement the specified aspects. (2003): Approach to the Development of Knowledge K
Management Systems. In Proceedings of the 10th. In-
ternational Congress on Computer Science Research.
CONCLUSION Morelos (Mexico).
Huang, K. T.; Lee, Y. W. & Wang, R. Y. (1999): Qual-
The knowledge, either for its management or not,
ity Information and Knowledge. Prentice-Hall PTR.
is transmitted within the organisations, although its
New Jersey (USA).
existence does not imply its adequate use. There is a
vast amount of knowledge where access is extremely Liebowitz, J. & Beckman, T. (1998): Knowledge Or-
difficult; this means that there are items from where no ganizations. What Every Manager Should Know. CRC
return is being achieved and that they are lost into the Press. Florida (USA).
organisation. The KM represents the effort for captur-
Nonaka, I. & Takeuchi, H. (1995): The Knowledge
ing and getting benefits from the collective experience
Creating Company: How Japanese Companies Create
of the organisation by means of turning it accessible
the Dynamics of Innovation. Oxford University Press.
to any of its members. However, it could be stated
New York (USA).
that not a current tool is able to efficiently perform
this task as, although there exist the so-named KM Rodrguez, S. (2002): Un marco metodolgico para
tools, they merely store documents and none of them la Knowledge Management y su aplicacin a la Ing-
performs the structuration of the relevant knowledge eniera de Requisitos Orientada a Perspectivas. PhD.
for its best use. Dissertation. School of Computer Science. University
In order to palliate such problems, the present work of A Corua (Spain).
proposes an approach based on a market research. It is
as well based on the KM definition that indicates how Stein, E. W. (1995): Organizational Memory: Review
to approach and defines the characteristics that a tool of Concepts and Recommendations for Management.
should have for working as facilitator of an adequate International Journal of Information Management. Vol.
and explicit Knowledge Management. 15. No. 2. PP: 17-32.
Tiwana, A. (2000): The Knowledge Management
Toolkit. Prentice Hall.
REFERENCES
Van Heijst, G.; Van der Spek, R. & Kruizinga, E.
Andrade, J., Ares, J., Garca, R., Rodrguez, S., Silva, (1997): Corporate Memories as a Tool for Knowledge
A., & Surez, S. (2003a): Knowledge Management Management. Expert Systems with Applications. Vol.
Systems Development: a Roadmap. Lecture Notes in 13. No. 1. PP: 41-54.
Artificial Intelligence, 2775, 1008-1015. Weber R., Aha, D. W., & Becerra-Fernandez, I. (2001):
Andrade, J.; Ares, J.; Garca, R.; Rodrguez, S. & Surez, Intelligent Lessons Learned Systems. International
S. (2003b): Lessons Learned for the Knowledge Man- Journal of Expert Systems Research and Applications,
agement Systems Development. In Proceedings of the Vol. 20 No.1, PP: 17-34.
2003 IEEE International Conference on Information Wiig, K.; de Hoog, R. & Van der Spek, R. (1997):
Reuse and Integration. Las Vegas (USA). Supporting Knowledge Management: A Selection of
Brooking, A. (1996): Intellectual Capital. Core Asset Methods and Techniques. Expert Systems with Ap-
for the Third Millennium Enterprise. International plications. Vol. 13. No. 1. PP: 15-27.
Thomson Business Press. London (UK). Wiig, K. (1993): Knowledge Management Foundations:
Davenport, T. H. & Prusak, L. (2000): Working Knowl- Thinking about thinking. How People and Organiza-
edge: How Organizations Manage What They Know. tions Create, Represent and Use Knowledge. Schema
Harvard Business School Press. Boston (USA). Press. Texas (USA).
987
Knowledge Management Tools
Wiig, K. (1995): Knowledge Management Methods: Knowledge Management: Discipline that intends
Practical Approaches to Managing Knowledge. Schema to provide, at its most suitable level, the accurate infor-
Press, LTD. Texas (USA). mation and knowledge for the right people, whenever
they may needed and at their best convenience.
Knowledge Management Tool: Organisational
KEy TERmS system that connects people with the information and
communication technologies, with the purpose of
Communication & Collaboration Tool: Systems improving the share and distribution processes of the
that enable collaboration and communication among organisational knowledge.
members of an organisation (i.e. chat applications,
Lesson Learned: Specific experience, positive
whiteboards).
or negative, of a certain domain. It is obtained into a
Document Management: It is the computerised practical context and can be used during future activi-
management of electronic, as well as paper-based ties of similar contexts.
documents.
Yellow Page: It storages information about a hu-
Institutional Memory: It is the physical storage of man or non-human source that has additional and/or
the knowledge entered in an organization. specialized knowledge about a particular subject.
Knowledge: Pragmatic level of information that
provides the capability of dealing with a problem or
making a decision.
988
989
Knowledge-Based Systems K
Adrian A. Hopgood
De Montfort University, UK
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Knowledge-Based Systems
990
Knowledge-Based Systems
and and
valve open flow high pressure high valve closed
approach is appropriate when a more tightly focused Hopgood (2001) summarizes deduction, abduction,
solution is required, such as the generation of a plan for and induction as follows:
a particular goal. In the example of a boiler monitoring
system, forward chaining would lead to the reporting deduction: cause + rule effect
of any recognised problems. In contrast, backward abduction: effect + rule cause
chaining might be used to diagnose a specific mode induction: cause + effect rule
of failure by linking a logical sequence of inferences,
disregarding unrelated observations. Logic Programming
The rules that make up the inference network in
Figure 2 are used to link cause and effect: Logic programming describes the use of logic to es-
tablish the truth, or otherwise, of a proposition. It is,
if <cause> then <effect> therefore, an underlying principle for rule-based sys-
tems. Although various forms of logic programming
Using the inference network, an inference can be have been explored, the most commonly used one is
drawn that if the valve is open and the flow rate is high the Prolog language (Bramer, 2005), which embodies
(the causes) then steam is escaping (the effect). This the features of backward chaining, pattern matching,
is the process of deduction. Many problems, such as and list manipulation.
diagnosis, involve reasoning in the reverse direction, The Prolog language can be programmed declara-
i.e. the user wants to ascertain a cause, given an effect. tively, although an appreciation of the procedural be-
This is abduction. Given the observation that steam is havior of the language is needed in order to program
escaping, abduction can be used to infer that valve is it effectively. Prolog is suited to symbolic problems,
open and the flow rate is high. However, this is only a particularly logical problems involving relationships
valid conclusion if the inference network shows all of between items. It is also suitable for tasks that involve
the circumstances in which steam may escape. This is data lookup and retrieval, as pattern-matching is fun-
the closed-world assumption. damental to the functionality of the language.
If many examples of cause and effect are available,
the rule (or inference network) that links them can be Symbolic Computation
inferred. For instance, if every boiler blockage ever
seen was accompanied by steam escaping and a stuck A knowledge base may contain a mixture of numbers,
valve, then rule2 above might be inferred from those letters, words, punctuation, and complete sentences.
examples. Inferring a rule from a set of example cases These symbols need to be recognised and processed
of cause and effect is termed induction. by the inference engine. Lists are a particularly useful
991
Knowledge-Based Systems
data structure for symbolic computation, and they are Use of Vague Language
integral to the AI languages Lisp and Prolog. Lists al-
low words, numbers, and symbols to be combined in Rule1, above, is based around the notion of a high
a wide variety of ways. A list in the Prolog language flow rate. There is uncertainty over whether high
might look like this: means a flow rate of the order of 1cm3s-1 or 1m3s-1.
[animal, [cat, dog], vegetable, mineral] Two popular techniques for handling the first two
sources of uncertainty are Bayesian updating and cer-
where this example includes a nested list, i.e. a list within tainty theory (Hopgood, 2001). Bayesian updating has
a list. In order to process lists or similar structures, the a rigorous derivation based upon probability theory, but
technique of pattern matching is used. For example, the its underlying assumptions, e.g., the statistical indepen-
above list in Prolog could match to the list dence of multiple pieces of evidence, may not be true
in practical situations. Certainty theory does not have
[animal, [_, X], vegetable, Y] a rigorous mathematical basis, but has been devised as
a practical and pragmatic way of overcoming some of
where the variables X and Y would be assigned values the limitations of Bayesian updating. It was first used
of dog and mineral respectively. This pattern matching in the classic MYCIN system for diagnosing infectious
capability is the basis of an inference engines ability diseases (Buchanan, 1984). Other approaches are re-
to process rules, facts and evolving knowledge. viewed in (Hopgood, 2001), where it is also proposed
that a practical non-mathematical approach is to treat
Uncertainty rule conclusions as hypotheses that can be confirmed or
refuted by the actions of other rules. Possibility theory,
The examples considered so far have all dealt with or fuzzy logic, allows the third form of uncertainty, i.e.
unambiguous facts and rules, leading to clear conclu- vague language, to be used in a precise manner.
sions. In real life, the situation can be complicated by
three forms of uncertainty: Decision Support and Analysis
Uncertainty in the Rule Itself Decision support and analysis (DSA) and decision
support systems (DSSs) describe a broad category
For example, rule1 (above) stated that an open valve and of systems that involve generating alternatives and
high flow rate lead to an escape of steam. However, if selecting among them. Web-based DSA, which uses
the boiler has entered an unforeseen mode, it made be external information sources, is becoming increasingly
that these conditions do not lead to an escape of steam. important. Decision support systems that use artificial
The rule ought really to state that an open valve and high intelligence techniques are sometimes referred to as
flow rate will probably lead to an escape of steam. intelligent DSSs.
One clearly identifiable family of intelligent DSS is
Uncertainty in the Evidence expert systems, described above. An expert system may
contain a mixture of simple rules based on experience
There are two possible reasons why the evidence upon and observation, known as heuristic or shallow rules,
which the rule is based may be uncertain. First, the and more fundamental or deep rules. For example,
evidence may come from a source that is not totally reli- an expert system for diagnosing car breakdowns may
able. For example, in rule1 there may be an element of contain a heuristic that suggests checking the battery
doubt whether the flow rate is high, as this information if the car will not start. In contrast, the expert system
relies upon a meter of unspecified reliability. Second, might also contain deep rules, such as Kirchoffs laws,
the evidence itself may have been derived by a rule which apply to any electrical circuit and could be used
whose conclusion was probable rather than certain. in association with other rules and observations to
diagnose any electrical circuit. Heuristics can often
992
Knowledge-Based Systems
993
Knowledge-Based Systems
This development will require progress in simulating Systems Man and Cybernetics Part C - Applications
behaviors that humans take for granted specifically and Reviews, 31, 269-281.
perception, recognition, language, common sense, and
Hopgood, A. A. (2001). Intelligent Systems for En-
adaptability. To build an intelligent system that spans
gineers and Scientists, 2nd edition. CRC Press, Boca
the breadth of human capabilities is likely to require
Raton.
a hybrid approach using a combination of artificial
intelligence techniques. Hopgood, A. A. (2003). Artificial intelligence: hype or
reality? IEEE Computer, 6, 24-28.
Hopgood, A.A. (2005). The state of artificial intel-
REFERENCES
ligence. Advances in Computers, 65, 1-75.
Bergmann, R., Althoff, K.-D., Breen, S., Gker, M., Li, G., Hopgood, A.A. and Weller, M.J. (2003). Shift-
Manago, M., Traphner, R., and Wess, S. (2003). De- ing Matrix Management: a model for multi-agent
veloping Industrial Case-Based Reasoning Applications cooperation. Engineering Applications of Artificial
the INRECA Methodology (2nd Edition). Lecture Intelligence, 16, 191-201.
Notes in Artificial Intelligence, Vol. 1612. Springer
- Buchreihe. Mateis, C., Stumptner, M., and Wotawa, F. (2000).
Locating bugs in Java programs - First results of the
Bramer, M.A. (2005), Logic Programming with Prolog. Java diagnosis experiments project. Lecture Notes in
Springer-Verlag, London. Artificial Intelligence, 1821, 174-183.
Bruninghaus, S. and Ashley, K. D. (2003). Combining Montani, S., Magni, P., Bellazzi, R., Larizza, C.,
case-based and model-based reasoning for predicting Roudsari, A. V., and Carson, E. R. (2003). Integrating
the outcome of legal cases. Lecture Notes in Artificial model-based decision support in a multi-modal rea-
Intelligence, 2689, 65-79. soning system for managing type 1 diabetic patients.
Artificial Intelligence in Medicine, 29, 131-151.
Buchanan, B. G. and Shortliffe, E. H. (1984). Rule-
Based Expert Systems: the MYCIN experiments of Moss, N.G., Hopgood, A.A. and Weller, M.J. (2007).
the Stanford Heuristic Programming Project, Addi- Can Agents without Concepts think? An Investigation
son-Wesley. using a Knowledge Based System. Proc. AI-2007: 27th
SGAI International Conference on Artificial Intel-
Choy, K.W., Hopgood, A.A., Nolle, L. and ONeill, B.C.
ligence, Cambridge, UK.
(2003). Design and implementation of an inter-process
communication model for an embedded distributed Yanga, B.S., Limb, D.S., and Tanc, A.C.C. (2005).
processing network. International Conference on Soft- VIBEX: an expert system for vibration fault diagnosis
ware Engineering Research and Practice (SERP03), of rotating machinery using decision tree and deci-
Las Vegas, 239-245. sion table. Expert Systems with Applications, 28(4),
735-742.
Choy, K.W., Hopgood, A.A., Nolle, L. and ONeill,
B.C. (2004). Implementation of a tileworld testbed
on a distributed blackboard system. 18th European
Simulation Multiconference (ESM2004), Magdeburg, KEy TERmS
Germany, 129-135.
De Koning, K., Bredeweg, B., Breuker, J., and Wiel- Backward Chaining: Rules are applied through
inga, B. (2000). Model-based reasoning about learner depth-first search of the rule base to establish a goal.
behaviour. Artificial Intelligence, 117, 173-229. If a line of reasoning fails, the inference engine must
backtrack and search a new branch of the search tree.
Fenton, W. G., Mcginnity, T. M., and Maguire, L. P. This process is repeated until the goal is established
(2001). Fault diagnosis of electronic systems using or all branches have been explored.
intelligent techniques: a review. IEEE Transactions on
994
Knowledge-Based Systems
995
996
Ghislain Deleplace
Universit de Paris VIII LED, France
Patrice Gaubert
Universit de Paris 12 ERUDITE, France
Lucien Gillard
CNRS LED, France
Madalina Olteanu
Universit de Paris I CES SAMOS, France
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Kohonen Maps and TS Algorithms: Clear Convergence
997
Kohonen Maps and TS Algorithms: Clear Convergence
16
15.9
15.8
15.7 P aris
H am burg
London
15.6
15.5
15.4
15.3
15.2
class 1 class 2 class 3 class 4 class 5 class 6
14 1.92 26 1.92
1.9 1.9
25.8
13.8
1.88
1.88
25.6
13.6 1.86
1.86
P/F - P/M
P/F - P/M
M/F
M/F
1.84
1.84 25.4
13.4
1.82
1.82
25.2
13.2
1.8
1.8
25 1.78
13 1.78 class 1 class 2 class 3 class 4 class 5 class 6
class 1 class 2 class 3 class 4 class 5 class 6
P ound in Francs M ark in Franc
Pound in M ark M ark in Franc
sharply contrasted by the levels of the exchange rates. These remarks, which also apply respectively to the
Years 1821-23 and 1826 (class 5) are marked by a low rest of the 1820s (class 3) and to the rest of the 1850s
mark/franc exchange rate and high gold-silver prices, (class 6) are consistent with historical analysis: while
the Hamburg one being higher than the Paris one; years the Hamburg mark was always anchored to silver, the
1854-60 (class 4) are marked by a high mark/franc French franc was during the 1820s and 1850s anchored
exchange rate and low gold-silver prices, the Hamburg to gold (in contrast with the 1830s and 1840s when it
one being below the Paris one. was anchored to silver); it is then normal that the mark
998
Kohonen Maps and TS Algorithms: Clear Convergence
depreciated against the franc when silver depreciated The parameters of the model are then
against gold, and more in Hamburg than in Paris (as in K
class 5 and 3), and that the mark appreciated against the {a , a ,..., a , a
1
0
1
1
1
l
2
0 }
, a12 ,..., al2 , S 1 , S 2 , p, q
franc when silver appreciated against gold, and more
in Hamburg than in Paris (as in class 4 and 6).
and they are usually estimated by maximizing the log-
likelihood function via an EM (Expectation Maxi-
mization) algorithm3.
A mODEL FOR THE SPREAD Our characteristic of interest will be the a posteriori
BETWEEN THE HIGHEST AND THE computed conditional probabilities of belonging to the
LOWEST GOLD-SILVER PRICE first or to the second regime. Indeed, as our goal is to
derive a periodization of the international bimetallism,
An Autoregressive Markov Switching the a posteriori computed states of the unobserved
Model Markov chain will provide a natural one.
Although the results obtained with a switching
The key assumption is that the time series to be modeled Markov model are usually satisfying in terms of predic-
follow a different pattern or a different model according tion and the periodizations are interesting and easily
to some unobserved, finite valued process. Usually, the interpretable, a difficulty remains: how does one choose
unobserved process is a Markov chain whose states are the number of regimes? In the absence of a complete
called regimes, while the observed series follows a theoretical answer, the criteria for selecting the right
linear autoregressive model whose coefficients depend number of regimes are quite subjective from a statisti-
on the current regime. cal point of view4.
Let us put this in a mathematical language. Sup-
pose that (yt)tZ is the observed time series and that The Results
the unobserved process (xt)tZ is a two-states Markov
chain with transition matrix In this paper we use a two-regime model to represent the
spread computed with the gold-silver prices observed
p 1 q at each period on the three places. The transition matrix
A= , where p, q ]0,1[ (1) indicates good properties of stability:
1 p q
0.844298 0.253357
Then, assuming that yt depends on the first l lags of
0.155702 0.746643
time, we have the following equation of the model:
y t = a 0xt + a1xt y t 1 + ... + alxt y t l + S xt E t (2) and no three regime model was found with an accept-
able stability.
where aixt {ai1 , ai2 } R 2 for every i {0,1,..., l}, The first regime is a multilayer perceptron with one
hidden layer, the second one is a simple linear model
S xt {S 1 , S 2} (R+*) and t is a standard Gaussian
2
with one lag. Using the probabilities computed for each
noise. regime at each period, it may be interesting to study
y t y t +1
f x t (y t 1 , y t 2 )+ x t f x t +1 ( y t , y t 1 ) + x t +1 t+1
t
P ( x xt )
x t
t +1
x t +1
999
Kohonen Maps and TS Algorithms: Clear Convergence
the six sub-periods obtained and to observe the switch changed. The changes, whose number and configura-
between the regimes along these periods of time. tion are unknown, occur in the marginal distribution
Most of the time the regime 1 explains the spread and may be in mean, in variance or in both mean and
(about 70% of the whole period) but important differ- variance. We assume that there exists an integer K*
ences are to be noted between the sub-periods: and a sequence of change-points T ={T 1 ,...,T K } with
* * *
*
Classes 3 and 4 clearly contrast with, respectively, T 0 = 0 < T 1 < ... < T K 1 < T K = T such that (k, k) (k+1,
* *
*
*
*
*
the highest and the lowest volatility of spread as they k+1) where k = E(Yt) and k = Cov(Yt) = E(Yt E(Yt))(Yt
are ruled by, respectively, regime 2 and regime 1 E(Yt))T, T k*1 + 1 t T k*.
models. The numbers of changes as well as their configura-
As will be explained later, further investigations tion are computed by minimizing a penalized contrast
have to be made with a more complex model and us- function. Details on the algorithms for computing the
ing a more adapted indicator of the arbitrages ruling change-points configuration * can be found in Lavielle
the markets. and Teyssire (2006)6.
1000
Kohonen Maps and TS Algorithms: Clear Convergence
0.8 5
1824-21 1825-41 1839-45 1843-13 1850-46 1854-41
0.7
4
0.6
0.5
3
prob1
0.4
2
0.3
0.2
1
0.1
0 0
03
03
03
03
03
03
03
03
03
03
03
03
03
03
03
03
03
03
03
03
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
spread prob1
shocks affecting the integration process of the LPH The second episode runs from the 45th week of 1839
system, although the shocks may have been asym- till the 13th week of 1843. It started with the attempt
metrical (only one or two of the financial centres being of Prussia to unify the numerous German-speaking
initially hit) or symmetrical (the three of them being independent states in a common monetary zone, on a
simultaneously hit). silver standard. Since the Bank of Hamburg maintained
The first episode runs from the 21st week of 1824 till the price of silver fixed, that pressure on silver led to a
the 41st week of 1825. The sharp initial increase in the drop in the Hamburg price of gold, and consequently
spread may be explained by two opposite movements in in its gold-silver price, at a time when it was more or
London and Hamburg: on one side, heavy speculation less stabilized in Paris. The spread between the high-
in South-American bonds and Indian cotton fuelled est (Paris) and the lowest (Hamburg) gold-silver price
in London the demand for foreign payments in silver, suddenly was enlarged, and during more than three
which resulted in a great increase in the price of silver years remained at a level significantly higher than
and a corresponding decline in the gold-silver price; during the 14 preceding years. This episode ended
on the other side, the price of gold rose in Hamburg with the breaking of the 13th week of 1843, when, this
while the price of silver remained constant, sparkling shock having been absorbed, the gold-silver price in
the huge spread between the highest (Hamburg) and Hamburg went back in line with the price in the two
the lowest (London) gold-silver prices. More than one other financial centres.
year later, the opposite movements took place: the price The third episode runs from the 46th week of 1850
of gold plunged in Hamburg, while the price of silver till the 41st week of 1854. The shock was then sym-
remained at its height in London, under the influence metrical: London, Paris and Hamburg were hit by the
of continuing speculation (which would end up in the pouring of gold following the discovery of Californian
famous banking crisis of December 1825); consequently mines, and the sudden downward pressure on the world
the spread abruptly narrowed, this event being reflected price of that metal. It took four years to absorb this
by the breaking of the 41st week of 1825.
1001
Kohonen Maps and TS Algorithms: Clear Convergence
enormous shock, as reflected by the breaking of the riodization of International Bimetallism (1821-1873),
41st week of 1854. Investigacion Operacional (forthcoming).
Cottrell, M., Fort, E.C. & Pags, G. (1997), Theo-
retical aspects of the Kohonen Algorithm WSOM97,
CONCLUSION
Helsinki 1997.
In the three cases, the integration process of the LPH Cottrell, M., Gaubert, P., Letremy, P., Rousset, P., Ana-
system, shown by the downward trend of the spread lysing and Representing Multidimensional Quantitative
over half a century, was jeopardized by a shock: a and Qualitative Data. Demographic Study of the Rhone
speculative one in 1824, an institutional one in 1839, Valley. The Domestic Consumption of the Canadian
a technological one in 1850. But the effects of these Families, in Kohonen Maps, E. Oja and S. Kaski Eds.,
shocks were absorbed after some time, thanks to ac- Elsevier Science, Amsterdam, 1999.
tive arbitrage operations between the three financial
Hamilton, J. D. (1989). A new approach to the eco-
centres of the system. Generally, that arbitrage did not
nomic analysis of non-stationary time series and the
imply the barter of gold for silver but the coupling of
business cycle, Econometrica, 57, 357-84
a foreign exchange operation (on bills of exchange)
with the transport of one metal only. Kohonen, T. Self-Organization and Associative Mem-
As a consequence, it would be appropriate in a ory. (3rd edition 1989), Springer, Berlin, 1984.
further study to locate the breakings of another indica-
tor of integration: the spread between a representative Lavielle, M. (1999), Detection of multiple changes in
national gold-silver price and an arbitrated interna- a sequence of dependant variables, Stochastic Process.
tional gold-silver price taking into account the foreign Appl., vol. 83, pp 79-102.
exchange rates. At the same time it would be interesting Lavielle, M. & Teyssire, G. (2005), Adaptative detec-
to go further with the Markov switching model, trying tion of multiple change-points in asset price volatility,
more complete specifications. in Teyssire G. & Kirman Eds. A. Long-Memory in
Economics, Springer, Berlin, pp.129-156.
1002
Kohonen Maps and TS Algorithms: Clear Convergence
KEy TERmS the stability of the exchange rates between them. Its
working depends on the monetary rules adopted in each K
Change-Point: Instant of time where the basic country and on international arbitrage (see that defini-
parameters of time series change (in mean and/or in tion) between the foreign exchange markets. Historical
variance); the series may be considered as a piecewise examples are the gold-standard system (1873-1914) and
stationary process between two change-points the Bretton-Woods system (1944-1976). The paper stud-
ies some characteristics of another historical example:
Gold-Silver Price: Ratio of the market price of gold international bimetallism (see that definition).
to the market price of silver in one place. The stability
of that ratio through time and the convergence of its Markov Switching Model: An autoregressive
levels in the various places constituting the interna- model where the process linking a present value to its
tional bimetallism (see that definition) are tests of the lags is an hidden Markov chain defined by its transi-
integration of that system. tion matrix
International Arbitrage: Activity of traders in gold SOM Algorithm: An unsupervised technique of
and silver and in foreign exchange, which consisted in classification (Kohonen,1984) combining adaptative
comparing their prices in different places, and in moving learning and neighbourhood to construct a very stable
the precious metals and the bills of exchange accord- classification, with a more simple interpretation (Ko-
ingly, in order to make a profit. Arbitrage and monetary honen maps) than other techniques.
rules were the two factors explaining the working of
international bimetallism (see that definition).
International Bimetallism: An international mon- ENDNOTES
etary system (see that definition) which worked from
1821 to 1873. It was based on gold and silver acting as 1
Details may be found in Boyer-Xambeu, ...,
monetary standards, either together in the same country Olteanu, 2006.
(like France) or separately in different countries (gold 2
See Cottrell M., Fort(1997) and Cottrell M.,
in England, silver in German and Northern states). The Gaubert(1999).
integration of that system was reflected in the stabil- 3
See Rynkiewicz (2004) and Maillet et al.
ity and the convergence of the observed levels of the (2004).
relative price of gold to silver (see that definition) in 4
See Olteanu et al. (2006).
London, Paris, and Hamburg. 5
The authors are very grateful to Gilles Teyssire
for a significant help on this part.
International Monetary System: A system link- 6
See also Lavielle M. and Teyssire G. (2005),
ing the currencies of various countries, which ensures
Teyssire G. (2003) and Lavielle M. (1999).
1003
1004
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Learning in Feed-Forward Artificial Neural Networks I
and the Levenberg-Marquardt algorithm, which 4. Set three new points a,b,c out of and the two
are local optimization methods (Luenberger, points among the old a,b,c having the lowest E. L
1984). Repeat from 2.
First-order methods. The gradient Ew of an s- Although it is possible to locate the nearby global
dimensional function is the vector field of first deriva- minimum, the cost can become prohibitedly high.
tives of E(w) w.r.t. w, The line search can be replaced by a fixed step size ,
which has to be carefully chosen. A sufficiently small
E( w ) E( w ) is required such that Ew(ui) is effectively very
Ew= ( ,..., ) (2)
w1 ws small and the expansion (3) can be applied. A too large
value might cause to overshoot or lead to divergent
oscillations and a complete breakout of the algorithm.
Here s=dim(w). A linear approximation to E(w) in
On the other hand, very small values translate in a pain-
an infinitesimal neighbourhood of an arbitrary point
fully slow minimization. In practice, a trial-and-error
wi is given by:
process is carried out.
A popular heuristic is a historic average of previ-
E(w) E(wi)+Ew(wi)(w-wi) (3)
ous changes to exploit tendencies and add inertia
to the descent, accomplished by adding a so-called
We write Ew(wi) for the gradient Ew evalu-
momentum term iwi-1, where wi-1 is the previous
ated at wi. These are the first two terms of the Taylor
weight update (Rumelhart, Hinton & Williams, 1986).
expansion of E(w) around wi. In steepest or gradient
This term helps to avoid or smooth out oscillations in
descent methods, this local gradient alone determines
the motion towards a minimum. In practice, it is set
the minimization direction ui. Since, at any point wi,
to a constant value (0.5,1). Altogether, for steepest
the gradient Ew(wi) points in the direction of fastest
descent, the update equation (1) reads:
increase of E(w), an adjustment of wi in the negative
direction of the local gradient leads to its maximum
wi+1=wi+iui+wi-1 (4)
decrease. In consequence the direction ui= -Ew(wi)
is taken.
where ui=-Ew(wi) and wi-1=wi-wi-1. This method
In conventional steepest descent, the step size i
is very sensitive to the chosen values for i and , to
is obtained by a line search in the direction of ui: how
the point that different values are required for different
far to go along ui before a new direction is chosen.
problems and even for different stages in the learning
To this end, evaluations of E(w) and its derivatives
process (Toolenaere, 1990). The inefficiency of the
are made to locate some nearby local minimum. Line
steepest descent method stems from the fact that both
search is a move in the chosen direction ui to find the
ui and i are somewhat poorly chosen. Unless the first
minimum of E(w) along it. For this one-dimensional
step is chosen leading straight to a minimum, the itera-
problem, the simplest approach is to proceed along ui
tive procedure is very likely to wander with many small
in small steps, evaluating E(w) at each sampled point,
steps in zig-zag. Therefore, these methods are quite out
until it starts to increase. One often used method is a
of use nowadays. A method in which both parameters
divide-and-conquer strategy, also called Brents method
are properly chosen is the conjugate gradient.
(Fletcher, 1980):
Conjugate Gradient. This minimization technique
(explained at length in Shewchuck, 1994) is based on
1. Bracket the search by setting three points a<b<c
the idea that a new direction ui+1 should not spoil
along ui such that E(aui)>E(bui)<E(cui). Since
previous minimizations in the directions ui,ui-1,...,u1.
E is continuous, there is a local minimum in the
This is the case if we simply choose ui=-gi, where
line joining a to c.
gi=Ew(wi), as was found above for steepest descent.
2. Fit a parabola (a quadratic polynomial) to a,b,c.
At most points on E(w), the gradient does not point
3. Compute the minimum of the parabola in the
directly towards the minimum. After a line minimiza-
line joining a to c. This value is an approximation
tion, the new gradient gi+1 is orthogonal to the line
of the minimum of E in this interval.
1005
Learning in Feed-Forward Artificial Neural Networks I
search direction, that is, gi+1ui=0. Thus, successive the squared norm of the previous gradient. For the
search directions will also be orthogonal, and the er- Fletcher-Reeves updating:
ror function minimization will proceed in zig-zag in a
extremely slow advance to a minimum. g
i +1
. g
i +1
g
i +1
. (g
i +1 i
-g ) where Hw=Ew is the Hessian ss matrix of second
i= i i
(8) derivatives, with components
g .g
2 E( w )
This is the inner product of the previous change Hw=(hij), hij= .
wi w j
in the gradient with the current gradient, divided by
1006
Learning in Feed-Forward Artificial Neural Networks I
Again, Hw(wi) indicates the evaluation of Hw in wi. descent direction method can be obtained by setting
The Hessian traces the curvature of the error function in G=I. L
weight space, portraying information about how Ew In order to implement Equation (14), an approximate
changes in different directions. Given a direction ui from inverse G of the Hessian matrix is needed. Two com-
wi, the product Hw(wi)ui is the rate of change of the monly used algorithms are the Davidon-Fletcher-Powell
gradient along ui from wi. Differentiating (10) w.r.t. w, (DFP) algorithm and the Broyden-Fletcher-Goldfarb-
a local approximation of the gradient around wi is: Shanno (BGFS) algorithm. The DFP algorithm is
given by
Ew Ew(wi)+Hw(wi)(w-wi) (11)
i +1 i i +1 i
G(i+1) = G(i) + ( w - w ) ( w - w )
For points close to wi, (10) and (11) give reason- i +1 i i +1 i
( w - w ).( g - g )
able approximations to the error function E(w) and to
its gradient. Setting (11) to zero, and solving for w, i +1 i i +1 i
[G (i) .( g - g )] (G (i) .[ g - g )]
one gets:
+ (g
i +1 i
- g ).G(i) .( g
i +1
-g )
i , (15)
w* = -H-1w(wi)Ew(wi) (12)
where denotes outer product, which is a matrix: the
Newtons method uses the second partial derivatives i,j component of u v is uivj. The BGFS algorithm is
of the objective function and hence is a second order exactly the same, but with one additional term:
method, finding the minimum of a quadratic function i +1 i i +1 i
in just one iteration. The vector H-1w(wi)Ew(wi) is ( w - w )( w - w )
G(i+1) = G(i) + i +1 i i +1 i
known as the Newton train direction. Since higher-order ( w - w ).( g - g )
terms have been neglected, the update formula (12) is
used iteratively to find the optimal solution. An exact [G (i) .( g
i +1 i
- g )] (G (i) .[ g
i +1 i
- g )]
evaluation of the Hessian is computationally demanding + i +1 i i +1 i ,
if done at each stage of an iterative algorithm; the Hes- (g - g ).G(i) .( g -g )
sian matrix must also be inverted. Further, the Newton
train direction may move towards a maximum or a + ((gi+1-gi)G(i)(gi+1-gi)) v v, (16)
saddle point rather than a minimum (if the Hessian is
not positive definite) and the error would not be guar- where the vector v is given by
anteed to be reduced. This motivates the development
of alternative approximation methods. (w -w )
i +1 i
1007
Learning in Feed-Forward Artificial Neural Networks I
respect to the weights, and e is the vector of network E(w)=1/2 (y,k - ,c+1k)2 (20)
M =1 k =1
errors. The Jacobian can be computed via standard
backpropagation, a much less complex process than
computing the Hessian. where ,c+1k=Fw(x)k is the k-th component of the
The Levenberg-Marquardt algorithm uses this ap- networks response to input pattern x (the network
proximation to the Hessian in the following Newton- has c+1 layers, of which c are hidden). For a given
like update: input pattern x, define:
E(w)=1/2 k =1
(y,k - ,c+1k)2 (21)
When the scalar is zero, this is Newtons method
using the approximate Hessian matrix. When is large, so that
this becomes gradient descent with a small step size.
p
Newtons method is faster and more accurate near an E(w) = E (w).
error minimum, so the aim is to shift towards Newtons M =1
method as quickly as possible. Thus, is decreased after
each successful step (reduction in objective function) The computation of a single unit i in layer l, 1 l c+1
and is increased only when a temptative step would upon presentation of pattern x to the network may be
increase it. When computation of the Jacobian becomes expressed 6 iM ,l =g( 6iM ,l ), with g a smooth function as
prohibitedly high for big networks on a large number the sigmoidals and
of training examples, the quasi-Newton method is
preferred. 6iM ,l wijl 6 Mj ,l 1
j
.
THE BACK-PROPAGATION ALGORITHm The first outputs are then defined ,0i = x,i. A single
weight wijl denotes the connection strength from neuron
This algorithm represented a breakthrough in connec- j in layer l-1 to neuron i in layer l, 1 l c+1.
tionist learning due to the possibility to compute the If the gradient-descent rule (1) with constant is
gradient of the error function Ew(wi) recursively and followed, then ui= Ew(wi). Together with (2), the
efficiently. It is so-called because the components of increment wij in a single weight wijl of w is:
the gradient concerning weights belonging to output
units are computed first, and propagated backwards
(toward the inputs) to compute the rest, in the order E (w ) p E M (w ) p
$wijl nA l nA l nA $M wijl
marked by the layers. Intuitively, the algorithm finds the wij M =1 wij M =1
1008
Learning in Feed-Forward Artificial Neural Networks I
E M (w ) E M (w ) 6 kM +1,l
Assuming g to be the logistic function with slope
6 iM ,l
= k 6 M +1,l 6 M ,l
: k i (31)
g' 6iM ,l B g 6iM ,l ng 6iM ,l B 6 iM ,l n 6 iM ,l Now l<c+1 means there is at least one more layer
(26) l+1 to the right of layer l which is posterior in the feed-
forward computation, but that precedes l in the opposite
The remaining expression direction. Therefore E(w) functionally depends on
layer l+1 before than layer l, and the derivative can be
E M (w ) split in two. The summation over k is due to the fact
6 iM ,l that E(w) depends on every neuron in layer l+1.
Rewriting (31):
is more delicate and constitutes the core of the algorithm.
We develop two separate cases: l=c+1 and l<c+1. The E M (w ) E M (w ) 6kM ,l +1
first case corresponds to output neurons, for which an 6 iM ,l
= k 6 M +1,l 6 M ,l k ,l+1 wkil+1
k i
expression of the derivative is immediate from (21):
(32)
E M (w ) since
= (y,ii,c+1) (27)
6 iM ,c +1
6 kM ,l +1 wkjl+1 j ,l
Incidentally, collecting all the results so far and j
assuming c=0 (no hidden layers), the delta rule for
non-linear single-layer networks is obtained: and thus
p p
6kM ,l +1
$M wij A y,in 6 iM g' 6iM 6 Mj , 0 A DM i xM,j = wkil+1.
M =1 M =1 6 iM ,l
(28)
Putting together (24), (25), (27) and (32) in (23):
where the superscript l is dropped (since c=0), and
i = (y,i 6 iM ) g( 6iM ). Another name for the back- E (w ) i ,l i ,l
propagation algorithm is generalized delta rule. Con- wijl = = ,l i j ,l 1
sequently, for l1, the deltas (errors local to a unit) i ,l i ,l wijl
(33)
are defined as:
where
E M (w )
,li= (29)
6iM ,l ,li=g'( i ,l ) ,l+1k wkil+1
k ,
For the output layer l=c+1, ,li= (y,i 6 iM ) g(
6 ). In case g is the logistic, we finally have:
M the deltas for the hidden units. We show the method
i
(Fig. 1) in algorithmic form, for an epoch or presenta-
tion of the training set.
,li=(y,i i ,c +1 ) i ,l (1 i ,l )
(30)
1009
Learning in Feed-Forward Artificial Neural Networks I
forall in 1p
1. F orward pass. Present x and compute
the outputs i ,l of all the units.
2. Backward pass. Compute the deltas ,li
of all the units (the local gradients), as follows:
l=c+1: ,li= g'( i ,l ) (y,i i ,l )
l<c+1: ,li=g'( i ,l ) k
,l+1k wkil+1
,l 1
3. wijl=,li j
E nd
p
Update weights as wij l= wijl
=1
1010
Learning in Feed-Forward Artificial Neural Networks I
1011
1012
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Learning in Feed-Forward Artificial Neural Networks II
6. Bad generalization, which can be due to several There are many ways of optimizing the hidden-
causes: the use of poor training data or attempts layer parameters. When the number of hidden neurons L
to extrapolate beyond them, an excessive number equals the number of patterns, each pattern may be
of hidden units, too long training processes or a taken to be a center of a particular neuron. However,
badly chosen regularization. All of them can lead the aim is to form a representation of the probability
to an overfitting of the training data, in which density function of the data, by placing the centres in
the ANN adjusts the training set merely as an only those regions of the input space where significant
interpolation task. data are present. One commonly used method is the
7. Not amenable to inspection. It is generally arduous k-means algorithm (McQueen, 1967), which in turn
to interpret the knowledge learned, especially in is an approximate version of the maximum-likelihood
large networks or with a high number of model (ML) solution for determining the location of the means
inputs. of a mixture density of component densities (that is,
maximizing the likelihood of the parameters with re-
spect to the data). The Expectation-Maximization (EM)
LEARNING IN RBF NETWORKS algorithm (Duda & Hart, 1973) can be used to find the
exact ML solution for the means and covariances of
A Radial Basis Function network is a type of ANN that the density. It seems that EM is superior to k-means
can be viewed as the solution of a high-dimensional (Nowlan, 1990). The set of centres can also be selected
curve-fitting problem. Learning is equivalent to finding randomly from the set of data points.
a surface providing the best fit to the data. The RBF The value of the smoothing term can be obtained
network is a two-layered feed forward network using a from the clustering method itself, or else estimated a
linear transfer function for the output units and a radi- posteriori. One popular heuristic is:
ally symmetric transfer function for the hidden units.
The computation of a hidden unit is expressed as the d
composition of two functions, as: Q=
2M (2)
Fi(x ) = {g(h(x,wi)), wiRn}, xRn (1)
where d is the maximum distance between the chosen
with the choice h(x,wi)= ||x-wi||/ (or other distance centers and M is the number of centers (hidden units).
measure), with >0 a smoothing term, plus an activa- Alternatively, the method of Distance Averaging
tion g which very often is a monotonically decreasing (Moody and Darken, 1989) can be used, which is the
response from the origin. These units are localized, in global average over all Euclidean distances between the
the sense that they give a significant response only in center of each unit and that of its nearest neighbor.
a neighbourhood of their centre wi. For the activation Once these parameters are chosen and kept constant,
function a Gaussian g(z)=exp(-z2/2) is a preferred assuming the output units are linear, the (square) error
choice. function is quadratic, and thus the hidden-to-output
Learning in RBF networks is characterized by the weights can be fast and reliably found iteratively by
separation of the process in two consecutive stages simple gradient descent over the quadratic surface of
(Haykin, 1994), (Bishop, 1995): the error function or directly by solving the minimum
norm solution to the over determined least-squares data
1. Optimize the free parameters of the hidden layer fitting problem (Orr, 1995).
(including the smoothing term) using only the The whole set of parameters of a RBF network
{x}i in D. This is an unsupervised method that can also be optimized with a global gradient descent
depends on the input sample distribution. procedure on all the free parameters at once (Bishop,
2. With these parameters found and frozen, optimize 1995), (Haykin, 1994). This brings back the problems
the {ci}i, the hidden-to-output weights, using the of local minima, slow training, etc, already discussed.
full information in D. This is a supervised method However, better solutions can in principle be found,
that depends on the given task. because the unsupervised solution focuses on esti-
1013
Learning in Feed-Forward Artificial Neural Networks II
mating the input probability density function, but the for which continuous EA are most naturally suited.
resulting disposition may not be the one minimizing However, it is difficult to find applications in which
the square error. GA (or other EA, for that matter) have clearly outper-
formed DBLA for supervised training of feed-forward
neural networks (Whitley, 1995). It has been pointed
EVOLUTIONARy LEARNING out that this task is inherently hard for algorithms that
ALGORITHmS rely heavily on the recombination of potential solutions
(Radcliffe, 1991). In addition, the training times can
The alternative to derivative-based learning algorithms become too costly, even worse than that for DBLA.
(DBLA) are Evolutionary Algorithms (EA) (Back, In general, Evolutionary Algorithms particularly,
1996). Although the number of successful specific the continuous ones are in need of specific research
applications of EA is counted by hundreds (see (Back, devoted to ascertain their general validity as alternatives
Fogel & Michalewicz, 1997) for a review), only Genetic to DBLA in neural network optimization. Theoretical
Algorithms or GA (Goldberg, 1989) and, to a lesser as well as practical work, oriented to tailor specific
extent, Evolutionary Programming (Fogel, 1992), EA parameters for this task, together with a special-
have been broadly used for ANN optimization, since ized operator design should pave the way to a fruitful
the earlier works using genetic algorithms (Montana assessment of validity.
& Davis, 1989). Evolutionary algorithms operate on
a population of individuals applying the principle of
survival of the fittest to produce better approximations FUTURE TRENDS
to a solution. At each generation, a new population is
created by selecting individuals according to their level Research in ANN currently concerns the development
of fitness in the problem domain and recombining them of learning algorithms for weight adaptation or, more
using operators borrowed from natural genetics. The often, the enhancement of existing ones. New archi-
offspring also undergo mutation. This process leads to tectures (ways of arranging the units in the network)
the evolution of populations of individuals that are bet- are also introduced from time to time. Classical neuron
ter suited to their environment than the individuals that models, although useful and effective, are lessened to a
they were created from, just as in natural adaptation. few generic function classes, of which only a handful
There are comprehensive review papers and guides of instances are used in practice.
to the extensive literature on this subject: see (Shaffer, One of the most attractive enhancements is the
Whitley & Eshelman, 1992), (Yao, 1993), (Kusu & extension of neuron models to modern data mining
Thornton, 1994) and (Balakrishnan & Honavar, 1995). situations, such as data heterogeneity. Although a feed-
One of their main advantages over methods based on forward neural network can in principle approximate
derivatives is the global search mechanism. A global an arbitrary function to any desired degree of accuracy,
method does not imply that the solution is not a local in practice a pre-processing scheme is often applied to
optimum; rather, it eliminates the possibility of getting the data samples to ease the task. In many important
caught in local optima. Another appealing issue is the domains from the real world, objects are described by
possibility of performing the traditionally separated a mixture of continuous and discrete variables, usually
steps of determining the best architecture and its weights containing missing information and characterized by
at the same time, in a search over the joint space of an underlying vagueness, uncertainty or imprecision.
structures and weights. Another advantage is the use For example, in the well-known UCI repository (Mur-
of potentially any cost measure to assess the goodness phy & Aha, 1991) over half of the problems contain
of fit or include structural information. Still another explicitly declared nominal attributes, let alone other
possibility is to embody a DBLA into a GA, using the discrete types or fuzzy information, usually unreported.
latter to search among the space of structures and the This heterogeneous information should not be treated
DBLA to optimize the weights; this hybridization leads in general as real-valued quantities. Conventional
to extremely high computational costs. Finally, there ways of encoding non-standard information in ANN
is the use of EA solely for the numerical optimization include (Prechelt, 1994), (Bishop, 1995), (Fiesler &
problem. In the neural context, this is arguably the task Beale, 1997):
1014
Learning in Feed-Forward Artificial Neural Networks II
Ordinal variables. These variables correspond limitations, a veritable qualitative origin, or even we
to discrete (finite) sets of values wherein an ordering can be interested in introducing imprecision with the L
has been defined (possibly only partial). They are purpose of augmenting the capacity for abstraction
more than often treated as real-valued, and mapped or generalization (Esteva, Godo & Garca), possibly
equidistantly on an arbitrary real interval. A second because the underlying process is believed to be less
possibility is to encode them using a thermometer. To precise than the available measures.
this end, let k be the number of ordered values; k new In Fuzzy Systems theory there are explicit formal-
binary inputs are then created. To represent value i, isms for representing and manipulating uncertainty,
for 1ik, the leftmost 1,...,i units will be on, and the that is precisely what the system best models and
remaining i+1,...,k off. manages. It is perplexing that, when supplying this
The interest in these variables relies in that they kind of input/output data, we require the network to
appear frequently in real domains, either as symbolic approximate the desired output in a very precise way.
information or from processes that are discrete in nature. Sometimes the known value takes an interval form:
Note that an ordinal variable need not be numerical. between 5.1 and 5.5, so that any transformation
Nominal variables Nominal variables are unani- to a real value will result in a loss of information. A
mously encoded using a 1-out-of-k representation, more common situation is the absence of numerical
being k the number of values, which are then encoded knowledge. For example, consider the value fairly
as the rows of the Ikk identity matrix. tall for the variable height. Again, Fuzzy Systems
Missing values Missing information is an old issue are comfortable, but for an ANN this is real trouble.
in statistical analysis (Little & Rubin, 1987). There are The integration of symbolic and continuous informa-
several causes for the absence of a value. They are very tion is also important because numeric methods bring
common in Medicine and Engineering, where many higher concretion, whereas symbolic methods bring
variables come from on-line sensors or device mea- higher abstraction. Their combined use is likely to
surements. Missing information is difficult to handle, increase the flexibility of hybrid systems. For numeric
especially when the lost parts are of significant size. data, an added flexibility is obtained by considering
It can be either removed (the entire case) or filled in imprecision in their values, leading to fuzzy numbers
with the mean, median, nearest neighbour, or encoded (Zimmermann, 1992).
by adding another input equal to one only if the value
is absent and zero otherwise. Statistical approaches
need to make assumptions about or model the input CONCLUSION
distribution itself. The main problem with missing data
is that we never know if all the efforts devoted to their As explained at length in other chapters, derivative-
estimation will revert, in practice, in better-behaved based learning algorithms make a number of assump-
data. This is also the reason why we develop on the tions about the local error surface and its differentiabil-
treatment of missing values as part of the general dis- ity. In addition, the existence of local minima is often
cussion on data characteristics. The reviewed methods neglected or overlooked entirely. In fact, the possibility
pre-process the data to make it acceptable by models of getting caught in these minima is more than often
that otherwise would not accept it. In the case of miss- circumvented by multiple runs of the algorithm (that
ing values, the data is completed because the available is, multiple restarts from different initial points in
neural methods only admit complete data sets. weight space). This sampling procedure is actually
Uncertainty. Vagueness, imprecision and other an implementation of a very nave stochastic process.
sources of uncertainty are considerations usually put A global training algorithm for neural networks is the
aside in the ANN paradigm. Nonetheless, many vari- evolutionary algorithm, a stochastic search training
ables in learning processes are likely to bear some form algorithm based on the mechanics of natural genetics
of uncertainty. In Engineering, for example, on-line and biological evolution. It requires information from
sensors are likely to get old with time and continuous the objective function, but not from the gradient vector
use, and this may be reflected in the quality of their or the Hessian matrix and thus it is a zero-order method.
measurements. In many occasions, the data at hand On the other hand, there is an emerging need to devise
are imprecise for a manifold of reasons: technical neuron models that properly handle different data types,
1015
Learning in Feed-Forward Artificial Neural Networks II
as is done in support vector machines (Shawe-Taylor Computing Sciences Dept. University of Sussex,
& Cristianini, 2004), where kernel design is a current England.
research topic.
Little, R.J.A., Rubin, D.B. (1987). Statistical analysis
with missing data. John Wiley.
McQueen, J. (1967) Some methods of classification and
REFERENCES
analysis of multivariate observations. In Procs. of the
5th Berkeley Symposium on Mathematics, Statistics
Balakrishnan, K., Honavar, V. (1995). Evolutionary
and Probability. LeCam and Neyman (eds.), University
design of neural architectures - a preliminary taxonomy
of California Press.
and guide to literature. Technical report CS-TR-95-01.
Dept. of Computer Science. Iowa State University. Montana, D.J., Davis, L. (1989). Training Feed-For-
ward Neural Networks using Genetic Algorithms. In
Bck, T. (1996). Evolutionary Algorithms in Theory
Proceedings of the 11th International Joint Conference
and Practice. Oxford Univ. Press, New York
on Artificial Intelligence. Morgan Kaufmann.
Bck, Th., Fogel D.B., Michalewicz, Z. (Eds., 1997)
Moody, J. and Darken, C. (1989): J. Moody and C.
Handbook of Evolutionary Computation. IOP Publish-
Darken. Fast learning in networks of locally-tuned
ing & Oxford Univ. Press.
processing units. Neural Computation, 1:281-294.
Bishop, C. (1995). Neural Networks for Pattern Rec-
Murphy, P.M., Aha, D. (1991). UCI Repository of ma-
ognition. Clarendon Press.
chine learning databases. UCI Dept. of Information
Esteva, F., Godo, L., Garca, P. (1998). Similarity-based and Computer Science.
Reasoning. IIIA Research Report 98-31, Instituto de
Nowlan, S. (1990). Max-likelihood competition in RBF
Investigacin en Inteligencia Artificial. Barcelona,
networks. Technical Report CRG-TR-90-2. Connec-
Spain.
tionist Research Group, Univ. of Toronto.
Duda, R.O., Hart, P.E. (1973) Pattern classification
Orr, M.J.L. (1995) Introduction to Radial Basis Func-
and scene analysis. John Wiley.
tion Networks. Technical Report of the Centre for
Fiesler, E., Beale, R. (Eds., 1997) Handbook of Neural Cognitive Science, Univ. of Edinburgh.
Computation. IOP Publishing & Oxford Univ. Press.
Poggio T., Girosi, F. (1989). A Theory of Networks for
Fogel, L.J. (1992). An analysis of evolutionary pro- Approximation and Learning. AI Memo No. 1140,
gramming. In Fogel and Atmar (Eds.) Procs. of the AI Laboratory, MIT.
1st annual conf. on evolutionary programming. La
Prechelt, L. (1994). Proben1: A set of Neural Network
Jolla, CA: Evolutionary Programming Society.
Benchmark Problems and Benchmarking Rules. Tech-
Goldberg, D.E. (1989). Genetic Algorithms for Search, nical Report 21/94. Universitt Karlsruhe.
Optimization & Machine Learning. Addison-Wesley.
Radcliffe, N.J. (1991). Genetic set recombination and
Haykin, S. (1994). Neural Networks: A Comprehensive its application to neural network topology optimiza-
Foundation. MacMillan. tion. Technical Report EPCC-TR-91-21. University
of Edinburgh.
Hertz, J., Krogh, A., Palmer R.G. (1991). Introduction
to the Theory of Neural Computation, Addison-Wesley, Shaffer, J.D., Whitley, D., Eshelman, (1992). L.J. Com-
Redwood City. bination of Genetic Algorithms and Neural Networks:
A Survey of the State of the Art. In Combination of
Hinton, G.E. (1989). Connectionist learning procedures. Genetic Algorithms and Neural Networks. Shaffer,
Artificial Intelligence, 40: 185-234. J.D., Whitley, D. (eds.), pp. 1-37.
Kusu, I., Thornton, C. (1994). Design of Artificial Shawe-Taylor, J. Cristianini, N. (2004). Kernel Methods
Neural Networks using genetic algorithms: review for Pattern Analysis. Cambridge University Press.
and prospect. Technical Report of the Cognitive and
1016
Learning in Feed-Forward Artificial Neural Networks II
Whitley, D. (1995). Genetic Algorithms and Neural Evolutionary Algorithm: A computer simulation in
Networks. In Genetic Algorithms in Engineering and which a population of individuals (abstract representa- L
Computer Science. Periaux, Galn, Cuesta (eds.), John tions of candidate solutions to an optimization problem)
Wiley. are stochastically selected, recombined, mutated, and
then removed or kept, based on their relative fitness
Yao, X. (1993). A Review of Evolutionary Artificial
to the problem.
Networks. Intl. Journal of Intelligent Systems, 8(4):
539-567, 1993. Feed-Forward Artificial Neural Network: Artifi-
cial Neural Network whose graph has no cycles.
Zimmermann, H.J. (1992). Fuzzy set theory and its
applications. Kluver. Learning Algorithm: Method or algorithm by vir-
tue of which an Artificial Neural Network develops a
representation of the information present in the learning
examples, by modification of the weights.
KEy TERmS
Neuron Model: The computation of an artificial
Architecture: The number of artificial neurons, its neuron, expressed as a function of its input and its
arrangement and connectivity. weight vector and other local information.
Artificial Neural Network: Information processing Weight: A free parameter of an Artificial Neural
structure without global or shared memory that takes the Network, that can be modified through the action of
form of a directed graph where each of the computing a Learning Algorithm to obtain desired responses to
elements (neurons) is a simple processor with internal certain input stimuli.
and adjustable parameters, that operates only when all
its incoming information is available.
1017
1018
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Learning Nash Equilibria in Non-Cooperative Games
strategies it is not possible for any player i to play a Then, the system creates n agents, one associated
different strategy i able to gain a better payoff of that to each player, and a referee. The agents will play L
gained by playing i. i is called a Nash equilibrium the game G k times, after each match, each agent
strategy for player i. will decide the strategy to play in the next match to
In 1951 J. F. Nash proved that a strategic (non- maximise his expected utility on the basis of his beliefs
cooperative) game G = (N, A, r) has at least a (Nash) about the strategies that the other agents are adopting.
equilibrium (Nash, 1951); in his honour, the compu- By analyzing the behaviour of each agent in all the k
tational problem of finding such equilibria is known matches of the game, SALENE presents to the user an
as NASH (Papadimitriou, 1994). estimate of a Nash Equilibrium of the game. The Agent
paradigm has represented a natural way of model-
ling and implementing the proposed solution as it is
SOFTWARE AGENTS FOR LEARNING characterized by several interacting autonomous entities
NASH EQUILIBRIA (players) which try to achieve their goals (consisting
in maximising their returns).
SALENE was conceived as a system for learning at The class diagram of SALENE is shown in Figure
least one Nash Equilibrium of a non-cooperative game 1.
given in the form G = (N; (Ai)iN; (ri)iN). In particular, The Manager Agent interacts with the user and it is
the system asks the user for: responsible for the global behaviour of the system. In
particular, after having obtained from the user the input
the number n of the players which defines the set parameters G and k, the Manager Agent creates both n
of players N = {1, 2, , n}; Player Agents and a Referee Agent that coordinates and
for each player iN, the related finite set of pure monitors the behaviours of the players. The Manager
strategies Ai and his payoff function ri : A1 Agent sends to all the agents the definition G of the game
An ; then he asks the Referee Agent to orchestrate k matches
the number k of times the players will play the of the game G. In each match, the Referee Agent asks
game. each Player Agent which pure strategy he has decided
to play, then, after having acquired the strategies from
all players, the Referee Agent communicates to each
Player Agent both the strategies played and the payoffs
gained by all players. After playing k matches of the
Figure 1. The class diagram of SALENE game G the Referee Agent communicates all the data
about the played matches to the Manager Agent which
analyses it and properly presents the obtained results
FIPAAgent
to the user.
A Player Agent is a rational player that, given the
game definition G, acts to maximise his expected util-
1 1 ity in each single match of G without considering the
1 1..*
overall utility that he could obtain in a set of matches.
ManagerAgent
1 1
RefereeAgent
1 1..*
PlayerAgent
In particular the behaviour of the Player Agent i can
1 1 1 0..* be described by the following main steps:
1
11 11 1
<<uses>> 1. In the first match the Player Agent i chooses to
<<uses>> <<uses>> <<uses>>
1 1 1 play a pure strategy randomly generated consider-
1 1 1
ing all the pure strategies playable with the same
ManagerBehaviour RefereeBehaviour PlayerBehaviour
probability: if |Ai|=m the probability of choosing
a pure strategy sAi is 1/m;
2. The Player Agent i waits for the Referee Agent to
GameDefinition
ask him which strategy he wants to play, then he
communicates to the Referee Agent the chosen
1019
Learning Nash Equilibria in Non-Cooperative Games
pure strategy as computed in step 1 if he is play- It is worth noting that a Player Agent for choosing
ing his first match or in step 4 otherwise; the mixed strategy to play in each match of G does
3. The Player Agent waits for the Referee Agent to not need to known the payoff functions of the others
communicate him both the pure strategies played players, in fact, for solving the optimization problem
and the payoffs gained by all players; P it only needs to consider the strategies which have
4. The Player Agent decides the mixed strategy to been played by the other players in all the previous
play in the next match. In particular, the Player matches.
Agent updates the beliefs about the mixed strate- The Manager Agent, receives from the Referee
gies currently adopted by the other players and Agent all the data about the k matches of the game G
consequently recalculate the strategy able to and computes an estimate of a Nash Equilibrium of
maximise his expected utility. Basically, the Player G, i.e. an N-tuple =(i)iN, i(Ai). In particular, in
Agent i tries to find the strategy i (Ai), such order to estimate i (the Nash equilibrium strategy of
that for any other strategy i (Ai), ri(i,-i) the player i), the Manager Agent computes, on the basis
ri(i,-i) where ri denotes his expected payoff and of the pure strategies played by the player i in each
-i represents his beliefs about the mixed strategies of the k match, the frequency of each pure strategy:
currently adopted by all the other players, i.e. - this frequency distribution will be the estimate for i.
i
=(j)jN,ji, j (Aj). In order to evaluate j for The so computed set =(i)iN, i(Ai) will be then
each other player ji the Player Agent i considers properly proposed to the user together with the data
the pure strategies played by the player j in all exploited for its estimation.
the previous matches and computes the frequency SALENE has been implemented using JADE
of each pure strategy, this frequency distribution (Bellifemine, Poggi, & Rimassa, 2001), a software
will be the estimate for j. If there is at least an framework allowing for the development of multi-
element in the actually computed set -i=(j)jN,ji agent systems and applications conforming to FIPA
that differs from the set -i as computed in the standards (FIPA, 2006), and tested on different games
previous match, the Player Agent i solves the that differ from each other both in the number and in
inequality ri(i,-i) ri(i,-i) that is equivalent to the kind of Nash Equilibria. The experiments have
solve the optimization problem P={max(ri(i,- demonstrated that:
i
)), i(Ai)}. It is worth noting that P is a linear
optimization problem, actually, given the set -i, if the game has p>=1 Pure Nash Equilibria and
ri(i,-i) is a linear objective function in i (see s>=0 Mixed Nash Equilibria the agents converge
the game definition reported in the Background in playing one of the p Pure Nash Equilibria;
Section), and with |Ai|=m i(Ai) is a vector in these cases, as the behaviour of each Player
M such that sM s=1 and for every sM s0, Agent converges with probability one to a Nash
so the constraint i(Ai) is a set of m+1 linear Equilibrium of the game, the learning process
inequalities. P is solved by the Player Agent by converges in behaviours to equilibrium (Foster
using an efficient method for solving problems & Young, 2003);
in linear programming, in particular the predic- if the game has only Mixed Nash Equilibria,
tor-corrector method of Mehrotra (1992), whose while the behaviour of the Player Agents does
complexity is polynomial for both average and not converge to an equilibrium, the time-average
worst case. The obtained solution for i is a pure behaviour, i.e. the empirical frequency with which
strategy because it is one of the vertices of the each player chooses his strategy, may converge
polytope which defines the feasible region for to one of the mixed Nash Equilibria of the game;
P. The obtained strategy i will be played by the that is the learning process may converge in
Player Agent i in the next match; ri(i,-i) repre- time average to equilibrium (Foster and Young,
sents the expected payoff to player i in the next 2003).
match;
5. back to step 2. In the next Section the main aspects related to the
convergence properties of the approach/algorithm
1020
Learning Nash Equilibria in Non-Cooperative Games
exploited by the SALENE agents for leaning Nash adopted in SALENE converges; more specifically, the
Equilibria are discussed in a more general discussion classes of games for which the algorithm: (a) conver- L
about current and future research efforts. gences in behaviours to equilibrium (which implies
the convergence in time average to equilibrium), (b)
only convergences in time average to equilibrium; (c)
FUTURE TRENDS does not converge neither in behaviours neither in time
average. Currently, it has been demonstrated that the
Innovative approaches, as SALENE, based on the algorithm converges in behaviours or in time average
concepts of learning and evolution have shown great to equilibrium for the following classes of games:
potential for modelling and efficiently solving non-co-
operative games. However, as the solutions of the games zero-sum games (Robinson, 1951);
(e.g. Nash Equilibria) are not statically computed but games which are solvable by iterated elimina-
are the result of the evolution of a system composed tion of strictly dominated strategies (Nachbar,
by interacting agents, there are several open problems 1990);
mainly related to the accuracy of the provided solution potential games (Monderer & Shapley, 1996);
that need to be tackled to allow these approaches to be 2xN games, i.e. games with 2 players, 2 strate-
widely exploited in concrete business application. gies for one player and N strategies for the other
The approach exploited in SALENE, which derives player (Berger, 2005).
from the Fictitious Play (Robinson, 1951) approach,
efficiently solves the problem of learning a Nash Future efforts will be geared towards: (i) completing
Equilibrium in non-cooperative games which have at the characterization of the classes of games for which
least one Pure Nash Equilibrium: in such a case the the learning algorithm adopted in SALENE converges
behaviour of the players exactly converges to one of and evaluating the complexity of solving the member-
the Pure Nash Equilibria of the game (convergence in ship problem for such a classes; (ii) evaluating different
behaviours to equilibrium). On the contrary, if the game learning algorithms and their convergence properties;
has only Mixed Nash Equilibria, the convergence of the (ii) letting the user ask for the computation of Nash
learning algorithm is not ensured. Computing ex ante Equilibria with simple additional properties.
when this case happens is quite costly as it requires to More in general, a wide adoption of the emerg-
solve the following problem: Determining whether a ing agent-based approaches for solving games which
strategic game has only Mixed Nash Equilibria, which model concrete business applications will depend on
is equivalent to: Determining whether a strategic game the accuracy and the convergence properties of the
does not have any Pure Nash Equilibria. This problem provided solutions; both aspects still need to be fully
is Co-NP complete as its complement Determining investigated.
whether a strategic game has a Pure Nash Equilibrium
is NP complete (Gottlob, Greco, & Scarcello, 2003).
As witnessed by the conducted experiments, when a CONCLUSION
game has only Mixed Nash Equilibria there are still
some cases in which, while the behaviour of the players The complexity of NASH, the problem consisting in
does not converge to an equilibrium, the time-average computing Nash Equilibria in non-cooperative games,
behaviour, i.e. the empirical frequency with which is still debated, but even in the two players case, the
each player chooses his strategy, converges to one of best known algorithm has an exponential worst-case
the Mixed Nash Equilibria of the game (convergence running time. SALENE, the proposed MAS for learn-
in time average to equilibrium). ing Nash Equilibria in non-cooperative games, can
Nevertheless, there are some cases in which there is be conceived as a heuristic and efficient method for
neither convergence in behaviour neither convergence computing at least one Nash Equilibria in a non-coop-
in time average to equilibrium; an example of such a erative game represented in its normal form; actually,
case is the fashion game of Shapley (1964). An important the learning algorithm adopted by the Player Agents
open problem is then represented by the characterization has a polynomial running time for both average and
of the classes of games for which the learning algorithm worst case. SALENE can be then fruitfully exploited for
1021
Learning Nash Equilibria in Non-Cooperative Games
efficiently solving non-cooperative games which model Foster, D.P., & Young, P.H. (2003). Learning, Hy-
interesting concrete problems ranging from classical pothesis Testing, and Nash Equilibrium. Games and
economic and finance problems to the emerging ones Economic Behavior, 45(1), 73-96.
related to the economic aspects of the Internet such as
Fudenberg, D., & Levine, D. (1998). The Theory of
TCP/IP congestion, selfish routing, and algorithmic
Learning in Games. Cambridge, MA: MIT Press.
mechanism design.
Gilboa, I., & Zemel, E. (1989). Nash and correlated
equilibria: some complexity considerations. Games
REFERENCES and Economic Behavior, 1(1), 80-93.
Gottlob, G., Greco, G., & Scarcello, F. (2003). Pure
Bellifemine, F., Poggi, A., & Rimassa, G. (2001). De-
Nash Equilibria: hard and easy games. Proceedings of
veloping multi-agent systems with a FIPA-compliant
the 9th Conference on Theorethical Aspects of Ratio-
agent framework. Software Practice and Experience,
nality and Knowledge, 215-230.
31(2),103-128.
Maynard Smith, J. (1982). Evolution and the Theory
Benaim, M., & Hirsch, M.W. (1999). Learning proc-
of Games. Cambridge University Press.
ess, mixed equilibria and dynamic system arising for
repeated games. Games and Economic Behavior, 29, Mehrotra, S. (1992). On the Implementation of a Pri-
36-72. mal-dual Interior Point Method. SIAM Optimization
Journal, 2, 575-601.
Berger, U. (2005). Fictitious Play in 2xN Games.
Journal of Economic Theory, 120, 139-154. Monderer, D., & Shapley, L.S. (1996). Fictitious Play
Property for Games with Identical Interests. Journal
Bonifaci, V., Di Iorio, U., & Laura, L. (2005). On the
of Economic Theory, 68, 258-265.
complexity of uniformly mixed Nash equilibria and
related regular subgraph problems. Proceedings of Nachbar, J. (1990). Evolutionary Selection Dynamics
the 15th International Symposium on Fundamentals in Games: Convergence and Limit Properties. Interna-
of Computation Theory, 197-208. tional Journal of Game Theory, 19, 59-89.
Carmel, D., & Markovitch, S. (1996). Learning Models Nash, J.F. (1951). Non-cooperative games. Annals of
of Intelligent Agents. Proceedings of the 13th National Mathematics, 54, 289-295.
Conference on Artificial Intelligence and the Eighth
Innovative Applications of Artificial Intelligence Papadimitriou, C.H. (1991). On inefficient proofs of
Conference, Vol. 2, 62-67, Menlo Park, California: existence and complexity classes. Proceedings of the
AAAI Press. 4th Czechoslovakian Symposium on Combinatorics.
Chen, X., & Deng, X. (2005). Settling the Complexity Papadimitriou, C.H. (1994). On the complexity of
of 2-Player Nash-Equilibrium. Electronic Colloquium the parity argument and other inefficient proofs of
on Computational Complexity, Report No. 140. existence. Journal of Computer and Systems Sciences,
48(3), 498-532.
Conitzer, V., & Sandholm, T. (2003). Complexity
results about Nash equilibria. Proceedings of the 18th Papadimitriou, C.H. (2001). Algorithms, Games and
International Joint Conference on Artificial Intelli- the Internet. Proceedings of the 33rd Annual ACM
gence, 765-771. Symposium on the Theory of Computing, 749-753.
Daskalakis, C., Goldbergy, P.W., & Papadimitriou, Robinson, J. (1951). An Iterative Method of Solving a
C.H. (2005). The Complexity of Computing a Nash Game. Annals of Mathematics, 54, 296-301.
Equilibrium. Electronic Colloquium on Computational Savani, R., & von Stengel, B. (2004). Exponentially
Complexity, Report No. 115. many steps for finding a Nash equilibrium in a bimatrix
FIPA (2006). Foundation for Intelligent Physical game. Proceedings of the 45th Symposium on Founda-
Agents, http://www.fipa.org. tions of Computer Science, 258-267.
1022
Learning Nash Equilibria in Non-Cooperative Games
Shapley, L.S. (1964). Some topics in two-person self-interested interacting players act for maximizing
games. Advances in Game Theory, 1-28, Dresher, their returns. L
M., Shapley, L.S., & Tucker, A. W. editors, Princeton
Heuristic: In computer science, a technique de-
University Press.
signed to solve a problem which allows for gaining
Tsebelis, G. (1990). Nested Games: rational choice in computational performance or conceptual simplicity
comparative politics. University of California Press. potentially at the cost of accuracy and/or precision of
the provided solutions to the problem itself.
Von Neumann, J, & Morgenstern, O. (1944). Theory
of Games and economic Behaviour. Princeton Uni- Nash Equilibrium: A solution concept of a game
versity Press. where no player can benefit by changing his strategy
unilaterally, i.e. while the other players keep theirs
unchanged; this set of strategies and the corresponding
payoffs constitute a Nash Equilibrium of the game.
KEy TERmS
NP-Hard Problems: Problems that are intrinsically
Computational Complexity Theory: A branch of harder than those that can be solved by a nondetermin-
the theory of computation in computer science which istic Turing machine in polynomial time.
studies how the running time and the memory require-
Non-Cooperative Games: A game in which any co-
ments of an algorithm increase as the size of the input
operation among the players must be self-enforcing.
to the algorithm increases.
Payoffs: Numeric representations of the utility
Game Theory: A branch of applied mathematics
obtainable by a player in the different outcomes of a
and economics that studies situations (games) where
game.
1023
1024
Learning-Based Planning
Sergio Jimnez Celorrio
Universidad Carlos III de Madrid, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Learning-Based Planning
At the present time, there is not such dependence task, which ignores the delete list of actions. The solu-
on ML for solving AP problems, but there is a renewed tion to the simplified task is taken as the estimated cost L
interest in applying ML to AP motivated by three factors: for reaching the task goals. These kinds of heuristics
(1) IPC-2000 showed that knowledge-based planners provide good guidance across the wide range of different
significantly outperform domain-independent planners. domains. However, they have some faults: (1) in many
The development of ML techniques that automatically domains, these heuristic functions vastly underestimate
define the kind of knowledge that humans put in these the distance to the goal leading to poor guidance, (2)
planners would bring great advances to the field. (2) the computation of the heuristic values of the search
Domain-independent planners are still not able to cope nodes is too expensive, and (3) these heuristics are
with real-world complex problems. On the contrary, non-admissible so heuristics planners do not find good
these problems are often solved by defining ad hoc plan- solutions in terms of plan quality.
ning strategies by hand. ML promises to be a solution to Since evaluating a search node in heuristic planning
automatically defining these strategies. And, (3) there is is so time consuming, (De la Rosa, Garca-Olaya &
a need for tools that assist in the definition, validation Borrajo, 2007) proposed using Case-based Reasoning
and maintenance of planning-domain models. At the (CBR) to reduce the number of explored nodes. Their
moment, these processes are still done by hand. approach stores sequences of abstracted state transi-
tions related to each particular object in a problem
instance. Then, with a new problem, these sequences
LEARNING-BASED PLANNING are retrieved and re-instantiated to support a forward
heuristic search, deciding the node ordering for com-
This section describes the current ML techniques puting its heuristic value.
for improving the performance of planning systems. In the last years, other approaches have been devel-
These techniques are grouped according to the target oped to minimize the negative effects of the heuristic
of learning: search control, domains-specific planners, through ML: (Botea, Enzenberger, Mller & Schaef-
or domain models. fer, 2005) learned off-line macro-actions to reduce the
number of evaluated nodes by decreasing the depth of
Learning Search Control the search tree. (Coles & Smith, 2007) learned on-line
macro-actions to escape from plateaus in the search tree
Domain-independent planners require high search ef- without any exploration. (Yoon, Fern & Givan, 2006)
fort, so search-control knowledge is frequently used proposed using an inductive approach to correct the
to reduce this effort. Hand-coded control knowledge domain-independent heuristic in those domains based
has proved to be useful in many domains, however on learning a supplement to the heuristic from observa-
is difficult for humans to formalize it, as it requires tions of solved problems in these domains.
specific knowledge of the planning domains and the All these methods for learning search-control knowl-
planner structure. Since APs early days, diverse edge suffer from the utility problem. Learning too much
ML techniques have been developed with the aim of control knowledge can actually be counterproductive
automatically learning search-control knowledge. A because the difficulty of storing and managing the
few examples of these techniques are macro-actions information and the difficulty of determining which
(Fikes, Hart & Nilsson, 1972), control-rules (Borrajo information to use when solving a particular problem
& Veloso, 1997), and case-based and analogical plan- can interfere with efficiency.
ning (Veloso, 1994).
At the present, most of the state-of-the-art planners Learning Domain-Specific Planners
are based on heuristic search over the state space (12
of the 20 participants in IPC-2006 used this approach). An alternative approach to learning search control con-
These planners achieve impressive performance in sists of learning domain-specific planning programs.
many domains and problems, but their performance These programs receive as input a planning problem
strongly depends on the definition of a good domain- of a fixed domain and return a plan that solves the
independent heuristic function. These heuristics are problem.
computed solving a simplified version of the planning
1025
Learning-Based Planning
The first approaches to learn domain-specific plan- complex. Actions may result in innumerable different
ners were based on supervised inductive learning; they outcomes, so more elaborated approaches are required.
used genetic programming (Spector, 1994) and deci- (Pasula, Zettlemoyer & Kaelbling, 2004) presented
sion-list learning (Khardon, 1999), but they were not the first specific algorithm to learn simple stochastic
able to reliably produce good results. Recently, (Winner actions without conditional effects. This algorithm is
& Veloso, 2003) presented a different approach based based on three levels of learning: the first one consists
on generalizing an example plan into a domain-specific of deterministic rule-learning techniques to induce
planning program and merging the resulting source the action preconditions. The second one relies on a
code with the previous ones. search for the set of action outcomes that best fits the
Domain-specific planners are also represented as execution examples, and; the third one consists of
policies, i.e., pairs of state and the preferred action estimating the probability distributions over the set of
to be executed in the state. Relational Reinforcement action outcomes. But, stochastic planning algorithms do
Learning (RRL) (Dzeroski, Raedt & Blockeel, 1998) not need to consider all the possible actions outcomes.
has aroused interest as an efficient approach for learning (Jimenez & Cussens 2006) proposed to learn complex
policies for relational domains. RRL includes a set of action-effect models (including conditions) for only
learning techniques for computing the optimal policy the relevant action outcomes. Thus, planners generate
for reaching the given goals by exploring the state robust plans by covering only the most likely execution
space though trial and error. The major benefit of these outcome while leaving others to be completed when
techniques is that they can be used to solve problems more information is available.
whether the action model is known or not. In the other In deterministic environments, (Shahaf & Amir,
hand, since RRL does not explicitly include the task 2006) introduced an algorithm that exactly learns
goals in the policies, new policies have to be learned STRIPS action schemas even if the domain is only
every time a new goal has to be achieved, even if the partially observable. But, in stochastic environments,
dynamics of the environment has not changed. there is still no general efficient approach to learn ac-
In general, domain-specific planners have to deal tion model.
with the problem of generalization. These techniques
build planning programs from a given set of solved
problems so cannot theoretically guarantee solving FUTURE TRENDS
subsequent problems.
Since the appearance of the first PDDL version in IPC-
Learning Domain Models 1998, the standard planning representation language has
evolved to bring together AP algorithms and real-world
No matter how efficient a planner is, if it is fed with a planning problems. Nowadays, the PDDL 3.0 version
defective domain model, it will return defective plans. for the IPC-2006 includes numeric state variables to
Designing, encoding and maintaining a domain model support quality metrics, durative actions that allow ex-
is very laborious. At the time being, planners are the plicit time representation, derived predicates to enrich
only tool available to assist in the development of an AP the descriptions of the system states, and soft goals
domain model, but planners are not designed specifi- and trajectory constraints to express user preferences
cally for this purpose. Domain model learning studies about the different possible plans without discarding
ML mechanisms to automatically acquire the planning valid plans. But, most of these new features are not
action schemas (the action preconditions and post-con- handled by the state-of-the-art planning algorithms:
ditions) from observations of action executions. The existing planners usually fail solving problems that
Learning domain models in deterministic environ- define quality metrics. The issue of goal and trajectory
ments is a well-studied problem; diverse inductive preferences has only been initially addressed. Time
learning techniques have been successfully applied and resources add such extra complexity to the search
to automatically define the actions schema from ob- process that a real-world problem becomes extremely
servations (Shen & Simon, 1989), (Benson, 1997), difficult to solve. New challenges for the AP community
(Yang, Wu & Jiang, 2005), (Shahaf & Amir, 2006). In are those related to developing new planning algorithms
stochastic environments, this problem becomes more and heuristics to deal with these kinds of problems. As
1026
Learning-Based Planning
it is very difficult to find an efficient general solution, techniques. Automatic learned knowledge is useful
ML must play an important role in addressing these for AP in diverse ways: it helps planners in guiding L
new challenges because it can be used to alleviate the search processes, in completing domain theories or in
complexity of the search process by exploiting regular- specifying particular solutions to a particular problem.
ity in the space of common problems. However, the learning-based planning community can
Besides, the state-of-the-art planning algorithms not only focus on developing new learning techniques
need a detailed domain description to efficiently solve but also on defining formal mechanisms to validate its
the AP task, but new applications like controlling un- performance against other generic planners and against
derwater autonomous vehicles, Mars rovers, etc. imply other learning-based planners.
planning in environments where the dynamics model
may be not easily accessible. There is a current need for
planning systems to be able to acquire information of REFERENCES
their execution environment. Future planning systems
have to include frameworks that allow the integration Benson, S. (1997). Learning Action Models for Re-
of the planning and execution processes together with active Autonomous Agents. PhD thesis, Stanford
domain modeling techniques. University.
Traditionally, learning-based planners are evaluated
only against the same planner but without learning, in Blum, A., & Furst, M. (1995). Fast planning through
order to prove their performance improvement. Addi- planning graph analysis. In C. S. Mellish, editor, Pro-
tionally, these systems are not exhaustively evaluated; ceedings of the 14th International Joint Conference
typically the evaluation only focuses on a very small on Artificial Intelligence, IJCAI-95, volume 2, pages
number of domains, so these planners are usually quite 16361642, Montreal, Canada, August 1995. Morgan
fragile when encountering new domains. Therefore, Kaufmann.
the community needs a formal methodology to validate Bonet, B. & Geffner, H. (2001). Planning as Heuristic
the performance of the new learning-based planning Search. Artificial Intelligence, 129 (1-2), 5-33.
systems, including mechanisms to compare different
learning-based planners. Borrajo, D., & Veloso, M. (1997). Lazy Incremental
Although ML techniques improve planning systems, Learning of Control Knowledge for Efficiently Obtain-
existing research cannot theoretically demonstrate ing Quality Plans. AI Review Journal. Special Issue
that they will be useful in new benchmark domains. on Lazy Learning. 11 (1-5), 371-405.
Moreover, for time being, it is not possible to formally Botea, A., Enzenberger, M., Mller, M. & Schaeffer,
explain the underlying meaning of the learned knowl- J. (2005). Macro-FF: Improving AI Planning with
edge (i.e., does the acquired knowledge subsumes task Automatically Learned Macro-Operators. Journal of
decomposition? a goal ordering? a solution path?). Artificial Intelligence Research (JAIR), 24, 581-621.
This point reveals that future research in AP and ML
will also focus on theoretical aspects that address these Bylander, T., The computational complexity of proposi-
issues. tional STRIPS planning. (1994). Artificial Intelligence,
69(1-2), 165204.
Coles, A., & Smith, A. (2007). Marvin: A heuristic
CONCLUSION search planner with online macro-action learning. Jour-
nal of Artificial Intelligence Research, 28, 119156.
Generic domain-independent planners are still not able
to address the complexity of real planning problems. De la Rosa, T., Garca Olaya, A., & Borrajo, D. (2007)
Thus, most planning systems implemented in applica- Using Utility Cases for Heuristic Planning Improve-
tions require additional knowledge to solve the real ment. Procceedings of the 7th International Conference
planning tasks. However, the extraction and compilation on Case-Based Reasoning, Belfast, Northern Ireland,
of this specific knowledge by hand is complicated. Springer-Verlag.
This article has described the main last advances Dzeroski, S., Raedt, L. D., & Blockeel, H., (1998)
in developing planners successfully assisted by ML Relational reinforcement learning. In International
1027
Learning-Based Planning
Workshop on Inductive Logic Programming, pages Yang, Q, Wu, K & Jiang, Y. (2005) Learning action
1122. models from plan examples with incomplete knowledge.
In Proceedings of the 2005 International Conference
Fikes, R., Hart, P., & Nilsson, N., (1972) Learning and
on Automated Planning and Scheduling, (ICAPS 2005)
Executing Generalized Robot Plans, Artificial Intel-
Monterey, CA USA, pages 241250.
ligence, 3, pages 251-288.
Yoon, S., Fern, A., & Givan, R., (2006). Learning
Fox, M. & Long, D, (2003) PDDL2.1: An extension to
heuristic functions from relaxed plans. In International
PDDL for expressing temporal planning domains. Jour-
Conference on Automated Planning and Scheduling
nal of Artificial Intelligence Research, 20, 61124.
(ICAPS-2006).
Hoffmann J. & Nebel B. (2001) The FF planning system:
Fast plan generation through heuristic search. Journal
of Artificial Intelligence Research,14, 253302.
KEy TERmS
Jimnez, S. & Cussens, J. (2006). Combining ILP and
Parameter Estimation to Plan Robustly in Probabilistic Control Rule: IF-THEN rule to guide the planning
Domains. In Conference on Inductive Logic Program- search-tree exploration.
ming. Santiago de Compostela, ILP2006. Spain.
Derived Predicate: Predicate used to enrich the
Khardon, R. (1999) Learning action strategies for plan- description of the states that is not affected by any of
ning domains. Artificial Intelligence, 113, 125148, the domain actions. Instead, the predicate truth values
are derived by a set of rules of the form if formula(x)
Pasula, H. Zettlemoyer, L. & Kaelbling, L. (2004)
then predicate(x).
Learning probabilistic relational planning rules. Pro-
ceedings of the Fourteenth International Conference Domain Independent Planner: Planning system
on Automated Planning and Scheduling, ICAPS04. that addresses problems without specific knowledge of
the domain, as opposed to domain-dependent planners,
Samuel, A. L., (1959). Some studies in machine learning
which use domain-specific knowledge.
using the game of checkers. IBM Journal of Research
and Development, 3(3), 211229. Macro-Action: Planning action resulting from
combining the actions that are frequently used together
Shahaf, D & Amir, E. (2006). Learning partially observ-
in a given domain. Used as control knowledge to speed
able action schemas. In Proceedings of the 21st National
up plan generation.
Conference on Artificial Intelligence (AAAI06).
Online Learning: Knowledge acquisition during
Shen, W. & Simon. (1989). Rule creation and rule learn-
a problem-solving process with the aim of improving
ing through environmental exploration. In Proceedings
the rest of the process.
of the IJCAI-89, pages 675680.
Plateau: Portion of a planning search tree where
Spector, L. (1994) Genetic programming andAI planning
the heuristic value of nodes is constant or does not
systems. In Proceedings of Twelfth National Conference
improve.
on Artificial Intelligence, Seattle,Washington,USA,
AAAI Press/MIT Press. Policy: Mapping between the world states and the
preferred action to be executed in order to achieve a
Veloso, M. (1994). Planning and learning by analogi-
given set of goals.
cal reasoning. Springer Verlag.
Search Control Knowledge: Additional knowledge
Winner, E. & Veloso, M. (2003) Distill: Towards learn-
introduced to the planner with the aim of simplifying
ing domain-specific planners by example. In Proceed-
the search process, mainly by pruning unexplored
ings of Twentieth International Conference on Machine
portions of the search space or by ordering the nodes
Learning (ICML 03), Washington, DC, USA.
for exploration.
1028
1029
Jean-Francois Giret
CEREQ, France
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Longitudinal Analysis of Labour Market Data with SOM
tory) according to our economic approach. When the was representative of the 750 000 young people leav-
preprocessing of recoding is performed, Self Organizing ing the education system for the first time that year in
Maps is a useful clustering tool, first considering its France. This survey provided useful information about
pre-mentioned clustering and projection qualities and the young peoples characteristics (their familys socio-
also because of its ability to make the efficiency of our economic status, age, highest grade completed, highest
new encode emerge. grade attended, discipline, any jobs taken during their
studies, work placement) and the month-by-month
work history from 1998 to 2005. We therefore have
CLUSTERING LIFE HISTORy WITH a complete and detailed record of the labor market
SOm status of the respondents during the 88-month period
from July 1998 to November 2005. Employment spells
An Example of a Life History were coded as follows, depending on the nature of
the labor contract: 1 = permanent labor contract, 2 =
Career Paths fixed term contract, 3 = apprenticeship contract, 4 =
public temporary labor contract, 5 = interim/temping).
Labor economists have generally assumed that the Other unemployed situations were coded as follows: 6
beginning of a career results from a matching process = unemployment, 7 = inactivity, 8 = military service,
(Jovanovich, 1979). Employers and job seekers lack 9 = at school.
information about each other: employers need to know
how productive their potential employee is and job Preprocessing Phase: Life History
applicants want to know whether the characteristics of Encoding
the job correspond to their expectations. Job turnover
and temporary employment contracts can therefore The encoding of the trajectories involved a two-step
be viewed as the consequences of this trial-by-error preprocessing phase : defining a distance between
process. However, individuals first employment situ- states including time dynamics and the resulting
ations may also act as a signal of employability to the quantitative encoding of trajectories. These two steps
labor market. For example, a long spell of unemploy- refer to the specificity of the data set structures of life
ment during the first years in a persons career may be history samples: the variables items (the states) are
interpreted by potential employers as sign of low work some qualitative information while the variables order
efficiency; whereas working at a temp agency may records some quantitative information (the timing and
be regarded as a sign of motivation and adaptability. the duration of events).
This is consistent with the following path dependency
hypothesis: the influence of past job experience on the The Distance Between Situations
subsequent career depends on the cost associated
with the change of occupational situation. However, Working with pairs (state, time), called situations, al-
empirical studies have shown that employers mainly lows to include the time dynamics in the proximities
recruit on the basis of recent work experience (Allaire, between occupational states. The proximity between
Cahuzac & Tahar, 2000). The effects of less recent two situations is measured on the basis of their common
employment situations on a persons career therefore future, in line with our Economic approach. A situation
decrease over time. is assumed as a potential for its own future, depending
on its influence on this future. The similarity between
Data two situations is deduced from comparisons between
their referring potential. The potential future PS of a
The data used in this study were based on the Genera- situation S among n monthly periods and p states is
tion 98 survey carried out by Creq: 22 000 young defined as the pn dimensional vector given in (1). Its
people who had left initial training in 1998 at all levels components PSS are the product of terms and .
and in all training specializations were interviewed in measures the flow between situation S and any situation
spring 2001 and 2003 and autumn 2005. This sample S as the empirical probability of reaching S starting
1030
A Longitudinal Analysis of Labour Market Data with SOM
from S. It is also the empirical probability of an indi- with the formula (2). Applying to the situations, the
vidual i being in any future situation S conditionally principal components of inertia (the principal events) L
of being at the present in S. The coefficient of temporal are computed as the principal component vectors of
inertia weights the influence of S on PS according the matrix .
to the Economic approach. It is a decreasing function Trajectories can then be described in the principal
of the time delay (t-t). In the career paths application, events space: performing the traditional binary encod-
the function chosen is the inverse of the delay and 0 for ing (3) of the trajectory Ti is equivalent to performing
the past. Lastly, ensures that potential futures PS will a linear encoding through the situations (4) and then
be profiles. The natural distance between situations is also through the principal events E (5).
therefore the distance between their potential future
profiles. jj' = GS j .GS j ' =
1
[d jj '
d j. d . j ' + d .. ]
2 (2)
1
PS =( s ,t ) = (0,...,0, B (t ' ) SS '==((s1,,tt)') , B (t ' ) SS '=( 2,t ') ,..., B (t ' ) SS '=(9,t ') ,..., B (T ) SS '=(1,T ) ,..., B (T ) SS ' ,..., B (T ) SS '=(9,T ) )
A (t ) T
i = (0,...,0,1,0,...,0,0,...,0,1,0,...,0,..., 0,...,0,1,0,...,0)
t ' t
T
time =1 9 positions time =9
(3)
S ' = ( 2 , t ')
S ,..., B (t ' ) SS '=(9,t ') ,..., B (T ) SS '=(1,T ) ,..., B (T ) SS ' ,..., B (T ) SS '=(9,T ) )
' t T
(1) Ti = A ( S ) (0,...,0,1,0,...,0)
S =( s ,t )
position: s*t
(4)
where
Ti = A S ( B eS Ee ) = G e Ee
C S C S' S e e (5)
SS '==((ss,'t,t)') = i
CS
i ,
1
1031
A Longitudinal Analysis of Labour Market Data with SOM
Figure 1. Typology of career paths with SOM: each unit on the map gives a chronogram of the evolution in time
of the proportional contribution (as a percentage) of any occupational position. Two populations of closed units
have similar chronogram.
which can be characterized by an immediate access strongly instead of the classical fixed term contract. The
to a permanent contract or an indirect access during lower-left-hand corner of the map shows more unstable
the first few years. In the upper-left-hand corner of trajectories which end in a temporary position: seven
the map, mainly the five clusters of the two first lines, years after leaving school, people have a temporary
career paths are characterized by a high level of ac- contract (in the last two clusters in the first column)
cess to employment with permanent contracts. Young or are unemployed (second and third cells on the last
people rapidly gained access to a permanent contract line). The chronograms situated in the middle-left-
and kept it until the end of the observation period. In hand part of the map highlight how the longitudinal
upper-right-hand corner of the map, access to a per- approach is interesting to understand the complexity
manent contract was less direct: during the first year, of transition processes: the young people here were
young people first obtained a temporary contract or directly recruited for a permanent job, but five or six
spent ten months in compulsory military service before years after graduating, they lost this permanent job.
obtaining a permanent contract. the upper part of the This turn of events can be explained by the strong
map, the bottom part describes career paths with longer change in the economic environment which occurred
period of temporary contracts and/or unemployment. during this period; the years 2003 and 2004 correspond
In the lower-right-hand corner, access to a permanent to a dramatic growth of youth unemployment on the
contract is becoming rare during the first few years. French labor market.
However, more than ninety per cent of young people What role does each individual characteristic play in
have obtained a final permanent position (in the last the development of these career paths? Several factors
column of the map). In the bottom lines, a five-year may explain the labor market opportunities of school-
public policy contract called Emploi Jeunes features leavers: human capital factors, parents social class,
1032
A Longitudinal Analysis of Labour Market Data with SOM
Figure 2. Career path typology by educational levels and other factors responsible for inequalities on the
labor market, such as parents nationality and gender. L
The distribution of these characteristics was included
graphically in each cell on the map. Figure 2, which
gives the distribution in terms of educational level,
clearly shows that educational level strongly affected
the career path. Higher educational graduates feature
much more frequently in the upperleft-hand corner of
the map, whereas school leavers without any qualifica-
tions occur more frequently in the bottom part. Figure
3 shows a similar pattern as far as gender is concerned:
there are much higher percentages of females than males
in the most problematic career paths, which suggests
the occurrence of gender segregation or discriminatory
practices on the French labor market. The differences
are less conspicuous as far as the fathers nationality
is concerned (Figure 4). However, the results obtained
here also suggests that children with French parents
have better chances than the others of finding safe
Educational levels (dark = without qualifications; medium
= secondary educational graduates; light = ligher educa- career paths.
tional graduates).
1033
A Longitudinal Analysis of Labour Market Data with SOM
1034
A Longitudinal Analysis of Labour Market Data with SOM
maps. Neural Networks, 15, 967-978. 5th Workshop on Self-Organizing Maps (WSOM05),
Dablemont, S., Simon, G., Lendasse, A., Ruttiens, A.,
Paris, France, 339-346. L
Blayo, F. & Verleysen, M. (2003). Time series forecast-
ing with SOM and local non-linear models - Application
to the DAX30 index prediction. In Proceedings of the KEy TERmS
4th Workshop on Self-Organizing Maps, (WSOM)03,
340-345. Careers Paths: Sequential monthly position among
several pre-defined working categories.
Fort J.C. (2006). SOMs mathematics , Neural Net-
works, 19, 812-816. Distibutional Equivalency: Property of a distance
that allows to group two modalities of the same variable
Fougre, D. & Kamionka, T. (2005). Econometrics of
having identical profiles into a new modality weighted
Individual Labor Market Transitions, IZA Discussion
with the sum of the two weights.
Papers 1850, Institute for the Study of Labor (IZA).
Markov Process: Stochastic process in wich the
Jovanovic, C. (1979). Job matching and the theory of
new state of a system depends on the previous state or
turnover, Journal of Political Economy, 87, 5, 972-
a finite set of previous states.
990.
Optimal Matching: Statistical method issued
Kohonen T. (2001). Self-Organizing Maps. 3.ed,
from biology abble to compare two sequences from a
Springer Series in Information Sciences, 30, Springer
predifined cost of substitution.
Verlag, Berlin.
Preservation of Topology: After learning, observa-
Lebart L., Morineau M. et Piron M. (2006). Statisti-
tions associated to the same class or to close classes
que exploratoire multidimensionnelle. 4.ed, Dunod,
according to the definition of the neighborhood and
Paris.
given by the network structure are close according
Miret, E., Garca-Lagos, F., Joya, G., Arazoza, H. & to the distance in the input space.
Sandoval, F. (2005). A combined multidimensional
Self-Organizing Maps by Kohonen: A neural
scaling + self-organizing maps method for exploratory
network unsupervised method of vector quantization
analysis of qualitative data. Proceedings of the 5th
widely used in classification. Self-Organizing Maps
Workshop on Self-Organizing Maps (WSOM05),
are a much appreciated for their topology preserva-
Paris, France, 711-718.
tion property and their associated data representation
Rousset P., Guinot C. & Maillet B. (2006). Understan- system. These two additive properties come from a
ding and Reducing Variability of SOM neighbourhood pre-defined organization of the network that is at the
structure , Neural Networks, 19, 838-846. same time a support for the topology learning and its
representation.
Souza, L. G., Barreto, G.A. & Mota, J. C. (2005). Using
the self-organizing map to design efficient RBF models Distance: Distance having certain specific prop-
for nonlinear channel equalization, Proceedings of the erties such as the distibutional equivalency.
1035
1036
John Wang
Montclair State University, USA
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Managing Uncertainties in Interactive Systems
diagnosing systems. This assumption also requires that Dempsters rule of combination normalizes the
every possible hypothesis is known a priori. It would intersection of the bodies of evidence from the M
be violated if the problem domain were not suitable to two sources by the amount of non-conflictive
a closed-world assumption. evidence between the sources.
Perhaps the most restrictive limitation of the Bye-
sian approach is its inability to represent ignorance. This theory is attractive for several reasons. First,
The Bayesian view of probability does not allow one it builds on classical probability theory, thus inheriting
to distinguish uncertainty from ignorance. One cannot much of its theoretical foundations. Second, it seems
tell whether a degree of belief was directly calculated not to over-commit by not forcing precise statements
from evidence or indirectly inferred from an absence of probabilities: its probabilities do not seem to pro-
of evidence. In addition, this method requires a large vide more information than is really available. Third,
amount of data to determine the estimates for prior and it reflects the degree of ignorance of the probability
conditional probabilities. Such a requirement becomes estimate. Fourth, the Dempster-Shafer theory provides
manageable only when the problem can be represented rules for combining probabilities and thus for propa-
as a sparse Bayesian network that is formed by a gating measures through the system. This also is one
hierarchy of small clusters of nodes. In this case, the of the most controversial points since the propagation
dependencies among variables (nodes in the network) method is an extension of the multiplication rule for
are known, and only the explicitly required conditional independent events. Because many applications involve
probabilities must be obtained (Pearl, 1988). dependent events, the rule might be inapplicable by
classical statistical criteria. The tendency to assume
2. The Dempster-Shafer Theory of Evidence. The that events are independent unless proven otherwise
Dempster-Shafer theory, proposed by Shafer has stimulated a large proportion of the criticism of
(Shafer, 1976), was developed within the frame- probability approaches. Dempster-Shafer theory suffers
work of Dempsters work on upper and lower the same problem (Bhatnager and Kanal, 1986).
probabilities induced by a multi-valued mapping In addition, there are two problems with Demp-
(Dempster, 1967). Like Bayesian theory, this ster-Shafer approach. The first problem is computa-
theory relies on degrees of belief to represent tional complexity. In the general case, the evaluation
uncertainty. However, it allows one to assign a of the degree of belief and upper probability requires
degree of belief to subsets of hypotheses. Accord- exponential time in the cardinality of the hypothesis set.
ing to the Dempster-Shafer theory, the feature of This complexity is caused by the need for enumerating
multi-valued mapping is the fundamental reason all the subsets of a given set. The second problem in
for the inability of applying the well-known theo- this approach results from the normalization process
rem of probability that determines the probability presented in both Dempsters and Shafers work. Zadeh
density of the image of one-to-one mapping (Co- has argued that this normalization process can lead to
hen, 1983). In this context, the lower probability incorrect and counter-intuitive results (Zadeh, 1984).
is associated with the degree of belief and the By removing the conflicting parts of the evidence and
upper probability with a degree of plausibility. normalizing the remaining parts, important informa-
This formalism defines certainty as a function that tion may be discarded rather than utilized adequately.
maps subsets of a proposition space on the [0,1] Dubois and Prade (1985) have also shown that the
scale. The sets of partial beliefs are represented normalization process in the rule of evidence combina-
by mass distributions of a unit of belief across tion creates a sensitivity problem, where assigning a
the space of propositions. These distributions are zero value or a very small value to a basic probability
called the basic probability assignment. The total assignment causes very different results.
certainty over the space is 1. A non-zero BPA can Based on Dempster-Shafer theory, Garvey et al.
be given to the entire proposition space to rep- (1982) proposed an approach called Evidential Reason-
resent the degree of ignorance. The certainty of ing that adopts the evidential interpretation of the degree
any proposition is then represented by the interval of belief and upper probabilities. This approach defines
characterized by upper and lower probabilities. the likelihood of a proposition as a subinterval of the
1037
Managing Uncertainties in Interactive Systems
unit interval [0,1]. The lower bound of this interval is Endorsements can provide a good mechanism for
the degree of support of the proposition and the upper the explanations of reasoning, since they create and
bound is its degree of plausibility. When distinct bod- maintain the entire history of justifications (i.e., reasons
ies of evidence must be pooled, this approach uses the for believing or disbelieving a proposition) and the
same Dempster-Shafer techniques, requiring the same relevance of any proposition with respect to a given
normalization process that was criticized by Zadeh goal. Endorsements are divided into five classes: rules,
(Zadeh, 1984). data, task, conclusion, and resolution. Cohen points
out that the main difference between the numerical
3. Fuzzy Sets and Possibility Theory. The theory of approaches and the endorsement-based approach,
possibility was proposed independently by Zadeh, specifically with respect to chains of inferences, is that
as a development of fuzzy set theory, in order reasoning in the former approach is entirely automatic
to handle vagueness inherent in some linguistic and non-reflective, while the latter approach provides
terms (Zadeh, 1978). For a given set of hypoth- more information for reasoning about uncertainty.
eses, a possibility distribution may be defined in Consequently, reasoning in the latter approach can be
a way that is very similar to that of a probability controlled and determined by the quality and avail-
distribution. However, there is a qualitative dif- ability of evidence.
ference between the probability and possibility Endorsements provide the information necessary to
of an event. The difference is that a high degree many aspects of reasoning about uncertainty. Endorse-
of possibility does not imply a high degree of ments are used to schedule sure tasks before unsure
probability, nor does a low degree of probability ones, to screen tasks before activating them, to deter-
imply a low degree of possibility. However, an mine whether a proposition is certain enough for some
impossible event must also be improbable. More purpose, and to suggest new tasks when old ones fail
formally, Zadeh defined the concept of a possibil- to cope with uncertainty. Endorsements distinguish dif-
ity distribution. ferent kinds of uncertainty, and tailor reasoning to what
is known about uncertainty. However, Bonissone and
The concept of possibility theory has been built upon Tong (1995) argue that combinations of endorsements
fuzzy set theory and is well suited for representing the in a premise (i.e., proposition), propagation of endorse-
imprecision of vague linguistic predicates. The vague ments to a conclusion, and ranking of endorsements
predicate induces a fuzzy set and the corresponding must be explicitly specified for each particular context.
possibility distribution. From a semantic point of This creates potential combinatorial problems.
view, the values restricted by a possibility distribution
are more or less all the eligible values for a linguistic 5. Assumption based reasoning and non-monotonic
variable. This theory is completely feasible for every logic: In the reasoned-assumptions approach pro-
element of the universe of discourse. posed by Doyle (1979), the uncertainty embedded
in an implication rule is removed by listing all the
4. Theory of Endorsement. A different approach to exceptions to that rule. When this is not possible,
uncertainty representation was proposed by Cohen assumptions are used to show the typicality of a
(Cohen, 1983), which is based on a qualitative value (i.e., default values) and defeasibility of a
theory of endorsement. According to Cohen, the rule (i.e., liability to defeat of reason). In classi-
records of the factors relating to ones certainty cal logic, if a proposition C can be derived from
are called endorsements. Cohens model of en- a set of propositions S, and if S is a subset of T,
dorsement is based on the explicit recording of the then C can also be derived from T. As a systems
justifications for a statement, normally requiring premises increase, its possible conclusions at
a complex data structure of information about the least remain constant and more likely increase.
source. Therefore, this approach maintains the Deductive systems with this property are called
uncertainty. The justification is classified accord- monotonic. This kind of logic lacks tools for
ing to the type of evidence for a proposition, the describing how to revise a formal theory to deal
possible actions required to solve the uncertainty with inconsistencies caused by new information.
of that evidence, and other related features. McDermott and Dole proposed a non-monotonic
1038
Managing Uncertainties in Interactive Systems
1039
Managing Uncertainties in Interactive Systems
Computer Interaction: Issues and Challenges, (ed.) Pednault, E. P. D., Zucker, S. W. and Muresan, L.V.,
Chen, Q. Idea Group Pub. 2001. On the Independence Assumption Underlying Subjec-
tive Bayesian Updating, Artificial Intelligence, 16,
Cohen, P. R. and Grinberg, M. R., A Theory of Heu-
pp. 213-222. 1981
ristic Reasoning about Uncertainty, AI Magazine, Vol.
4(2), pp. 17-23, 1983. Raisinghani, M., Klassen, C. and Schkade, L. Intel-
ligent Software agents in Electonic Commerce: A
Dempster, A. P., Upper and Lower Probabilities
Socio-Technical Perspective, Human-Computer In-
Induced by a Multivalued mapping, The Annuals
teraction: Issues and Challenges, (ed.) Chen, Q. Idea
of Mathematical Statistics , Vol. 38(2), pp. 325-339,
Group Pub. 2001.
1967.
Reiter, R., A Logic for Default Reasoning, Artificial
Dubois, D. and Prade, H., Combination and Propaga-
Intelligence, Vol. 13, 1980 pp. 81-132.
tion of Uncertainty with Belief Functions -- A Reex-
amination, Proc. of 9th International Joint Conference Rich, E., User Modeling via Stereotypes, Cognitive
on Artificial Intelligence, pp. 111-113, 1985. Sciences, Vol. 3 1979, pp. 329-354.
Dutta, A., Reasoning with Imprecise Knowledge in Shafer, G., A Mathematical Theory of Evidence, Priceton
Expert Systems, Information Sciences, Vol. 37, pp. University Press, 1976.
2-24, 2005.
Zadeh, L. A., Review of Books : A Mathematical
Doyle, J., A Truth Maintenance System, AI, Vol. 12, Theory of Evidence, AI Magazine., 5(3), 81-83.
1979, pp. 231-272. 1984
Garvey, T. D., Lowrance, J. D. and Fischer, M. A. Zadeh, L. A. Knowledge Representation in Fuzzy
An Inference Technique for Integrating Knowledge Logic, IEEE Transactions on Knowledge and Data
from Disparate Source, Proc. of the 7th International Engineering, Vol. 1, No. 1, pp. 89-100, 1989.
Joint Conference on AI, Vancouver, B. C. pp. 319-325,
Zwick, R., Combining Stochastic Uncertainty and
1982
Linguistic Inexactness: Theory and Experimental
Heckerman, D., Probabilistic Interpretations for Evaluation of Four Fuzzy Probability Models, Int. J.
MYCINs Certainty actors, Uncertainty in Artificial Man-Machine Studies, Vol. 30, pp. 69-111, 1999.
Intelligence,(ed.). Kanal, L. N. and Lemmer, J. F. ,
1986
Jacobson, C. and Freiling, M. J. ASTEK: A Mulit- KEy TERmS
paradigm Knowledge Acquisition tool for complex
structured Knowledge. International. Journal of Bayesian Theory: Also known as Bayes rule or
Man-Machine Studies, Vol. 29, 311-327. 1988. Bayes law. It is a result in probability theory, which
relates the conditional and marginal probability distri-
Kahneman, D. and Tversky, A (1982). Variants of
butions of random variables. In some interpretations of
Uncertainty, Cognition, 11, 143-157.
probability, Bayes theory tells how to update or revise
McDermott, D. and Doyle, J., Non-monotonic Logic, beliefs in light of new evidences.
AI Vol. 13, pp. 41-72. (1980).
Default Reasoning: A non-monotonic logic pro-
Pearl, J., Probabilistic Reasoning in Intelligent Systems: posed by Raymond Reiter to formalize reasoning with
Networks of Plausible Inference, Morgan Kaufmann default assumptions. Default reasoningc can express
Publisher, San Mateo, CA ,1988. facts like by default, something is true; by contrast,
standard logic can only express that something is true
Pearl, J., Reverend Bayes on Inference Engines: A
or that something is false. This is a problem because
Distributed Hierarchical Approach, Proc. of the 2nd
reasoning often involves facts that are true in the
National Conference on Artificial Intelligence, IEEE
majority of cases but not always. A classical example
Computer Society, pp. 1-12, 1985.
is: birds typically fly. This rule can be expressed in
1040
Managing Uncertainties in Interactive Systems
standard logic either by all birds fly, which is in- Theory of Endorsement: An approach to repre-
consistent with the fact that penguins do not fly, or by sent uncertainty proposed by Cohen, which is based M
all birds that are not penguins and not ostriches and on a qualitative theory of endorsement. According
... fly, which requires all exceptions to the rule to be to Cohen, the records of the factors relating to ones
specified. Default logic aims at formalizing inference certainty are called endorsements. Cohens model of
rules like this one without explicitly mentioning all endorsement is based on the explicit recording of the
their exceptions justifications for a statement, normally requiring a
complex data structure of information about the source.
Non-Monotonic Logic: A formal logic whose
Therefore, this approach maintains the uncertainty.
consequence relation is not monotonic. Most studied
The justification is classified according to the type of
formal logics have a monotonic consequence relation,
evidence for a proposition, the possible actions required
meaning that adding a formula to a theory never pro-
to solve the uncertainty of that evidence, and other
duces a reduction of its set of consequences. Intuitively,
related features.
monotonicity indicates that learning a new piece of
knowledge cannot reduce the set of what is known. A Truth Maintenance System: A knowledge rep-
monotonic logic cannot handle various reasoning tasks resentation method for representing both beliefs and
such as reasoning by default. their dependencies. The name truth maintenance is
due to the ability of these systems to restore consis-
Possibility Theory: A mathematical theory for deal-
tency. There are two major truth maintenance systems:
ing with certain types of uncertainty and is an alterna-
single-context and multi-context truth maintenance.
tive to probability theory. Professor Lotfi Zadeh first
In single context systems, consistency is maintained
introduced possibility theory in 1978 as an extension
among all facts in memory (database). Multi-context
of his theory of fuzzy sets and fuzzy logic.
systems allow consistency to be relevant to a subset
Shafer-Dempsters Evidence Theory: A math- of facts in memory (a context) according to the history
ematical theory of evidence based on belief functions of logical inference. This is achieved by tagging each
and plausible reasoning, which is used to combine fact or deduction with its logical history. Multi-agent
separate pieces of information (evidence) to calculate truth maintenance systems perform truth maintenance
the probability of an event. The theory was developed across multiple memories, often located on different
by Arthur P. Dempster and Glenn Shafer. machines.
1041
1042
Soon-Thiam Khu
University of Exeter, UK
Dragan A. Savi
University of Exeter, UK
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Many-Objective Evolutionary Optimisation
issues affecting many-objective evolutionary optimisa- the scheme is coupled with an exact single-objective
tion and discusses the latest advances in the field. optimiser. It must be pointed out however, that none of M
the methods described above has ever been thoroughly
tested in the context of many-objective optimisation.
REmEDIAL mEASURES: The Multiple Single Objective Pareto Sampling
STATE-OF-THE-ART (MSOPS 1 & 2), an interesting hybridisation of the ag-
gregation method with goal specification, was presented
The possible remedies that have been proposed to ad- in (Hughes, 2003, Hughes, 2005). In the MSOPS, the
dress the issues arising in evolutionary many-objective selective pressure is not provided by Pareto ranking.
optimisation can be broadly classified as follows: Instead, a set of user defined target vectors is used in
turn, in conjunction with an aggregation method, to
aggregation, goals and priorities evaluate the performance of each solution at every
conditions of optimality generation of a MOEA. The greater is the number
dimensionality reduction of targets that a solution nears, the better its rank.
The authors suggested two aggregation methods: the
In the next sub-sections we give an overview of weighted min-max approach (implemented in MSOPS)
each of these methods and review the approaches that and the Vector-Angle-Distance-Scaling (implemented
have been so far proposed. in MSOPS 2). The results indicated with statistical
significance that NSGA-II (Deb, Pratap, Agarwal, &
Aggregation, Goals and Priorities Meyarivan, 2002), the Pareto-based MOEA used for
comparative purposes, was outperformed on many
This class of methods tries and overcome the diffi- objective problems. This was also recently confirmed
culties described in the previous section through the by Wagner et al. in (Wagner, Beume, & Naujoks, 2007),
decomposition of the original problem into a series of where they benchmarked traditional MOEAs, aggrega-
parameterised single-objective ones, that can then be tion-based methods and indicator-based methods on
solved by any classical or evolutionary algorithm. a up to 6-objective problems and suggested a more
Many aggregation-based methods have been pre- effective method to generate the target vectors.
sented so far and they are usually based on modifications
of the weighted sum approach, such as the augmented Conditions of Optimality
Tchebycheff function, that are able to identify exposed
solutions, and explore non-convex regions of the Recently, great attention has been given to the role
trade-off surface. However, the problem of selecting that conditions of optimality may play in the context
an effective strategy to vary weights or goals so that of many-objective evolutionary optimisation when
a representative approximation of the trade-off curve used to rank trial solutions during the selection stage
can be achieved is still unresolved. of MOEA in alternative to, or conjunction with, Pa-
The -constraint approach (Chankong, & Haimes, reto efficiency. Farina et al. (Farina, & Amato, 2004)
1983), which is based on minimisation of one (the proposed the use of a fuzzy optimality condition, but
most preferred or primary) objective function while did not provide a direct means to incorporate it into a
considering the other objectives as constraints bound MOEA. Kppen et al. (Koppen, Vincente-Garcia, &
by some allowable levels, was also used in the context Nickolay, 2005) also suggested the fuzzification of the
of evolutionary computing. The main limitation of this Pareto dominance relation, which was exploited within
approach is its computational cost and the lack of an a generational elitist genetic algorithm on a synthetic
effective strategy to vary bound levels (). Recently, MOP. The concept of knee (Deb, 2003), has also been
Laumanns et al. (Laumanns, Thiele, & Zitzler, 2006) exploited in the context of evolutionary many-objec-
proposed a variant of the original approach where they tive optimisation. Simply stated, a knee is a portion of
developed a variation scheme based on the concept a Pareto surface where the marginal substitution rates
of -Pareto dominance (efficiency) (White, 1986) that are particularly high, i.e. a small improvement in one
adaptively generates constraint values, thus enabling objective lead to a high deterioration of the others. A
the exhaustive exploration of the Pareto front, provided graphical representation is given in Figure 1. The idea
1043
Many-Objective Evolutionary Optimisation
is that, with no prior information about the preference effective to ease some of the difficulties associated
structures of the DM, the knee is likely to be the most with many-objective problems. A recent study by
interesting area. Branke et al. (Branke, Deb, Dierolf, Wagner et al. (Wagner, Beume, & Naujoks, 2007)
& Osswald, 2004) developed two methodologies to showed the excellent performance of -MOEA (Deb,
detect solutions laying on knees, and incorporated Mohan, & Mishra, 2003) on a 6-objective instance
them into the second stage (crowding measure) of the of two synthetic test functions. A good review on the
ranking procedure of NSGA-II. The first methodology application of approximate conditions of optimality
consists in evaluating for each individual in a popula- is given in (Burke, & Landa Silva, 2006), where the
tion the angle between itself and some neighbouring authors also compared the effect of using two relaxed
solutions and to use this value to favour solutions with forms of Pareto dominance as evaluation methods
higher angles, i.e. closer to the knee. The methodology, within two MOEAs.
however, scales poorly with the number of objectives. Recently,di Pierro et al. (di Pierro, Khu, & Savi,
The second strategy resorts to the expected marginal 2007) proposed a ranking scheme based on Preference
utility function to detect solutions close to the knee. Ordering (Das, 1999), a condition of optimality that
This approach extends easily with the number of ob- generalises Pareto efficiency, but it is more stringent,
jectives; however, the sampling necessary to evaluate and tested it using NSGA-II as the optimisation shell
the expectation of the marginal utility function with a on a suit of seven benchmark problems with up to
certain confidence may become expensive. Neither of eight objectives. Results indicated that the methodol-
these approaches has been tested on many-objective ogy proposed enhanced significantly the convergence
problems. properties of the standard NSGA-II algorithm on all
The concept of approximate optimal solutions has the test problems. The strengths of this approach are
also been investigated to some extent in the context its absence of parameters to tune and the fact that it
of evolutionary many-objective optimisation. In par- showed very good performances across varying problem
ticular, -efficiency was considered to be potentially features; the drawbacks its computation runtime and
1044
Many-Objective Evolutionary Optimisation
the fact that its combination with diversity preserv- (PCA) for reducing the dimension of the problem to
ing mechanisms that favours extreme solutions may solve. The procedure consists in performing a series M
ingenerate too high of a selective pressure. of optimisations using a state-of-the-art MOEA, each
In (Sato, Aguirre, & Tanaka, 2007), Sato et al. in- one focusing only on the objectives that PCA found
troduced an approach to modify the Pareto dominance explaining most of the variance on the basis of Pareto
condition used within the selection stage of Pareto- front obtained with the previous optimisation. Recently
based MOEAs, by which the area dominated by any Saxena and Deb (Saxena, & Deb, 2007) extended their
given point is contracted or expanded according to a work and replaced PCA with two dimensionality reduc-
formula derived from the Sine theorem and the extent tion techniques, the correntropy PCA and a modified
of the contraction/expansion is controlled by a constant maximum variance unfolding that could also detect
factor. The results of a series of experiments performed non-linear interactions in the objective space. The re-
using NSGA-II equipped with the contraction/expan- sults indicated that the former method suffered to some
sion mechanism on 0/1 multi-objective knapsack extent from a difficult choice o the best kernel function
problems showed substantially improved convergence to use, whereas for the latter, the authors performed
and diversity performances compared to the standard a significant number of experiments to suggest bound
NSGA-II algorithm. However, it was also shown that values of the only free parameter of the procedure. It
the optimal value for the contraction/expansion factor must be highlighted that these two studies are the only
depends strongly on various problem features, and no efforts that have challenged new algorithms on highly-
indication was given to support a correct choice. dimensional test problems (up to 50 objectives).
Most modern MOEAs rely on a two-stage ranking In a recent study Brockhoff and Zitzler (Brockhoff,
during the selection process. At the first stage the ranks & Zitzler, 2006b) introduced the minimum objective
are assigned according to some form of Pareto-based subset problem (MOSS), which is concerned with the
dominance relation; if ties exist, these are resolved identification of the largest set of objectives that can be
resorting to mechanisms that favour a good distribution removed without altering the dominance structure of the
along the Pareto front of the solutions examined. It has problem (i.e. the set of Pareto optimal solutions obtained
now been acknowledged that this second stage of the considering all the objectives or only the MOSS is the
ranking procedure may in fact be detrimental in case same), and developed an exact algorithm and a greedy
of many objectives, as it was shown in (Purshouse, heuristic to solve it. Subsequently (Brockhoff, & Zitzler,
& Fleming, 2003b) for the case of NSGA-II. Recent 2006a), they proposed a measure of variation for the
efforts have therefore focused on replacing diver- dominance structure and extended the MOSS to allow
sity preserving mechanisms at this second stage with for dimensionality reductions involving predefined
more effective ones. Koppen and Yoshida (Koppen, thresholds of problem structure changes. However,
& Yoshida, 2007) proposed four secondary ranking they did not propose a mechanism to incorporate these
assignments and tested them by replacing the crowd- algorithms within a MOEA.
ing distance assignment within NSGA-II. The results Recently, the analysis of the relationships of inter-
indicated improved convergence in all cases compared dependence between the objectives of an optimisation
to the standard NSGA-II. However, the authors did problem has been successfully exploited to devise ef-
not report any result on the diversity performance of fective reduction methods. Following the definitions
the algorithms. of conflict, support or harmony, and independence
proposed in (Carlsson, & Fuller, 1995), Purshouse and
Dimensionality Reduction Methods Fleming (Purshouse, & Fleming, 2003a) discussed the
effects of these relationships in the context of many-
The aim of this class of methods is usually to transform objective evolutionary optimisation. In a later study
the objective space into a lower dimension represen- Purshouse and Fleming (Purshouse, & Fleming, 2003c)
tation, either one-off (prior to the optimisation) or also suggested, in the case of objectives independence,
iteratively (as the search progresses). a divide-and-conquer algorithm based on objective
Deb and Saxena (Deb, & Saxena, 2006) developed space decomposition.
a procedure based on principal component analysis
1045
Many-Objective Evolutionary Optimisation
As it appears from the discussion above, there is an Branke, J., Deb, K., Dierolf, H., and Osswald, M.,
increasing effort to develop strategies that are able (2004). Finding Knees in Multi-objective Optimization,
to overcome the limitations of Pareto-based methods Kanpur Genetic Algorithm Laboratory, KanGAL Tech.
when solving problems with many objectives. Although Rep. No. 2004010.
promising results have been generally reported, most
Brockhoff, D. and Zitzler, E., (2006a). Dimensionality
of the approaches presented are of an empirical nature,
Reduction in Multiobjective Optimization with (Par-
which makes it difficult to draw conclusions that can
tial) Dominance Structure Preservation: Generalized
be generalised.
Minimum Objective Subset Problems, TIK-Report
With the exception of dimensionality reduction
No. 247,.
techniques, the majority of the studies presented to
date focus on mechanisms to improve the ranking of Brockhoff, D. and Zitzler, E., (2006b). On Objective
the solutions in the selection process. However, the Conflicts and Objective Reduction in Multiple Criteria
analysis of these mechanisms is usually undertaken in Optimization, TIK-Report No. 243, TIK-Report No.
isolation with respect to the other components of the 243.
algorithms. In our view, this is an important limitation
that next generation algorithms will have to address, in Burke, E. K., & Landa Silva, J. D., (2006). The Influence
particular, by undertaking the analysis of these mecha- of the Fitness Evaluation Method on the Performance of
nisms in relation with the variation operators. Multiobjective Search Algorithms, European Journal
Moreover, there has been little attention in trying of Operational Research, (169) 3, 875-897.
to characterise the solutions that a given method (be- Carlsson, C., & Fuller, R., (1995). Multiple Criteria
longing to the first or second category identified in the Decision Making: The Case for Interdependence, Com-
previous section) favours, in relation to the properties puters and Operations Research, (22) 251-260.
of the problem being solved. Theoretical frameworks
are therefore needed in order to analyse existing meth- Chankong, V., & Haimes, Y. Y., (1983). Multiobjec-
ods and develop more focused approaches. As it was tive decision making: theory and methodology, Dover
pointed out by di Pierro in (di Pierro, 2006), where he Publications.
provided a theoretical framework to analyse the effect Coello Coello, C. A., & Aguirre, A. H., (2002). Design
of the Preference Ordering-based ranking procedure in of Combinational Logic Circuits through an Evolution-
relation to the interdependence relationships a problem, ary Multiobjective Optimization Approach, Artificial
this approach enables predicting the effect of applying a Intelligence for Engineering Design, Analysis and
given methodology to a particular problem with limited Manufacture, (16) 1, 39-53.
prior knowledge, which is certainly an advantage since
the goal of developing powerful algorithms is to solve Coello Coello, C. A., Van Veldhuizen, D. A., &
(often for the first time) real life problems. Lamont, G. B., (2002). Evolutionary Algorithms for
Solving Multi-Objective Problems, Kluwer Academic
Publishers.
CONCLUSIONS Das, I., (1999). A Preference Ordering Among Various
Pareto Optimal Alternatives, Structural Optimization,
In this chapter we have provided a comprehensive review (18) 1, 30-35.
of the state-of-the-art of evolutionary algorithms for the
optimisation of many objective problems discussing Deb, K., (2001). Multi-Objective Optimization using
limitations and strengths of the approaches described, Evolutionary Algorithms, John Wiley & Sons.
and we have suggested future trends of research for a Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T.,
field that is gathering increasing momentum. (2002). A Fast and Elitist Multiobjective Genetic Al-
1046
Many-Objective Evolutionary Optimisation
gorithm: NSGA-II, IEEE Transaction on Evolutionary International Conference, EMO 2005, vol. 3410, pp.
Computation, (6) 2, 182-197. 399-412. M
Deb, K., & Saxena, D. K., (2006). Searching for pareto- Kppen, M., & Yoshida, K., (2007). Substitute Distance
optimal solutions through dimensionality reduction for Assignments in NSGA-II for Handling Many-Objective
certain large-dimensional multi-objective optimization Optimization Problems, in Evolutionary Multi-Crite-
problems, in IEEE Congress on Evolutionary Com- rion Optimization, .4th International Conference, EMO
putation, (CEC 2006), pp. 3353-3360. 2007, S. Obayashi and K. Deb and C. Poloni and T.
Hiroyasu and T. Murata Eds. Springer.
Deb, K., (2003). Multi-objective evolutionary al-
gorithms: Introducing bias among Pareto-optimal Laumanns, M., Thiele, L., & Zitzler, E., (2006). An
solutions, in Advances in Evolutionary Computing: efficient, adaptive parameter variation scheme for
Theory and Applications. A. Ghosh and S. TsuTsui, metaheuristic based on the epsilon-constraint method,
Eds. London: Springer -Verlag, pp. 263-292. European Journal of Operational Research, (169)
932-942.
Deb, K., Mohan, M., and Mishra, S., (2003). A Fast
Multi-objective Evolutionary Algorithm for Find- Purshouse, R. C., & Fleming, P. J., (2003c). An adap-
ing Well-Spread Pereto-Optimal Solutions, Kanpur tive divide-an-conquer methodology for evolutionary
Genetic Algorithm Laboratory, KanGAL Technical multi-criterion optimisation, in Second International
Report, 2003002. Conference on Evolutionary Multi-Criterion Optimiza-
tion (EMO 2003), pp. 133-147, Berlin: Springer.
di Pierro, F., (2006). Many-Objective Evolutionary
Algorithms and Applications to Water Resources En- Purshouse, R. C., & Fleming, P. J., (2003a). Conflict,
gineering, School of Engineering, Computer Science harmony, and independence: Relationships in evolution-
and Mathematics, University of Exeter, UK, Ph.D. ary multi-criterion optimisation, in Second International
Thesis. Conference on Evolutionary Multi-Criterion Optimiza-
tion (EMO 2003), pp. 16-30, Berlin: Springer.
di Pierro, F., Khu, S.-T., & Savic, D. A., (2007). An
Investigation on Preference Order - Ranking Scheme Purshouse, R. C., & Fleming, P. J., (2003b). Evolu-
for Multi Objective Evolutionary Optimisation, IEEE tionary Multi-Objective Optimisation: An Exploratory
Transaction on Evolutionary Computation, (11) 1, Analysis, in Proceedings of the 2003 Congress on
17-45. Evolutionary Computation (CEC 2003), vol. 3, pp.
2066-2073 IEEE Press.
Farina, M., & Amato, P., (2004). A Fuzzy Definition of
Optimality for Many-Criteria Optimization Problems, Sato, H., Aguirre, H. E., & Tanaka, K., (2007). Control-
IEEE Transactions on Systems, Man, and Cybernetics ling Dominance Area of Solutions and Its Impact on
Part A---Systems and Humans, (34) 3, 315-326. the Performance of MOEAs, in Evolutionary Multi-
Criterion Optimization, .4th International Conference,
Farina, M., & Amato, P., (2002). On the Optimal
EMO 2007, pp. 5-19.
Solution Definition for Many-criteria Optimization
Problems, in Proceedings of the NAFIPS-FLINT In- Saxena, D. K., & Deb, K., (2007). Non-linear Di-
ternational Conference 2002, pp. 233-238, Piscataway, mensionality Reduction Procedures, in Evolutionary
New Jersey: IEEE Service Center. Multi-Criterion Optimization, .4th International Con-
ference, EMO 2007, pp. 772-787, S. Obayashi and
Hughes, E. J., (2003). Multiple single objective pareto
K. Deb and C. Poloni and T. Hiroyasu and T. Murata
sampling, in 2003 Congress on Evolutionary Computa-
Eds. Springer.
tion, pp. 2678-2684.
Wagner, T., Beume, N., & Naujoks, B., (2007). Pareto-,
Kppen, M., Vincente-Garcia, R., & Nickolay, B.,
Aggregation-, and Indicator-based Methods in Many-
(2005). Fuzzy-Pareto-Dominance and Its Applica-
Objective optimization, in Evolutionary Multi-Criterion
tion in Evolutionary Multi-objective Optimization,
Optimization.4th International Conference, EMO 2007,
in Evolutionary Multi-Criterion Optimization. Third
1047
Many-Objective Evolutionary Optimisation
pp. 742-756, S. Obayashi and K. Deb and C. Poloni Pareto Front: Image of the Pareto Set onto the
and T. Hiroyasu and T. Murata Eds. Springer. performance (objective) space.
White, D. J., (1986). Epsilon Efficiency, Journal of Op- Pareto Optimal Solution: Solution that is not
timization Theory and Applications, (49) 319-337. dominated by any other feasible solution.
Pareto Set: Set of Pareto Optimal solutions.
Ranking Scheme: Scheme that assigns to each
KEy TERmS
solution of a population a score that is a measure of
its fitness relative to the other members of the same
Evolutionary Algorithms: Solution methods
population.
inspired by the natural evolution process that evolve
a population of solutions to an optimisation problem Selective Pressure: The ratio between the number
through iterative application of randomised processes of expected selections of the best solution and the mean
of recombination and selection, until a termination performing one.
criteria is met.
Many-Objective Problem: Problem consisting
of more than 3-4 objectives to be concurrently maxi-
mised/minimised.
1048
1049
Wolfgang A. Halang
Fernuniversitaet in Hagen, Germany
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Mapping Ontologies by Utilising Their Semantic Structure
1050
Mapping Ontologies by Utilising Their Semantic Structure
and properties) are quite often defined with different kenised into many sub-terms, i.e., tokens, it is necessary
representations by different organisations. For instance, to separately calculate similarity between each pair of M
representations may be with or without connector sym- token strings. Assume that the number of tokens of the
bols, with upper or lower cases, etc., which renders it first term is m, n for the second term, and assume m n,
very complicated and hard to identify terms. Tokenisa- the total similarity measure according to Eq. (1) is:
tion means to parse names into tokens based on specific
rules or symbols by customisable tokenisers using m n m n
Sim syn = V 1 Simorig + V ij Simij , V 1 + V ij = 1
punctuation, upper case, special symbols, digits, etc. i =1 j =1 i =1 j =1
In this way, a class or property name can m be n tokenised m n
into one or several Sim
token
syn strings.
= V 1 For
Sim orig +
example, Vthe term
ij Sim ij , V 1 + V ij = 1
Social_%26_Science can be tokenised i =1 jas
=1 Social, i =1 j =1 (2)
26, and Science. Note that the terms can sometimes
contain digits like date which is not neglectable. where Simorig is the similarity between the original
For simplicity, we assume that all terms of ontology strings, Simij is the similarity between tokens ith and
concepts (classes, properties) are described without jth from two source terms, and 1, ij are the weights
abbreviations. The mapping process between differ- for Simorig and Simij. The sum of ij and 1 are supposed
ent class and property names is then transformed to to be 1. Given a predefined similarity threshold, if the
mapping between tokens. acquired similarity value is greater than or equal to
We first check whether the original child nodes the threshold, then two tokens are considered similar,
are equal ignoring case. Otherwise, the tokens are vice versa.
used instead to check whether they are equal. If not,
the similarity measure based on the edit distance is Semantic-Level Mapping Based on
adopted to calculate similarity. If the calculated simi- Ontology Structure
larity value is above a threshold (for example, 0.95),
the compared nodes are considered to be similar. The Semantic heterogeneity occurs when there is a disa-
process continues to deal with the next pair of nodes greement about meaning, interpretation, or intended
in the same way. use of the same or related data. Semantic relations
The edit distance formulated by Levenshtein (Leven- (Gahleitner, & Woess, 2004) are:
shtein, 1966) and the string mapping method proposed
by Maedche & Staab (Maedche & Staab, 2002) are different naming of the same content, i.e., syno-
employed here to calculate token similarity. The edit nyms,
distance is a well-established method to weigh the dif- different abstraction levels: generic terms vs.
ference between two strings. It measures the minimum more specific ones (name vs. first name and last
number of token insertions, deletions, and substitutions name), hypernyms or hyponyms, and
required to transform one string into another using a different structures about the same content (sepa-
dynamic programming algorithm. The string matching rate type vs. part of a type), i.e., meronyms.
method is used to calculate token similarity based on
Levenshteins edit distance: In ontology mapping, WordNet is one of the most
frequently used sources of background knowledge.
min( X , Y ) ed ( X , Y ) Actually, it plays the role of an intermediate to help
Sim( X , Y ) = max( 0, ) [0,1]
min( X , Y ) finding semantic heterogeneity. The WordNet library
can be accessed with the Java API JWNL2 . It groups
(1)
English words into sets of synonyms called synsets
providing short, general definitions, and it records the
where X, Y are token strings, |X| is the length of X,
various semantic relations between these synonym sets.
min( ) and max() denotes the minimum/maximum
The purpose is to produce a combination of diction-
value of two arguments, respectively, and ed( ) is the
ary and thesaurus that is more intuitively usable, and
edit distance.
to support automatic text analysis and artificial intel-
As the original ontology terms may have been to-
ligence applications. It is assumed that each sense in a
1051
Mapping Ontologies by Utilising Their Semantic Structure
WordNet synset describes a concept. WordNet senses if it holds (a, b) E if and only if a is a parent
are related among themselves via synonym, hyponym, element of b; a is called parent node, and b child
and hyperonym relations. Terms lexicalising the same node.
concept (sense) are considered to be equivalent through
the synonym relationship, while hypernyms, hyponyms, In building a ts-graph, breadth-first traversal is ap-
and meronyms are considered similar. With this rule, plied to traverse a tree hierarchy. To construct a ts-graph,
the original ontology terms and their relative token we begin with the root nodes first child node. All its
strings are first checked whether they have the same child nodes and their relative parent nodes form edges as
part of speech, i.e., noun, verb, adjective, adverb. The (parent node, child node) for the ts-graph. The process
next step is to judge whether they are synonyms, hy- is repeated until the tree is completely traversed.
pernyms, hyponyms, or meronyms. Analogue to Eq. After the ts-graphs of two ontologies to be matched
(2), for a pair of nodes a similarity value is calculated are built, the edge sets of both graphs are employed in
according to the weights of different similarity parts. the mapping process. The relative positions of a pairs
If it exceeds a given threshold, the pair is considered (from two ontologies) nodes within their tree-structured
to be similar. graphs determines the semantic-level mapping between
If the above-mentioned syntactic and semantic them. In Table 1 we summarise three types of relation-
WordNet mapping methods still could not find a map- ships between edges characterising properties, child
ping between two terms, another semantic-level method classes, and parent-and-child classes. By understanding
based on tree-structured graphs is applied. these properties, we can derive that entities having the
As rendered with SWOOP3 (a hypermedia-inspired same properties are similar. This is not a rule always
ontology browser and editor based on OWL ontolo- holding true, but it is a strong indictor for similarity.
gies, which supports renderers to obtain class/property
hierarchy trees as well as definitions of and inferred Experimental Results
facts on OWL classes and properties), ontology enti-
ties are represented by class/property hierarchy trees. The implementation of our algorithm was written in
From class hierarchy trees, tree-structured graphs are Java. It relies on SWOOP for parsing OWL files. All
constructed. Based on the notion of structure graphs tests were run on a standard PC (with Windows XP).
(Lian, 2004), a tree-structured graph (ts-graph) is Our test cases include ontologies Baseball team, and
defined as: Russia provided by the institute AIFB4.
To get an impression of the matchmakers per-
Definition 1. Given a tree of sets T, N the union of all formance different measures have to be considered.
sets in T, and E the set of edges in T , then ts-g There are various ways to measure how well retrieved
(T) = (N, E) is called tree-structured graph of T, information matches intended information. To evaluate
R3 Parent-and-child If parent class and one of the child classes of a and b are similar,
classes a and b are also similar.
1052
Mapping Ontologies by Utilising Their Semantic Structure
0.8
0.6 Precision
Recall
0.4
F-Measure
0.2
0
0.8 0.85 0.95
the quality of mapping results, we use standard infor- of our future work is to address security problems con-
mation-retrieval metrics: Recall (r), Precision (p), and nected to ontology mapping, such as trust management
F-Measure (Melnik & Rahm, 2002), where Recall is in the application of web services.
the ratio of the number of relevant entities retrieved
to the total number of relevant entities, Precision is
the ratio of the number of relevant records retrieved CONCLUSION
to the total number of irrelevant and relevant records
retrieved, and The overall research goal presented in this article is to
develop a method for ontology mapping that combines
2rp syntactic analysis measuring the difference between
F-Measure =
r +p (3) tokens by the edit distance with semantic analysis
based on WordNet as semantic relation and the simi-
As shown in Figure 2, with the increase of the thresh- larity of structured graphs representing the ontologies
old, the mapping precision for Russia gets higher. The being compared. Empirically we have shown that our
shortcoming of this method is its efficiency. Though synthesised mapping method works with relatively
we trade off efficiency to get more effective mapping high precision.
results, our algorithm is still applicable to some offline
web applications, like information filtering (Hanani,
2001) according to users profile. REFERENCES
1053
Mapping Ontologies by Utilising Their Semantic Structure
Approach. In Bussler, C., Davis, J., Fensel, D., Studer, Semantic Web as Background Knowledge for Ontology
R., eds.: Proceedings of the 1st ESWS. Volume 3053 Mapping. In Proceedinds of the International Work-
of Lecture Notes in Computer Science., Heraklion, shop on Ontology Matching (OM-2006), collocated
Greece, Springer Verlag, pp. 7691, 2004. with ISWC06.
Fellbaum, C. (1999). Wordnet: An Electronic Lexical Su, X.M. & Atle Gulla, J. (2006) An Information
Database. MIT press. Retrieval Approach to Ontology Mapping. Data &
Knowledge Engineering 58(1), 47-69.
Gahleitner, E., & Woess, W. (2004). Enabling Distribu-
tion and Reuse of Ontology Mapping Information for
Semantically Enriched Communication Services. In
Proceedings of the 15th International Workshop on Da- KEy TERmS
tabase and Expert Systems Applications (DEXA04).
Ontology: As a means to conceptualise and structure
Gruber, T.R. (1993). A Translation Approach to Por-
knowledge, ontologies are seen as the key to realise
table Ontology Specification. Knowledge Acquisition,
the vision of the semantic web
5(2), 199-220.
Ontology Mapping: Ontology mapping is required
Hanani, U., Shapira, B., Shoval, P. (2001) Information
to achieve knowledge sharing and semantic integration
Filtering: Overview of Issues, Research and Systems.
in an environment with different underlying ontolo-
User Modeling and User-Adapted Interaction, 11,
gies.
203-259.
Precision: The ratio of the number of relevant
Kalyanpur, A., Parsia, B., & Hendler, J. (2005). A Tool
records retrieved to the total number of irrelevant and
for Working with Web Ontologies. Intrnational Journal
relevant records retrieved.
on Semantic Web and Information Systems, 1(1).
Recall: The ratio of the number of relevant entities
Levenshtein, I.V. (1966). Binary Codes Capable of
retrieved to the total number of relevant entities.
Correcting Celetions, Insertions, and Reversals. Cy-
bernetics and Control Theory, 10(8), 707-710. Semantic Web: Envisioned by Tim Berners-Lee,
the semantic web is as a universal medium for data,
Li, J. (2004). LOM: A Lexicon-based Ontology Map-
information, and knowledge exchange. It suggests
ping tool. In Proceeding of the Performance Metrics
to annotate web resources with machine-processable
for Intelligent Systems (PerMIS04), Information
metadata.
Interpretation and Integration Conference (I3CON),
Gaithersburg, MD. Retrieved Aug. 30, 2007, from Similarity Measure: A method used to calculate
http://reliant.teknowledge.com/DAML/I3con.pdf. the degree of similarity between mapping sources.
Lian, W., Cheung, D.W., & Yiu, S.M. (2004). An Ef- Tokenisation: Tokenisation extracts the valid ontol-
ficient and Scalable Algorithm for Clustering XML ogy entities from OWL descriptions.
Documents by Structure. IEEE Transactions on Knowl-
edge and Data Engineering, 16(1), 82-96. Tree-Structured Graph: A graphical structure
to represent a tree with nodes and a hierarchy of its
Maedche A., & Staab, S. (2002) Measuring Similarity edges.
between Ontologies. In Proceedings of the European
Conference on Knowledge Acquisition and Manage-
ment (EKAW-2002). LNCS/LNAI 2473, pp. 251-263,
Springer-Verlag. ENDNOTE
1054
Mapping Ontologies by Utilising Their Semantic Structure
4
The datasets are available from http://www.aifb.
uni-karlsruhe.de/WBS/meh/mapping/. M
1055
1056
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Mathematical Modeling of Artificial Neural Networks
by biological neural networks (NNs) are changing in proaches dealing with uncertainty such as fuzzy logic,
response to extrinsic stimuli is called self-organization. probabilistic, hyperplane, kernel, and exemplar-based M
The flexible nature of the human brain, represented classifiers can be incorporated into ANN classifiers in
by self-organization, seems to be responsible for the applications where only few data are available (Ng &
learning function which is specific to living organisms. Lippmann, 1991).
Essentially, learning is an adaptive self-organizing The capacity of analog neural systems to operate
process. From the training assistance point of view, in unpredictable environments depends on their abil-
there are supervised and unsupervised neural classifiers. ity to represent information in context. The context of
Supervised classifiers seek to characterize predefined a signal may be some complex collections of neural
classes by defining measures that maximize in-class patterns, including those that constitute learning. The
similarity and out-class dissimilarity. Supervision may interplay of context and adaptation is a fundamental
be conducted either by direct comparison of output with principle of the neural paradigm. As only variations and
the desired target and estimating error, or by specify- differences convey information, permanent change is
ing whether the output is correct or not (reinforce- a necessity for neural systems rather than a source of
ment learning). The measure of success in both cases difficulty as it is for digital systems.
is given by the ability to recover the original classes
for similar but not identical input data. Unsupervised
classifiers seek similarity measures without any pre- mATHEmATICAL FRAmEWORK OF
defined classes performing cluster analysis or vector NEURONS AND ANNS mODELING
quantization. Neural classifiers organize themselves
according to their initial state, types and frequency of An approach to investigate neural systems in a general
the presented patterns, and correlations in the input frame is the mean field theory (Cooper & Scofield, 1988)
patterns by setting up some criteria for classification from statistical physics suited for highly interconnected
(Fukushima, 1975) reflecting causal mechanisms. systems as cortical regions are. However, there is a big
There is no general agreement on the measure of their gap between the formal model level of description in
success since likelihood optimization always tends to associative memory levels and the complexity of neural
favor single instance classes. dynamics in biological nets. Neural modeling need
Classification as performed by ANNs has essentially no information concerning correlations of input data,
a dual interpretation reflected by machine learning rather nonlinear processing units and a sufficiently large
too. It could mean either the assignment of input pat- number of variable parameters ensure the flexibility to
terns to predefined classes, or the construction of new adapt to any relationship between input and output data.
classes from a previously undifferentiated instance set Models can be altered externally, by adopting a different
(Stutz & Cheesman, 1994). However, the assignment axiomatic structure, and internally, by revealing new
of instances to predefined classes can produce either inside structural or functional relationships. Ranking
the class that best represents the input pattern as in the several neuromorphic models is ultimately carried out
classical decision theory, or the classifier can be used based on some measure of performance.
as a content-addressable or associative memory, where
the class representative is desired and the input pattern Neuron Modeling
is used to determine which exemplar to produce. While
the first task assumes that inputs were corrupted by Central problems in any artificial system designed to
some processes, the second one deals with incomplete mimic NNs arise from (i) biological features to be pre-
input patterns when retrieval of full information is the served, (ii) connectivity matrix of the processing units,
goal. Most neural classifiers do not require simultane- whose size increases with the square of their number,
ous availability of all training data and frequently yield and (iii) processing time, which has to be independent
error rates comparable to Bayesian methods without of the network size. Biologically realistic models of
needing prior information. An efficient memory might neurons might minimally include:
store and retrieve many patterns, so its dynamics must
allow for as many states of activity which are stable Continuous-valued transfer functions (graded
against small perturbations as possible. Several ap- response), as many neurons respond to their
1057
Mathematical Modeling of Artificial Neural Networks
input in a continuous way, though the nonlinear Most neuromimetic models are based on the Mc-
relationship between the input and the output of Culloch and Pitts (1943) neuron as a binary threshold
cells is a universal feature; unit:
Nonlinear summation of the inputs and significant
logical processing performed along the dendritic N
y j (t + 1) = wij xi (t ) Q j
tree; i =1 (1)
Sequences of pulses as output, rather than a simple
output level. A single state variable yj representing
the firing rate, even if continuous, ignores much where yj represents the state of neuron j (either 1 or
information (e.g., pulse phase) that might be en- 0) in response to input signals {xi}i, j stands for
coded in pulse sequences. However, there is no a certain threshold characteristic of each neuron
relevant evidence that phase plays a significant j, time t is considered discrete, with one time unit
role in most neuronal circuits; elapsing per processing step, and is the unit step
Asynchronous updating and variable delay of (Heaviside) function:
data processing, that is, the time unit elapsing
per processing step, t t + 1, is variable among 1, x 0
neurons; (x ) =
0, x < 0 (2)
Variability of synaptic strengths caused by the
amount of transmitter substance released at a
chemical synapse, which may vary unpredictably. The weights, wij , 1 i N , represent the strengths
This effect is partially modeled by stochastic of the synapses connecting neuron i to neuron j, and
generalization of the binary neural models dy- may be positive (excitatory) or negative (inhibitory).
namics. The weighted sum
1058
Mathematical Modeling of Artificial Neural Networks
1059
Mathematical Modeling of Artificial Neural Networks
Let N formal neurons link, directly for one-layer nection links a postsynaptic neuron j to conjuncts S
networks and indirectly for multi-layered ones, an P(N) of presynaptic neurons. Each conjunct S prepro-
input space X of signals to an output space Y. The cesses (or gates) the afferent signals {xi)i produced by
state space of the system is the product X Y of the the presynaptic neurons through a function:
input-output pairs (x,y), which are generically called
patterns or configurations in pattern recognition (PR), {xi }i =1,2,..., N
J
J S (x )
S
(4)
data analysis, and classification problems. When X =
Y and the input of the patterns coincide with the out-
puts, (x,y), x = y, the system is called autoassociative; If conjuncts are reduced to individual neurons S =
if the input and output patterns are different, (x,y) y {i}, then the role of control is played by the synaptic
x, then the system is heteroassociative. Among all matrix:
possible input-output patterns, a subset K X Y is
chosen as training set. Most often, the input and output W = wSj S P (N )
Figure 2. Conjuncts of neurons {2,3} and {4,5,6} gate the inputs to neuron 1
1060
Mathematical Modeling of Artificial Neural Networks
yj = f j ({w ,J (x )} )
S
j S S P (N )
(6)
Boolean associative memories correspond to
X = N , Y = , K = {0,1} {0,1}
N
,
for synchronous neurons (discrete dynamical system),
and: and fuzzy associative memories to
y= wS xi
S
fj ({w ,J (x )} )
S
j S S P (N )
S P (N )
= g j wSj J S (x )
(8)
S P (N ) iS
(11)
1061
Mathematical Modeling of Artificial Neural Networks
Though ANNs are able to perform a large variety of Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduc-
tasks, the problems handled practically could be loosely tion to the theory of neural computation. Addison-
divided into four basic types (Zupan & Gasteiger, 1994): Wesley Publishing Company.
auto- or hetero-association, classification, mapping Jansson, P. A. (1991). Neural networks: An overview.
(Kohonen, 1982), and modeling. Analytical Chemistry, 63(6), 357-362.
In neural classifiers, the set of examples used for
training should necessarily come from the same (pos- Kohonen, T. (1982). Self-organized formation of topo-
sibly unknown) distribution as the set used for testing logically correct feature maps. Biological Cybernetics,
the networks, in order to provide reliable generaliza- 43, 59-69.
tion in classifying unknown patterns. Valid results are MacKay, D. J. K. (1992). A practical Bayesian frame-
only produced if the training examples are adequately work for backpropagation networks, Neural Computa-
selected and their number is comparable to the number tion, 4, 448-472.
of effective parameters in the net. Quite a few classes
of learning algorithms have the convergence guaran- Mao, J. & Jain, A. K. (1995). Artificial neural networks
teed; moreover, they require substantial computational for feature extraction and multivariate data projec-
resources. tion. IEEE Transactions of Neural Networks, 6(2),
Generally, ANNs are used as parts of larger systems 296-317.
employed as preprocessing or labeling/interpretation McCulloch, W. S. & Pitts, W. (1943). A logical calcu-
subsystems. In many cases, the flexibility and non- lus of ideas immanent in nervous activity. Bulletin of
algorithmic processing of ANNs surpass their incon- Mathematical Biophysics, 5, 115-133.
1062
Mathematical Modeling of Artificial Neural Networks
Neal, R. M. (1996). Bayesian learning for neural net- Complex Systems: Systems made of several in-
works. Springer-Verlag. terconnected simple parts which altogether exhibit a M
high degree of complexity from each emerges a higher
Ng, K. & Lippmann, R. (1991). Practical charac-
order behaviour.
teristics of neural network and conventional pattern
classifiers. In R. Lippmann, & D. Touretzky (Eds.), Emergence: Modalities in which complex systems
Advances in neural information processing systems like ANNs and patterns come out of a multiplicity of
(pp. 970-976). San Francisco, CA: Morgan Kauffman relatively simple interactions.
Publishers Inc.
Learning (Training) Rule: Iterative process of
Stutz, J. & Cheesman, P. (1994). Autoclass - A Bayesian updating the weights from cases (instances) repeatedly
approach to classification. In J. Skilling, & S. Sibisi presented as input. Learning (adaptation) is essential
(Eds.), Maximum entropy and Bayesian methods (pp. in PR where the training data set is limited and new
117-126). Cambridge: Kluwer Academic Publishers. environments are continuously encountered.
Sutton, R. S. & Barto, A. G. (1998). Introduction to Paradigm of ANNs: Set of (i) pre-processing units
reinforcement learning. Cambridge, MA: MIT Press. form and functions, (ii) network topology that describes
the number of layers, the number of nodes per layer,
Xu, L. & Yuille, A. L. (1995). Robust principal element
and the pattern of weighted interconnections among the
analysis by self-organizing rules based on statistical
nodes, and (iii) learning (training) rule that specifies
physics approach. IEEE Transactions on Neural Net-
the way weights should be adapted during use in order
works, 6(1), 131-143.
to improve network performance.
Welstead, S. T. (1994). Neural network and fuzzy logic
Relaxation: Process by which ANNs minimize
applications in C/C++. John Wiley & Sons.
an objective function using semi- or non-parametric
Zupan, J. & Gasteiger, J. (1994). Neural networks for methods to iteratively updating the weights.
chemists. An introduction. VCH Verlagsgesellschaft
Robustness: Property of ANNs to accomplish
mbH, Germany.
reliable its tasks when handling incomplete and/or cor-
rupted data. Moreover, the results should be consistent
even if some part of the network is damaged.
KEy TERmS Self-Organization Principle: A process in which
the internal organization of a system that continually
Artificial Neural Networks (ANNs): Highly paral- interacts with its environment increases in complexity
lel networks of interconnected simple computational without being driven by an outside source. Self-organiz-
elements (cells, nodes, neurons, units), which mimic ing systems typically exhibit emergent properties.
biological neural network.
1063
1064
Ricardo Colomo
Universidad Carlos III de Madrid, Spain
Marcos Ruano
Universidad Carlos III de Madrid, Spain
ngel Garca
Universidad Carlos III de Madrid, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Microarray Information and Data Integration Using SAMIDI
1065
Microarray Information and Data Integration Using SAMIDI
Kabak, Laleci & Lausen, 2005). Ontologies allow to THE SAmIDI APPROACH
organise terms used in heterogeneous data sources
according to their semantic relationships so that hetero- SAMIDI is both a methodology to allow the conver-
geneous data fragments can be mapped into a consistent sion of a set of micro-array data sources into a unified
frame of reference (Buttler, Coleman, Critchlow, Fileto, representation of the information they include, and a
Han, Pu, Rocco & Xiong, 2002). software architecture. A general overview of SAMIDI
Applying semantics allows capturing the meaning is depicted in Figure 1.
of data one single time. Without semantics, each data
element will have to be interpreted several times from The Unifying Information Model (UIM)
its design and implementation until its use, facilitating
error raise. Finally, semantics allows turning a great set It brings together all the physical data schemas related
of data sources into an integrated, coherent and unique to the data sources to be integrated. It doesnt repli-
body of information. The architecture of the information cate any of the data models, but is built to represent
itself contains a record to keep the meaning and locate an agreed-upon scientific view and vocabulary which
each data asset, enabling the automation of overlapping will be the foundation to understand the data. The
and redundancy analysis. UIM might capture the major concepts present in each
One of the most important contributions of ontol- schema for, after applying a semantic mapping, relate
ogy to the unification of biological data schemas is the the physical schemas of the various data sources to
MGED (Microarray Gene Expression Data) ontology the Model itself. Thus, the semantic mapping encap-
(Whetzel, Parkinson, Causton, Fan, Fostel, Fragoso, sulates the meaning of the data, taking into account
Game, Heiskanen, Morrison, & Rocca-Serra, 2006) the agreed-upon terminology. The UIM also provides
that was prompted by the heterogeneity among MI- the basis for the creation of new data assets, assuring
AME and MAGE formats. MGED is a conceptual they are consistent with the underlying schemas, and
model for micro-array experiments that establishes serves as a reliable reference for understanding the
concepts, definitions, terms and resources for standard- interrelationship between seemingly unrelated sources
ized description of a micro-array experiment in support and for automatically planning the translation of data
of MAGE. MGED has been accepted as a unifying elements between them.
terminology by the micro-array research community,
which makes it a perfect candidate for becoming a
universal understanding model.
Figure 1. SAMIDI
SAMIDI UIM
1066
Microarray Information and Data Integration Using SAMIDI
The Semantic Information Management The effective application of the SIMM requires a
Methodology (SIMM) support information system. Key components of the M
supporting system should include:
Its aim is to fill the existent gap between the frag-
mented information scenario represented by the set A repository for storing the collected metadata
of micro-array data sources to be integrated and the on data assets, schemas and models.
required semantic data integration. The methodology A set of semantic tools for integrated ontology
is arranged in such a way that each of the phases gener- modeling for the creation of the UIM and the
ates a substantial added value itself, while concurrently semantic mapping of the data schemas to the
progressing towards the full semantic integration. model.
Next, the different stages of the methodology (Figure The standard business terminology stemmed from
2) are detailed: the UIM should be used across the supporting
system.
1. Capture Requirements: The project scope is es- Data management capabilities of the system
tablished by surveying the relevant data sources should include authoring and editing the infor-
to be integrated and determining the information mation model; discovering data assets for any
requirements of the aimed information model. particular business concept; creating qualitative
2. Collect Metadata: Data assets are classified and and quantitative reports about data assets; testing
catalogued while the relevant (according to the and simulating the performance of the UIM and
organizations use of data) metadata are col- impact analyzing in support of change.
lected. The system should fully support data integration
3. Build UIM: The structure of the UIM is determined by automatically generating code for queries and
by representing the desired business world-view, translation scripts between mapped schemas
a comprehensive and consistent vocabulary and harnessing the common understanding acquired
a set of business rules. from the data semantics.
4. Rationalize data semantics: The meaning of the Data quality should be approached by support-
represented data is mapped into the UIM. ing the identification and decommissioning of
5. Deploy: The UIM, the metadata and the semantics redundant data assets, comparison mechanisms
are shared among the intended stakeholders and for ensuring consistency among semantically dis-
customized to fulfill their needs. similar data, and validation/cleansing of individual
6. Exploit: Business processes are created, ensur- sources against the central repository of rules.
ing that the architecture has achieved the data The system should allow bi-directional com-
management, data integration and data quality munication with other systems for exchanging
objectives. metadata and models by means of adaptors and
Figure 2. SIMM
Deploy
Exploit
1067
Microarray Information and Data Integration Using SAMIDI
standards such as XML Metadata Interchange UIM using the MGED ontology. It will provide
standard. Similarly, the system should be able to a semi-automatic mechanism for the mapping of
collect metadata and other data assets from rela- schemas and concepts or categories in the UIM
tional databases or other kind of repositories. so as to lessen the workload of a process that will
Capability of active data integration. The system require human intervention. Fully automatic map-
should have a Run-Time interface for the auto- ping has been discarded since it is regarded as not
matic generation and exporting of queries, transla- recommendable due to semantic incompatibilities
tion scripts, schemas and cleansing scripts. and ambiguities among the different schemas
The user interface should provide a rich thick- and data formats. The objective of the Semantic
client for power users in the data management Engine is to bridge the gap between cost-efficient
group. machine-learning mapping techniques and pure
The system should include a transversal platform human interaction.
supporting shared functionalities such as version YARS: The YARS (Yet Another RDF Store)
control, collaboration tools, access control and (Harth & Decker, 2005) system is a semantic data
configuration for all metadata and active content store that allows semantic querying and offers a
in the system. higher abstraction layer to enable fast storage and
retrieval of large amounts of metadata descriptions
The SAMIDI Software Architecture while keeping a small footprint and a lightweight
architecture approach.
A detailed description of the components comprised GUI: Enables user-system interaction. It collects
in the SAMIDI software architecture (see Figure 3) is requests following certain search criteria, transfers
presented now: the appropriate commands to and displays the
results provided as a response from the RunTime
SearchBot: Software agent whose mission is to Manager.
perform a methodical, systematic and automatic Query Factory: Build queries into YARS storage
browsing of the integrated information sources. systems using a query language. The semantics
Semantic Engine: Integrated set of tools for the of the query are defined by an interpretation of
semantic mapping of the data schemas to the the most suitable results of the query instead of
1068
Microarray Information and Data Integration Using SAMIDI
strictly rendering a formal syntax. YARS stores fosters the intelligent interaction between natural lan-
RDF triples, and the query factory, fro pragmatic guage user intentions and the existing SWS execution M
reasons, implements SPARQL Query Language environments. BIRD is a platform designed to interact
for RDF (Prudhommeaux & Seaborne, 2004). with humans as a gateway or a man-in-the-middle
RunTime Manager: This component coordinates towards SWS execution environments. The main goal
the interactions among the rest of components. of the system is to help users express their needs in
First of all, it communicates with the Semantic terms of information retrieval and achieve information
Engine to verify that the information collected integration by means of SWS. BIRD allows users to
by the SearchBot is adequately mapped on the state their needs via natural language or using a list
MGED ontology as a UIM and stored into YARS of terms extracted from the Gene Ontology, infer the
using RDF syntax. Secondly, if accepts the users goals derived from the users wishes and send them to
search requests through the GUI and hands them the suitable SWS execution environment, which will
over the Query Factory, which, in turn queries retrieve the outcome resulting of the integration of the
YARS to retrieve all the metadata descriptions applications being accessed (e.g. all the biomedical
related to the particular search criteria. By retriev- publications and medical databases).
ing a huge amount of metadata information from
all the integrated data sources, the user benefits
from a knowledge aware search response which CONCLUSION
is mapped to the underlying terminology and uni-
fied criteria of the UIM, with the added advantage SAMIDI, represents a tailor-made contribution aimed
that all resources can be tracked and identified to tackling the poser of discovering, searching and inte-
separately. grating multiple micro-array data sources harnessing the
distinctive features of semantic technologies. The main
contribution of SAMIDI is that it makes a decomposition
FUTURE TRENDS of the unmanageable problem of integrating different
and independent micro-array data sources.
We believe that the SW and SW Services (SWS) SAMIDI is the first step to foster and extend the
paradigm promise a new level of data and process idea of using semantic technologies for the integra-
integration that can be leveraged to develop novel high- tion of different data sources not only originated from
performance data and process management systems for micro-array research but stemming from biomedical
biological applications. research areas.
Using semantic technologies as a key technology for
interoperation of various datasets enables data integra-
tion of the vast amount of biological and biomedical REFERENCES
data. In a nutshell, the use of knowledge-oriented
biomedical data integration would lead to achieving Berners-Lee, T., Hendler, J. & Lassila, O. (2001). The
Intelligent Biomedical Data Integration, which will Semantic Web. Scientific American Magazine. 284
bring biomedical research to its full potential. (5), 34-43.
A future trend of SAMIDI effort is to integrate it
in a SWS scenario in order to achieve seamless inte- Buttler, D., Coleman, M., Critchlow, T., Fileto, R.,
gration, also from the service or process integration Han, W., Pu, C., Rocco, D. & Xiong, L. (2002). Query-
perspective. This would enable the access to a number ing multiple bioinformatics information sources: can
of heterogeneous data resources which are accessed via semantic web research help? ACM SIGMOD Record.
a Web Service interface and it would open the scope 31 (4), 59-64.
and goals of SAMIDI to a broader base. Promising Chen, Y.P.P.,Promparmote, S. & Maire, F. (2006).
integrating efforts in that direction have already been MDSM: Microarray database schema matching using
undertaken by the Biomedical Information and Integra- the Hungarian method. Information Sciences. 176 (19),
tion Discovery with SWS (BIRD) platform (Gmez, 2771-2790.
Rico, Garca-Snchez, Liu & Terra, 2007), which
1069
Microarray Information and Data Integration Using SAMIDI
Cohen, J. (2005). Computer Science and Bioinforma- Verschelde, J.L., Dos Santos, M.C., Deray, T., Smith,
tics. Communitacions of the ACM. 48 (3), 74-78. B. & Ceusters, W. (2004). Ontology-assisted database
integration to support natural language processing
Della Valle, E., Cerizza, D., Bicer, V., Kabak, Y., Laleci,
and biomedical data-mining. Journal of Integrative
G.B. & Lausen, H. (2005). The Need for Semantic Web
Bioinformatics. 1 (1).
Service in the eHealth. W3C workshop on Frameworks
for Semantics in Web Services. Vyas, H. & Summers, R. (2005). Interoperability of
bioinformatics resources. VINE: The journal of infor-
Denn, S.O. & MacMullen, W.J. (2002). The ambiguous
mation and knowledge management systems. 35 (3),
bioinformatics domain: conceptual map of information
132-139.
science applications for molecular biology. Proceedings
of the 65th Annual Meeting of the American Society Whetzel, P.L., Parkinson, H., Causton, H.C., Fan,
for Information Science & Technology. Philadelphia. L., Fostel, J., Fragoso, G., Game, L., Heiskanen, M.,
18-21. Morrison, N. & Rocca-Serra, P. (2006). The MGED
Ontology: a resource for semantics-based description of
Gmez, J.M., Rico, M., Garca-Snchez, F., Liu, Y.,
microarray experiments. Bioinformatics. 22 (7), 866.
Terra, M. (2007). Biomedical Information Integration
and Discovery with Semantic Web Services. Procee- Xu, L., Maresh, G. A., Giardina, J. & Pincus, S. (2004).
dings of the 2nd International Work Conference on the Comparison of different microarray data analysis
Interplay between Natural and Artificial Computation. programs and description of a database for microar-
4527-4528. ray data management. DNA & Cell Biology. 23 (10),
643-652.
Goodman, N. (2002). Biological data becomes com-
puter literate: new advances in bioinformatics. Current
Opinion in Biotechnology. 13 (1), 68-71.
KEy TERmS
Fensel, D. (2002). Ontologies: A Silver Bullet for
Knowledge Management and Electronic Commerce.
Bioinformatics: Application of information science
Springer-Verlag.
and technologies for the management of biological data
Harth, A. & Decker, S. (2005). Optimized Index Struc- and the use of computers to store, compare, retrieve,
tures for Querying RDF from the Web. Proceedings of analyze or predict the composition of the structure of
the 3rd Latin American Web Congress. biomolecules.
Ignacimuthu, S. (2005). Basic Bioinformatics. Alpha DNA Micro-Array: Collection of microscopic
Science International. DNA spots attached to a solid surface forming an
array for the purpose of expression profiling, which
Murphy, D. (2002). Gene expression studies using
monitors expression levels for thousands of genes
microarrays: principles, problems, and prospects. Ad-
simultaneously.
vances in Physiology Education. 26 (4), 256-270.
MAGE: Standard micro-array data model and
Prudhommeaux, E. & Seaborne, A. (2004). SPARQL
exchange format that is able to capture information
Query Language for RDF. WWW Consortium.
specified by MIAME.
Segall, R.S. & Zhang, Q. (2006). Data visualization
MGED: Ontology for micro-array experiments that
and data mining of continuous numerical and discrete
establishes concepts, definitions, terms and resources
nominal-valued microarray databases for bioinformat-
for standardized description of a micro-array experi-
ics. Kibernetes. 35 (10), 1538-1566.
ment in support of MAGE.
Sinnott, R.O. (2007). From access and integration to
MIAME: XML based standard for the description
mining of secure genomic data sets across the Grid. Fu-
of micro-array experiments, which stands for Minimum
ture Generation Computer Systems. 23 (3), 447-456.
Information About a Micro-array Experiment.
1070
Microarray Information and Data Integration Using SAMIDI
Ontology: The specification of a conceptualization array data sources into a semantically integrated and
of a knowledge domain. Its a controlled vocabulary unified representation of the information stored in the M
that describes objects and the relations among them in data sources.
a formal way, and has a grammar for using the vocabu-
Unifying Information Model (UIM): Construction
lary terms to express something meaningful within a
that brings together all the physical data schemas related
specified domain of interest.
to the data sources to be integrated. Its built to represent
Semantic Information Model Methodology: Set an agreed-upon scientific view and vocabulary which
of activities, together with their inputs and outputs, will be the foundation to understand the data.
aimed at the transformation of a collection of micro-
1071
1072
The development of autonomous mobile robots is Global path planning algorithms refer to a group of
continuously gaining importance particularly in the navigation algorithms that plans an optimal path from a
military for surveillance as well as in industry for in- point of origin to a given goal in a known environment.
spection and material handling tasks. Another emerging This group of algorithms requires the environment to
market with enormous potential is mobile robots for be free from dynamic and unforeseen obstacles. In this
entertainment. section, two key global path planning algorithms: navi-
A fundamental requirement for autonomous mobile gation functions and roadmaps will be discussed.
robots in most of its applications is the ability to navi-
gate from a point of origin to a given goal. The mobile Navigation Functions
robot must be able to generate a collision-free path that
connects the point of origin and the given goal. Some The most widely used global path planning algorithm
of the key algorithms for mobile robot navigation will is perhaps the navigation function computed from the
be discussed in this article. wave-front expansion (J.-C Latombe, 1991; Howie
Choset et al, 2005) algorithm due to its practicality, ease
in implementation and robustness. The navigation func-
BACKGROUND tion N is the Manhattan distance to the goal from the
free space in the environment. The algorithm requires
Many algorithms were developed over the years for information of the environment provided to the robot
the autonomous navigation of mobile robots. These to be represented as an array of grid cells.
algorithms are generally classified into three differ-
ent categories: global path planners, local navigation
methods and hybrid methods, depending on the type of
environment that the mobile robot operates within and
the robots knowledge of the environment.
In this article, some of the key algorithms for navi- Figure 1. The shaded cells are the 1-Neighbor of the
gation of a mobile robot are reviewed. Advantages and cell (x, y) and the number shows the priority of the
disadvantages of these algorithms shall be discussed. neighboring cells
The algorithms that are reviewed include the navigation
function, roadmaps, vector field histogram, artificial
potential field, hybrid navigation and the integrated
algorithm. Note that all the navigation algorithms that
are discussed in this article assume that the robot is
operating in a planar environment.
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Mobile Robots Navigation, Mapping, and Localization
The navigation function assigns a numeric value N vironment (J.-C Latombe, 1991; Danner et al, 2000;
to each cell with the goal cell having the lowest value, Foskey et al, 2001; Isto P., 2002; T. Simon et al, 2004;
and the other unoccupied cells having progressively Xiaobing Zou et al, 2004; Howie Choset et al, 2005;
higher values such that a steepest descent from any Bhattacharya et al, 2007). Once a roadmap has been
cell provides the path to the goal. The value of the constructed, it is used as a set of standardized paths.
unoccupied cell increases with the distance from the Path planning is thus reduced to connecting the initial
goal. Each grid cell is either free or occupied space and goal positions to points in the roadmap. Various
denoted by gCfree and gCoccupied. First, the value of N is methods based on this general idea have been proposed.
set to 0 at the goal cell gCgoal. Next, the value of N They include the visibility graph (Danner et al, 2000;
is set to 1 for every 1-Neighbor (see Figure 1 for the Isto P., 2002; T. Simon et al, 2004), Voronoi diagram
definition of 1-Neighbors) of gCgoal which is in gCfree. It (Foskey et al, 2001; Xiaobing Zou et al, 2004; Bhat-
is assumed that the distance between two 1-Neighbors tacharya et al, 2007), freeway net and silhouette (J.-C
is normalized to 1. In general, the value of each gCfree Latombe, 1991; Howie Choset et al, 2005).
cell is set to N+1 (e.g., 2) for every unprocessed gCfree The visibility graph is the simplest form of road-
1-Neighbor of the grid cell with value N (e.g., 1). This map. This algorithm assumes that the environment is
is repeated until all the grid cells are processed. made up of only polygonal obstacles. The nodes of a
Finally, a path to the goal is generated by following visibility graph include the point of origin, goal and all
the steepest descent of the N values. To prevent the path the vertices of the obstacles in the environment. The
from grazing the obstacles, the grid cells which are less graph edges are straight line segments that connect
than a safety distance from the obstacles are omitted any two nodes within the line-of-sight of each other.
in the computation of the navigation function. Figure Finally, the shortest path from the start to goal can be
2 shows a path generated by the navigation function. obtained from the visibility graph.
The black cells are the obstacles and the grey cells are
the unsafe regions. Advantages and Disadvantages
1073
Mobile Robots Navigation, Mapping, and Localization
U i ,o = 2 Ko d d if d i < d 0
i o
0 otherwise
LOCAL NAVIGATION mETHODS (2)
In contrast to the global path planners, local navigation which pushes the robot away from the obstacle. In cases
methods do not require a known map of the environ- where there is more than one obstacle, the total repulsive
ment to be provided to the robot. Instead, local navi- force is computed by the sum of all the repulsive forces
gation methods rely on current and local information produced by the obstacles. Kg and Ko are the respective
from sensors to give a mobile robot online navigation gains of the attractive and repulsive potential. di is the
capability. In this section, two of the key algorithms scalar distance between the robot and obstacle i. The
for local navigation: artificial potential field and vector repulsive potential will only have effect on the robot
field histogram will be evaluated. when its moves to a distance which is lesser than d0.
This implies that d0 is the minimum safe distance from
Artificial Potential Field the obstacle that the robot tries to maintain.
The negated gradient of the potential field gives the
The artificial potential field (O.Khatib, 1986) method, artificial force acting on the robot.
first introduced by Khatib, is perhaps the best known
algorithm for local navigation of mobile robots due to F (q ) = U (q ) (3)
its simplicity and effectiveness. The robot is represented
as a particle in the configuration space q moving under Figure 3 shows the attractive force
the influence of an artificial potential produced by the
goal configuration qgoal and the scalar distance to the
obstacles. Typically the goal generates an attractive F (q ) = K q q
g g
( )
g
(4)
potential such as
1074
Mobile Robots Navigation, Mapping, and Localization
that is generated from the goal and the repulsive Advantages and Disadvantages
force M
The advantage of the artificial potential field, vector
1 1 1 field histogram and other local navigation methods is
K o if d i < d o
Fi ,o (q ) = di do d2 that they do not include an initial processing step aimed
i at capturing the connectivity of the free space in a
0 otherwise
(5) concise representation. Hence a prior knowledge of the
environment is not needed. At any instant in time, the
that is generated from an obstacle i. FR is the resultant path is determined based on the immediate surrounding
of all the repulsive forces and attractive force. Note that of the robot. This allows the robot to be able to avoid
n denotes the total number of obstacles which is lesser any dynamic obstacles in the robots vicinity.
than a distance do from the robot. At every position, The major drawback of the local navigation methods
the direction of this force is considered as the most is that they are basically steepest descent optimization
promising direction of motion for the robot. methods. This renders the mobile robot to be susceptible
to local minima (Y.Koren et al, 1991; J.-O. Kim et al,
n 1992; Liu Chengqing et al, 2000; Liu Chengqing, 2002;
FR (q ) = Fg (q )+ Fi , o (q )
i =1 (6) Min Gyu Park et al, 2004). A local minimum in the
potential field method occurs when the attractive and
Vector Field Histogram the repulsive forces cancel out each other. The robot
will be immobilized when it falls into a local minimum,
The vector field histogram method (Y.Koren et al, 1991; and loses the capability to reach its goal. Many methods
J.Borenstien, 1991; Zhang Huiliang et al, 2003) requires have been proposed to solve the local minima problem
the environment to be represented as a tessellation of (J.-O. Kim et al, 1992; Liu Chengqing et al, 2000;
grid cells. Each grid cell holds a numerical value that Liu Chengqing, 2002; Min Gyu Park et al, 2004). For
ranges from 0 15. This value represents whether the example, Liu Chengqing (Liu Chengqing et al, 2000;
environment represented by the grid cell is occupied Liu Chengqing, 2002) has proposed the virtual obstacle
or not. 0 indicates absolute certainty that the cell is not method where the robot detects the local minima and
occupied and 15 indicates absolute certainty that the fills the area with artificial obstacles. Consequently, the
cell is occupied. A two stage data reduction process is method closes all concave obstacles and thus avoiding
carried out recursively to compute the desired motion local minima failures. Another method was proposed by
of the robot at every instance of time. Jin-Oh Kim (J.-O. Kim et al, 1992) to solve the local
In the first stage, the values of every grid cells that minima problem. This method uses local minima free
are in the vicinity of the robots momentary location harmonic functions based on fluid dynamics to build
are reduced to a one-dimensional polar histogram. the artificial potentials for obstacle avoidances.
Each bin from the polar histogram corresponds to a
direction as seen from the current location of the robot
and it contains a value that represents the total sum of HyBRID mETHODS
the grid cell values along that direction. The values
from the polar histogram are also known as the polar Another group of algorithms suggest a combination of
obstacle density and they represent the presence of the local navigation and global path planning methods.
obstacles in the respective directions. These algorithms aim to combine the advantages from
In the second stage, the robot selects the bin with a both the local and global methods, and to also eliminate
low polar obstacle density and direction closest to the some of their weaknesses. In this section, two key
goal. The robot moves in the direction represented by the hybrid methods algorithms: hybrid navigation and
chosen bin because this direction is free from obstacles integrated algorithm will be reviewed.
and it will bring the robot closer to the goal.
1075
Mobile Robots Navigation, Mapping, and Localization
Hybrid Navigation goal when the distance between the robot and its goal
becomes smaller than the radius of the circle. This is
Figure 4 shows an illustration of the hybrid navigation to make sure that the N value of the next intersection
algorithm (Lim Chee Wang, 2002; Lim Chee Wang et will be smaller than the current N value.
al, 2002). This algorithm combines the navigation func-
tion with the potential field method. It aims to eliminate Integrated Algorithm
local minima failures and at the same time does online
collision avoidance with dynamic obstacles. In recent years, the integrated algorithm (Lee Gim Hee,
The robot first computes the path joining its current 2005; Lee Gim Hee et al, 2007) has been proposed to
position to the goal using the navigation function. The give a mobile robot the ability to plan local minima free
robot then places a circle with an empirical radius cen- paths and does online collision avoidance to dynamic
tered at its current position. The cell that corresponds obstacles in an unknown environment. The algorithm
to the intersection of the circle with the navigation modifies the frontier-based exploration method (Brian
function path is known as the attraction point. The Yamauchi, 1997), which was originally used for map
attraction point is the cell with the lowest N value if building, into a path planning algorithm in an unknown
there is more than one intersection. environment. The modified frontier-based exploration
The robot advances towards the attraction point using method is then combined with the hybrid navigation
the potential field method and the circle moves along algorithm into a single framework.
with the robot which will cause the attraction point to Figure 5 shows an overview of the integrated algo-
change. As a result, the robot is always chasing after a rithm. The robot first builds a local map (see Part II of this
dynamic attraction point which will progress towards article for details on map building) of its surrounding.
the goal along the local minima free navigation function It then decides whether the goal is reachable based on
path. The radius of the circle is made larger to intersect the acquired local map. The goal is reachable if it is in
the navigation function path in cases where no inter- free space, and is not reachable if it is in the unknown
sections are found. The radius of the circle is reduced space. Note that an unknown region is a part of the map
to smaller than the distance between the robot and its which has not been explored during the map building
1076
Mobile Robots Navigation, Mapping, and Localization
process. The robot will advance towards the goal using Advantages and Disadvantages
the hybrid navigation algorithm if it is reachable or
advance towards the sub-goal and build another local The hybrid navigation algorithm has the advantage
map at the sub-goal if the goal is not reachable. This of eliminating local minima failures and at the same
map will be added to the previous local maps to form time doing online collision avoidance with dynamic
a larger map of the environment. The process goes on obstacles. However, it requires the environment to be
until the robot finds the goal within a free space. fully known for the search of a navigation function path
The sub-goal is computed in three steps. First, com- to the goal. The algorithm will fail in a fully unknown
pute the path that joins the robots current position and environment. It also does not possess any capability to
the goal using the navigation function. The unknown re-plan the navigation function path during an opera-
cells are taken to be free space in the computation of tion. Therefore any major changes to the environment
the navigation function. Second, all the frontiers in the could cause failure in the algorithm.
map are computed. The boundary of free space and The integrated algorithm has the advantages of
unknown region is known as the frontier. A frontier planning local minima free paths, does online collision
is made up of a group of adjacent frontier cells. The avoidance to dynamic obstacles in totally unknown
frontier cell is defined as any gCfree cell on the map with environments. In addition, the algorithm gives the
at least two unknown cells gCunknown as its immediate mobile robot a higher level of autonomy since it does
neighbor. The total number of frontier cells that make not depend on humans to provide a map of the environ-
up a frontier must be larger than the size of the robot ment. However, the advantages come with the trade
to make that frontier valid. Third, the frontier that in- off of tedious implementations. This is because the
tersects the navigation function path will be selected integrated algorithm requires both the hybrid naviga-
and its centroid chosen as the sub-goal. tion algorithm and a good mapping algorithm to be
implemented at the same time.
1077
Mobile Robots Navigation, Mapping, and Localization
1078
Mobile Robots Navigation, Mapping, and Localization
1079
1080
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Mobile Robots Navigation, Mapping ,and Localization
= (0.5,1] indicates the confidence level of a cell being than the sensor measurement. The other cells in the
occupied where 1 indicates absolute certainty that the map remain unchanged. M
cell is occupied. pt,i = 0.5 indicates that the cell is an Figure 1(a) illustrates the update process for the
unexplored area. map. The cell that corresponds to the sensor measure-
A robot does not have any knowledge of the world ment is shaded black and all the cells that intercept the
when it was first placed in an unknown environment. sensor measurement beam are shaded white. Figure
It is therefore intuitive to set pt,i = 0.5 for all i at time t 1(b) shows a case where the sensor measurement
= 0. The map is updated via the log odds (S. Thrun et equals to maximum sensor range and lsensor = lfree for all
al, 2005) representation of occupancy. The advantage cells that intercepts the sensor beam. This is because
of log odds representation is that it can avoid numeri- it is assumed that no obstacle is detected if the sensor
cal instabilities for probability near 0 or 1. The ith grid measurement equals to maximum sensor range. locc and
cell that intercepts the sensor line of sight is updated lfree are computed from
according to
p occ p free
lt ,i = lt 1,i + l sensor l occ = log and l free = log (3)
(1) 1 p occ 1 p free
where lt-1,i is the log odds computed from the occupancy where pocc and pfree denote the probabilities of the sensor
value of the cell at t-1. measurement correctly deducing whether a grid cell
is occupied or empty. The two probabilities must add
pt 1,i up to 1 and their values depend on the accuracy of the
lt 1,i = log
1 pt 1,i sensor. pocc and pfree will have values closer to 1 and 0
(2)
for an accurate sensor. The values of pocc and pfree have
to be determined experimentally and remain constant
lsensor = locc if the cell corresponds to the sensor measure-
in the map building process.
ment and lsensor = lfree if the range to the cell is shorter
Figure 1. Updating an occupancy grid map (a) when an obstacle is detected (b) when a maximum range mea-
surement is detected, i.e. it is assumed that in this case no obstacle is detected
(a) (b)
1081
Mobile Robots Navigation, Mapping, and Localization
Figure 2. Occupancy grid map of the corridor along The occupancy value of a grid cell is easily recov-
block EA level 3 in the Faculty of Engineering of the ered from
National University of Singapore (NUS)
1
p t ,i = 1
1 + exp{lt ,i }
(4)
Topological Maps
Figure 3. Example of a topological map. The features are represented as nodes mi. The connectivity and distance
between features are represented as edges ujk
m1 m2
u12
u23 m3
u25
u57 m5
m7 u34
u56
m6 m4
1082
Mobile Robots Navigation, Mapping ,and Localization
between adjacent features is represented as edges ujk. ness and efficiency. The EKF is a recursive algorithm
In many topological maps, distances between adjacent for estimating the pose of the robot with noisy sensor M
features are also represented by the edges connecting readings. A key feature of the EKF is that it maintains
the nodes. The success of the topological maps de- a posterior belief bel(xt) of the pose estimate, which
pends greatly on the efficiency in features extraction. follows a Gaussian distribution, represented by a mean
Examples of feature extraction algorithms can be found xt and covariance Pt. The mean xt represents the most
in (Martin David Adams, 1999; Sen Zhang et al, 2003; likely pose of the robot at time t and covariance Pt rep-
Jodo Xavier et al, 2005). resents the error covariance of this estimate. The EKF
Topological maps are better choice for mapping if consists of two steps: the prediction and update steps.
memory space is a major concern. This is because less In the prediction step, the predicted belief bel ( xt ) is
memory is required to store the nodes as compared to first computed using a motion model which describes
the large number of grid cells in occupancy grid maps. the state dynamics of the robot. bel ( xt ) is subsequently
The advantage of less memory consumption for the transformed into bel(xt) by incorporating the sensor
topological map however comes with the tradeoff of measurements in the update step.
being less accurate. This is because some important As mentioned above, the predictedbelief bel ( xt )which
information such as precise location of the free spaces is represented by the predicted mean xt and covariance
in the environment may not be represented in the maps. P t is computed from the prediction step given by
The limited accuracy of topological maps thus restricts
the robots capability for fast and safe navigation. xt = f (xt -1 , u t )
(5)
P t = Ft Pt 1 Ft T + Qt
LOCALIZATION (6)
Most mobile robots localize their pose xt with respect to where f(.) is the motion model of the mobile robot, F
a given map based on odometry readings. Unfortunately, is the Jacobian of f(.) evaluated at xt1, Qt is the covari-
wheel slippages and drifts cause incremental localiza- ance of the motion model and ut is the control data of
tion errors (J. Borenstein et al, 1995; J. Borenstein et the robot.
al, 1996). These errors cause the mobile robot to lose bel ( xt ) is subsequently transformed into bel(xt) by
track of its own pose and hence losing the ability to incorporating the sensor measurement zt into the update
navigate autonomously from one given point in the map step of the EKF shown in Equations 7, 8 and 9.
to another. The solution to the localization problem is
to make use of information of the environment from Kt = Pt H tT ( H t Pt H tT + Rt ) 1 (7)
T
additional sensors. Examples of sensors used are laser K
xtt = Pxt tH+t K
= (H P H tT( +
t (tz t t h xt R
1
t )))
,m
T
range finder and sonar sensor that measure the distance K
xttt
P = Pxt tH+t (K
= (IHt(tzK
HhtT( +
Pt t xR
1
t )))
,m
= t H t ) Ptt (8)
between the robot and the nearest obstacles in the xtt
P = xt + (KI t( zKt tHht ()xPtt , m))
=
environment. The extended Kalman filter (EKF) and Pt = ( I K t H t ) Pt
particle filter are two localization algorithms that use (9)
odometry and additional sensory data of the environ-
ment to localize a mobile robot. Both algorithms are Kt, computed in Equation 7, is called the Kalman gain.
probabilistic methods that allow uncertainties from the It specifies the degree to which zt should be incorporated
robot pose estimate and sensor readings to be accounted into the new pose estimate. Equation 8 computes xt by
for in a principled way. adjusting it in proportion to Kt and the deviation of the zt
with the predicted measurement h( xt , m) . It is important
Localization with Extended Kalman Filter to note that the sensor measurement z t = [ z t1 z t2 ...]T
refers to coordinates of a set of observed landmarks
EKF (John J. Leonard et al, 1991; A.Kelly, 1994; G. instead of the raw sensor readings and the sensor mea-
Welch et al, 1995; Martin David Adams, 1999; S. Thrun surement model h(.) gives the predicted measurement
et al, 2005) is perhaps the most established algorithm from the given topological map m and xt . Ht is the
for localization of mobile robots because of its robust- Jacobian of h(.) evaluated at xt1. Finally, the covariance
1083
Mobile Robots Navigation, Mapping, and Localization
Table 1. Pseudo algorithm for mobile robot localization with particle filter
1. X t = X t = ;
2. for m = 1 to M do
3. generate random sample of xt[m ] from p ( xt | u t , xt[m1] ) ;
4. wt[ m ] = p ( z t | xt[ m ] , m) ;
5. C t[ m ] = [ xt[ m ] wt[ m ] ]T ;
6. end;
7. for m = 1 to M do
8. draw C t[m ] from X t with probability proportional to wt[1] , wt[ 2 ] ,...., wt[ M ] ;
9. end;
1084
Mobile Robots Navigation, Mapping ,and Localization
model. The importance factor accounts for the mismatch Figure 4(a) to (d) shows an implementation result of
between bel ( xt ) and bel(xt). Finally, the resampling a robot localizing itself in a corridor. The particle set is M
process from line 7 to 9 draws with replacement M initialized to the initial known pose of the robot show
particles from the temporary set X t with a probability in Figure 4(a). The particles are initialized uniformly
proportional to the importance factors. The distribution within a circle with radii 100mm and the initial position
of bel ( xt ) is transformed into bel(xt) by incorporating of the robot is taken as the center. The orientation of
the importance factors in the resampling process. the particles is also initialized uniformly within 5o to
Figure 4. Implementation of the particle filter to solve the localization problem. Notice that the error from the
odometry grows as the robot travels a greater distance
1085
Mobile Robots Navigation, Mapping, and Localization
A mobile robot has to possess three competencies to D. Kortenkamp, & T.Weymouth. (1994). Topological
achieve full autonomy: navigation, map building and mapping for mobile robots using a combination of
localization. Over the years, many algorithms have sonar and vision sensing. Proceedings of the Twelfth
been proposed and implemented with notable success National Conference on Artificial Intelligence. pp.
to give mobile robots all the three competencies. Some 979984. AAAI Press/MIT Press.
of the key algorithms such as the navigation function, G. Welch, & G.Bishop (1995). An introduction to the
roadmaps, artificial potential field, vector field histo- Kalman filter. Paper TR 95-041. Department of Com-
gram, hybrid navigation and the integrated algorithm puter Science, University of North Carolina at Chapel
for navigation; occupancy grid and topological based Hill, Chapel Hill, NC 27599-3175.
mapping; as well as the Kalman filter and particle
filter for localization are reviewed in both Part I and H. Choset. (1996). Sensor based motion planning: The
II of this article. hierarchical generalized voronoi graph. PhD thesis,
California Institute of Technology.
1086
Mobile Robots Navigation, Mapping ,and Localization
H. Choset, & J.W. Burdick. (1996). Sensor based scanner. The Fourth International Conference on Con-
planning: The hierarchical generalized voronoi graph. trol and Automation. M
Proceeding Workshop on Algorithmic Foundations of
S. Thrun, D. Fox, W. Burgard, & F. Dellaert. (2001).
Robotics.
Robust monte carlo localization for mobile robots.
H.P. Moravec. (1988). Sensor fusion in certainty grids Artificial Intelligence Journal (AIJ).
for mobile robots. AI Magazine. pp. 61-74.
S. Thrun, Wolfram Burgard, & Dieter Fox. (2005).
H.P. Moravec, & D.W. Cho. (1989). A Bayesian method Probabilistic robotics. Cambridge Massachusetts,
for certainty grids. AAAI 1989 Spring Symposium on London, England. The MIT Press.
Mobile Robots.
Ioannis M. Rekleitis. (2004). A particle filter tutorial
for mobile robot localization. Technical Report TR- KEy TERmS
CIM-04-02, Centre for Intelligent Machines, McGill
University, 3480 University St., Montreal, Quebec, Curse of Dimensionality: This term was first
CANADA H3A 2A7. used by Richard Bellman. It refers to the problem of
exponential increase in volume associated with adding
J. Borenstein, & L. Feng. (1995). Umbmark: A bench-
extra dimensions to a mathematical space.
mark test for measuring odometry errors in mobile
robots. Proceedings of the 1995 SPIE Conference on Gaussian Distribution: It is also known as normal
Mobile Robots. pp. 569-574. distribution. It is a family of continuous probability dis-
tributions where each member of the family is described
J. Borenstein, & L. Feng. (1996). Measurement and
by two parameters: mean and variance. This form of
correction of systematic odometry errors in mobile
distribution is used by the localization with extended
robots. IEEE Transaction on Robotics and Automation.
Kalman filter algorithm to describe the posterior belief
Vol. 12, pp. 869-880.
distribution of the robot pose.
Jodo Xavier, Daniel Castrot, Marco Pachecot, Ant Nio
Jacobians: The Jacobian is a first-order partial
Ruanot, & Urbano Nunes. (2005) Fast line, arc/circle
derivatives of a function. Its importance lies in the
and leg detection from laser scan data in a player driver.
fact that it represents the best linear approximation to
Proceedings of the 2005 IEEE International Conference
a differentiable function near a given point.
on Robotics and Automation. pp. 95-106.
Odometry: A method to do position estimation for
John J. Leonard, & Hugh F. Durrant-Whyte. (1991).
Mobile robot localization by tracking geometric bea- a wheeled vehicle during navigation by counting the
cons. IEEE Transaction on Robotics and Automation. number of revolutions taken by the wheels that are in
Vol. 7, pp. 376-382. contact with the ground.
R. Chatila, & J.-P. Laumond. (1985). Position referenc- Posterior Belief: It refers to the probability dis-
ing and consistent world modeling for mobile robots. tribution of the robot pose estimate conditioned upon
Proceedings of the 1985 IEEE International Conference information such as control and sensor measurement
on Robotics and Automation. data. The extended Kalman filter and particle filter
are two different methods for computing the posterior
R. Smith, M. Self, & P. Cheeseman. (1990). Estimating belief.
uncertain spatial relationships in robotics. Autonomous
Robot Vehicles. Springer-Verlag. pp. 167193. Predicted Belief: It is also known as the prior belief.
It refers to the probability distribution of the robot pose
Martin David Adams. (1999). Sensor Modeling, Design
estimate interpreted from the known control data and
and Data Processing for Autonomous Navigation. World
in the absence of the sensor measurement data.
Scientific. Vol. 13.
Recursive Algorithm: It refers to a type of computer
Sen Zhang, Lihua Xie, Martin Adams, & Fan Tang.
function that is applied within its own definition. The
(2003). Geometrical feature extraction using 2d range
1087
Mobile Robots Navigation, Mapping, and Localization
extended Kalman filter and particle filter are recursive current time step are used as inputs in the next time
algorithms because the outputs from the filters at the step.
1088
1089
Natalia Garanina
Russian Academy of Science, Institute of Informatics Systems, Russia
It becomes evident in recent years a surge of interest All modal logics are languages that are characterized
to applications of modal logics for specification and by syntax and semantics. Let us define below a very
validation of complex systems. It holds in particular simple modal logic in this way. This logic is called El-
for combined logics of knowledge, time and actions ementary Propositional Dynamic Logic (EPDL).
for reasoning about multiagent systems (Dixon, Nalon Let true, false be Boolean constants, Prp and Rel
& Fisher, 2004; Fagin, Halpern, Moses & Vardi, 1995; be disjoint sets of propositional and relational variable
Halpern & Vardi, 1986; Halpern, van der Meyden & respectively. The syntax of the classical propositional
Vardi, 2004; van der Hoek & Wooldridge, 2002; Lomus- logic consists of formulas which are constructed from
cio, & Penczek, W., 2003; van der Meyden & Shilov, propositional variables and Boolean connectives
1999; Shilov, Garanina & Choe, 2006; Wooldridge, (negation), & (conjunction), (disjunction),
2002). In the next paragraph we explain what are log- (implication), and (equivalence) in accordance
ics of knowledge, time and actions from a viewpoint with the standard rules. EPDL has additional formula
of mathematicians and philosophers. It provides us a constructors, modalities, which are associated with
historic perspective and a scientific context for these relational variables: if r is a relational variable and j
logics. is a formula of EPDL then
For mathematicians and philosophers logics of ac-
tions, time, and knowledge can be introduced in few ([r]j) is a formula which is read as box r-j or
sentences. A logic of actions (ex., Elementary Proposi- after r always j;
tional Dynamic Logic (Harel, Kozen & Tiuryn, 2000)) (rj) is a formula which is read as diamond
is a polymodal variant of a basic modal logic K (Bull r-j or after r sometimes j.
& Segerberg, 2001) to be interpreted over arbitrary
Kripke models. A logic of time (ex., Linear Temporal The semantics of EPDL is defined in models, which
Logic (Emerson, 1990)) is a modal logic with a number are called labeled transition systems by computer
of modalities that correspond to next time, always, scientists and Kripke models1 by mathematicians
sometimes, and until to be interpreted in Kripke and philosophers. A model M is a pair (D , I) where
models over partial orders (discrete linear orders for the domain (or the universe) D is a set, while the
LTL in particular). Finally, a logic of knowledge or interpretation I is a pair of mappings (P , R). Elements
epistemic logic (ex., Propositional Logic of Knowledge of the domain D are called states by computer scientists
(Fagin, Halpern, Moses & Vardi, 1995; Rescher, 2005)) and worlds by mathematicians and philosophers. The
is a polymodal variant of another basic modal logic S5 interpretation maps propositional variables to sets of
(Bull & Segerberg, 2001) to be interpreted over Kripke states P: Prp2D and relational variables to binary
models where all binary relations are equivalences. relations on states R: Rel2DD. We write I(p) and
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Modal Logics for Reasoning about Multiagent Systems
I(r) instead of P(p) and R(r) whenever it is implicit States correspond to game positions, i.e. integers
that p and r are propositional and relational variables in [1..109].
respectively. Propositional variable fail is interpreted by
Every model M = (D , I) can be viewed as a directed [100..109].
graph with nodes and edges labeled by propositional Relational variable move is interpreted by possible
and action variables respectively. Its nodes are states moves.
of D. A node sD is marked by a propositional vari-
able pPrp iff sI(p). A pair of nodes (s1,s2)DD is Formula fail & move(fail & [move]fail) is valid
an edge of the graph iff (s1,s2)I(r) for some relational in those states where the game is not lost, there exists
variable rRel; in this case the edge (s1,s2) is marked a move after which the game is not lost, and then all
by this relational variable r. Conversely, a graph with possible moves always lead to a loss in the game. Hence
nodes and edges labeled by propositional and relational this EPDL formula is valid in those states where Alice
variables respectively can be considered as a model. has a 1-round winning strategy against Bob.
For every model M = (D,I) the entailment (validity,
satisfiability) relation M between states and formulas
can be defined by induction on formula structure: COmBINING KNOWLEDGE, ACTIONS
AND TImE
for every state sM true and not sM false;
for any state s and propositional variable p, sM Logic of Knowledge
p iff sI(p);
for any state s and formula j, sM (j) iff it is Logics of knowledge are also known as epistemic
not the case sM j ; logics. One of the simplest epistemic logic is Propo-
for any state s and formulas j and , sitional Logic of Knowledge for n>0 agents (PLKn)
sM (j &) iff sMj and sM ; (Fagin, Halpern, Moses & Vardi, 1995). A special
sM (j ) iff sMj or sM ; terminology, notation and Kripke models are used
for any state s, relational variable r, and formula in this framework. A set of relational symbols Rel in
j, PLKn consists of natural numbers [1..n] representing
sM ([r]j) iff (s,s)I(r) and sM j for every names of agents. Notation for modalities is: if i [1..
state s ; n] and j is a formula, then (Ki j) and (Si j) are used
sM (rj) iff (s,s)I(r) and sM j for some state instead of ([i] j) and (i j). These formulas are read
s . as (an agent) i knows j and (an agent) i can sup-
pose j. For every agent i [1..n] in every model M
Semantics of the above kind is called possible = (D, I), interpretation I(i) is an indistinguishability
worlds semantics. relation, i.e. an equivalence relation2 between states
Let us explain EPDL pragmatics by the following that the agent i can not distinguish. Every model M,
puzzle example. where all agents are interpreted in this way, is denoted
as (D, ~1, ~n, I) with explicit I(1) = ~1, I(n) = ~n
Alice and Bob play the Number Game. Positions in instead of brief standard notation (D,I). An agent knows
the game are integers in [1..109]. An initial position some fact j in a state s of a model M, if the fact is
is a random number. Alice and Bob make alternating valid in every state s of this model that the agent can
moves: Alice, Bob, Alice, Bob, etc. Available moves not distinguish from s:
are same for both: if a current position is n[1..99]
then (n+1) and (n+10) are possible next positions. A sM (Ki j) iff sM j for every state s~i s.
player wins the game iff the opponent is the first to
enter [100..109]. Problem: Find all initial positions Similarly, an agent can suppose a fact j in a state
where Alice has a winning strategy. s of a model M, if the fact is valid in some state s of
this model that the agent can not distinguish from s:
Kripke model for the game is quite obvious:
1090
Modal Logics for Reasoning about Multiagent Systems
sM (Si j) iff sM j for some state s~i s. The standard branching-time temporal logic CTL
can be treated as Act-CTL with a single implicit ac- M
The above possible worlds semantics of knowledge tion symbol.
is due to pioneering research (Hintikka, 1962).
Combined Logic of Knowledge, Actions
Temporal Logic with Actions and Time
Another propositional polymodal logic is Computa- There are many combined polymodal logics for
tional Tree Logic with actions (Act-CTL). Act-CTL reasoning about multiagent systems. Maybe the most
is a variant of a basic propositional branching time advanced is Belief-Desire-Intention (BDI) logic
temporal logic Computational Tree Logic (CTL) (Wooldridge, 1996; Wooldridge, 2002). An agents
(Emerson, 1990; Clarke, Grumberg & Peled, 1999). beliefs correspond to information the agent has about
In Act-CTL the set of relational symbols consists of the world. (This information may be incomplete or
action symbols Act. Each action symbol can be inter- incorrect. An agents knowledge in BDI is just a true
preted by an instant action that is executable in one belief.) An agents desires correspond to the allocated
undividable moment of time. tasks. An agents intentions represent desires that it
Act-CTL notation for basic modalities is: if b Act has committed to achieving. Admissible actions are
and j is a formula, then , (AbX j) and (EbX j) are used actions of individual agents; they may be constructed
instead of ([b] j) and (b j). But syntax of Act-CTL from primitive actions by means of composition, non-
has also some other special constructs associated with deterministic choice, iteration, and parallel execution.
action symbols: if b Act and j and are formulas, But semantics of BDI and reasoning in BDI are quite
then (AbG j), (AbF j), (EbG j), (EbF j), Ab(j U ) and complicated for a short encyclopedia article.
Eb(j U ) are also formulas of Act-CTL. In formulas In contrast, let us discuss below a simple example
of Act-CTL prefix A is read as for every future, of a combined logic of knowledge, actions and time
E for some future, suffix X next state, namely Propositional Logic of Knowledge and
G always or globally, F sometimes or Branching Time for n>0 agents Act-CTL-Kn (Ga-
future, the infix U until, and a sub-index b ranina, Kalinina, & Shilov, 2004; Shilov, Garanina &
is read as in b-run(s). Choe, 2006; Shilov & Garanina, 2006). First we provide
We have already explained semantics of (AbX j) and a formal definition of Act-CTL-Kn, then discuss some
(EbX j) by referencing to ([b] j) and (b j). Constructs pragmatics, and then in the next section introduce
AbG , AbF , EbG , and EbF can be expressed in model checking as a reasoning mechanism.
terms of Ab(U) and Eb(U) , for example: Let [1..n] be a set of agents (n > 0), and Act be a
(EbFj) Eb(true U j). Thus let us define below se- finite alphabet of action symbols. Syntax of Act-CTL-
mantics of Ab(U) and Eb(U) only. Let M = Kn admits epistemic modalities Ki , and Si for every
(D, I) be a model. If b Act is an action symbol, then i[1..n], and branching-time constructs AbX, EbX, AbG,
a partial b-run is a sequence of states s0, sk,s(k+1), EbG, AbF, EbF, Ab(U), and Eb(U) for every
D (maybe infinite) such that (sk,s(k+1)) I(b) for every bAct. Semantics is defined in terms of entailment in
consecutive pair of states within this sequence. If b environments. An (epistemic) environment is a tuple
Act is an action symbol, then a b-run is an infinite partial E = (D, ~1, ~n , I) such that (D, ~1, ~n) is a model
b-run or finite b-run that can not be continued3. Then for PLKn , and (D, I) is a model for Act-CTL. Entail-
semantics of constructs Ab(U) and Eb(U) ment relation is defined by induction according to
can be defined as follows: the standard definition for propositional connectives
(see semantics of EPDL), and the above definitions of
sM Ab(j U ) iff for every b-run s0, sk, that epistemic modalities and branching time constructs.
starts in s (i.e. s0=s) there exists some n0 for We are mostly interested in trace-based perfect recall
which snM and skMj for every k[0..(n-1)]; synchronous environments generated from background
sM Eb(j U ) iff for some b-run s0, sk, that finite environments. Generated means that possible
starts in s (i.e. s0=s) there exists some n0 for worlds are runs of finite-state machine(s). There
which snM and skMj for every k[0..(n-1)]. are several opportunities how to define semantics of
1091
Modal Logics for Reasoning about Multiagent Systems
combined logics on runs. In particular, there are two b(L,R) for disjoint L, R [1..N+1] with |L| = |R|. The
extreme cases: Forgetful Asynchronous Systems only information available for the agent (i.e., which
(FAS) and Synchronous systems with Perfect Recall gives him/her an opportunity to distinguish states) is a
(PRS). Perfect recall means that every agent has a balancing result. The agent should learn fake_coin_num-
log-file with all his/her observations along a run, while ber from a sequence which may start from any initial
forgetful means that information of this kind is not state and then consists of M queries and corresponding
available. Synchronous means that every agent can results. Hence single agent logic Act-CTL-K1 seems to
distinguish runs of different lengths, while asynchro- be a very natural framework for expressing FCP(N,M)
nous means that some runs of different lengths may as follows: to validate or refute whether
be indistinguishable.
It is quite natural that in the FAS case combined s |= E ( EB X ...M times...EB X ( f [1.. N ]
logic Act-CTL-Kn can express as much as it can express
in the background finite system. In contrast, in the K 1 ( fake _ coin _ number = f ))...))
PRS case Act-CTL-Kn becomes much more expressive
than in the background finite environment. Importance for every initial state s, where E is a PRS environment
of combined logics in the framework of trace-based generated from a finite space [1..N]{l,h}{<, >, =,
semantics with synchronous perfect recall rely upon ini}, and B is a balancing query L,R[1..N+1]b(L,R).
their characteristic as logics of agents learning or
knowledge acquisition. We would like to argue this
characteristic by the following single-agent4 Fake Coin FUTURE TRENDS: mODEL CHECKING
Puzzle FCP(N,M). FOR COmBINED LOGICS
A set consists of (N+1) enumerated coins. The last coin The model checking problem for a combined logic
is a valid one. A single coin with a number in [1..N] is (Act-CTL-Kn in particular) and a class of epistemic
fake, but other coins with numbers in [1...(N+1)] are environments (ex., PRS or FAS environments) is to
valid. All valid coins have the same weight that differs validate or refute sE j , where E is a finitely-gener-
from the weight of the fake. Is it possible to identify the ated environment in the class, s is an initial state of
fake by balancing coins M times at most? the environment E, and j is a formula of the logic.
The above re-formulation of FCP(N,M) is a particular
In FCP(N,M) the agent (i.e. a person who have to example of a model checking problem for a formula
solve the puzzle) does not know neither a number of the of Act-CTL-Kn and some finitely-generated perfect
fake, nor whether it is lighter or heavier than the valid recall environment.
coins. Nevertheless, this number is in [1..N], and the Papers (Meyden & Shilov, 1999) and (Garanina,
fake coin is either lighter (l) or heavier (h). The agent Kalinina & Shilov, 2004) have demonstrated that if the
can make balancing queries and read balancing results number of agents n>1, then the model checking problem
after each query. Every balancing query is an action in perfect recall synchronous systems is very hard or
b(L,R) which consists in balancing of two disjoint sets of even undecidable. In particular, it has non-elementary5
coins: with numbers L[1..N+1] on the left pan, and upper and lower time bounds for Act-CTL-Kn. Papers
with numbers R[1..N+1] on the right pan, |L| = |R|. (Meyden & Shilov, 1999) and (Shilov, Garanina &
There are three possible balancing results: <, >, Choe, 2006) have suggested a tree-like data structures
and =, which means that the left pan is lighter, heavier to make feasible model checking of combinations
than or equal to the right pan, respectively. Of course, of temporal and action logics with propositional logic
there are initial states (marked by ini) which represent of knowledge PLKn. Alternatively, (van der Hoek &
a situation when no query has been made. Wooldridge, 2002; Lomuscio & Penczek, 2003) have
Let us summarize. The agent acts in the environment susggested either to simplify language of logics to
generated from a finite space [1..N]{l,h}{<, >, =, be combined, or to consider agents with bounded
ini}. His/her admissible actions are balancing query recall.
1092
Modal Logics for Reasoning about Multiagent Systems
1093
Modal Logics for Reasoning about Multiagent Systems
Logic of Actions: A polymodal logic that associate Multiagent System: A collection of communicat-
modalities like always and sometimes with action ing and collaborating agents, where every agent have
symbols that are to be interpreted in labeled transi- some knowledge, intensions, enabilities, and possible
tion systems by transitions. A so-called Elementary actions.
Propositional Dynamic Logic (EPDL) is sample logic
Perfect Recall Synchronous Environment: An
of actions.
environment for modeling a behavior of a perfect recall
Logic of Knowledge or Epistemic Logic: A poly- synchronous system.
modal logic that associate modalities like know and
Perfect Recall Synchronous System: A multiagent
suppose with enumerated agents or groups of agents.
system where every agent always records his/her ob-
Agents are to be interpreted in labeled transition systems
servation at all moments of time while system runs.
by equivalence indistinguishability relations. A so-
called Propositional Logic of Knowledge of n agents
(PLKn) is sample epistemic logic.
Logic of Time or Temporal Logic: A polymodal ENDNOTES
logic with a number of modalities that correspond to
next time, always, sometimes, and until to be
1
Due to pioneering papers of Saul Aaron Kripke
interpreted in labeled transition systems over discrete (born in 1940) on models for modal logics.
partial orders. For example, Linear Temporal Logic
2
A symmetric, reflexive, and transitive binary
(LTL) is interpreted over linear orders. relation on D.
3
That is for the last state s there is no state s such
Model Checking Problem: An algorithmic problem that (s,s)I(b).
to validate or refute a property (presented by a formula) 4
For multiagent example refer Muddy Children
in a state of a model (from a class of Kripke structures). Puzzle (Fagin, Halpern, Moses & Vardi, 1995).
For example, model checking problem for combined 5
I.e. it is not bounded by a tower of exponents with
logic of knowledge, actions and time in initial states of any fixed height
perfect recall finitely generated environments.
1094
1095
Cecilio Angulo
Technical University of Catalonia, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Modularity in Artificial Neural Networks
is produced when an already trained network losses lems, but the optimal solution of one of those problems
part of its knowledge when either, it is re-trained to depends on the optimal solution of some of the others
perform a different task, called temporal cross-talk (Watson, 2002). The resolution of such modular sys-
(Jacobs et al.,1991), or two or more different tasks at tems is more difficult than a typical separable modular
the same time, the effect being called spatial cross-talk system and it is usually treated as a monolithic one in
(Jacobs,1990). Modular systems allow reusing modules the literature.
in different activities, without re-implementation of the
function represented on each different task (De Jong Module
et al., 2004) (Garibay et al., 2004).
Most of the works that use modularity, use the defini-
Modularity tion of module given by (Fodor, 1983), which is very
similar to the concept of object in object-oriented pro-
From a computational point of view, modularity is gramming: a module is a domain specific processing
understood as the property that some complex compu- element, which is autonomous and cannot influence
tational tasks have to be divided into simpler subtasks. the internal working of other modules. A module can
Then, each of those simpler subtasks is performed by influence another only by its output, this is, the result of
a specialized computational system called a module, its computation. Modules do not know about a global
generating the solution of the complex task from problem to solve or global tasks to accomplish, and
the solution of the simpler subtask modules (Azam, are specific stimulus driven. The final response of a
2000). From a mathematical point of view, modular- modular system to the resolution of a global task, is
ity is based on the idea of a system subset of variables given by the integration of the responses of the different
which may be optimized independently of the other modules by a especial unit. The global architecture of
system variables (De Jong et al., 2004). In any case, the system defines how this integration is performed.
the use of modularity implies that a structure exists in The integration unit must decide how to combine the
the problem to be solved. outputs of the modules, to produce the final answer of
In modular systems, each of the system modules the system, and it is not allowed to feed information
operates primarily according to its own intrinsically back into the modules.
determined principles. Modules within the whole
system are tightly integrated but independent from
other modules following their own implementations. mODULAR NEURAL NETWORKS
They have either distinct or the same inputs, but they
generate their own response. When the interactions When modularity is applied for the design of a modular
between modules are weak and modules act indepen- neural network (MNN) based controller, three general
dently from each other, the modular system is called steps are commonly observed: task decomposition,
nearly decomposable (Simon, 1969). Other authors training and multi-module decision-making (Auda and
have identified this type of modular systems as sepa- Kamel, 1999). Task decomposition is about dividing
rable problems (Watson et al., 1998). This is by far the required controller into several sub-controllers, and
one of the most studied types of modularity, and it assigning each sub-controller to one neural module.
can be found everywhere from business to biological Modules should be trained either, in parallel, or in dif-
systems. In nearly decomposable modular systems, the ferent processes following a sequence indicated by the
final optimal solution of a global task is obtained as modular design. Finally, when the modules have been
a combination of the optimal solutions of the simpler prepared, a multi-module decision making strategy is
ones (the modules). implemented which indicates how all those modules
However, the existence of decomposition for a prob- should interact in order to generate the global controller
lem doesnt imply that sub-problems are completely response. This modularization approach can be seen
independent from each other. In fact, a system may be as at the level of the task.
modular and still having interdependencies between The previous general steps for modularity only
modules. It is defined a decomposable problem as a apply for a modularization of nearly decomposable or
problem that can be decomposed on other sub-prob- separable problems. Decomposable problems, those
1096
Modularity in Artificial Neural Networks
where strong interdependencies between modules types of modularity will be compared against mono-
exist, are not considered under that decomposition lithic approaches. M
mechanism, and they are treated as monolithic ones.
The article introduces the differentiation between Implementing Modularity
two modular levels, the current modularization level,
which concentrates on task sub-division, and a new Strategic modularity can be implemented by any of the
modularization performed at the level of the devices modular approaches that already exist in the literature.
or elements. Those approaches are called strategic and See (Auda and Kamel, 1999) for a complete descrip-
tactical, respectively. tion. Any of the modularization methods described
there is strategic, although it was not given that name,
Strategic and Tactical Modularity and they can, in general, be integrated with tactical
modularity.
Borrowing the concepts from game theory, strategy The term strategic is used for those modular ap-
deals with what has to be done in a given situation in proaches in order to differentiate them from the new
order to perform a task by dividing the global target proposed modularity.
solution into all the sub-targets required to accomplish Tactical modularity defines modularity at the level
the global one. Tactics, on the other hand, treats about of the elements involved in the generation of a sub-
how plans are going to be implemented, this means, goal. By elements, it is understand the inputs required
how to use the resources available at that moment to to generate the sub-goal and the outputs that define
accomplish each of those sub-targets. the sub-goal solution. Each of those elements conform
It is defined strategic modularity in neural control- a tactical module, implemented by a simple neural
lers as the modular approach that identifies which network. That is, tactical modularity is implemented
sub-goals are required for an agent in order to solve by designing a completely distributed controller com-
a global problem. Each sub-goal identified is imple- posed of small processing modules around each of the
mented by a monolithic neural net. In contrast, tactical meaningful elements of the problem.
modularity in neural controllers is defined as the one The schematics for a tactical module is shown in
that identifies which inputs and outputs are necessary Figure 1. Tactical modules are connected to its as-
for the implementation of a given goal, and it designs sociated element, controlling it, and processing the
a single module for each input and output. In tactical information coming in, for input elements, or going out,
modularity, modularization is performed at the level for output elements. This kind of connectivity means
of the elements (any meaningful input or output of that the processing element is the one that decides
the neural controller) that are actually involved in the which commands must be sent to the output element,
accomplishment of the task. or how a value received from an input element must
To our extent, all the research based on neural be interpreted. It is said that the processing element is
modularity and divide-and-conquer principles, focus responsible for its associated element.
their division at the strategic level, that is, how to In order to generate a complete answer for the
divide the global problem into its sub-goals. Then, sub-goal, all the tactical modules are connected each
they implement each of those sub-goals by means of other, output of each module being sent back to all the
a single neural controller, final goal being generated others. By introducing this connectivity, each module
by combining the outputs of those sub-goals in some is aware about what the others are doing, allowing that
sense. The current paper proposes, first, the definition the different modules coordinate for the generation of
of two different levels of modularity, and second, the a common answer, and avoiding a central coordinator.
use of tactical modularity as a new level of modulariza- The resulting architecture shows a completely distrib-
tion that allocates space for decomposable modularity. uted MNN, where neural modules are independent but
It is expected that tactical modularization will be able implement strong interactions with the other modules.
in the generation of complex neural controllers when Figure 2 shows an example of connectivity in the
many inputs and outputs must be taken into account. generation of a tactical modular neural controller for
It will be confirmed below, where the use of the two a simple system composed of two input elements and
two outputs.
1097
Modularity in Artificial Neural Networks
Figure 1. Schematics of a tactical module for one input element (left) and for one output element (right)
Figure 2. Connectivity of a tactical modular controller Training tactical modules is difficult due to the
with two input elements and two output elements strong relationships between the different modules.
Training methods used in strategic modules based
on error propagation are not suitable, so a genetic
algorithm is used to train the nets, because it allows
to find the networks weights without defining an error
measurement, just by specifying a cost function (Di
Ferdinando et al., 2000).
1098
Modularity in Artificial Neural Networks
When combining both levels in one neural controller, strategic, tactical and a combination of both. The results
the strategic modularization should be first performed, showed that the combination of both levels obtained M
for identifying the different sub-goals required for the the better results (see Figure 3).
implementation. Next, a tactical modularization should On additional experiments, tactical modularity was
be completed, implementing each of those sub-goals implemented for an Aibo robot. In this case, 31 tactical
by a group of tactical modules. The number of tacti- modules were required to generate the controller. The
cal modules for each strategic module will depend on controller was generated to solve different tasks like
the elements involved in the resolution of the specific stand up, standing and pushing the ground (Tllez et
sub-goal. al., 2005). The controller was also able to generate one
of the first MNNs controller able to make Aibo walk
Application Examples (Tllez et al., 2006).
Figure 3. This figure represent the maximal performance value obtained by different types of modular approaches.
Approach (a) is a monolithic approach, (b) and (c) are two different types of strategic approaches, (d) is tactical
approach, and (f) is a reduced version of the tactical approach.
1099
Modularity in Artificial Neural Networks
provided with a quite complex robot, it is necessary to regularity and hierarchy in open-ended evolutionary
see if the system can scale to systems with hundreds computation.
of elements.
Di Ferdinando, A., & Calabretta, R., & Parisi, D.
Additional applications include its use in more
(2000), Evolving modular architectures for neural
classical domains like pattern recognition, speech
networks, Proceedings of the sixth Neural Computa-
recognition.
tion and Psychology Workshop: Evolution, Learning
and Development.
CONCLUSION Garibay 0., & Garibay I., & Wu A.S. (2004), No Free
Lunch for Module Encapsulation, Proceedings of
The level of modularity in neural controllers can be the Modularity, Regularity and Hierarchy in Open-
highly increased if tactical modularity is taken into ended Evolutionary Computation Workshop - GECCO
account. This type of modularity complements typical 2004.
modularization approaches based in strategic modu-
Jacobs, R.A. (1990), Task decomposition through
larizations, by dividing strategic modules into their
competition in a modular connectionist architecture,
minimal components, and assigning one single neural
PhD thesis, University of Massachusets.
module to each of them. This modularization allows
the implementation of decomposable problems within Jacobs, R.A., & Jordan, M.I., & Barto, A.G. (1991),
a modularized structure. Both types of modularizations Task decomposition through competition in a modular
can be combined in order to obtain a highly modular connectionist architecture: the what and where vision
neural controller, which shows better results in complex tasks, Cognitive Science, 15, 219-250.
robot control.
Simon, H.A., (1969) The sciences of the artificial, The
MIT Press.
REFERENCES Tllez, R.A., & Angulo, C., & Pardo, D. (2005), Highly
modular architecture for the general control of au-
Auda, G., & Kamel, M. (1999), Modular neural tonomous robots, Proceedings of the 8th International
networks: a survey, International Journal of Neural Work-Conference on Artificial Neural Networks.
Systems, 9(2), 129-151.
Tllez, R.A., & Angulo, C., & Pardo, D. (2006), Evolv-
Azam, F. (2000), Biologically inspired modular neu- ing the walking behaviour of a 12 DOF quadruped using
ral networks, PhD Thesis at the Virginia Polytechnic a distributed neural architecture, Proceedings of the
Institute and State University. 2nd International Workshop on Biologically Inspired
Approaches to Advanced Information Technology.
Boers, E., & Kuiper, H. (1992), Biological metaphors
and the design of modular artificial neural networks, Tllez, R., & Angulo, C. (2006) Tactical modularity for
Master Thesis, Leiden University. evolutionary animats, Proceedings of the International
Catalan Conference on Artificial Intelligence.
Caelli, G. L., & Wen, W. (1999), Modularity in neural
computing, Proceedings of the IEEE 87(9), 1497- Tllez, R., & Angulo, C. (2007), Acquisition of mean-
1518. ing through distributed robot control, Proceedings
of the ICRA Workshop on Semantic information in
Fodor, J. (1983), The modularity of mind, The MIT
robotics.
Press.
Watson, R.A., & Hornby, G.S., & Pollack, J. (1998),
Callebaut, W. (2005), The ubiquity of modularity,
Modeling Building-Block Interdependency, Late
Modularity. Understanding the Development and Evo-
Breaking Papers at the Genetic Programming 1998
lution of Natural Complex Systems, The MIT Press.
Conference.
De Jong, E.D., & Thierens, D., & Watson, R.A. (2004),
Watson, R. (2002), Modular Interdependency in Com-
Defining Modularity, Hierarchy, and Repetition, Pro-
plex Dynamical Systems, Proceedings of the 8th Inter-
ceedings of the GECCO Workshop on Modularity,
1100
Modularity in Artificial Neural Networks
national Conference on the Simulation and Synthesis Evolutionary Robotics: A technique for the creation
of Living Systems. of neural controllers for autonomous robots, based on M
genetic algorithms.
Genetic Algorithm: An algorithm that simulates the
KEy TERmS natural evolutionary process, applied the generation of
the solution of a problem. It is usually used to obtain
Cost Function: A mathematical function used to the value of parameters difficult to calculate by other
determine how good or how bad has a neural network means (like for example the neural network weights).
performed during the training phase. The cost function It requires the definition of a cost function.
usually indicates what is expected from the neural
Modularization: A process to determine the sim-
controller.
plest meaningful parts that compose a task. There is
Element: Any variable of the program that contains no formal process to implement modularization, and
a value that is used to feed into the neural network con- in practice, it is very arbitrary.
troller (input element) or to contain the answers of the
Neural Controller: It is a computer program, based
neural network (output element). The input elements
on artificial neural networks. The neural controller is
are usually the variables that contain the information
a neural net or group of them which act upon a series
from which the output will be generated. The output
of meaningful inputs, and generates one or several
elements contain the output of the neural controller.
outputs.
1101
1102
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Morphological Filtering Principles
I (I)
1103
Morphological Filtering Principles
Dilations and erosions by structuring element per- Using the lattice framework, the expression for func-
form, respectively, sup and inf operations over an input tions is formally identical to the expression for sets:
image that depend on a structuring element B. These
dilations and erosions are symbolized, respectively, B(I) = supbBIb
by B and B, and they originate from the Minkowski
set addition and subtraction. Let us first discuss the Note: the sup operator is that of the function lattice.
set framework. The function expression can be written in another
In the set framework, if A denotes an input set, the way that gives a more operational expression to compute
B(A) dilation computes the locus of points where the the value of the result of B(I) at each pixel x of I:
B structuring element translated to has a non-empty
intersection with (i.e., touches) input set A: [B(I)](x) = maxbB{I(x + b)}
B(A) = {x | Bx A }
1104
Morphological Filtering Principles
The sup operation has been replaced by the max The expression for sets formulated by means of the
operation that computes the maximum of a set of inf operation is: M
integers. Note that [B(I)](x) is the intensity value
of pixel x in that image. The + symbol denotes the B(A)= Ab = inf bB Ab
vector addition. bB
9 10 11 12 12 11 13 13 12 12 12 10 9 9 9 10
I
Figure 2. Erosion B(A)
10 11 12 12 12 13 13 13 13 12 12 12 10 9 10 10
B(A) D B (I )
9 9 10 11 11 11 11 12 12 12 10 9 9 9 9 9
E B (I )
1105
Morphological Filtering Principles
J B = E B D B
BASIC mORpHOlOGICAl fIlTeRS
i.e., B is the sequential composition of a dilation
Openings and Closings B followed by an erosion E B , where B denotes B
transposed.
In morphological processing, a filter is an increasing
and idempotent transformation. Alternated filters
The two most fundamental filters in morphological
processing arise when there is an order between the input The sequential compositions of an opening and a clos-
and the filter output. They are the so-called openings ing are called alternated sequential compositions.
and closings, symbolized, respectively, by and . A morphological alternated filter is a sequential
composition of an opening and a closing, i.e.,
An opening is an anti-extensive morphological
filter. and ,
A closing is an extensive morphological fil-
ter. are alternated filters.
An important fact is that there is generally no
The names algebraic openings and algebraic ordering between the input and output of alternated
closings are also used in the literature to refer to these filters, i.e.,
most general types of openings and closings.
The computation of openings and closings that use I (I) I
structuring elements as the shape probes to process
input image shapes is discussed next. They are defined I (I) I
in terms of dilations and erosions by structuring ele-
ment. In addition, there is generally no ordering between
For an input set A, an opening by structuring ele- and .
ment B, symbolized by B, is the set of points x that Alternated filters are quite useful in image processing
belong to a translated structuring element that fits a set and analysis because they combine in some way the
A, i.e., that is included in A. Let us establish how B effects of both openings and closings in one filter.
is computed in the next definition, which applies both
to sets and images. parallel Combination properties
An opening by structuring element B, symbolized
by B is defined by The class of openings (or, respectively, of closings) is
closed under the sup (respectively, inf) operation. In
G B = D B E B
other words:
1106
Morphological Filtering Principles
Figure 3. Granulometry
1107
Morphological Filtering Principles
i j i j. ASFi = ii...jj...11
Quite often, both a granulometry and an anti-granu- where i j 1, and where i and i belong, respectively,
lometry are computed. to a granulometry and an anti-granulometry.
Normally, quantitative increasing measures are Alternating sequential filters satisfy the following
computed at each output. These measure values build absorption property: if i j, then
a curve that can be used to characterize an input image
if the measure criterion is appropriate. ASFi ASFj = ASFi
1108
Morphological Filtering Principles
1109
Morphological Filtering Principles
Haralick, R., & Shapiro, L. (1992). Computer and robot Salembier, P., & Serra, J. (1995). Flat zones filtering,
vision. Volume I. Reading: Addison-Wesley Publishing connected operators, and filters by reconstruction. IEEE
Company. Transactions on Image Processing, 4(8), 11531160.
Heijmans, H. (1994). Morphological image operators. Schmitt, M., & Mattioli, J. (1993). Morphologie math-
Boston: Academic Press. matique. Paris: Masson.
Heijmans, H. (1999). Connected morphological opera- Serra, J. (1982). Mathematical morphology. Volume I.
tors for binary images. Computer Vision and Image London: Academic Press.
Understanding, 73, 99120.
Serra, J. (Ed.). (1988). Mathematical morphology.
Iesta, J. M., & Crespo, J. (2003). Principios bsicos Volume II: Theoretical Advances. London: Academic
del anlisis de imgenes mdicas. In M. Belmonte, O. Press.
Coltell, V. Maojo, J. Mateu, & F. Sanz (Eds.), Manual
Serra, J., & Salembier, P. (1993, July). Connected
de informtica mdica (pp. 299333). Barcelona:
operators and pyramids. In Proceedings of SPIE, Non-
Editorial Menarini - Caduceo Multimedia.
linear algebra and morphological image processing,
Jain, A. (1989). Fundamentals of digital image process- San Diego (Vol. 2030, pp. 6576).
ing (Prentice hall information and system sciences
Soille, P. (2003). Morphological image analysis (2nd.
series; Series Editor: T. Kailath). Englewood Cliffs:
ed.). Berlin-Heidelberg-New York: Springer-Verlag.
Prentice Hall.
Vincent, L. (1993, April). Morphological grayscale
Maragos, P., & Schafer, R. W. (1990, April). Morpho-
reconstruction in image analysis: Applications and
logical systems for multidimensional signal processing.
efficient algorithms. IEEE Transactions on Image
Proc. of the IEEE, 78(4), 690710.
Processing, 2, 176201.
Meyer, F. (1998). From connected operators to levelings.
In H. J. A. M. Heijmans & J. B. T. M. Roerdink (Eds.),
Mathematical morphology and its applications to im-
age and signal processing (pp. 191198). Dordrecht: Key TeRmS
Kluwer Academic Publishers.
Duality: The duality principle states that, for each
Meyer, F. (2004, January-March). Levelings, image morphological operator, there exists a dual one. In
simplification filters for segmentation. Journal of sets, the duality is established with respect to the set
Mathematical Imaging and Vision, 20(1-2), 5972. complementation operation (see further details in the
text).
Oppenheim, A., Schafer, R. W., & Buck, J. (1999).
Discrete-time signal processing (2nd. ed.). Englewood Extensivitity: A transformation is extensive when
Cliffs: Prentice-Hall. its output is larger than or equal to the input. Anti-ex-
tensivity is the opposite concept: a transformation is
Pratt, W. (1991). Digital image processing (2nd. ed.).
anti-extensive when its output is smaller than or equal
New York: John Wiley and Sons.
to the input.
Ronse, C. (2005). Guest editorial. Journal of Mathemati-
Idempotence: A transformation is said to be
cal Imaging and Vision, 22(2 - 3), 103105.
idempotent if, when sequentially applied twice, it
Rosenfeld, A., & Kak, A. (1982). Digital picture does not change the output of the first application,
processing - Volumes 1 and 2. Orlando: Academic i.e., = .
Press.
Image Transformation: An operation that pro-
Russ, J. C. (2002). The image processing handbook cesses an input image and produces an output image.
(4th. ed.). Boca Raton: CRC Press.
1110
Morphological Filtering Principles
1111
1112
Domingo Lpez-Rodrguez
University of Mlaga, Spain
Juan M. Ortiz-de-Lazcano-Lobato
University of Mlaga, Spain
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
MREM, Discrete Recurrent Network for Optimization
However, the biggest problem of this model (the mUlTIVAlUeD DISCReTe ReCURReNT
discrete as well as the continuous one) is the possibil- mODel. ApplICATION TO M
ity to converge to a non feasible state, or to a local COmBINATORIAl OpTImIzATION
(not global) minimum. Wilson and Pawley (1988) pROBlemS
demonstrated, through massive simulations, that, for
the travelling salesman problem of 10 cities, only 8% A new generalization of Hopfields model arises in the
of the solutions were feasible, and most not good. works (MridaCasermeiro, 2000) (MridaCasermeiro
Moreover, this proportion got worse when problem et al., 2001), where the MREM (Multivalued REcurrent
size was increased. Model) model is presented.
After this, many works were focused on improving
Hopfield's network:
The Neural mRem model
By modifying the energy function (Xu & Tsai,
This model presents two essential features that make it
1991).
very versatile and that increase its applicability:
By adjusting the numerous parameters present in
the network, as in (Lai & Coghill, 1988).
The output of each neuron, si, is a value of the
By using stochastic techniques in the dynamics
set = {m1 , m2 , , mL }, which is not necessarily
of the network (Kirkpatrick et al., 1983) (Aarts
numeric.
& Korst, 1988).
The concept of similarity function f between
neuron outputs is introduced. f(x,y) represents
Particularly, researchers tried to improve the effi-