Vous êtes sur la page 1sur 105

Conference Title

The Fourth International Conference on Artificial

Intelligence and Pattern Recognition (AIPR2017)

Conference Dates
September 18-20, 2017

Conference Venue
Lodz University of Technology, Lodz, Poland

978-1-941968-43-7 2017 SDIWC

Published by
The Society of Digital Information and Wireless
Communications (SDIWC)
Wilmington, New Castle, DE 19801, USA
Table of Contents

Artificial Intelligence in Modeling and Simulation

Mapping of Relational Schema to Ontology Model. 1
Combined Neural Network Model for Real Estate Market Range Value Estimation. 11
Multi-Sensor Fusion Method for Localization System: Assistance to Displacement of Persons with
Disabilities.... 17

Case Studies
A Neural Network Approach for Attribute Significance Estimation.... 21

Cognitive Systems and Applications

A 3-Dimensional Object Recognition Method Using SHOT and Relationship of
Distances and Angles in Feature Points...... 26
Object Detection Method Using Invariant Feature Based on Local Hue Histogram in
Divided Area of an Object Area.......... 33
Handwriting Text/Non-Text Classification on Mobile Device... 42

Computer Vision
Multi Feature Region Descriptor Based Active Contour Model for Person Tracking. 50

Data Mining
Approaches for Optimization of WEB Pages Loading via Analysis of the Speed of
Requests to the Database.............. 58

Examining Stock Price Movements on Prague Stock Exchange Using Text Classification.. 64

Decision Support Systems

Continuous GA-based Optimization of Order-Up-To Inventory Policy in Logistic Networks for
Achieving High Service Rate ........... 70

Fusing Geometric and Appearance-based Features for Glaucoma Diagnosis... 76

Study on Route Setting and Movement Based on 3D Map Generation by Robot
in Hydroponics Managing System....... 86
Construction of Audio Corpus of Non-Native English Dialects -Arabs speakers-.. 92

Classification and Data Analysis for Modeling Selected Colon Diseases. 98

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Mapping of Relational Schema to Ontology Model

Aniagu Ugochukwu Christian
Department of Information Technology
University of Miskolc
3515 Miskolc-Egyetemvaros, Hungary

Relationships between entities, such as students'

Abstract- The aim of this work is to transform Relational enrolment in courses, faculty teaching courses, and the use of
Database Management System (RDBMS) metadata to rooms for courses [9].
standard Ontology description. This work is also aimed at
development and manipulation of Oracle RDBMS metadata
using Structured Query Language (SQL) as well as Ontology
Data modeling is the ability to document complex system
and its uses as a web tool.
processes by the help of techniques and tools to showcase or
The simple analogy of this task is as follows: Survey on clarify system designs to simpler data flows and
relational, and Ontology modelling. Investigation of the representations [10]. Relationships using flow charts and
mapping of database schema into Ontology model. diagrams can be illustrated by data model. A proper
Development of conversion framework in Java for the documented model aids easy error identification.
conversion. Implementation of the conversion program.
Testing of the program with example schemas.
It is possible to find in the literature several definitions of
KEYWORDS Ontology. One of the most cited is the one proposed by
Gruber T. R,which defines Ontology as a formal, explicit
specification of a shared conceptualization [3]. Fensel D.
Ontology, Java Java Database Connectivity (JDBC),
analyzes this definition identifying four main concepts
Relational Database Management System (RDBMS), Data
involved: an abstract model of a phenomenon termed
Modeling, Document Object Model (DOM), Extensible
"conceptualization", a precise mathematical description hints
Markup Language (XML).
the word "formal", the precision of concepts and their
relationships clearly defined are expressed by the term
I. INTRODUCTION "explicit" and the existence of an agreement between
The rate at which Information Technology with the Ontology users is hinted by the term "shared" [1]. The
impact of Artificial Intelligence (AI) is revolutionizing our definition proposed by Gruber is general, however Ontology
society demands more automated approaches of representing can be defined in specific contexts. For example, taking the
knowledges. Intelligent systems have improved paradigm of agents into account, [7] establish that Ontology
geometrically which can be seen on Natural Language is a formal description of the concepts and relations which
Processing (NLP), for example, Google translate. can exist in a community of agents [5].

For Ontology to be used within an application the

Highly sophisticated multi-national companies and Ontology must be specified, that is delivered using some
schools store and retrieve data by the use of databases with a concrete representation. There are many languages that can
complex metadata structure. And Ontology in the other hand; be used for the representation of conceptual models, with
can be used in classification and comprehension of entities varying characteristics in terms of their expressiveness, ease
and their respective relationships. This work is about of use and computational complexity.
establishing a mutual relationship between RDMS and
Ontology is created and manipulated using designed
applications called Ontology editors. They express Ontology
A database is a collection of data, typically describing the
in one of the many languages, some of these languages
activities of one or more related organizations. For example,
interoperates, that is, provide export to other Ontology
a university database might contain information about the
languages.Many Ontology editors could be found today e.g.;
Apollo, OntoStudio, Protg, Swoop and TopBraid
Entities such as students, faculty, courses, and Composer Free Edition. This piece of work will be on
classrooms. exploring Protg as an Ontology editor.

ISBN: 978-1-941968-43-72017 SDIWC1 1

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

This paper is a means in which a complex data structure database system and programming language. According to
was solved to a simple comprehendible data flow and the process flow mentioned in the introduction which
representations. The sample structure used is the clumsy involves creating a RDBMS using Persons-Students school
school structure which is divided into two super classes of structure and relationship as a sample using SQL, creating a
school units and persons. The school unit comprises of conversion Java program using JDBC API to map and
organization unit class, educational unit class and resources transform the relational database metadata to standard
class while the persons comprises of the students class, Ontology description through the help of DOM/XML and
academic staff class and non-academic staff class. These importing the created Ontology description to an Ontology
classes also have sub-classes and schemas with relationships. editor (Protg) to form a standard Ontology structure.
However, with the time frame of research and other
limitations, a simpler structure of persons and modules super
classes were implemented. Relational Model Semantic Model Database

This structure was modeled using relational model JDBC

OWL (Protg) DOM
approach, to capture the entities, relationships and attributes.
Then database table and their relationships are created to
Fig. 1. Project work flow
implement the model. A Java database connection code
(JDBC) was used to connect to the created database to
extract and map the needed structure which will result to an A. Relational Model:
Ontology structure using protg as the Ontology editor with Using a sample of the relationship between the persons
the help of Document Object Model (DOM). and students relationships in the university, a relational
model (Entity relational model) exposed the hierarchy of the
persons with subclasses of professors and students and other
The methodology portrays the workflow used from the classes of module.
creation of the RDMS metadata to the Java application map
and transformation of the needed data to standard Ontology
description. There is also the detailed report and mapping As shown in the relational diagram below(Figure 2), the
examples of the initial data and the transformed versions. In three major part of this model include: Entities, Relationship
the body of this work, there are several screenshots, sample and Attributes as noted and described in the introductory part
codes and diagrammatically designed patterns to aid of this research work.
comprehension of the work.
The entities are the Persons, Professors, Students and
Modules. The relations are shown with the teaches and
studies key words. A professor teaches a module and a
II. RELATED WORK student studies a module. Recalling from the background
There are research projects similar to this work. For knowledge that there are various relationship types, it is
example, an article titled Database-to-Ontology Mapping worth noting that in the sample, a professor can teach many
Generation for Semantic Interoperability by Raji Ghawi and modules (1: N) and many students can take a module (N: 1).
Nadine Cullot. However, this work used a self-created
JDBC application with integrated SQL query to extract
required data from Oracle database and transform it to The last part of the relational model to note is the
Ontology Web Language using protg. attributes. In this sample there are shared attributes and
personal attributes. For example, the person entity has name
and age attributes which are also shared by the professor and
III. METHODOLOGY students entities respectively. Because of some constraints,
the sample could not involve complicated attributes to
The objective of this work is to transform Relational showcase multi-valued and derived attributes.
Database Management System (RDBMS) metadata to
standard Ontology description. As shown infigure I below,
the task involves background knowledge of the basic tools of

ISBN: 978-1-941968-43-72017 SDIWC2 2

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Fig. 2. Relational Representation

B. SemanticModel
Semantic model shows the meaning of its instances from
the ER model. It is a conceptual data model that includes
expressing information that enables parties to the
information exchange to interpret meaning (semantics) from
the instances, without the need to know the meta-model [12].


Students Professors
Nep_no P_ID Grade Staff_ID P_ID Dept
Int Int Varchar2(20) Int Int Varchar2(20)

P_ID Name Age
Int Varchar2(20) Int

M_Code Nep_no Staff_ID M_Title M_Unit Description

Int Int Int Varchar2(20) Int Varchar2(20)

ISBN: 978-1-941968-43-72017 SDIWC3 3

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

The ER model is represented as four semantic tables data needed to retrieve the users query are called from the
which will be implemented in an Oracle database using SQL sources [8].
(Structured Query Language). It is easily identifiable that the
entities are represented as table names, the attributes as Output 1 below shows the sample mapping result
column names and the relationship between the tables between the database and Ontology. This output shows the
showed using foreign key constraints. For example, the metadata tables converted to Classes, ObjectPropertyDomain
modules table has columns for students (Nep_no) and and ObjectPropertyRange whuich are various OWL
professors (Staff_ID) to show the students that study a parameters.
particular module as well as the professors that handle the
modules. <Declaration>
C. JDBC: IRI="file:/C:/JDeveloper/mywork/Test1/Cl
Java Database Connectivity (JDBC) is an Application </Declaration>
Programming Interface (API) for the programming language <ObjectPropertyDomain>
Java, which defines how a client may access a database to <ObjectProperty
create, insert into, update and query tables [11]. The JDBC IRI="file:/C:/JDeveloper/mywork/Test1/Cl
establish a connection to the database; execute query and ient/Out1.xml#P_ID"/>
covert to Java variables with some methods to aid the <Class
conversion [6]. IRI="file:/C:/JDeveloper/mywork/Test1/Cl
The result of the query is accessed by the Document
Object Model (DOM). The DOM defines a standard for </ObjectPropertyDomain>
accessing Extensible Markup Language (XML) documents. <ObjectPropertyRange>
The W3C DOM is a platform and language neutral <ObjectProperty
interface that allows programs and scripts to dynamically IRI="file:/C:/JDeveloper/mywork/Test1/Cl
access and update the content, structure, and style of a ient/Out1.xml#NEP_NO"/>
document. The extracted content in an XML format is
imported to an Ontology Web Tool (Protg).
Java Application JDBC API </ObjectPropertyRange>
JDBC Driver IRI="file:/C:/JDeveloper/mywork/Test1/Cl
JDBC Driver
Manager ient/Out1.xml#GRADE"/>
Oracle, SQL Server, ODBC Data Source ient/Out1.xml#STUDENTS"/>
Fig. 3. Architectural Diagram <DataProperty
D. Mapping DatabasetoOntology:
The goal is to create mapping between an Ontology and a ient/Out1.xml#STUDENTS"/>
legacy database. Various levels of overlap between the </DataPropertyDomain>
database domain and the Ontologys are seen. The mapping <DataPropertyRange>
is done using two steps, namely: <DataProperty
Mapping definition: This is the transformation of data IRI="file:/C:/JDeveloper/mywork/Test1/Cl
schema into Ontology structure. ient/Out1.xml#GRADE"/>
Data Migration: The migration of database instances into abbreviatedIRI="xsd:VARCHAR2"/>
ontological instances. In this project, the process is query
driven. That is the database instances are transformed as a
result of a response from a given query. This means that only
Output 1. Demonstration of the relational data mapped to Ontology

ISBN: 978-1-941968-43-72017 SDIWC4 4

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Fig. 4. Metadata being mapped as data properties

Fig. 5. Screenshot of the development environment

ISBN: 978-1-941968-43-72017 SDIWC5 5

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

The design steps illustrates diagrammatically the process The main method instantiates prog to independently
flow, execution and conversion of variables which manipulate the get_class_name(), get_data_prog(), and
eventually becomes OWL with the help of DOM. get_object_prog() and createOntology methods respectively.

Oracle public static void main(String args[])

throws Exception {
Test5 prog = new Test5();

Meta data tables


RowsSelect SQL
Java program Java }
Each of these mentioned methods above have separate
Execute query task and query to execute as well as independent connections
and convert to to the database. Below is the sample:
Java variables

public void get_class_names() {

DOM Connection connection = null;

Statement stmt = null;

ResultSet rs = null;
Ontology Web Language Protg
String cmd = "";

Class_desc obj;

Fig. 6. Design Steps try {

connection =
Example code: stmt =
This part explains the important part of the query used connection.createStatement();
and their specific functions. cmd = "select table_name
public class Test5 { from user_tables";
rs= stmt.executeQuery(cmd);
static Connection connection = null;
static DatabaseMetaData metadata = public void get_data_prop() {
ArrayList <Class_desc> Classes = new Connection connection =
ArrayList <Class_desc> (); null;
ArrayList <Data_prop_desc> DataProps Statement stmt = null;
= new ArrayList <Data_prop_desc> (); ResultSet rs = null;
ArrayList <Object_prop_desc> String cmd = "";
ObjectProps = new ArrayList
<Object_prop_desc> (); Data_prop_desc data;

The code has a public class that inhabits the entire code. try {
The class consists of many methods and also initialized array connection =
lists and database connection. There is also a main method testClass2.getConnection();
used by the JVM to start execution.

ISBN: 978-1-941968-43-72017 SDIWC6 6

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

stmt = for (int i = 0;

connection.createStatement(); i<ObjectProps.size(); i++) {
cmd = "select * from System.out.println("Data
user_tab_columns"; propertis are:" +
rs= stmt.executeQuery(cmd); this.ObjectProps.get(i).name + " : " +
public void get_object_prop() { this.ObjectProps.get(i).domain +
this.ObjectProps.get(i).range );
Connection connection = }
Statement stmt = null; Note that the needed metadata are divided into three:
Statement stmt2 = null; classes, data property and object property.
ResultSet rs = null;
ResultSet rs2 = null; DocumentBuilderFactory dbf =
String cmd = ""; DocumentBuilderFactory.newInstance();
DocumentBuilder builder =
Object_prop_desc object; dbf.newDocumentBuilder();
Document doc =
try {
String c1; Element rootelement =
String c2; doc.createElement("Ontology");
c1 = null;
c2 = null; doc.appendChild(rootelement);
connection =
stmt = TransformerFactory factory =
connection.createStatement(); TransformerFactory.newInstance();
stmt2 = Transformer transformer =
connection.createStatement(); factory.newTransformer();
cmd = "select * from Result result = new
user_constraints where constraint_type = StreamResult(new File("Out1.xml"));
'R'"; Source source = new
rs= stmt.executeQuery(cmd); DOMSource(doc);

These sample codes above execute the SQL codes and transformer.transform(source, result);
get the required data from the database apart from the
createOntology method which its job is to get record from The document builder aids the formation of these data in
the these methods above to save and get them ready for Ontology format and get it transformed with the help of these
transformation to OWL format. The sample createOntology transformation code using DOM source. The result saved in
method is shown below: a directory is imported using protg.

public void createOntology() { Example and Testing:

int db = 0; RDMS is the basis for SQL and for modern systems like
for (int i = 0; Oracle. The data in RDBMS is stored in database objects
i<Classes.size(); i++) { called tables: Persons, Professors, Students, and Modules.
System.out.println("Classes These tables are broken up into smaller entities called fields.
data:" + this.Classes.get(i).c_name + " The fields in the Students table consist of Nep_No, P_ID,
: " + this.Classes.get(i).p_name ); and Grade. Creating a basic table involves naming the table
} and designing its columns and each columns data type. The
for (int i = 0; SQL CREATE TABLE statement is used to create a new
i<DataProps.size(); i++) { table. For example, this syntax was used in this project:
propertis are:" +
this.DataProps.get(i).name + " : " + CREATE TABLE Students
this.DataProps.get(i).domain + (
this.DataProps.get(i).range ); Nep_No int NOT NULL,

ISBN: 978-1-941968-43-72017 SDIWC7 7

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Grade varchar2(10) NOT NULL, </Declaration>

P_ID int, <Declaration>
PRIMARY KEY (Nep_No), <DataProperty
FOREIGN KEY (P_ID) REFERENCES IRI="file:/C:/JDeveloper/mywork/Test1/Cl
Persons(P_Id) ient/Out1.xml#M_CODE"/>
); </Declaration>
To extract some database metadata, listed below are the <DataProperty
SQL syntax used: IRI="file:/C:/JDeveloper/mywork/Test1/Cl
For Class description: SELECT TABLE_NAME from ient/Out1.xml#M_TITLE"/>
USER_TABLES; </Declaration>
For data properties: SELECT * from
USER_TAB_COLUMNS; <DataPropertyDomain>
Name = COLUMN_NAME <DataProperty
Domain = TABLE_NAME IRI="file:/C:/JDeveloper/mywork/Test1/Cl
Range = DATA_TYPE ient/Out1.xml#AGE"/>
For object properties: SELECT * from IRI="file:/C:/JDeveloper/mywork/Test1/Cl
'R' </DataPropertyDomain>
Domain = TABLE_NAME <DataPropertyDomain>
C1 = CONSTRAINT_NAME <DataProperty
C2 = R_CONSTRAINT_NAME IRI="file:/C:/JDeveloper/mywork/Test1/Cl
SELECT * from USER_CONS_COLUMNS where <Class
CONSTRAINT_NAME = +C1+; IRI="file:/C:/JDeveloper/mywork/Test1/Cl
Name = COLUMN_NAME ient/Out1.xml#PROFESSORS"/>
SELECT * from USER_CONS_COLUMNS where <DataPropertyDomain>
CONSTRAINT_NAME = +C2+; <DataPropertyRange>
Range = TABLE_NAME. <DataProperty
Then these queries extract the required data and save them
for conversion with the aid of the Java program.
<Datatype abbreviatedIRI="xsd:NUMBER"/>
Transformed Ontology version
<DataProperty Output 2:Protg representation of the class properties

ISBN: 978-1-941968-43-72017 SDIWC8 8

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Fig. 7. . Screenshot of OWL/XML

Fig. 8. Protg representation of the class properties

Fig. 9. Resulting Ontology Onto Graf:

ISBN: 978-1-941968-43-72017 SDIWC9 9

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

IV. CONCLUSION [5] Maria A. M. N, An overview of Ontology, Mexico: Technical

Report, 2003.
Ontology is a great tool to manage the fundamental
concepts in a domain, the relations between them and the [6] OWL web Ontology language overview, http://www.w3.org/TR/owl-
operations that can be performed on them. It gives the features/, Accessed March 6, 2017
system the ability to recognize the context they are operating
in and reason about those contexts. Ontology plays an [7] Russell S., Norvig P. Artificial intelligence: a modern approachNJ:
important role in future of software development and Prentice Hall, Englewood Cliffs, 1995.
knowledge modelling.
[8] Raji G., Nadine C.,Database-to-Ontologymapping generation for
The objective of this work was to develop a conversion semantic interoperability France: University of BurgundyLaboratory
program from relational database system to standard LE2I, ND.
Ontology description. I have developed a conversion
framework in Java using JDBC and XML/DOM interfaces. [9] Sweta S., Database management system, India: Journal of
This program can be used for rapid Ontology development Management Research and Analysis, 2015, vol 2(1), pp. 7280.
and the resulting Ontology OWL description file can be
manipulated in standard Ontology editors, like Protg. [10] TechTarget, Data modeling, 2010, Accessed March 5, 2017.

[11] Tutorial, http://www.tutorialspoint.com/jdbc/pdf/jdbc-

introduction.pdf, 2004, Accessed March 5, 2017.

[12] Tutorialspoint.com,Relation data model, 2014, Accessed March 5,
[1] Fensel D, Ontology: a silver bullet for knowledge management and
electronic commerce Springer, 2001. [13] Web Ontologylanguage OWL,
[2] Frye, M., "Applications of Ontology in heterogeneous multi- Ontology-language-owl.html, Accessed March 6, 2017
tiernetworks for network management", Theses and Dissertations
2012, Paper 1118. [14] Wikipedia, Protg (Software), 28 August 2015, Accessed March
6, 2017.
[3] Gruber T.R, A translation approach to portable Ontology
specifications, knowledge acquisition, 1993, 5:199220. [15] XML RDF, http://www.w3schools.com/xml/xml_rdf.asp; Accessed
March 5, 2017
[4] Irina A, ,Storing OWL Ontology in SQL relational databases, 2007.

ISBN: 978-1-941968-43-72017 SDIWC10 10

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Combined neural network model for real estate market range value

V.Yakubovskyi Georgi Petrov Dimitrov

Institute of International Relations, Taras University of Library Studies and
Shevchenko National University of Kyiv Information Technologies
Ukraine Sofia, Bulgaria
vyakubovsky@veritex.com.ua geo.p.dimitrov@gmail.com

Oleksiy Bychkov Galina Panayotova

Taras Shevchenko University of Library Studies and
National University of Kyiv Information Technologies
Kiev, Ukraine Sofia, Bulgaria
bos.knu@gmail.com panayotovag@gmail.com

Estimating the value of real property is

ABSTRACT necessary for a variety of endeavors, including
real estate financing, listing real estate for sale,
investment analysis, property insurance and the
This article describes the neural network
taxation of real estate. For most people,
model for solving problem of real estate market
valuation. Results of this approach were compared
determining the asking or purchase price of a
to the ones of existing algorithms for this problem. property is the most useful application of real
Influence of the data parameters and estate valuation. This article will provide an
structure of learning dataset on accuracy of introduction to the basic concepts and methods
algorithms for real estate market valuation is also of real estate valuation, particularly as it
analyzed in this article. pertains to real estate sales.
An accurate appraisal depends on the
KEYWORDS methodical collection of data. Specific data,
neural network model, structured approach, covering details regarding the particular
valuation, comparative approach, real estate,
property, and general data, pertaining to the
multilayer perceptron, Kohonen network,
clustering, k-means, multiple regression analysis
nation, region, city and neighborhood wherein
the property is located, are collected and
Introduction. analyzed to arrive at a value. Three basic
approaches are used during this process to
In the process of real estate valuation, and determine a property's value.
in particular, in the case of mass valuation,
where statistical analysis methods are applied, Problem statement.
new methods of determination of a real estate
value are searched for, which would allow to Despite recent advances of machine
achieve higher accuracy of results. Artificial learning and data mining, real estate market
neural networks (ANN) represent one of assessment remains the domain, which almost
methods which might be an alternative for the always uses manual estimation by professional
commonly applied method of multiple experts.
regression. They may be applied in the process We can select following problems which
of determination of real estate values, as well as cause such situation:
at the stage of selection of those real estate 1. Pricing characteristics for real
features, which highly influence their prices. estate market valuation vary a lot depending on
the country of assessment.

ISBN: 978-1-941968-43-72017 SDIWC 11

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

2. Pricing characteristics values Various articles receive diametrically opposed

have different influence for different areas (e.g. results.
in (Zurada, Levitan, & Guan, 2011) for the In (Peterson & Flanagan, 2009),
USA market real estate area was used as ordinal (Limsombunchai, Gan, & Lee, 2004) NN
parameter with 4 possible values, in Ukraine it outperform other algorithm including MRA,
is used as a numeric parameter). meanwhile in (Zurada, Levitan, & Guan, 2011)
3. Some methods perform better authors didnt receive any important different in
with specific data format or with different types results between NN and other approaches.
of models. That is why their results differ a lot Several authors (e.g. (Lenk, Wozrala, & Silva,
in different articles, which complicates the 1997)) raise concerns about NN black box
creation of unified algorithm suitable for most nature.[4, 5, 7, 9].
cases. Another approach that is used for real
4. Most datasets are not public, so estate valuation is fuzzy logic. Although
it is difficult to compare different algorithms on linguistic variables should precisely describe
the same dataset. many important price characteristics it is still
difficult to determine fuzzy rules. And even if
Article purpose we specify these rules they will be very
dependent on our dataset. In (Gonzlez &
Justification of expedience of approach
Formoso, 2006) fuzzy rule-based system
based on neural networks and general
showed results slightly better than MRA.[2]
structured neural network model development.
Comparison this approach with other existing
algorithm for this problem and justification of Main research results.
its usage for providing range price assessment.
One of the most basic algorithms for real
Current researches and publications estate valuation is k-means. This approach uses
analysis. only object coordinates to determine its price
zone. Despite of using only small part of
The first thing it should be mentioned available characteristics k-means shows
about current researches is absence of universal acceptable results (Bourassa, Cantoni, &
datasets. Almost all authors use their own Hoesli, 2010) and its performance is often used
datasets. For example, in (Selim, 2011) model as baseline for algorithm comparison.[1]
is built for Turkish house prices, in (Helbrich, One layer perceptron, that uses all
Brunauer, & Nijkamp, 2013) authors analyze characteristics from dataset, has shown almost
dataset of Austrian real estate market. The the same results as k-means for the dataset
difference of these articles results shows that collected for Ukrainian real-estate market.
main real estate price characteristics are very Both algorithms had one similar problem.
dependent on object location.[3, 8,10]. They both have a numeric characteristic that
Because of this ambiguity of data has influence on result improvement (i.e. the
characteristics, algorithms demonstrate greater this characteristic is the better result is).
different performance for different datasets. For k-means its the number of clusters and for
That is why researches develop many different perceptron its the number of neurons. And for
algorithm for solving this problem. both cases algorithm stop improving after some
The most traditional approach for real value of this characteristic.
estate market estimation is multiple regression But k-means has also one advantage
analysis (MRA) (Mark & Goldberg, 1988).[6] which is rarely mentioned in literature. It can be
Neural networks (NN) are quite a used for price range assessment. Range
common approach for solving different selection is done as following:
classification and estimation tasks. That is why 1) We select some accuracy value
they are also widely used for economical assets 2) For this accuracy, we calculate
valuation. But their performance comparing to corresponding percentiles (e.g. for
other algorithms is still an open question. 50% we get 25th and 75th percentiles,

ISBN: 978-1-941968-43-72017 SDIWC 12

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

for 60% we get 20th and 80th 3) We select price range accuracy.
percentiles) 4) Corresponding percentiles are
3) These values are the resulting price selected.
range. 5) Percentiles values are added to the
It is obvious that this algorithm will value from step 2)
always return correct results for learning 6) The result of step 5) is objects price
dataset, meaning that for selected accuracy (e.g. range.
60%) the ratio of objects with correct price Usual price assessment is done as
range will be the same (same 60%). follows:
But for the testing dataset that was used 1) Object is clustered by first level
by authors, the result is a bit worse, but still network
close to the selected accuracy. 2) Second level network calculates its
Despite k-means algorithm multilayer value according to objects
perceptron can use all characteristics which are characteristics.
present in learning dataset, but it didnt effect 3) Value from step 2) is added to mean
on its results. For dataset that contains prices of value for cluster selected on step 1)
Ukrainian real estate market, k-means and 4) The result is objects price.
multilayer perceptron showed similar accuracy.
Per the results of two described Testing results.
algorithms authors decided to combine these
Testing was divided in two parts.
two approaches into one two-level neural The first part is usual price assessment
network model. (when algorithm returns only single value). For
For the learning of this model we use the this task, it was compared three algorithms: k-
following algorithm: means, classical multilayer perceptron with one
1) On the first level data are clustered hidden layer and two-layer neural network
with Kohonen network using only model.
objects coordinates. The details of algorithms are as
2) Average price is calculated for each following:
cluster. 1) k-means algorithm: uses only two
3) Separate perceptron is created for objects coordinates, k = 400. This
each cluster. It uses all characteristics algorithm is mentioned below as
except price as inputs and the K400
difference between the real price and 2) multilayer perceptron: has one hidden
the average price in cluster as an layer with 100 neurons, uses all
output. objects characteristics as input (one
4) On the second level data is processed neuron in input layer per
by perceptron of cluster detected on
characteristic), has one neuron in
the first level. output layer that returns price per
5) Each object in cluster is passed to its square meter. This algorithm is
clusters perceptron and calculated mentioned below as N100
value is subtracted from objects 3) two-layer neural network model: has
price. 40 clusters on the first level and
6) All these results are saved in cluster. multilayer perceptron with 10
7) All percentiles are calculated per neurons in hidden layer for each
these values. cluster. This algorithm is mentioned
Price range assessment is done as below as G40/10
follows: The following metrics are used to
1) Object is clustered by first level compare results of usual price assessment:[11]

=1| |
2) Second level network calculates its 1) Average error
value according to objects
characteristics. This value is saved.

ISBN: 978-1-941968-43-72017 SDIWC 13

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

2) Average percent error


Test dataset

The second part is price range assessment
(when algorithm returns price range with Two-layer NNM

specified accuracy). Classical multilayer

perceptron cannot be used for this purpose 145 150 155 160 165 170

without extra modifications, so only two K400 N100

algorithms were compared: k-means and two-

layer neural network model. In this case k- Diagram1. Price assessment comparison for
means has k = 40 (so both algorithms has the specified algorithms of error.
same number of clusters).
Testing was done on two datasets that For price range case, combined approach
were aggregated from ads and represent is compared with k-means with the same
Ukrainian real estate market. The first one number of clusters (k = 40). The main metric
contains approximately 35000 elements and is for this case is average price range for specified
used for both learning and testing. The second accuracy. Testing is done for each accuracy
one contains approximately 2500 elements and from 10% to 90% with 10% step. The results of
is used only for testing. All objects with this testing are presented in Table 2-3. The last
extremely high prices (>2000) and extremely column of these tables shows how much
low prices (<80) were removed from datasets). average range of two-layer neural network
The average price per square meter of first model is less than average range of k-means
dataset was 631.3745. The average price per algorithm in percent.
square meter of the second dataset was
Two-layer NNM k-means Results
653.4842. differenc
The results of testing for single price case Accurac Averag Accurac Averag
y e range y e range e in
are presented in Table 1. percent
0,9 637,76 0,901 835,28 30,972
Learning dataset Test dataset 1 7
Algorith 0,8 446,51 0,801 621,66 39,226
Percent Percent
m Error Error 2
error error
164.374 30.919 167.679 29.974 0,7 336,65 0,701 482,94 43,452
N100 8
9 8 5 6
152.774 27.620 159.817 27.088 0,6 260,66 0,602 383,18 47,001
K400 6 2
4 9 6 7
151.187 27.181 154.489 26.626 0,5 200,57 0,503 305,27 52,204
G40/10 1 6
7 2 2 9
Table 1. Price assessment comparison for 0,4 152,6 0,402 236,59 55,044
specified algorithms 0,3 110,56 0,302 172,65 56,147
As we see in Table 1 and Diagram1 9
performance of two-layer neural network with 0,2 70,938 0,204 112,23 58,219
40 clusters and 10 neurons in each network is 7
better than performance of classical neural 0,1 36,187 0,103 55,101 52,267
network with 100 neurons and is similar with Average difference 48,281
performance of k-means algorithm with 400 Table 2. Price range assessment
clusters. So, despite much less number of comparison for learning dataset
neurons and clusters combined approach shows
similar or slightly better performance.

ISBN: 978-1-941968-43-72017 SDIWC 14

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

1000 1000
800 800
600 600
400 400
1 2 3 4 5 6 7 8 9 0
1 2 3 4 5 6 7 8 9
Two-layer NNM k-means
Two-layer NNM k-means

Diagram 2. Price range assessment comparison Diagram 3. Price range assessment comparison
for learning dataset of Average range for test dataset of Average range
As we can see from Table 2 and Diagram For the test dataset performance
2 two-layer neural network model shows much improvement keeps the same. K-means still has
better performance than k-means algorithm slightly better accuracy than selected values,
with the same number of clusters. This but combined approach shows slightly worse
performance difference is more for smaller accuracy results. In the worst case, it is 2.4% for
selected accuracy and in average for all tested 40% accuracy. This can be explained by neural
accuracy values it is 48.281 %. Also, we should network performance. They have not the
mention that the real accuracy is the same as perfect performance on testing data, and for
selected, so combined approach showed no some objects this error is more than final price
accuracy loss for learning dataset. Small range. But despite of this performance the mean
accuracy improvement of k-means algorithm range size is in average 52.103% better. So, we
for learning dataset can be explained by the fact can slightly increase target accuracy to keep the
that some clusters may have less than 20 final one on the same level with selected one
elements, so percentiles values are not accurate and combined approach will still show better
enough and they represent slightly incorrect performance than plain k-means algorithm. It
accuracy values (e.g. 61% percentile instead of also keeps showing more performance
60%). difference for smaller accuracy. As we see in
See the result on the Table 3 and diagram 3. tables 2 and 3 the results of two-layer neural
network considerably exceed corresponding
Two-layer NNM k-means Results
differen results of k-means algorithm. On the learning
Accurac Averag Accurac Averag
ce in dataset accuracy meets given values and price
y e range y e range
percent ranges are in average 48% smaller. On the test
0,89 646,95 0,902 859,576 32,866 dataset accuracy decreases slightly (by 1-3%),
0,783 450,593 0,803 638,929 41,797 but price ranges keep their size.
0,685 339,624 0,701 495,63 45,935
0,579 262,351 0,607 392,859 49,746 Conclusion
0,478 201,813 0,503 314,739 55,956 Artificial neural networks are an
0,376 153,383 0,406 243,603 58,819 alternative solution for the multiple regression
0,278 111,469 0,302 178,865 60,462 method used for real estate valuation.
0,171 71,344 0,198 116,831 63,756 New two-layer neural network model for
0,091 36,203 0,108 57,776 59,59
real estate valuation is developed in this article.
The main idea of this model is to combine
Average difference 52,103
clusterization by geographical coordinates and
Table 3. Price range assessment estimation with multilayer perceptron to
comparison for test dataset achieve the most effective usage of existing
objects characteristics for predicting its price.
The test result of the described model
shows that despite using much less clusters than
k-means it shows similar results for usual price

ISBN: 978-1-941968-43-72017 SDIWC 15

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

valuation, with the same number of clusters it [2]. Gonzlez, M., & Formoso, C. (2006). Mass
shows much better results for price range appraisal with genetic fuzzy rule-based
systems. Property Management, 20-30.
valuation. [3]. Helbrich, M., Brunauer, W., & Nijkamp, P. (2013).
The results of price range valuation of the Spatial heterogeneity in hedonic house price
described model suggest prospects for its models: the case of Austria. Urban Studies,
implementation in real-life systems for real 390-411.
estate market valuation. [4]. Lenk, M. M., Wozrala, E. M., & Silva, A. (1997).
High-tech valuation: should artificial neural
The accuracy of determination of real networks bypass the human valuer? Journal of
estate values using the ANN method is higher Property Valuation and Investment., 8-26.
than the accuracy obtained with the use of the [5]. Limsombunchai, V., Gan, C., & Lee, M. (2004).
multiple regression method for real estate House price prediction: hedonic price model
markets of high number of transactions above vs. artificial neural network. American Journal
of Applied Sciences, 193-201.
100 transactions. In such a case utilisation of [6]. Mark, J., & Goldberg, M. (1988). Multiple
the ANN is highly recommended. On the other regression analysis and mass assessment: A
hand, for the markets of medium number of review of the issues. Appraisal Journal, 56(1),
transactions several dozens of transactions 89-109.
tilisation of the ANN for real estate valuation [7]. Peterson, S., & Flanagan, A. (2009). Neural
network hedonic pricing models in mass real
results in the slightly higher accuracy of estate appraisal. Journal of Real Estate
estimation of real estate values. Those methods Research., 147-164.
may be interchangeably applied. [8]. Selim, S. (2011). Determinants of house prices in
Turkey: a hedonic regression model. Dou
niversitesi Dergisi., 65-86.
Acknowledgment [9]. Zurada, J., Levitan, A., & Guan, J. (2011). A
comparison of regression and artificial
intelligence methods in a mass appraisal
This work is partly supported by the context. Journal of Real Estate Research, 349-
project SIP-2017-09, /02/03/2017 - "Research 387.
and analysis of ecosystem monitoring and [10]. Panayotova G., G. P. Dimitrov, Design of Web -
management systems from the Internet of Based Information System for Optimization of
Things". Portfolio, The 13th International Synposium on
Operations Reseach in Slowenia, 23-25
September, 2015, Bled, Slowenia, pp. 193-198,
References ISBN978-961-6165-45-7
[11]. Panayotova, G. (2014) Mathematical modelling,
[1]. Bourassa, S., Cantoni, E., & Hoesli, M. (2010). Sofia, Bulgaria, ISBN 976-619-185-037-2
Predicting house prices with spatial
dependence: A comparison of alternative
methods. Journal of Real Estate Research,

ISBN: 978-1-941968-43-72017 SDIWC 16

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Multi-Sensor Fusion Method For Mobile System

Wassila. Meddeber, Youcef. Touati and Arab. Ali-Cherif
Computer Science and Artificial Intelligence Lab. LIASD
University of Paris 8
Saint-Denis, France
meddeber@ai.univ-paris8.fr, touati@ai.univ-paris8.fr, aa@ai.univ-paris8.fr

Abstract This paper deals multi-sensor data fusion problem

for mobile robot localization. In this context, we have proposed a This paper focuses on robust pose estimation for mobile
Kalman Particle Kernel Filter (KPKF), which is based on a robot localization. A new hybrid Particle filter method called
hybrid Bayesian filter, combining both extended Kalman and Kalman-Particle Kernel Filter (KPKF) is proposed to minimize
particle filters. The KPKF models a conditional density using a the system estimation error and increase the localization
Gaussian mixture in which each component has a small robustness. Its organized as follows: In section II we present
covariance matrix. The Kalman correction updates the weights in and discuss some multi-sensor data fusion methods. In section
order to bring particles back into the most probable space area. III, a proposed KPKF is presented. In section IV, an example
This method can be applied for non-linear and multimodal of a localization process applied on the LIASD-Wheelchair is
environment and can improve localization performances. The
proposed approach is implemented on a LIASD-Wheelchair
illustrated. Finally, in section V, conclusion and some
experimental platform. perspectives are addressed.

KeywordsLocalization; data fusion; Bayesian filter; II. RELATED WORKS

Kalman filter; particle filter; smart wheelchair. The Kalman Filter (KF) is the best known and most widely
applied parameter and state estimation algorithm in data fusion
I. INTRODUCTION methods [5-15]. It can be considered as a prediction-update
Several works have been undertaken to assists and help the formulation. The algorithm uses a predefined linear model of
handicapped and elderly people to gain mobility and lead to the system to predict the state at the next time step. The
independent life and particularly those concerning the prediction and update are combined using the Kalman gain,
development of services related to automated wheelchairs. which is computed to minimize the mean square error of the
Make a wheelchair intelligent and autonomous, allows us to state estimate. The KF diagram is illustrated in Fig.1.
develop new methodologies taking into account the type of
handicap, environment dynamics, new communication
technologies such as sensor networks and wireless mesh
networks and so on. In this direction, localization process is
one of the main services that have been prospected in order to
ensure assisted people a better mobility and assistance in their
life [1-2]. It constitutes a key problem in mobile robotics [3]
and it consists of estimating the robots pose (position,
orientation) with respect to its environment from sensor data,
and the simplest way is integrating of odometric data which,
however, is associated with unbounded errors, resulting from Fig. 1. Kalman filter diagram (KF).
uneven floors, wheel slippage, limited resolution of encoders,
etc. However, such a technique is not reliable due to Extended Kalman Filter (EKF) is a new version of KF that
cumulative errors occurring over longer runs. Therefore, a can handle non-linear measurement equations. Various EKF
mobile robot must be able to localize or estimate its parameters based-approaches have been developed. These approaches
also with respect to an internal world model by using the work well as long as the used information can be described by
information obtained with its exteroceptive sensing system. simple statistics well enough. The lack of relevant information
is compensated by the use of various processes models.
The use of sensory data from a range of disparate multiple However, they require assumptions about parameters, which
sensors, is to automatically extract the maximum amount of might be very difficult to determine. Assumptions that
information possible about the sensed environment under all guarantee optimum convergence are often violated and,
operating conditions. The main idea of data fusion methods is therefore, the process is not optimal or it can even converge.
to provide a reliable estimation of robots pose, taking into The Stochastic Kalman Filtering techniques [5-14] rely on
account the advantages of the different sensors [4]. approximated filtering, which requires ad doc tuning of

ISBN: 978-1-941968-43-72017 SDIWC 17

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

stochastic modelling parameters, such as covariance matrices, However, this filter can be very costly to implement, as a very
in order to deal with model approximations and bias on the large number of particles is usually needed, especially in high
predicted pose. In order to compensate such error sources, local dimensional system. In case of low dynamical noise, we
iterations, adaptive models and covariance intersection filtering observe that in multiplying the high weighted particles, the
have been proposed [16-20]. An interesting approach solution prediction step will explore poorly the state space. The particle
was proposed in [17], where observation of the pose clouds will concentrate on few points of the state space. This
corrections is used for updating of the covariance matrices. phenomenon is called particle degeneracy, and causes the
However, this approach seems to be vulnerable to significant divergence of the filter.
geometric inconsistencies of the world models, since
inconsistent information can influence the estimated covariance Despite the research efforts to improve filters performance
matrices. for data fusion, their behaviors remain unstable for some
applications such as navigation and localization
In the localization problem is often formulated by using a
unique model, from both state and observation processes point III. PROPOSED KPFK FILTER APPROACH
of view. Such an approach, introduces inevitably modelling
The Kalman-Particle Kernel Filter (KPKF) combines both
errors, which degrade filtering performances, particularly,
an EKF and a PF for a robust localization system by adjusting
when signal-to-noise ratio is low and noise variances have been
the state of mobile system and reducing the estimation error.
poor estimated. Moreover, to optimize the observation process,
This new filter is a kind of hybrid particle filter. It is based on
it is important to characterize each external sensor not only
the representation of the kernel of conditional density and on a
from statistic parameters estimation point of view but also from
local linearization as a Gaussian mixture [24]. The KPKF
robustness of observation process point of view. It is then
filter method can be implemened according to three steps, as it
interesting to introduce an adequate model for each observation
is shown in Figure 3:
area in order to reject unreliable readings. In the same manner,
a wrong observation leads to a wrong estimation of the state
vector and consequently degrades localization algorithm
Particle Filter (PF) based-methods are considered as a
sequential version of the Monte Carlo methods [21-23]. They
represent the most effective methods for nonlinear locatisation
of mobile systems. These methods have the ability to manage
a set of particles in order to determine positions, and
orientations. The principle of PF is to make the particles
evolving in the same way as the robot to determine new
positions and then comparing its perceptions to those of the
particles. We retrieve the model values odometry (prediction)
between two successive moments then transmitted to the filter
function for correction by the observation model. After a small
number of iterations, this process converges into a position Fig. 3. Kalman-Particle Kernel Filter diagram (KPKF).
where a population of particles is very dense. The PF method
is illustrated in Fig. 2. Correction step: is divided into two steps: a Kalman
correction and a particle correction. The correction
step ensures a mixture of Gaussian distribution of the
filter density in order to increase the probability of
the presence of the particles in the state space.
Prediction step: therefore, this step is still a mixture of
Gaussian distribution. In fact, the predicted density is
modeled in the same way as the corrected density.
Resampling step: is introduced to further reduce the
divergence of the particulate filter (Monte Carlo).

The KPKF combines both advantages of the EKF and the


We present an application of the Kalman-Particle Kernel
Filter (KPKF) approach for a robust localization adapted to
disabled and elderly people. Our approach is implemented and
Fig. 2. Particle filter diagram (PF).

ISBN: 978-1-941968-43-72017 SDIWC 18

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

tested on a prototype called LIASD-Smart Wheelchair,

developed in our laboratory (Informatics and Artificial
Intelligence) [25]. The LIASD-Smart Wheelchair is equipped
with data fusion sensors: laser telemeter and, encoders.
LIASD-Wheelchair is an adjustable adults powered
wheelchair (Fig. 4). It is suitable for indoor or outdoor use and
implements wired and wireless networks for communication.
The wireless communication is based on two standards: IEEE
802.11 and IEEE 802.15.4. A wireless router is integrated to
ensure communication between remote computer (server) and
wheelchair devices (camera, embedded computer, etc.).
Fig. 5. LIASD-WheelChair control architecture.

Using its measurements and the characteristics of the

geometrical model of the dynamic system, we determine at
each moment the position and the orientation of the smart
wheelchair. The odometry model introduces a major fault of
the slipping of the rounds (noise of measurement) An
unreliable location. To correct the odometry error, we use the
data fusion theoretical method by adding the measurement of
the laser sensor (observation model). We use a laser with a
relatively good accuracy compared to other distance sensors.
The measurement of the laser telemeter corresponds to the
distance traveled between the localization of the robot and the


The localization system is a complex multi-sensor process.
Fig. 4. Global structure of LIASD-WheelChair.
To solve the problem of multimodality and non-linearity, we
have proposed a new adaptation filter for data fusion, called
A. Hardware Architecture
Kalman-Particle Kernel Filter. The KPKF is a mixture of
The hardware architecture of LIASD-Wheelchair consists extended Kalman filter and particulate filter combining the
of sensory block, control architecture, and communication advantages of both filters. Our approach is implemented on a
networks. The presented system includes two optical mobile platform developed in our laboratory called LIASD
incremental encoders mounted to a motor, with resolution of Smart Wheelchair. The aim is to improve the quality of
500 Counts per Revolution. Four ultrasonic rangefinders (US service in terms of mobility and assistance to displacement of
SFR08) are used to localize the wheelchair in the environment. persons with disabilities. A full test of our system is still in
They have a resolution of 3cm and can identify obstacles progress to demonstrate that our filtering approach is very
between 3cm and 6m. The US sensors interact with the effective for the robustness of system localization. Therefore,
computer via TCP/IP server board FMod-TCP DB using an this method could use the research work that deals with the
I2C interface. In order to ensure navigation and anti-collision issue of the localization and navigation of stand-alone
objectives a Wireless Internet Camera Server is mounted on vehicles.
the wheelchair headrest.
B. Control Architecture
This article is part of the research development and
The LIASD-WheelChair control architecture is divided innovation project (LIASD-Wheelchair). It deals with the
into three levels: a basic control level, a tactical level and problem of mobile system indoor localization for navigation.
strategy level, as shown in Fig. 5. The strategy level concerns Special thanks to professor ARAB ALI CHERIF, Director of
the way the wheelchair system can achieve the main goal. the LIASD laboratory at University Saint Denis Paris 8 and all
Algorithms such as planning trajectories, localization, etc. are the colleagues involved in this project.
implemented to fulfill the desired task. Elementary actions are,
then generated in tactical level aiming to satisfy reached goals REFERENCES
specified previously. Basic control level implements PID [1] A. Lankenau, T. Rofer. A Versatile and Safe Mobility Assistant[J]. IEEE
controller in the PWM/encoders boards with specific Robotics & Automation Magazine, 2001, Vol.8 (1), pp.29-37.
parameters for positions and speeds control. [2] E. Prassler, J. Scholz and P Fiorini. A robotics wheelchair for crowded
public environment[J]. IEEE Robotics Automation Magazine, 2001,
Vol.8(1), pp.38-45.
[3] J. Borenstein, B. Everett, L. Feng, Navigating Mobile Robots: Systems
and Techniques, A.K. Peters, Ltd., Wellesley, MA, 1996.

ISBN: 978-1-941968-43-72017 SDIWC 19

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

[4] C. Harris, A. Bailley, T. Dodd, Multi-sensor data fusion in defense and [16] G.A. Borges, M.J. Aldon, Robustified estimation algorithms for mobile
aerospace, Journal of Royal Aerospace Society 162 (1015) (1998) 229- robot localization based geometrical environment maps, Robotics and
244. Autonomous Systems 45 (2003) 131-159.
[5] J.B. Gao, C.J. Harris, Some remarks on Kalman filters for the multi- [17] L. Kleeman, Optimal estimation of position and heading for mobile
sensor fusion, Journal of Information Fusion 3 (2002) 191-201. robots using ultrasonic beacons and dead-reckoning, in: Proceedings of
[6] C. Chui, G. Chen, Kalman filtering with real time applications, Springer the IEEE International Conference on Robotics and Automation, 1992,
Series in Information Sciences, Springer-Verlag, New-York 17 (1987) pp. 25822587.
23-24. [18] L. Jetto, S. Longhi, G. Venturini, Development and experimental
[7] K.O. Arras, N. Tomatis, B.T. Jensen, R. Siegwart, Multisensor on-the- validation of an adaptive Kalman filter for the localization of mobile
fly localization: precision and reliability for applications, Robotics and robots, IEEE Transactions on Robotics and Automation 15 (2) (1999)
Autonomous Systems 34 (2001) 131143. 219229.
[8] H. Wang, M. Kung, T. Lin, Multi-model adaptive Kalman filters design [19] S.J. Julier, J.K. Uhlmann, A non-divergent estimation algorithm in the
for maneuvering target tracking, International Journal of Systems presence of unknown correlations, in: Proceedings of the American
Sciences 25 (11) (1994) 2039-2046. Control Conference, 1997.
[9] S. Borthwick, M. Stevens, H. Durrant-Whyte, Position estimation and [20] X. Xu, S. Negahdaripour, Application of extended covariance
tracking using optical range data, in: Proceedings of the IEEE/RSJ intersection principle for mosaic-based optical positioning and
International Conference on Intelligent Robots and Systems, 1993, pp. navigation of underwater vehicles, in: Proceedings of the IEEE
21722177. International Conference on Robotics and Automation, 2001, pp. 2759
[10] J.A. Castellanos, J.D. Tards, Laser-based segmentation and localization
for a mobile robot, in: F.P.M. Jamshidi, P. Dauchez (Eds.), Robotics and [21] H. A. P. Blom and Y. Bar-Shalom, The interacting multiple model
Manufacturing: Recent Trends in Research and Applications, vol. 6, algorithm for systems with Markovian switching coefficients, IEEE
ASME Press, New York, 1996, pp. 101109. Trans. Automat. Contr., vol. 33, pp. 780783, Aug. 1988.
[11] M. Jenkin, E. Milios, P. Jasiobedzki, Global navigation for ARK, in: [22] X. R. Li, Engineers guide to variable-structure multiple-model
Proceedings of the IEEE/RSJ International Conference on Intelligent estimation for tracking, in Multitarget-Multisensor Tracking:
Robots and Systems, 1993, pp. 21652171. Applications and Advances, Y. Bar-Shalom and D.W. Blair, Eds.
Boston, MA: Artech House, 2000, vol. III, ch. 10, pp. 499567.
[12] P. Jensfelt, H.I. Christensen, Pose tracking using laser scanning and
minimalistic environment models, IEEE Transactions on Robotics and [23] X. R. Li, Hybrid estimation techniques, in Control and Dynamic
Automation 17 (2) (2001) 138147. Systems: Advances in Theory and Applications, C. T. Leondes, Ed. New
York: Academic, 1996, vol. 76, pp. 213287.
[13] J.J. Leonard, H.F. Durrant-Whyte, Mobile robot localization by tracking
geometric beacons, IEEE Transactions on Robotics and Automation 7 [24] X. R. Li and Y. Bar-Shalom, Design of an interacting multiple model
(3) (1991) 376382 algorithm for air traffic control tracking, IEEE Trans. Contr. Syst.
Technol., vol. 1, pp. 186194, Sept. 1993. (Special issue on Air Traffic
[14] J. Neira, J.D. Tards, J. Horn, G. Schmidt, Fusing range and intensity Control).
images for mobile robot localization, IEEE Transactions on Robotics
and Automation 15 (1) (1999) 7684. [25] Y. Touati, H. Aoudia, and A. Ali-Chrif, Intelligent Wheelchair
localization in wireless sensor network environment: A fuzzy logic
[15] J.A. Prez, J.A. Castellanos, J.M.M. Montiel, J. Neira, J.D. Tards, approach, 5 th IEEE International Conference on Intelligent Systems,
Continuous mobile robot localization: vision vs. laser, in: Proceedings of 2010, London, UK , pp.408-413.
the IEEE International Conference on Robotics and Automation, 1999,
pp. 29172923.

ISBN: 978-1-941968-43-72017 SDIWC 20

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

A Neural Network Approach For Attribute

Significance Estimation
Peter Italo De Battista Alexander Buhmann Yiannos Manoli
Robert Bosch GmbH Robert Bosch GmbH University of Freiburg, IMTEK
Reutlingen, Germany Reutlingen, Germany Freiburg, Germany
peteritalo.debattista@de.bosch.com alexander.buhmann@de.bosch.com ymanoli@imtek.uni-freiburg.de

AbstractAttribute selection methods explore the interrela- to the smallest accuracy decrease of the model [11]. Another
tionship between the data to avoid less relevant attributes. Some Multilayer Perceptron solution is based on the signal-to-noise
selector methods are also able to estimate the significance rate ratio measurement between the first layer weights and noise
of input attributes. Removing unnecessary input attributes has
several advantages, like a lower variance and complexity of the injected input weights relating to the input attributes. The
machine learning model. In this paper, we propose a four-layer signal-to-noise ratio fluctuates around zero if the attribute
feedforward neural network, which estimates the input attribute is irrelevant [12]. Another neural network based approach
relevance rate depending on the desired output. The neural measures the model sensitivity by removing one attribute at
network contains a pre-input layer, where every input attribute time. The summed errors are calculated between the reduced
is connected by a salient weight to the next layer. Therefore,
every attribute primarily depends on its salient weight. Two models and the model with all attributes, where relevant
penalty terms are added related to the salient weights. Thus, attributes cause higher error [13].
the relevant and irrelevant attributes can be distinguished. The
attribute significance estimation capability of the proposed neural Specifically for classification tasks, an attribute selector
network was evaluated for three artificial and one real regression Multilayer Perceptron is introduced in [14]: output gradient
in addition to a real classification problem. based constraining terms are added to the cross-entropy error
cost function and the less salient attributes are eliminated
Index TermsFeature selection, attribute selection, dimension- automatically by setting them to zero one by one. The smallest
ality reduction, multilayer perceptron, artificial neural network,
feature significance, attribute saliency, feature ranking. drop attribute is eliminated at every training period. After the
elimination, the neural network is retrained and this process
I. I NTRODUCTION is repeated until only one input feature remains.
Attribute selection methods distinguish relevant data at- The Constructive Approach for Feature Selection algorithms
tributes from irrelevant ones. Thus, the variance and complex- is a wrapper based neural network with a self-growing struc-
ity of machine learning models can be decreased by using ture [15]: before the training, two groups of the features
only the relevant input attributes [1]. One subset of attribute are generated based on the correlation between them and
selectors are the filter methods, which rank the input attributes the weights of the neural network are initialized randomly.
before the learning phase, and remove the irrelevant attributes The first build-up of the neural network contains one or two
below a defined threshold. Pearson Correlation Coefficient [2] neurons in the input and hidden layer. During the training,
and Mutual Information [3] belong to this subset [4]. the neural network adds the features and hidden units by
Wrapper methods are other attribute selectors and they can predefined conditions till the neural network has not fulfilled
be divided into two subsets. One group is the sequential these conditions.
selection algorithms like Genetic Algorithm [5] and Swarm
The above detailed solutions focus mainly on classification
Optimization [6]; the other is the heuristic search methods
problems. This paper presents a four-layer feedforward neural
like Branch and Bound [7], which become computationally
network, which is tested also for regression tasks. Every input
complex by a growing number of attributes.
attribute is connected to the input layer through a salient
Unsupervised methods explore the relationships between
weight. Thus, the relevance of the attributes depends primarily
the attributes without any predefined classes. Such techniques
on their salient weight, which are regularized during the
are the clustering algorithms, like the k-means, where the
clustering is based on the distance measurement [8] [9].
Embedded methods classify different subsets by applying Section II describes the build-up of the Attribute Signif-
greedy search algorithms to find the appropriate subset. Sup- icance Estimator Multilayer Perceptron, how it is trained
port Vector Machines [10] and artificial neural networks are and the attribute significance estimation process. Section III
part of this approach. The Neural-network Feature Selector ap- presents the result of three artificial regression tasks, one real
plies two penalty terms to eliminate the unnecessary weights. classification and one real regression task. The paper finishes
The irrelevant attributes are removed one by one according with a summary and conclusion.

ISBN: 978-1-941968-43-72017 SDIWC 21

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017
II. T HE ATTRIBUTE S IGNIFICANCE E STIMATOR Where Na is the number of attributes, 1 , 2 and are regu-
M ULTILAYER P ERCEPTRON larization rates. Experimentally, we found that 1 = 2 = 0.01
A. The neural network build-up and = 100 are proper values for the attribute significance
estimation when the salient weights are initialized to 0.2.
The Multilayer Perceptron contains an extra layer between
the input attributes and the input layer, where every attribute is In the training phase, if the attribute is irrelevant or less
connected by a salient weight into the next layer. Every weight relevant, the penalty terms become more dominant than the
is initialized equally to 0.2 in the extra layer. Furthermore, error function based gradient. Thus, the value of the salient
the neural network consists of an input, hidden and output weight will be smaller.
layer. Fig. 1 shows the build-up of the Attribute Significance The salience of input attributes depends also on the weight
Estimator Multilayer Perceptron. initialization and the applied optimizer method. Therefore,
better optimizers and stochastic or mini-batch gradient descent
based methods are recommended, as they are able to find a
better local optimum [16].

C. The Attribute Significance Ratio (ASR) calculation

The attribute significance ratio gives the attribute saliency

in percentage regarding to the weights after the training:

|waij |
ASRj (wai ) = Na
100%. (5)
|waik |
Fig. 1. The Attribute Significance Estimator Multilayer Perceptron.
After the attribute significance ratio calculation, the at-
The evaluation steps of the neural network output are: an tributes are ranked.
element-wise vector multiplication of the attributes and the
salient weights delivers the output of the extra layer, which
is fed into the input layer. Then the general forward pass
calculation evaluates the neural network output. In vector
A. Basic regression cases
form, the whole forward pass calculation is:
Six input attributes and three combinations of them are
x = x wai , (1) defined. These three combinations are the desired output
of the neural network. The Attribute Significance Estimator
h = fa (xT Wih + bh ), (2) Multilayer Perceptron has to determine the input attributes
saliency and in this case these three functions. Table 1 shows
the six defined attributes.
y = fa (hWho + bo ), (3)
where x denotes the input attributes, wai the salient weights TABLE 1
between the input attributes and input layer, the Hadamard T HE INPUT ATTRIBUTES
product, x the extra layer output, fa the activation function, h Attribute Name Function Range/Value
the hidden layer activation, Wih the weights between the input x1 sine [0, 3]
and hidden layer, bh the hidden layer biases, y the output layer x2 linear (x) [0, 1]
activation, Who the weights between the hidden and output x3 cosine [0, 3]
layer and bo the output layer biases. x4 exponential [0, 1]
B. Training and weight regularization x5 constant 2
x6 sine [0, 3]
The salient weights are part of the optimization map. Thus,
the significance of an attribute depends mainly on its salient
weight. By using penalty terms, the irrelevant attributes can be To test the redundancy elimination capability of the neural
diminished during the training. Two penalty terms are added network, x1 and x6 are identical. Therefore, x1 attribute ap-
to the error function regarding the salient weights: pears always in the desired function. Another consideration is
Na Na 2
to prove the Multilayer Perceptron robustness against constant
X waij value. Thus, x5 is never added to the desired function. Table
R(wai ) = 1 wai + 2 2 . (4)
1 + waij 2 contains the desired functions.

ISBN: 978-1-941968-43-72017 SDIWC 22

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017
TABLE 2 The applied optimizer is ADAM [19] with the following
T HE DESIRED OUTPUT FUNCTIONS parameters: step size  103 , moment estimates 1 0.9 and
Case Desired Function
2 0.999, where the data sets are batched. The Multilayer
C1 y1 = x1 + x2
Perceptron is trained till 10,000 iterations. After the training,
C2 y2 = 0.1 x1 + x2 the train set error is 6.82% and the test set error is 9.96%.
C3 y3 = (x1 + x4 )x3 The ranked attributes with their attribute significance ratios
are shown in table 4.
In all three cases, the neural network build-up is the same, TABLE 4
contains 6 inputs, 100 hidden units and 1 output. The activation T HE RESULTS OF REAL CLASSIFICATION EXAMPLE
function is hyperbolic tangent between the input and hidden
layer, and the output activation is linear. The optimizer is also Attribute ASR[%] Attribute ASR[%]
the batch gradient descent with a 102 learning rate. The x19 24.68 x6 0.70
applied cost function is the mean squared error (MSE): x14 23.80 x3 0.59
x16 14.42 x7 0.58
m x13 12.68 x11 0.46
1 X
M SE(y, yn ) = (yi yni )2 , (6) x15 10.19 x1 0.36
m i=1
x17 5.67 x10 0.25
where y is the output of the neural network, yn is the desired x9 2.88 x18 0.05
output and m is the number of training instances. The learning x2 1.07 x4 0.01
stopping criteria is 5 104 MSE. Table 3 shows the results x8 0.85 x5 0.00
of the defined cases. x12 0.75

TABLE 3 To prove the attributes significance rate estimation, every

attribute is selected with bigger significance rate than 1% to
PP Input train a feed-forward neural network. Thus, 8 attributes are
P x1 x2 x3 x4 x5 x6
Case PP
P chosen, namely attribute 19, 14, 16, 13, 15, 17, 9 and 2.
C1 : ASR[%] 50.58 48.83 0.09 0.34 0.14 0.02 Thus, the neural network contains 8 inputs, 20 hidden units
C2 : ASR[%] 1.36 87.61 0.08 0.52 0.00 10.42 and 7 outputs.The training conditions are the same as for the
C3 : ASR[%] 0.04 0.04 38.59 26.60 0.01 34.73 saliency estimation, except for the stopping criteria. This is
defined to reach at least the train set error of the attribute
B. A real classification problem estimation network. After the training, the reached train set
The Image Segmentation Dataset is used for a classification error is 6.76% and the test set error is 7.79%. Additionally,
study in this paper. This dataset is avaible at the UCI Machine a Multilayer Perceptron is trained with all attributes to reach
Learning Repository [17]. The dataset contains seven types this stopping criteria. This reached a 6.76% train set 10.17%
of outdoor images, which are the brickface, sky, foliage, test set error. Thus, the difference between the train and test
cement, window, path and the grass. The total number of sets are: all input attributes 3.41%, all input attributes with the
instances are 2310, where every one has 19 attributes, which extra layer 3.14% and with the selected attributes 1.03%.
are in a prepared pixel information format. For the attribute C. A real regression problem
significance estimation, the dataset is divided into a training
set with 1848 instances and a test set with 462 instances. A real regression study is done on the Energy Efficiency
The neural network contains 19 inputs, 20 hidden units and Data Set, which is available at the UCI Machine Learning
7 outputs. Between the inputs and the hidden units, the sigmoid Repository [17]. The data set consists of simulation of 12
activation function is used. The output layer is a softmax different building forms with same volume 771.75 m3 and
function [18], which gives the probability per class: simulates the environment of Athens, Greece. The total num-
ber of instances is 768 and every one belongs to the training
ex Wj +bj set. Every instance contains 8 attributes, which are the relative
P (y = j|x) = m . (7) compactness (x1 ), surface area (x2 ), wall area (x3 ), roof area
x Wk +bk (x4 ), overall height (x5 ), orientation (x6 ), glazing area (x7 )
k=1 and the glazing area distribution (x8 ). The two desired outputs
The used cost function is the negative log-likelihood: are the heating load (y1 ) and cooling load (y2 ) [20].
For both desired outputs, one neural network is trained
m with the same build-up and training parameters. The neural
1 X
J(y, yd ) = yd ln(yi ), (8) network contains 8 input, 150 hidden units and 1 output. The
m i=1 i
hyperbolic tangent activation function is applied between the
where yd is the desired output and y is the output of the input and hidden layer, while the output activation is linear.
neural network. ADAM optimizer is used with 32 instance sized mini-batches.

ISBN: 978-1-941968-43-72017 SDIWC 23

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017
The optimizer parameters are: step size  5 105 , moment connected by a salient weight into the input layer. During the
estimates 1 0.9 and 2 0.999. Both models are trained by training, the salient weights are part of the optimization map
applying a Mean Squared Error cost function till 20,000 and regularized by two penalty terms.
iterations. The reached Mean Squared Errors are relating to The results of three basic regularization tasks has shown
the desired outputs: y1 0.27 and y2 3.56. Table 5 present the the applicability of the proposed neural network including
attribute significance ratios of y1 and y2 . elimination of constant and redundant inputs. A real classi-
fication problem has presented the model variance decreasing
TABLE 5 capability of the Multilayer Perceptron. In both cases, batch
T HE RESULTS OF REAL REGRESSION EXAMPLE gradient based optimizers are applied. Furthermore, a real
Attribute y1 : ASR[%] y2 : ASR[%] regression task proved also the attribute saliency estimation
x1 18.10 6.92 ability of the proposed neural network, where mini-batch based
x2 13.93 0.33 gradient optimizer are used.
x3 0.42 22.14 Compare to other neural network based feature selectors,
x4 19.00 1.36 the build-up and training of the proposed neural network are
x5 18.44 26.02 relatively simple because it requires only one training phase
x6 0.39 0.57 and it can explore the relationship between the input attributes
x7 24.60 35.51 and the given function properly in one step. The dimension
x8 5.13 7.15 reduction process is a manual step, it seems better suited for
data mining tasks, where the relationship between a given
To examine the results, a neural network is trained with function and the input features is unknown.
growing size of input attributes. The inputs are added one Although the concepts presented here are general and can
by one in decreasing order relating to their saliency. If the be applied to many disciplines, the applications we have in
attribute significance estimation is fully correct then the error mind are to develop better sensor models by exploring the
should decrease. Fig. 2 shows the errors relating to the number relationship between sensor attributes and given effects based
of attributes. on measured data. A second application area is to find proper
sensor combinations for machine condition monitoring.

[1] G. Chandrashekar and F. Sahin, A survey on feature selection methods,
Computers and Electrical Engineering, vol. 40, no. 1, pp. 16 28, 2014.
[2] I. Guyon and A. Elisseeff, An introduction to variable and feature
selection, Journal of machine learning research, vol. 3, no. Mar, pp.
11571182, 2003.
[3] J. R. Vergara and P. A. Estevez, A review of feature selection methods
based on mutual information, Neural Computing and Applications,
vol. 24, no. 1, pp. 175186, 2014.
[4] R. Battiti, Using mutual information for selecting features in supervised
neural net learning, IEEE Transactions on neural networks, vol. 5, no. 4,
pp. 537550, 1994.
[5] D. E. Goldberg and J. H. Holland, Genetic algorithms and machine
learning, Machine learning, vol. 3, no. 2, pp. 9599, 1988.
Fig. 2. Mean squared errors relating to the number of attributes. [6] J. Kennedy and R. Eberhart, Particle swarm optimization proceedings
of the international conference on neural networks, Australia IEEE, vol.
1948, 1995.
For the heating load (y1 ) estimation, the attribute ranking [7] P. M. Narendra and K. Fukunaga, A branch and bound algorithm for
was correct because the error increases only when all attributes feature subset selection, IEEE Transactions on Computers, vol. 26,
are used. So, overfitting appeares by using also x3 input. In no. 9, pp. 917922, 1977.
[8] C. M. Bishop, Pattern recognition, Machine Learning, vol. 128, pp.
the case of cooling load (y2 ), the error does not monotonically 158, 2006.
decrease. Despite, an error increase happens only once with [9] A. Likas, N. Vlassis, and J. J. Verbeek, The global k-means clustering
the first four most relevant attributes and the MSE increased algorithm, Pattern recognition, vol. 36, no. 2, pp. 451461, 2003.
by 0.74. [10] J. Neumann, C. Schnorr, and G. Steidl, Combined svm-based feature
selection and classification, Machine learning, vol. 61, no. 1, pp. 129
150, 2005.
[11] R. Setiono and H. Liu, Neural-network feature selector, IEEE trans-
Attribute selection is a machine learning and data mining actions on neural networks, vol. 8, no. 3, pp. 654662, 1997.
task. Attribute selectors explore the interrelationship between [12] K. W. Bauer, S. G. Alsing, and K. A. Greene, Feature screening using
signal-to-noise ratios, Neurocomputing, vol. 31, no. 1, pp. 2944, 2000.
the data and reduce the size and variance of machine learning [13] K. De Rajat, N. R. Pal, and S. K. Pal, Feature analysis: Neural network
models. and fuzzy set theoretic approaches, Pattern Recognition, vol. 30, no. 10,
This paper has presented a four-layer feed-forward neural pp. 15791590, 1997.
[14] A. Verikas and M. Bacauskiene, Feature selection with neural net-
network approach, which can estimate the attribute signifi- works, Pattern Recognition Letters, vol. 23, no. 11, pp. 13231335,
cance rate related to a given output. Every input attribute is 2002.

ISBN: 978-1-941968-43-72017 SDIWC 24

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017
[15] M. M. Kabir, M. M. Islam, and K. Murase, A new wrapper feature
selection approach using neural network, Neurocomputing, vol. 73,
no. 16, pp. 32733283, 2010.
[16] L. Bottou, Stochastic gradient learning in neural networks, Proceed-
ings of Neuro-Nmes, vol. 91, no. 8, 1991.
[17] M. Lichman, UCI machine learning repository, 2017. [Online].
Available: http://archive.ics.uci.edu/ml
[18] B. Krishnapuram, L. Carin, M. A. Figueiredo, and A. J. Hartemink,
Sparse multinomial logistic regression: Fast algorithms and general-
ization bounds, IEEE transactions on pattern analysis and machine
intelligence, vol. 27, no. 6, pp. 957968, 2005.
[19] D. Kingma and J. Ba, Adam: A method for stochastic optimization,
arXiv preprint arXiv:1412.6980, 2014.
[20] A. Tsanas and A. Xifara, Accurate quantitative estimation of energy
performance of residential buildings using statistical machine learning
tools, Energy and Buildings, vol. 49, pp. 560567, 2012.

ISBN: 978-1-941968-43-72017 SDIWC 25

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

A 3-Dimensional Object Recognition Method Using SHOT and Relationship of

Distances and Angles in Feature Points

Hiroyuki Kudo Kazuo Ikeshiro Hiroki Imamura

Department of Information Department of Information Department of Information
system Science, Graduate system Science, Graduate system Science, Graduate
School of Engineering, School of Engineering, School of Engineering,
Soka University Soka University Soka University
Hachioji-Shi, Tokyo, Japan Hachioji-Shi, Tokyo, Japan Hachioji-Shi, Tokyo, Japan
e17m5212@soka-u.jp ikeshiro@soka.ac.jp imamura@soka-u.jp

ABSTRACT (1) Robustness against occlusion

(2) Fast recognition
In recent years, a human support robot has been (3) Pose estimation with high accuracy
receiving attention. This robot is required to (4) Coping with erroneous correspondence
perform various tasks to support humans. (5) Recognizing objects in a noisy
Especially the object recognition task, which is environment
important when people request the robot to Firstly, the robots need the robust recognition
transport and rearrange objects. Object
for occlusion because occlusion occurs
recognition methods, especially using the 3D
sensor are also receiving attention. As between different objects in domestic
conventional object recognition methods using 3- environment. Secondly, the robots need to
dimensional information, Signature of Histogram recognize a target object fast to achieve
of OrienTations (SHOT) is commonly used. required tasks fast. Thirdly, the robots need to
SHOT performs highly accurate object estimate a pose of a target object with high
recognition since SHOT descriptor is represented accuracy to manipulate a target object.
by 352 dimensions. However SHOT Fourthly, the robots need to cope with
misrecognizes objects which have the same erroneous correspondence to recognize
feature but which are not the same objects and if objects which have the same feature in a local
there is occlusion in the 3-dimensional object. As region but which are not the same object. For
a solution, I would like to propose the object
example, a cube and a rectangular they both
recognition method with high quality by using the
positive part of SHOT. have same future points in their vertex, but
aspect ratio is totally different.


Cognitive system, 3D object, SHOT descriptor,

List matching, Human Support Robot

1 INTRODUCTION Figure 1. Mismatching in the local regions by

using SHOT
In recent years, human support robots have
been receiving attention [1] [2].
Especially, objects recognition task is
important in case that people request the
robots to transport and rearrange an object.
We consider that there are five necessary Figure 2. The relationship between the feature
properties to recognize in domestic points by previous research
environment as follows.

ISBN: 978-1-941968-43-72017 SDIWC 26

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

points but also relationships between feature

However, the previous research does not
satisfy recognizing objects in a noisy
environment. Figure 3 shows the
corresponding rate when we added Gaussian
Figure 3. The changing of the correspondence rate
noise according to standard deviation, and
Table 1. Properties of conventional methods and the shows the result of matching the scene data
proposed method with noises with its original data. In the
Figure 3, we added noises on the data of the
Coping with Recognizing
Robustness Fast Pose estimation
erroneous objects in a noisy pack. As you can see, corresponding rate is
against occlusion recognition with high accuracy
correspondence environment
gradually decreasing, therefore the previous
research is easily interfered by noise. The
Previous research
calculation method of corresponding rate will
Proposed method
be shown in the section 2.7.
Table 1 shows properties of these methods.
Finally, the robots need to recognize an object
As I mentioned, two of the method do not
which has some noises.
satisfy all the properties. To satisfy all the
As conventional object recognition methods
properties of recognition, we propose a 3-
using 3-dimensional information, Signature of
dimensional object recognition method by
Histogram of OrienTations (SHOT) is
using SHOT and relationships of distances
commonly used [3] [4].
and angles in feature points. We use the
SHOT focuses on the local region and
positive parts of both SHOT and previous
expresses the relationship between the point
research. As our approaches, firstly, to have
of interest and the surrounding points as
the robustness against noises, the proposed
SHOT descriptor in a histogram. SHOT
method uses SHOT in region to extract
performs highly accurate object recognition
feature points. SHOT focuses on the local
since SHOT descriptor is represented by 352
region and expresses the feature amount in the
dimensions. Therefore if there is some noise,
histogram as SHOT descriptor when
value of shot descriptor is hardly interfered.
extracting feature points. For this reason, it is
But SHOT misrecognizes objects which have
conceivable that they are less likely to
the same feature but which are not the same
interfere with noise since feature points are
objects, because SHOT only focuses on Local
determined by the values of the histogram.
feature points to match objects as shown
Furthermore, the proposed method generates
Figure 1. Therefore if an object has same
the list of distances and angles between
features in local, SHOT incorrectly
extracted corresponding points that SHOT
recognizes as same objects.
descriptors are matched. In addition, the
Therefore, to compensate for the defect of
proposed method matches lists which are
SHOT, our laboratory has developed the
generated in the model data and scene data.
previous research for the object recognition
by Maehara et al [5]. The previous research
used some high curvature points in regions
for feature points. Furthermore the previous
research generates a list by listing
relationships of distances and angles between
2.1 Flow of The Proposed Method
feature points and matches lists as shown
In this section, we describe about an overview
Figure 2. Thereby, the previous research
of the proposed method based on its
estimates a pose of a target object with high
processing flow as shown Figure 4.
accuracy and copes with erroneous
correspondence by using not only the feature

ISBN: 978-1-941968-43-72017 SDIWC 27

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

encodes a very descriptive entity represented

by the local histogram, SHOT can use a
coarse partitioning of the spatial grid and
hence a small cardinality of the descriptor. In
SHOT, the angle of the dot product of the
normal vector of the reference point and the
normal vector of the point of each grit is
represented by histograms.

2.4 Corresponding Points Extraction

To match the SHOT descriptor, the proposed
method matches each scene feature against all
model features. Furthermore, the proposed
method computes the ratio between the
nearest neighbor and second best. If the ratio
is below a threshold, a corresponding is
established between the scene feature and its
closest model feature.
Figure 4. The flow of the proposed method
2.5 List Generating
In the list generating process, the proposed
method generates the list of distances and
angles between extracted corresponding
points as relationships of these points.
At this time, the proposed method extracts the
(a). model data (b). scene data combination of three points as much as
Figure 5. The input data

2.2 Input Object Data

Firstly, the proposed method inputs target
objects as a teaching data and a scene data as
shown Figure 5.

2.3 SHOT Descriptor Extracting (a). In the model (b). In the scene
To extract feature points, the proposed Figure 6. Overviews of matched points
method uses SHOT (Signature of Histogram Table 2. The LIST in the model
of Orientations). The surface features of the
three-dimensional model can be described
with unique and repeatability by using SHOT.
It expresses the relationship between the point
of interest and its surroundings by histograms.
Since SHOT descriptor is expressed in 352
dimensions, SHOT is the method that can
extract feature points with high accuracy. In Table 3. The LIST in the scene
this section, we explain about how to extract a
SHOT descriptor. To extract a SHOT
descriptor, we use an isotropic spherical grid
that encompasses partitions along the radial,
azimuth and elevation axes, as sketched in
Figure.1. Since each volume of the grid

ISBN: 978-1-941968-43-72017 SDIWC 28

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

possible in the corresponding points as shown

Figure 6, Table 2 and Table 3. In recognizing
the object, the proposed method is able to be
less mismatching of list elements since the
lists we generated are including all the
corresponding points by matching SHOT
descriptor, furthermore if there is a point that Figure 7. Illustration of the rigid registration
is mismatched by SHOT matching, the
proposed method is able to exclude that point
from matching targets due to the difference in
the three points relationship.

2.6 List Matching

In the list matching process, the proposed
method matches the list of the model data and
the list of the scene data. As shown in Figure Figure 8. The result of the rigid registration
6, Table 2 and Table 3, a list has distances
and an angle as element. Then, the proposed Where, is the number of points of the
method matches between list number 1 of the teaching data. is the number of points of the
model and all the lists of the scene data.
object in the scene. is matched point of
Furthermore, in the proposed method, the list
with the smallest difference of between the the fitted teaching data. is matched point
sum of distance between point and point, of the matched object in the scene. The
distance between point and point and proposed method counts a number of
angle, which is less than the threshold is which are within a threshold which is 1
subjected to matching. [mm] of by the equation (1) as a score.
And then, the proposed method calculates the
2.7 Rigid Registration corresponding rate based on the score by
To recognize the target object in the scene equation (2). Finally, the proposed method
data, the proposed method applies the rigid selects a clustered object which has the
registration to the teaching data. Firstly, the highest corresponding rate.
proposed method fits the teaching data to the
matched object in the scene by calculating the
optimum rotation matrix R and the translation 3 EXPERIMENTS
vector t from associated corresponding points. In this section, to evaluate effectiveness of the
Secondly, the proposed method calculates a proposed method, we compare the proposed
corresponding rate M between a teaching data method with the previous research about five
and the matched object by using properties mentioned in section 1 as follows.
(1) Robustness against occlusion
(2) Fast recognition
(3) Pose estimation with high accuracy
(1) (4) Coping with erroneous correspondence
(5) Recognizing objects in a noisy
3.1 Object Recognition in Occlusion Scene
In this experiment, we compared the proposed
. (2)
method with the previous research and SHOT
to evaluate about three properties as follows.

(1) Robustness against occlusion

ISBN: 978-1-941968-43-72017 SDIWC 29

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Table 4. The result of average processing time

(a). Pack (b). Spray (c). Cup noodle

Figure 9. Overviews of objects and 3-dimensional
data of objects in the experiment accuracy, we use the equation (1) and (2) with
which is 1 [mm]. In case that, the
corresponding rate is high, that means
methods estimate the pose of a target object
with high accuracy. On the contrary, in case
that, the corresponding rate is zero, which
means methods mismatch the target object.
(a). Occlusion from (b). Occlusion from (c). Occlusion from The reported processing time is obtained
top side(10%) top side(40%) top side(70%) using Intel(R) Core(TM) i5 3.1GHz with 8.0
GB of main memory.
Figure 11 shows the result of occlusion
scenes for the spray object. As shown Figure
11, the proposed method was able to
recognize objects nearly equal to the previous
(d). Occlusion from (e). Occlusion from (f). Occlusion from
bottom side(10%) bottom side(40%) bottom side(70%)

(g). Occlusion from (h). Occlusion from (i). Occlusion from

right side(10%) right side(40%) right side(70%)
Figure 10. The examples of occlusion scene of (a). Corresponding rate of the spray occluded
the spray from the top side

(2) Fast recognition

(3) Pose estimation with high accuracy
We used three actual objects as recognition
targets which are usually in domestic
environment and obtained those 3-
dimensional data with Kinect as shown in
figure 9. In Figure 9, (a) shows a pack, (b)
(b). Corresponding rate of the spray occluded
shows a spray and (c) shows a cup noodle. from the bottom side
To generate occlusion scenes, we delete part
of each 3-dimensional object data from 3-
directions (top, bottom and right side) by 10%
each of point number of each 3-dimensional
object data as shown in figure 10.
To evaluate a pose estimation accuracy of a
target object, we use the corresponding rate M
between the target object fitted by using the
optimum rotate matrix R and the translation (c). Corresponding rate of the spray occluded
from the right side
vector t mentioned in the rigid registration
process (section 2.7). To calculate the Figure 11. The result of the spray occluded
corresponding rate M as a pose estimation from 3 directions

ISBN: 978-1-941968-43-72017 SDIWC 30

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

research and SHOT in occlusion scene. In Table 5. The result of average processing time
addition, as shown in Table 4, a processing Average processing time[sec.]
Spray Pack Noodle
time of the proposed method was nearly equal Previous research 3.20 2.75 2.62
to the SHOT and the previous research. From SHOT 1.24 1.09 1.12
these results, we consider that the proposed Proposed method 1.82 0.91 1.24
method has the robustness against occlusions
because the proposed method is able to match
the feature points of the target object and the
feature points of unoccluded scene data by
using the SHOT. In addition, we consider that
the proposed method is able to estimate a
pose of a target object with high accuracy
because the proposed method uses is not only Figure 13. The result of the spray of the pose
the corresponding points but also estimation in a noisy environment
relationships between corresponding points.
From these results, we consider that the
In this paper, I only show the result of the
proposed method has the robustness against
spray, however we got the same result in
noises, because the proposed method uses
other objects.
SHOT to generate feature points, SHOT is
hardly interfered with noises due to the high
3.2 Recognizing objects in a noisy dimensionality of SHOT feature quantities.
In this section, to evaluate effectiveness of the
3.3 The Experiment in Recognition of
proposed method in recognizing objects in a
Objects Which Have the Same Feature
noisy environment. We prepared same objects
but Which are not the Same Object
with the first experiment. To generate noisy
To qualitatively evaluate about coping with
scenes, we added some Gaussian noise on the erroneous in the proposed method, we
scenes. Figure 12 shows the changing of the compared the proposed method with the
scene data when we added noises. SHOT. As target objects which have the same
To evaluate a pose estimation accuracy of a feature in a local region but which are not the
target object, we use the corresponding rate M same object, we prepared a 500ml-pack and a
same as section 3.1. 1000ml-pack as shown in Figure 14.
Figure 13 and Table 5 show results about We generated the teaching data from the
accuracy and processing time of the proposed 1000ml-pack and applied the proposed
method, SHOT and the previous research. As method, the SHOT to a scene data in 500ml-
shown the result, we consider that the pack. Figure 15 and Figure 16 show the
accuracy and processing time of the proposed
results of the SHOT, Figure 17 shows the
method are equal to or more than these of the
result of the proposed method. As shown in
previous research and SHOT. Although I
these results, erroneous correspondence
show only the result of the spray here, the
occurred in the SHOT and it misrecognized
pack and the Cup noodle were able to obtain
the 1000ml-pack as the 500ml-pack.
equivalent result.

(a). 0.001[] (b). 0.002[] (c). 0.003[] (d). 0.005[] (a). 500-ml pack (b). 1000-ml pack
Figure 14. Overviews of objects and 3-
Figure 12. The result of the spray occluded from
dimensional data of objects in the
3 directions

ISBN: 978-1-941968-43-72017 SDIWC 31

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

distances and angles in feature and SHOT

descriptor points for the human support robot.

(1) Robustness against occlusion

(2) Fast recognition
(3) Pose estimation with high accuracy
(4) Coping with erroneous correspondence
(5) Recognizing objects in a noisy
Figure 15. The result of SHOT In experiments about five properties, we saw
the proposed method is more effective than
the SHOT and the previous research.
Summarizing the above, the proposed method
extracts the matching candidate points using
SHOT, focuses on the relationship between
the candidate points, and eventually uses the
list to match.
As a result, it becomes possible to recognize
objects that could not be recognized by
Figure 16. The result of SHOT conventional methods that have been used up
to now with high accuracy.
However, the proposed method cannot
recognize same shape objects which has
different texture because the proposed method
only uses SHOT descriptor which are
calculated from a shape data of objects.
Therefore, as future works, I improve the
proposed method by using not only a shape
data of objects but also color features in a
Figure 17. The result of the proposed method future work.
Since the feature quantities of the upper part REFERENCES
and the lower part of the pack are exactly the
same, when matching is performed, different [1] S.Sugano, T.Sugaiwa and H. Iwata,Vision
results are obtained each time depending on a System for Life Support Human-Symbiotic-
Robot, The Robotics Society of Japan, 27 (6), pp.
threshold value. 596-599, 2009.
Conversely, the proposed method recognizes
that they are the different object since there is [2] Y.Jia.,H.Wang.,P.Sturmer, N.Xi, Human/robot
interaction for human support system by using a
no corresponding lines, which means the mobile manipulator, ROBIO, pp. 190-195, 2010.
relationship of distances and angles between
points is no corrected. From these results, the [3] F.Tombari and S.Salti,Unique signatures of
histograms for local surface description, ECCV,
proposed method is able to cope with pp. 356-369, 2010.
erroneous correspondence and is more
[4] F.Tombari,S.Salti and L.D. Stefano, A Combined
effective than SHOT. Texture-Shaped Descriptor for Enhanced 3D
Feature Matching, ICIP, pp.809-812, 2011.

[5] S.Maehara,H.imamura,K.Ikeshiro, A 3-
4 CONCLUSION Dimensional Object Recognition Method Using
SHOT and Relationship of Distances and Angles
in Feature Points.,DIPDMWC2015,2015
In this paper, we proposed the 3-dimensional
object recognition method which has five
properties as follows using relationships of

ISBN: 978-1-941968-43-72017 SDIWC 32

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Object Detection Method Using Invariant Feature Based on Local Hue Histogram in
Divided Area of an Object Area

Tomohiro Kanda Kazuo Ikeshiro Hiroki Imamura

Department of Information Systems Department of Information Systems Department of Information Systems
Science, Graduate School of Science, Graduate School of Science, Graduate School of
Engineering, Soka University Engineering, Soka University Engineering, Soka University
1-236, Tangi-machi, Hachiouji-Shi, 1-236, Tangi-machi, Hachiouji-Shi, 1-236, Tangi-machi, Hachiouji-Shi,
Tokyo, Japan 192-8577 Tokyo, Japan 192-8577 Tokyo, Japan 192-8577
e16m5206@soka-u.jp ikeshiro@soka.ac.jp imamura@soka.ac.jp

In recent years, human support robots have been
receiving attention. Then, the robots are required to
perform various tasks to support humans. Especially, an
object detection task is important in case that people
request the robot to transport and rearrange an object. (a) (b)
However, the detection becomes difficult owing to Figure. 1. A target object and An input image. a) A target
difference of visual aspect in case of detecting with a object. b) An input image
target object from an equipped camera on a robot. We
consider that there are seven necessary properties to
The object detection is technology to detect a
detect in domestic environment as follows. 1. target object (Fig.1 (a)) from an input image (Fig.1
Robustness against the rotation change. 2. Robustness (b)).
against the scale change. 3. Robustness against the There is a problem which the detection is difficult
illumination change. 4. Robustness against the distortion in case of detecting with a target object from an
by perspective projection. 5. Robustness against the equipped camera on a robot, because differences of
occlusion. 6. Detecting an object which has few textures. visual appearance occur such as the rotation change.
7. Detecting different objects which have the same Therefore, we consider that there are six necessary
features of the hue histogram. As conventional method, properties to detect in domestic environment as
there are Scale Invariant Feature Transform (SIFT), follows.
Color Indexing and the proposed method by Tanaka et al.
These conventional methods do not satisfy all seven 1. Robustness against the rotation change
properties needed for the robots. Therefore, to satisfy the 2. Robustness against the scale change
seven properties of detection, we propose the object
detection method using invariant feature based on local 3. Robustness against the illumination change
hue histogram in a divided area of an object area. 4. Robustness against the distortion by perspective
KEYWORDS projection

Cognitive robot, hue histogram, peak and trough, 5. Robustness against the occlusion
divided area. 6. Detecting an object which has few textures
1 INTRODUCTION Firstly, the robots need the robust detection for the
In recent years, human support robots have been rotation change because the rotation change is
receiving attention [1-3]. Then, the robots are occurred in case that an object falls down such as
required to perform various tasks to support humans. fig.1 (b). Secondly, the robots need the robust
Especially, an object detection task is important in detection for the scale change because the scale
case that people request the robots to transport and change is occurred by the position between the
rearrange an object. robots and an object. Thirdly, the robots need the
robust detection for the illumination change because

ISBN: 978-1-941968-43-72017 SDIWC 33

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Figure. 4. Example of three dimensional color histogram

Figure. 2. Example of an object which has few
textures changes. In addition, as shown item 6 in table 1, the
Table. 1. Properties of Conventional Methods and the SIFT is difficult to detect an object which has few
Proposed Method textures because the SIFT cannot detect feature
amount against an object that an edge cannot be
detect from such as an object which has few textures.
On the other hand, Swain et al. proposed the
Color Indexing. To detect a target object from an
input image, the Color Indexing compares three
the precision of detection is easy to be affected by dimensional color histogram of a target object with
illumination condition. Fourthly, the robots need the three dimensional color histogram of a candidate
robust detection for the distortion by perspective object. As an example, fig.4 shows three
projection because the distortion by perspective dimensional color histogram. As shown in fig.4, a
projection is occurred in case that the robot moves size of the square in three dimensional color
sideways, looks up or looks down. Fifthly, the histogram expresses the frequency of each color.
robots need the robust detection for occlusion The frequency of each color does not change in case
because occlusion is occurred between different that the rotation change, the scale change, the
objects. Finally, the robots need to detect an object distortion by perspective projection and occlusion
which has few textures. As an example, fig.2 shows are occurred. In addition, the Color Indexing can
an object which has few textures. detect an object which has few textures by using the
color information. Therefore, as shown item 1, 2 and
As conventional method, there are the Scale 4-6 in table 1, the Color Indexing has robustness
Invariant Feature Transform (SIFT) [4] and the against the rotation change, the scale change,
Color Indexing [5]. Table 1 shows properties of distortion by perspective projection, occlusion and
these methods. an object which has few textures. However, as
To detect a target object from an input image, the shown item 3 in table 1, the Color Indexing is
SIFT compares local feature amount of a target difficult to detect in case that illumination is
object with local feature amount of an input image, changed, because the Color Indexing uses RGB
and extracts correspondence points as shown in fig.3. color system which is easy to be affected by the
Therefore, as shown item 1-3 and 5 in table 1, the illumination change.
SIFT has robustness against the rotation change, the In our laboratory, Tanaka et al. [6] focused on
scale change, the illumination change and occlusion. hue. Hue has three merits. Firstly, hue is hardly
However, as shown item 4 in table 1, the SIFT does affected by illumination and shadow. Secondly, hue
not have robustness against the distortion by is invariable against the distortion by perspective
perspective projection that local feature amount projection. Finally, the detection using hue can
detect an object which has few textures. In addition,
as shown in fig.5, the positions of peak and trough
of the hue histogram is invariable against the
rotation change, the scale change, the illumination
change, the distortion by perspective projection and
occlusion. In fig.5, we define the longitudinal axis as
the frequency, and the lateral axis as the hue value.
Figure. 3. Example of object detection by using SIFT In addition, the circle indicates the position of peak

ISBN: 978-1-941968-43-72017 SDIWC 34

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Figure. 6. Example of different objects which have same

invariant feature. a) The object which has same
invariant feature with (b). b) The object which has
same invariant feature with (a). c) Hue histogram of
(a). d) Hue histogram of (b)
(b). As shown item 7 in table 1, the conventional
method cannot distinguish objects (Fig.6 (a), (b)) as
different object because the positions of peak and
Figure. 5. Example of the positions of peak and trough
trough of fig.6 (a) equal the positions of peak and
trough of fig.6 (b) as shown in fig.6 (c), (d). As
and the square indicates the position of trough. mentioned above, as shown in table 1, the SIFT, the
Accordingly, Tanaka et al. proposed the object Color Indexing and the conventional method do not
detection method using invariant feature based on satisfy all seven properties.
the hue histogram. Here, we define the proposed Therefore, to satisfy the seven properties of
method by Tanaka et al. as the conventional method. detection, we propose the object detection method
To detect a target object from an input image, the using invariant feature based on the local hue
conventional method extracts the positions of peak histogram in a divided area of an object area instead
and trough of the hue histogram as invariant feature. of invariant feature based on the hue histogram in
In addition, the conventional method compares the whole object area. As our approaches, firstly, the
invariant feature of a target object with invariant proposed method divides an object area into plural
feature of a candidate object. As mentioned above, areas. As an example, fig.7 (a), (b) shows divided
the positions of peak and trough of the hue objects. Secondly, the proposed method extracts hue
histogram do not change in case that the rotation from each divided area and generates local hue
change, the scale change, the illumination change,
the distortion by perspective projection and
occlusion are occurred. In addition, the conventional
method can detect an object which has few textures
(a) (b)
by using hue information. Therefore, as shown item
1-6 in table1, the conventional method has
robustness against the rotation change, the scale
change, the illumination change, the distortion by
perspective projection, occlusion and an object
which has few textures. However, the conventional
method does not satisfy a property as follows.
7. Distinguishing different objects which have the
same features of the hue histogram
The conventional method cannot distinguish objects (c) (d)
which have the different textures but have the same
Figure. 7. Example of local hue histograms in divided area
positions of peak and trough of the hue histogram.
of an object. a) Divided area of fig.6 (a). b) Divided
As an example, fig.6 (a), (b) shows different objects area of an fig.6 (b). c) Local hue histograms in each
which have the same features of the hue histogram. divided area of fig.7 (a). d) Local hue histograms in
Fig.6 (c), (d) shows the hue histogram of fig.6 (a), each divided area of fig.7 (b).

ISBN: 978-1-941968-43-72017 SDIWC 35

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Start BRISK [7]. As an example, we set the registered

image which rotated 90 degrees to the input image
Feature matching
in this time. Finally, the proposed method performs
Candidate object extraction
matching based on the BRISK descriptor between
the registered image and the input image.
Candidate object rotation
2.3 Candidate Object Extraction
Object area division
Firstly, the proposed method calculates a vector
Local hue histogram generation from the most matched feature point to the most
far away endpoint of the object area in the registered
Invariant feature comparison image, and defines a norm of vector as the
End maximum distance . Secondly, the proposed
method calculates vectors from the most matched
Figure. 8. Flow of the proposed method.
feature point to the second most matched feature
histograms. As an example, fig.7 (c), (d) shows the point in the registered image and the input image,
local hue histogram of fig.7 (a), (b). Thirdly, the and calculates the ratio of the size of the registered
proposed method extracts the positions of peak and image and the input image using the calculated
trough from each local hue histograms as invariant vectors. Thirdly, the proposed method describes a
feature. As shown in fig.7 (c), (d), the positions of circle with center is the most matched feature point
peak and trough of the local hue histogram changes and radius is vector in the registered image. In
for each divided area by dividing an object area. addition, the proposed method describes a circle
Finally, the proposed method compares invariant with center is the most matched feature point and
feature of the target object with invariant feature of radius is vector in the input image. Vector is
the candidate object every each local hue histograms. calculated by using
Thereby, the proposed method can distinguish
different objects which have the same feature of hue = (1)
histogram. Furthermore, the proposed method has Where is vector , is vector , is the
robustness against the rotation change, the scale calculated ratio. Fourthly, the proposed method
change, the illumination change, the distortion by extracts the contour of the a circle in the registered
perspective projection, occlusion and an object image and the input image. Finally, the proposed
which has few textures as shown in table 1. method trims the registered image and the input
image based on the extracted contours. Thereby, it is
possible to accurately extract the candidate area
2.1 Flow of Proposed Method
even when scale is different or occlusion occurs as
We show the flow of the proposed method in
shown in fig.10.
fig.8. We represent each processing in the next
sections. 2.4 Candidate Object Rotation
As shown in fig.11, in case that the input image
2.2 Feature Matching
rotates, the feature amount of the hue histogram
Firstly, the proposed method extracts feature
changes. Therefore, it is necessary to rotate the input
points from the registered image (Fig.9 (a)) and the
input image (Fig.9 (b)) based on the feature of the

(a) (b) (a) (b)

Figure. 9. Registered image and input image. a) Figure. 10. Candidate area extraction. a) Registered image.
Registered image. b) Input image. b) Input image.

ISBN: 978-1-941968-43-72017 SDIWC 36

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

histogram is represented from 0 to 359. Fig.12

shows the divided areas and local hue histograms.
We define the vertical axis as the frequency, and the
horizontal axis as the hue value. Here, we pay
attention to item of fig.12. Fig.13 (a) shows an
expanded hue histogram of item of fig.12. In
fig.13 (a), because there are small irregularities at 7
and 9 of hue value, the feature amount becomes
unstable by detecting the positions of peak and
trough of the hue histogram in this state. Secondly,
(a) (b)
to eliminate those small irregularities, the proposed
Figure. 11. In case that input image rotates. a) Feature
amount of the hue histogram of registered image. b)
method smooths the hue histogram by using
Feature amount of the hue histogram of input image Gaussian filter. Fig.13 (b) shows a smoothed hue
histogram of fig.13 (a). In fig.13 (b), we can see that
image. Firstly, the proposed method extracts feature small irregularities of the hue histogram are omitted
points from the trimmed registered image and the and the characteristic positions of peak and trough
trimmed input image, and performs matching based of the hue histogram are remained. Finally, the
on the BRISK descriptor between the trimmed proposed method extracts the positions of peak of
registered image and the trimmed input image. the smoothed hue histogram from each divided areas
Finally, the proposed method rotates the trimmed of the registered image and the input image by using
input image by using extracted correspondence
points by expression (2). We represent expression (1 < ) ( > +1 )
(2) as follows.
2 1 11 12 13 1 and the positions of trough of the smoothed hue
(2 ) = (1 ) = (21 22 23 ) (1 ), (2) histogram from each divided areas of the registered
1 31 32 33 1 image and the input image by using

where is the homography matrix, (1 , 1 ) are the (1 > ) ( < +1 ), (4)

coordinates before the rotation, (2 , 2 ) are the and registers them as invariant feature. Where is
coordinates after the rotation, is coefficient. the hue value which is focused on, 1 is the hue
2.5 Object Area Division value before one of . +1 is the hue value after
To generate the local hue histogram, the one of . And then, the extracted positions of peak
proposed method divides the trimmed registered and trough of the registered image are expressed by
image and the rotated input image into plural areas. using
Here, we define the number of division as four as an {1 () , 2 () , } () , (5)
2.6 Local Hue Histogram Generation
Firstly, the proposed method extracts hue from
each divided area and generates local hue
histograms. The hue value of the generated


Figure. 13. Smoothing processing. a) A part of hue
Figure. 12. Divided area and local hue histograms.
histogram. b) A part of smooth hue histogram

ISBN: 978-1-941968-43-72017 SDIWC 37

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

{1 () , 2 () , } () (6)
where () is a set of peak position of a smoothed
hue histogram in a divided area . 1 () , 2 () ,
which are each peak positions of a smoothed hue
histogram, () is a set of trough position of a
smoothed hue histogram in a divided area .
1 () , 2 () , which are each trough positions of a
smoothed hue histogram. In addition, the extracted
positions of peak and trough of the input image are Figure. 14. Comparing invariant feature of registered
image with invariant feature of input image.
expressed by using
{1 () , 2 () , } () , (7) 1 , 2 , are the smallest difference values
between () to 1 () , 2 () , , 1 () , 2 () ,
{1 () , 2 () , } () , (8) which are each trough positions of a smoothed hue
where () is a set of peak position of a smoothed histogram in a divided area , is the number of
peak of a smoothed hue histogram in a divided area
hue histogram in a divided area , 1 () , 2 () ,
, is number of trough of a smoothed hue
which are each peak positions of a smoothed hue
histogram in a divided area , is the total value
histogram, () is a set of trough position of a of difference value of peak position, is the total
smoothed hue histogram in a divided area , value of difference value of trough position, is the
1 () , 2 () , which are each trough positions of total difference value. As an example, fig.14 shows
a smoothed hue histogram. that the comparison of the hue histogram of a
2.7 Invariant feature comparison registered image and the hue histogram of an input
Firstly, the proposed method calculates image. As shown in fig.14, the proposed method
difference values between invariant feature of the compares the positions of peak and trough of a
registered image and invariant feature of the input smoothed hue histogram of a registered image with
image by using the positions of peak and trough of a smoothed hue
histogram of an input image, and calculates
1 = min |1 () () |, (9) difference values. In addition, the proposed method
registers a peak and a trough having the smallest
2 = min |2 () () |, (10) difference value as the nearest peak and trough.
Finally, the proposed method detects the object
which has smallest D.
1 = min |1 () () |, (11) 3 EXPERIMENT

2 = min |2 () () |, (12) 3.1 Experiment Relating to The Number of

1 Division of an Image
1) Experiment Overview: In this experiment, to
determine the optimum number of division,
=1 , (13) changing the number of division from 1 piece to 25
= =1 , (14) pieces, the proposed method detected a registered
object from an input image. Furthermore, we
= + , (15) calculated the correct answer ratio by using
where is the number of peak of a smoothed hue
histogram in a divided area , is the number of = 100 [%]
valley of a smoothed hue histogram in a divided
area , 1 , 2 , which are the smallest and defined the number having the highest correct
difference values between () to 1 () , 2 () , , answer ratio as the number of division. Where is
1 () , 2 () , which are each peak positions of a the correct answer ratio, is the number which the
smoothed hue histogram in a divided area , proposed method could correctly detect objects, is

ISBN: 978-1-941968-43-72017 SDIWC 38

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

case that the number of division is four. However,

the correct answer ratio repeated increase and
decrease between 80% and 100%. From these
results, we defined the number of division as four
pieces in this paper.
(a) (b) (c) 3) Discussions: We understood that the optimal
Figure. 15. Feature of the object. a) Normal object. b) number of division was difference according to
object which has few textures. c) object which has objects from these experimental results. Therefore,
same hue value we saw that it is necessary to statistically calculate
the optimal parameter to increase the correct answer
the number of the registered objects. We selected 5 ratio.
pieces in normal object (Fig.15 (a)), 5 pieces in
object which has few textures (Fig.15 (b)), and 5 3.2 Experiment of Robustness Against All
pieces in object which has same hue value (Fig.15 Difference of Visual Appearance
1) Experiment Overview: In this experiment, to
(c)), totaling 15 pieces of objects from Amsterdam show the robustness against seven properties
Library of Object Images [8] as the test image. In mentioned above of the proposed method, we
object which has same hue value, we selected object carried out the experiment while giving objects
which has hue values of yellow and red. changes, and compared the correct answer ratio of
2) Experimental Results: Fig.16 shows the result the proposed method with the correct answer ratio
of this experiment. As a result, a correct answer of conventional methods. We used objects same as
ratio increased as number of division increased. In experiment of the number of division as the
addition, the correct answer ratio became highest in inspection object. In addition, we calculated the
case that the number of division is four and correct answer ratio by using equation (15). Here,
continued maintaining afterwards. However, we we show setting of each method. In the proposed
carried out the experiment on occluded objects, method, we defined the number of division as four
because we could not determine the most suitable pieces, the threshold of the peak as 40, the threshold
number of division in this result. Fig.17 shows the of the trough as 5 and of Gaussian filter as 1.0. In
result of the experiment of the occluded objects. As the conventional method, we defined of Gaussian
a result, the correct answer ratio became highest in filter as 1.0. In the SIFT, we defined the threshold
of correspondence points as 70.0. In addition, we
defined the object which has correspondence points
more than ten points and most correspondence
points as a detection result. In the Color Indexing,
we defined 255 gradation which was divided into
four as each axis of the histogram.
2) Experiment Results: We only indicate the
result of the proposed method without comparing
the proposed method with the conventional methods,
because both the proposed method and the
conventional methods have robustness against the
Figure. 16. The correct answer ratio for the number of rotation change and the scale change.
a) The scale change: We carried out the
experiment while increasing the scale level from 0.8
times to 1.2 times every 0.2. As a result of this
experiment, the correct answer ratio was 100% in
each scale level.
b) The rotation change: We carried out
experiment while increasing the rotation degree
from 0 degrees to 180 degrees every 90 degrees. As
a result of this experiment, the correct answer ratio
was 100% in each rotation degree.
Figure. 17. The correct answer ratio for the number of
division on the occluded objects.

ISBN: 978-1-941968-43-72017 SDIWC 39

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Figure. 20. The occluded object.

Figure. 18. The result of the experiment about the

illumination change
We indicate the result of the experiment about
robustness against the illumination change, the
distortion by perspective projection and occlusion.
c) The illumination change: We carried out the
experiment while increasing the gradation levels
from -20 gradations to +20 gradations every 20
gradations. Fig.18 shows the result of the
Figure. 21. The result of the experiment about occlusion.
experiment about the illumination change. As
shown in fig.18, the correct answer ratio of the result of the experiment about occlusion. As shown
proposed method was high, whereas the correct in fig.21, the correct answer ratio of the proposed
answer ratio of the Color Indexing was low as we method was higher than the correct answer ratio of
expected. conventional methods as we expected.
d) The distortion by perspective projection: We In addition, we focus on the feature of the each
carried out the experiment for the rotation in 30 object. Fig.22 shows the correct answer ratio of the
degrees and 45 degrees. Fig.19 shows the result of each object. As shown in fig.22, the proposed
the experiment about the distortion by perspective method and the Color Indexing got a high correct
projection. As shown in fig.19, the correct answer answer ratio in all features, whereas the SIFT got a
ratio of the proposed method was high, whereas the low correct answer ratio in the object which has few
correct answer ratio of the SIFT was low as we texture. In addition, the conventional method got a
expected. low correct answer ratio in the object having the
e) The occlusion: We carried out the experiment same hue value and the normal object.
while increasing occlusion ratio from 10 percent to 3) Discussions:
40 percent every 10 percent. As an example, we a) The proposed method: The correct answer
show an occluded object in fig.20. Fig.21 shows the ratio was higher or nearly equal than the
conventional methods. However, the correct answer

Figure. 19. The result of the experiment about the

distortion by perspective projection Figure. 22. The correct answer ratio on features of objects.

ISBN: 978-1-941968-43-72017 SDIWC 40

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

ratio decreased a little in case of 45 degrees of the the human support robot. To show that the proposed
distortion by perspective projection, because the method satisfies all seven properties as follows,
proposed method uses local feature amounts on
candidate object extraction and rotation processing. 1. Robustness against the rotation change
2. Robustness against the scale change
b) The conventional method: The correct
3. Robustness against the illumination change
answer ratio was less than 60% in all changes. 4. Robustness against the distortion by perspective
However, we see that there are few differences by projection
comparing the correct answer ratio in the object 5. Robustness against the occlusion
which a change is given and the correct answer ratio 6. Detecting an object which has few textures
in the object which a change is not given. Therefore, 7. Detecting different objects which have the same
we focus on the feature of the each object. As features of the hue histogram
shown in fig.22, the correct answer ratio was low in
the object having the same hue value and the We carried out experiments. As a result, we could
normal object. The reason for this, when the show that the proposed method satisfies all seven
candidate objects have plural pieces of color properties. However, the detection accuracy is
information, they have the same hue value with limited, because the proposed method uses only two
high probability, therefore it is thought that dimensional information. Therefore, in the future,
erroneous recognition occurred and the correct we aim to improve the ability for detection by using
answer ratio decreased. three dimensional information such as the shape
c) The SIFT: The correct answer ratio was less information.
than 60% in all changes. Especially, the correct
answer ratio decreased on the distortion by REFERENCES
perspective projection. The reason for this, the local [1] S. Sugano, T. Sugaiwa, and H. Iwata, Vision System for
Life Support Human-Symbiotic-Robot, The Robotics
feature amount changes by the distortion by Society of Japan, vol. 27, pp. 596599, 2009.
perspective projection, therefore it is thought that [2] T. Odashima, M. Onishi, K. Thara, T. Mukai, S. Hirano,
Z. W. Luo, and S. Hosoe, Development and evaluation
the SIFT descriptor changes and the correct answer of a human-interactive robot platform RI-MAN, The
ratio decreased. In addition, we focus on the feature Robotics Society of Japan, vol. 25, pp.554565, 2007.
of the each object. As shown in fig.22, the correct [3] Y. jia, H. wang, P. Sturmer, and N. Xi, Human/robot
interaction for human support system by using a mobile
answer ratio was low in the object which has few manipulator, ROBIO, pp. 190195, 2010.
textures. The reason for this, the SIFT descriptor [4] H. Fujiyoshi, Gradient-Based Feature Extraction SIFT
and HOG-, IPSJ SIG Technical Report CVIMI160,
uses the gradient information of the object, pp.221224, 2007.
therefore it is thought that the SIFT could not detect [5] M. j. Swain, and D. H. Ballard, Color Indexing, IJCV,
vol. 7, pp.1132, 1991.
the feature amount on the object which has few
[6] K. Tanaka, Y. Hagiwara, and H. Imamura, Object
edges such as the object which has few textures, Dtection in Image Using Feature of Invariant based on
and the correct answer ratio decreased. Histogram of Hue, IEICE, pp.187194, 2011.
[7] Leutenegger, S., Chli, M., & Siegwart, R. Y. (2011,
d) The Color Indexing: The correct answer ratio November). BRISK: Binary robust invariant scalable
decreased on the illumination change. The reason keypoints. In Computer Vision (ICCV), 2011 IEEE
International Conference on (pp. 2548-2555). IEEE.
for this, the RGB color system which is used for the [8] http://aloi.science.uva.nl/
Color Indexing is easy to be affected by the
illumination change, therefore it is thought that the
value of the three dimensional color histogram
changed and the correct answer ratio decreased.
In this study, we proposed the object detection
method using invariant feature based on the local
hue histogram in a divided area of an object area for

ISBN: 978-1-941968-43-72017 SDIWC 41

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Handwriting Text/Non-Text Classification on Mobile Device

Viacheslav Khomenko, Andriy Volkoviy, Illya Degtyarenko and Olga Radyvonenko

Samsung R&D Institute Ukraine (SRK)
57, Lva Tolstogo Str., Kyiv, 01032, Ukraine
{v.khomenko, a.volkoviy, i.degtyarenk, o.radyvonenk}@samsung.com

ABSTRACT example of continuous development and

scope expanding of such solutions.
This paper is dedicated to classification of hand- The information in online digital handwriting
written/drawn input made on screen of mobile documents is presented as a sequence of
devices into two classes: Text and Non-Text. strokes. A stroke contains point coordinates
A deep-learning solution using gated recurrent from pen-down to pen-up. Depending on
and feed-forward artificial neural networks has particular hardware (sensors) and software
been proposed. Two approaches have been
(API) additional information can be available,
compared: a real-time approach, designed to
process data at input time with preliminary strokes e.g. SPenSDK provides information about
grouping, and a batch processing approach, type of tool that was used for writing (s-pen,
designed for analysis of completed handwriting finger, mouse, etc.), time stamps, values of
documents having access to document contexts pressure, tilt and orientation, as well as
and performing text line grouping after features of brush selected for drawing [5].
classification. The key problem of document recognition
The presented solutions have been validated using with free-form input relies in selection of the
the benchmark IAMonDo dataset [1] and recognition engine that should be applied for
specially collected Samsung Mobile HandWriting processing of the corresponding handwritten
Document (MHWD) dataset, containing about textual or hand-drawn non-textual data
10 000 free-form documents combining
element [6]. Thus, purpose of the document
unconstrained handwriting in seven different
languages and different heterogeneous elements. structure analysis (DSA) is structuring a
The obtained precision by text class is 98.09% handwritten document into elements through
and recall by text class is 99.07% for the proposed segmentation and classification [7] and this
batch processing approach. research is aimed at developing such analysis
The results of the research have become the basis system (DSA engine) that can be subsystem
for development of Document Structure Analysis of complex solution for free-form handwriting
Engine focused on mobile platform and included recognition presented in Fig. 1.
in Samsung Handwriting Recognition Solution. The DSA engine receives strokes from the
input source and performs their classification
KEYWORDS and grouping into elements. The classified
Handwriting, Recognition, Classification, Neural data elements are further sent to the
network, Mobile platform. corresponding recognition engines. In this
work 2 classes are used: Text and Non-Text.
1 INTRODUCTION For strokes classified as Non-Text two
strategies can be used: all engines dedicated
Currently human-machine interaction is to recognition of non-text content receive
undergoing significant changes under the non-text strokes (as per Fig. 1), or additional
influence of advent of computationally classifier is involved to make additional
powerful touch-screen devices, smartphones, separation (as per Fig. 2). The strokes
phablets and interactive whiteboards [2]. Note classified as Text must first be recognized as
taking and processing applications [3] as well characters to make further processing
as intelligent tutoring systems [4] can be an possible.

ISBN: 978-1-941968-43-72017 SDIWC 42

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

The text recognition engine transforms

handwritten strokes into sequences of
characters. The shape recognition engine
recognizes a set of known shapes, diagram
axes and table grids. The pattern recognition
engine recognizes correction and edit patterns
(such as underlines, strikethrough, insertion,
splitting, etc.). The formulas are processed
with formula recognition engine. The
drawings and accidental strokes that cannot
be attributed to any of the known classes are
classified as free-drawing objects. The
recognition engines outputs are further
combined with the document structure.
Two principal scenarios are possible: real-
time and batch processing modes. For the
real-time recognition mode, strokes are
processed immediately after input. At the
Figure 1. Subsystems of free-form handwriting
same time, the context of the document is not
recognition system. available.

Figure 2. Data flow in complex solution for free-form handwriting recognition.

ISBN: 978-1-941968-43-72017 SDIWC 43

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

The batched mode shows the best accuracy In this work, two machine learning based
because of use of the documents context. The solutions for real-time and batch processing
analysis of already completed documents modes are proposed.
should be performed with the highest possible
precision. Taking into account the 3 PROPOSED SOLUTIONS
requirements for text recognition engine,
requirements for DSA are stated as: precision 3.1 Model architecture
by text class of 98%, and recall of 99%.
The proposed solution was validated using Due to the sequential nature of stream of
benchmark IAMonDo dataset [1] and Mobile strokes, the recurrent neural networks (RNN)
HandWriting Document (MHWD) dataset. provide high performance for online
The structure of the paper is as follows: after handwriting data analysis. Moreover, the
this introduction, section 2 presents the gated recurrent neural networks learn routing
review of published research. Section 3 activations with gating mechanism that
describes the proposed approach and feature controls update and reset actions in the
design. The experimental validation is recurrent unit, thus allowing the system to
presented in section 4. The conclusion is learn long-term dependencies. The gated
given at the end of the paper. recurrent unit (GRU) has been recently
presented in [12]. Compared to LSTM [13],
2 PUBLISHED RESEARCH [14] GRU shows faster converging during
training in terms of wall clock time and
The issues of automatic processing and number of iterations [15]. The GRU is also
recognition of handwritten input has been faster in terms of prediction time because it
discussed in scientific literature over the last has fewer parameters. The GRU model is
decade and the problem of classification described by the following equations [12]:
hand-drawn data to text/non-text objects is
one of the main issues that creates a basis for
successful text recognition when solved.
Particularly, Delaye et al. [8] proposed a (1)
context based text/non-text classifier which
establishes current state-of-the-art
performance for free form handwriting
documents. It is based on the conditional where xt is the input vector at the time t;
random fields and separates text blocks from rt and ut are reset and update gate vectors;
diagrams and tables. Recognition rates ct is cell output vector; ht is hidden states
achieved on IAMonDo dataset [8] are the vector; W denote weight matrices and b are
following: 99.58% for text blocks; 98.95% for bias vectors. The activation functions r and
tables; 88.88% for diagrams. u are sigmoid, and c is hyperbolic tangent.
Van Phan et al. [9] proposed a novel method The proposed artificial neural network
for text/non-text classification in online architectures for real-time and batched
handwritten document based on long short- approaches are presented in Fig. 3 and Fig. 4.
term memory (LSTM) recurrent neural We apply polygonal approximation [16] on
network. It shows classification rate of the strokes raw data points to robustly reduce
97.68% on IAMonDo dataset. The strokes the number of stroke points and improve
grouping into shapes and text-line grouping system quality.
approaches are analyzed in [10] and [11] The input layer receives normalized feature
respectively. Nevertheless, the complete vectors. Dropout layers (p = 0.5) are active
problem containing classification and only during training and included to improve
segmentation tasks is rarely analyzed in the the network generalization performance. The
published works. class conditional probabilities are obtained at

ISBN: 978-1-941968-43-72017 SDIWC 44

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

the output of softmax layer. The objective

train function is the categorical cross-entropy.
The prediction result is decoded by argmax
Training has been performed using mini-batch
gradient descent. The model parameters
update rule was ADADELTA [17] ensuring
good parameter convergence.

Figure 4. Network architectures for real-time stroke

groups classification and batched text line grouping.

The set of features (Table 2) that operate on

the groups of strokes to increase accuracy
compared to stroke level features has been

Table 2. Features for real-time group classification.

Figure 3. Network architectures for real-time stroke

grouping and batched stroke classification.
Feature Description
Normalized sum of trace return
3.2 Feature design
Normalized number of writing
direction changing
3.2.1 Real-time approach
Vicinity aspect [18]
To achieve better performance compared to
existing approaches, a modular classifier that Chain code histogram features
processes strokes stream in real-time by [19]
merging them into a group has been proposed.
The grouping is performed by the forward
GRU RNN with stroke point features
(Table 1). The multi-layer feed-forward Normalized RMS of linearity error
neural network performs classification of the
Y coordinate histogram features
groups of strokes into text or non-text objects.
, , and
Table 1. Features for real-time stroke grouping.

Feature Description
Difference between current and squared normalized sum of
past points along X coordinate trace return
Difference between current and
past points along Y coordinate
Signature of the beginning of the combination
stroke (1, otherwise 0)

ISBN: 978-1-941968-43-72017 SDIWC 45

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

3.2.2 Batched approach 3) it was collected on interactive whiteboard

instead of smartphone or phablet;
For the batched mode, set of computationally 4) it contains only English scripts.
light stroke-level features and bi-directional In order to overcome these limitations, there
GRU for classification of strokes into text and has been collected a special MHWD (Mobile
non-text classes has been elaborated. The HandWriting Document) dataset for training
forward-backward RNN has access to both and testing of the proposed solutions. It
local and global context of the complete contains about 10 000 screens for 7 language
document, giving the best classification scripts: Korean, Chinese, English, Russian,
accuracy. The strokes classified as text are Arabic, Hindi and Greek.
then processed by the text line grouping The hierarchy of the data in the dataset is
algorithm with logistic predictor. presented in Fig. 5. The elements considered
The common set of features for stroke as non-text have gray fill.
classification and text-line grouping has been The document structure is stored in the
developed (Table 3). Features InkML format [20].
from table 2 are used together with features The text blocks are composed of text lines that
presented in the table 3, which accounts for comprise words. Text lines may be written
16 features. with any orientation relative to the frame.
Each sample contains at least one correction.
Table 3. Features for batched mode analysis. Correction types are:
1) insertion of 1 or several symbols into a
Feature Description word;
Stroke angle 2) splitting a word into parts;
3) deletion of a symbol in a word;
Center angle
4) erasing with cross-strikethrough or
Strokes distance hatching;
, Delta center X, delta center Y 5) strikethrough of one or multiple words;
6) underlining a word;
RMS of linearity error
7) merging several words.
Angle frequencies The inline shapes are allowed within text
blocks. An unnumbered list comprises bullets
4 EXPERIMENTAL VALIDATION and text lines. A diagram comprises shapes
with text lines. A table comprises grid and
4.1. Datasets description text lines. Each formula in the dataset
contains its definition in the LATEX
The open benchmarking dataset IAMonDo mathematical notation. Different types of
has the following limitations: simple and complex (opened and closed)
1) its ground truth labeling contains errors; shapes have been sampled. Free drawings are
2) it doesnt contain all required shapes, sketches that do not belong to any particular
drawings are only free-type sketches; shape type.

Figure 5. Hierarchy of hand-drawn/hand-written data.

ISBN: 978-1-941968-43-72017 SDIWC 46

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

MHWD dataset has been collected and The evaluation results of the real-time
labeled with specially developed tools: approach on IAMonDo dataset are given in
sampling tool an Android application table 4. The accuracy of the new group
used by respondents who created hand- detection is essential for the object grouping
drawn documents on smartphones and algorithm (97%). Taking into account that the
tablets with different characteristics classifier operates only the limited amount of
(first of all with different screen sizes); contexts, the text classification precision
labeling tool a desktop application (91%) and recall (92%) are quite high.
that operates with documents created in
the sampling tool and supports labeling Table 4. Evaluation results for Real-time approach of
and proofreading with subsequent stroke grouping and group classification.
correction of found labeling errors;
validation and conversion tool a Object grouping
console application that allows to Class Precision Recall Support
convert samples from InkML to Same
0.76 0.90 513378
different formats, including graphics group
file formats (JPEG and PNG), as well as New
0.97 0.91 1676913
to check some aspects of labeling group
consistency automatically. Avg./total 0.92 0.91 2190291
Manual labeling defined a consistent Group classification
correspondence between each stroke of the Class Precision Recall Support
document and the reference value. Thus, each Text 0.91 0.92 414165
item in the hierarchical document structure Non-text 0.67 0.64 104028
shown in Fig. 5 is associated with at least one Avg./total 0.86 0.87 518193
stroke, and a stroke can belong only to one
item. The batched approach has been tested on
IAMonDo and MHWDSA datasets (table 5).
4.2. Results evaluation The classification of text strokes has accuracy
of 98% and recall of 99%.
The results evaluation has been performed on
the writer-independent test sets ensuring that Table 5. Evaluation results for Batched approach of
the learning system did not see samples from stroke classification.
the test set during training.
Dataset Class Precision Recall Support
Text 0.98 0.99 42205
IAMonDo Non-text 0.96 0.94 13312
dataset Avg./
0.97 0.97 55517
Text 0.981 0.991 17998
MHWD Non-text 0.948 0.898 3425
dataset Avg./
0.965 0.945 21423

The text-line grouping model ROC area is

0.97 (Fig. 6).
The demo application implementing the
proposed solutions has been developed and
tested on Samsung Note 5 smart phone. The
Figure 6. Text-line grouping ROC.
average processing speed per document is
60ms (20ms), which is significantly faster
than the competitors solutions.

ISBN: 978-1-941968-43-72017 SDIWC 47

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

5 CONCLUSION 98.09% and recall by text class is 99.07%.

The representative MHWD dataset has been
The fast and highly accurate solution for collected and labeled. It allowed us to train
automatic analysis of unconstrained and validate the developed system.
handwritten documents has been proposed. Figures 7 and 8 illustrate differences in the
Two architectures have been developed for classification accuracy of the described
real-time and batch processing modes. The approaches.
novelty of the real-time approach is the Considering the possibility of increasing the
analysis of the input data during subject input. number of classes for document structure
The input strokes are grouped into objects that analysis (Text/Non-Text classifier can be
improve the classification result. The batched replaced with a new one supporting three or
approach processing time is on average 60ms more of the following classes: Text, Shape,
per document. Due to the availability of the Formula, Correction, Bullet, Drawing),
document context, this approach has a very further studies will be required to determine
high accuracy. Precision by text class is the effect of such development.

Figure 7. Example of inaccuracies in the classification of formulae: a) real-time approach; b) batched approach.

REFERENCES [3] E.M. Stacy and J. Cain, Note-taking and

Handouts in The Digital Age, American Journal
of Pharmaceutical Education, vol. 79, no. 7, pp. 1-
6, 2015.
[1] E. Indermhle, M. Liwicki and H. Bunke,
IAMonDo-database: an online handwritten [4] S. Cheema, Pen-based Methods for Recognition
document database with non-uniform contents, in and Animation of Handwritten Physics Solutions,
Proceedings of the 9th IAPR International Doctoral dissertation, University of Central
Workshop on Document Analysis Systems, 2010. Florida Orlando, Florida, 2014.

[2] R. Plamondon and S. N. Srihari, Online and off- [5] S-Pen SDK 2.3 Tutorial: Technical Docs [Online].
line handwriting recognition: a comprehensive Available: http://developer.samsung.com/s-pen-
survey, IEEE Transactions on Pattern Analysis sdk/technical-docs/S-Pen-SDK-2-3-Tutorial
and Machine Intelligence, vol. 22, no. 1, pp. 63-
84, 2000.
[6] E. Indermhle, Analysis of Digital Ink in
Electronic Documents, Doctoral dissertation,
University of Bern, 2012.

ISBN: 978-1-941968-43-72017 SDIWC 48

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

[7] T. A. Tran, I. S. Na and S. H. Kim, Page [14] S. Otte, D. Krechel, M. Liwicki and A. Dengel,
segmentation using minimum homogeneity Local feature based online mode detection with
algorithm and adaptive mathematical recurrent neural networks, in Frontiers in
morphology, International Journal on Document Handwriting Recognition (ICFHR), IEEE
Analysis and Recognition (IJDAR), pp. 1-19, International Conference, 2012.
[15] K. Cho, B. Van Merrinboer, C. Gulcehre, D.
[8] A. Delaye and C.-L. Liu, Text/non-text Bahdanau, F. Bougares, H. Schwenk and
classification in online handwritten documents Y. Bengio, Learning phrase representations using
with conditional random fields, Pattern RNN encoder-decoder for statistical machine
Recognition, pp. 514-521, 2012. translation, arXiv preprint arXiv:1406.1078,
[9] T. Van Phan and M. Nakagawa, Text/non-text
classification in onlinehandwritten documents with [16] U. Ramer. An iterative procedure for the
recurrent neural networks, in Frontiersin polygonal approximation of plane curves.
Handwriting Recognition (ICFHR), 2014 14th Computer Graphics and Image Processing, vol. 1,
International Conference, 2014. no. 3, pp. 244-256.

[10] E. J. Peterson, T. F. Stahovich, E. Doi and C. [17] M. D. Zeiler, ADADELTA: An adaptive learning
Alvarado, Grouping Strokes into Shapes in Hand- rate method, arXiv preprint arXiv:1212.5701,
Drawn Diagrams, in AAAI Conference on 2012.
Artificial Intelligence, 2010.
[18] M. Liwicki and H. Bunke, Feature selection for
[11] X.-D. Zhou, D.-H. Wang and C.-L. Liu, A robust on-line handwriting recognition of whiteboard
approach to text line grouping in online notes, in Proceedings of the Conference of the
handwritten Japanese documents, Pattern Graphonomics Society, 2007.
Recognition, vol. 42, no. 9, pp. 2077-2088.
[19] I. Siddiqi and N. Vincent, A set of chain code
[12] K. Cho, V. M. Bart, D. Bahdanau and Y. Bengio, based features for writer recognition, in
On the Properties of Neural Machine Translation: Document Analysis and Recognition, 2009.
Encoder-Decoder Approaches, in Proceedings of ICDAR09. 10th International Conference, 2009.
SSST-8, Eighth Workshop on Syntax, Semantics
and Structure in Statistical Translation, 2014.
[20] Y.-M. Chee, K. Franke, M. Froumentin, S.
Madhvanath, J.-A. Magaa, G. Pakosz, G. Russell,
[13] E. Indermuhle, V. Frinken and H. Bunke, Mode M. Selvaraj, G. Seni, C.Tremblay and L. Yaeger,
detection in online handwritten documents using Ink Markup Language (InkML), W3C
BLSTM neural networks, in Frontiers in Recommendation, 20 September 2011. [Online].
Handwriting Recognition (ICFHR), IEEE Available: http://www.w3.org/TR/2011/REC-
International Conference, 2012. InkML-20110920/

Figure 8. Example of inaccuracies in the classification of corrections and parts of diagrams:

a) real-time approach; b) batched approach

ISBN: 978-1-941968-43-72017 SDIWC 49

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Multi Feature Region Descriptor based Active Contour Model for Person

Chadia Khraief, Faouzi Benzarti and Hamid Amiri

National Engineering School of Tunis (ENIT), Signal, Image and Technology Information
(LR-SITI), Manar, Tunisia
chadiaKhraief@gmail.com, benzartif@yahoo.fr, hamidlamiri@gmail.com

ABSTRACT based and active contour-based methods. In

contrast to general object tracking that
In this paper, we propose a new region based represent objects using predefined shape
active contour method for person tracking. The models, such as rectangles or ellipses [3],
method combines multiple cues such as color, active contour try to overcome the rigidness
texture and shape information to track the objects. of templates by fitting flexible structures to
The extracted features are enrolled in a covariance the humans body and so grant more accurate
matrix which captures not only each feature
and detailed object shape information.
variation but also their correlations. The tracking
is formulated by minimizing an energy functional In this paper we focus on active
using level-set method. The proposed method is contour method since it has recently proven to
robust to illumination, appearance changes, be a successful and reliable tool for tracking
deformations, scale variations and occultation. objects [7-11]. The level-set method is one of
Experimental results approve its efficiency and the most important active contour models [11]
accuracy for many applications such as smart thanks to its advantages over parametric
home monitoring and video surveillance representation such as snakes. In fact,
topology changes are handled automatically
KEYWORDS and parameterization is not required. The
proposed works based on active contour are
Person tracking, covariance region descriptor, divided into three categories: edge based,
video sequences analysis, region-based active shape based and region based methods. Edge
contour, level-set method, multi feature based methods [23] is based usually on the
image gradient in order to find the boundaries
1 INTRODUCTION AND RELATED of objects that have strong intensity. Shape
WORK based methods [24] estimate models to define
the contours. They are useful in complex
Person tracking is one of the most important scenes principally when detecting occluded or
research topics in computer vision, which is imprecise object contour but it is time-
widely used in video surveillance, security, consuming and it is very difficult to have real-
recognition of abnormal behavior and more time tracking. The third approach uses some
recently, support of elderly in living on their regional attributes to stop the deforming curve
own [1]. The aim of the tracking is to find instead of looking for geometrical boundaries.
automatically the same target in an adjacent The authors in [9] propose a tracking method
frame from sequence once it is initialized. by matching the object color histogram
Person tracking remains a challenging between successive frames of a sequence
problem due to different illumination using active contours. However, the
conditions, viewpoints, pose, scale and histogram matching efficiency may
occlusion. A large number of tracking drastically decrease if the object undergoes
algorithms are proposed, they mostly differ intensity variations due to noise or
from each other based on the object illumination changes. To overcome
representation [2]: model based, appearance dependence of color, multi cues tracking

ISBN: 978-1-941968-43-72017 SDIWC 50

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

algorithms were proposed in [16-17]. These section 6. Section 7 concludes our paper and
methods are based on different future perspectives are suggested.
complimentary features in order to get more
robust tacking results because the 2 LEVEL-SET BACKGROUND
performance of a single cue may degrade due
to complex nature of human appearance and The basic idea of Level-Set method [13] is to
environment challenges [4]. Hu et al [10] evolve an initial contour until reaching the
proposed to integrate many features into the boundaries of objects. The object contour is
level-Set method. It shows its effectiveness in described by an implicit representation where
many challenging situations but it consumes the contour is presented using a signed
much time and memory in order to evolve distance map. The deformed contour C is
each feature independently. considered as a zero level set function of a
Recently, region covariance descriptor higher-order function applied to image
is proposed in order to fuse multiple features spatial domain as illustrated by see figure 1
in a low dimensional representation [5, 12]. It and expressed by eq (1) where t is a point in
is able to capture not only each feature time and (x, y) is a point in space :
variation but also their correlation. It has been
proven to be a very efficient feature for visual C ( x , y ) = {( x , y ) / ( x , y , t ) = 0} (1)
tracking [6, 13, 14, 15] in several tracking
tasks and it outperforms histogram matching The level set function can be evolved by
method. The original Covariance-Tracker solving the partial differential equations
belongs to appearance based tracking (PDE):
approach. It estimates the new object position
by finding the covariance descriptor that has = F (2)
the minimum covariance distance to the with F is the speed function and is the
object model. It is a time consuming gradient operator.
algorithm. In addition, the monitoring result
depends on initialization. Indeed, if the model
incorporates information from the
background, the resulting model is not only
representative of the tracked object but also
influenced by the immediate environment of
the target.
Motivated by the convincing results of
active contour, the benefits of using
complementary features and the generic-like
Figure 1. The level set function.
object representation of covariance region
descriptor, we propose a new active contour Thus, C deforms iteratively according
based on multiple region features enrolled in to the speed function F until it attempts the
covariance region descriptor. border of the object through the minimization
The paper is organized as follows. of an energy computed based on different
Section 2 provides a brief mathematical criteria. The minimization process moves the
formulation of level-Set. Section 3 outlines points of the curve until it attempts the border
the proposed approach. In Section 4, the of the target object.
target appearance model using multiple The tracking results depend on the
features and covariance region descriptor are accurate choice of the speed function F. We
explained. Level-set based region evolution is have proposed a new level-set method based
detailed in section 5. The quantitative and on covariance region descriptor that will be
qualitative evaluations are presented in detailed on the next section.

ISBN: 978-1-941968-43-72017 SDIWC 51

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

3 PROPOSED METHOD Where N is the number of points in the

region, and is the mean vector of all the
The framework of the proposed tracking feature vectors.
algorithm is illustrated by Figure 2. The first We have combined shape features f Shape ,
step of our method is the target detection. We texture features f LBP and color features f HS .
use the most common method to do object
localization by making a rectangle These features are detailed as below:
surrounding the object manually. The
rectangle is used as an initial contour in the 4.1 Color Features: HS Color Space
first frame. Then, the target is defined by a
covariance region descriptor that fuses color, HSV is a perceptual color space that describes
texture and shape features. This initial color intuitively similar to human perception.
contour is then evolved by minimizing an The component (V) indicates the brightness,
energy term using levelSet method. After the hue (H) and saturation (S) components
detecting the accurate boundaries of object represent the chromaticity. The H and the S
tracked, we use this contour as the initial color models are scale and shift invariant with
contour of level-set in next frame. respect to light intensity but not the (V)
component [19]. So, we have used only the
hue (H) and saturation (S) components.

4.2 Texture Features

Local binary pattern (LBP) is very efficient

texture feature descriptor and it has already
proved its high discriminative power in many
applications [18]. The LBP is an illumination
invariant texture feature due to invariant
against monotonic gray level changes. It
labels the neighborhood of each pixel a binary
number according to a predefined set of
comparisons. The operator takes a local
neighbourhood around each pixel, thresholds
the pixels of the neighbourhood at the value
of the central pixel and uses the resulting
binary-valued image patch as a local image
Figure 2. Overview of the proposed method descriptor. Then a binomial factor of 2i is
Selecting appropriate and complementary assigned for each pixel. The LBP code is
features is crucial for good performance and computed by multiplying the threshold values
accurate tracking. The chosen features will be by their corresponding weight and summing
described in the next section. the results.
P 1

LBPP, R = S(g g ) 2

i c

1, si x 0
Let I be a three dimensional color image. Let S ( x) = (4)
F be the W H d dimensional feature image 0, si x p 0
extracted from I, the covariance of a region is
computed as: where P is the number of the neighbours and
R is the radius of the central pixel. gc denotes
( f n )( f n ) T
Ci = (3) the gray value of the central pixel and gi
N 1 n =1

ISBN: 978-1-941968-43-72017 SDIWC 52

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

denotes that of the P neighbours with i = 0, ..., 5 LEVEL-SET BASED REGION

P - 1, and s(x)represents the sign function. TRACKING
Later, the original LBP operator has been
generalized to be invariant to texture feature Our method consists on dividing each frame
orientation and more robust to illumination. It into two regions: the background region out
considers the pixel that belongs to a circular and the moving object noted by in as
pattern having a radius r around the center illustrated in fig3. The discontinuities
pixel. In the generalized LBP approach, the between these two regions define the curve C
number of neighbors and the radius of which is the boundary of the object.
windows are not fixed. In practice, the number
of possible LBP codes increases with larger
neighbors. By defining a measure of
uniformity, it can reduced the 256 different
local binary patterns defined in a 3 x 3
neighborhoods to 10 by representing the
number of bitwise spatial transitions (0/1) in a Figure 3. Image partition into backround and
circular pattern. To quantify these LBP Eq. forground
(4), an uniformity measure U was introduced
Eq. (5).
So, the closed contour C is represented as the
zero level set of the signed function such
s(g p g c) if U ( LBP) 2 that:
ULBP = p = 0 (5)
9 otherwise
( x ) = 0 if x C
( x ) > 0 if x is insideC (10)

7 ( x ) < 0 if x is outsideC
) = S(gp1gc) S(g0 gc) + S(gp gc) S(gp1gc)
U(LBP (6)

Let be H() be the Heaviside function defined

4.3 Shape Features as :

The shape features are resumed to the first 1 if ( x) 0

derivative Ix, and Iy, the second derivative Ixx H (( x)) = (11)
0 if x) < 0
, Iyy with respect to x and y respectively, the
magnitude and the orientation. They are Consequently, the covariance region
described as flows: descriptor of foreground and background
f Shape = C (I X , I Y , I xx , I yy , mag , O )
separated by the curve C can be given by
(7) Cin and Cout as follows:

Where mag is the magnitude and O is the

H()(f (x, y)in ()(f (x, y)in ())Tdxdy
orientation. They are based on the first C =

derivatives with respect to x and y and defined
H()dxdy (12)
as below:
(1H())(f(x, y)out()(f (x, y)out()) dxdy

Cout =
mag ( x , y ) = I X2 ( x , y ) + I Y2 ( x , y ) (8) (1H())dxdy

I ( x, y ) and represent the mean feature of the

O ( x, y ) = arctan y in out
I ( x , y ) candidate object region and the background

ISBN: 978-1-941968-43-72017 SDIWC 53

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

They are expressed as follows: 6 EVALUATION

( ) =
H ( )( f ( x , y ) dxdy
in (13) In this section extensive experiments
H ( ) dxdy
are performed to prove the robustness of the

out ( ) =
(1 H ( ))( f ( x , y ) dxdy proposed method. Evaluation is performed
(1 H ( )) dxdy qualitatively and also quantitatively by
comparing with other tracking methods and
We are looking for each frame the partition ground truth data.
which minimizes the Log-Euclidean metric
that measures the distance between the two 6.1 Qualitative Evaluation
covariance matrices of the object and the
background and the image energy model can We run our tracking algorithm on several
be defined as: challenging video sequences PETS09 [20]
and indoor dataset [21], where the targets
E ( ) = log( C in ) log( C out ) (14) undergo occlusions, lighting changes, scale
variations, shadow, pose deformations and
where is a Euclidean norm on symmetric complex backgrounds, etc. The obtained
results illustrate good contour detection and
matrices. tracking in these challenging situations. In
In order to keep the levelset as a signed fact, our method can track successfully
distance function we use the internal energy persons despite occlusion as shown in fig.5.
term defined as: From Fig. 6, we can observe that our method
P ( ) = ( 1 ) 2 dxdy (15) has good adaptation to the change of target
2 scale, as its output contour fits the target
Therefore the total energy according to eq(14) closely. Fig 7 prove the robustness of the
and eq(15) is proposed algorithm in presence of frequently
E total ( ) = E ( ) + P ( ) (16) shape deformations, similar background color
Where , are constant and illumination variations.
The gradient descent flow of Eq. (16) for
level set function is expressed as: For fair comparison, we have compared our
algorithm to others methods using the same
Etotal E P
= = + (17) sequence [22]. It shows a walking woman
t with multi-colored appearance and large
The final contour of the moving object posture changes. She goes behind cars and
obtained in the previous frame at time t-1 is becomes occluded. This sequence is captured
used to initialize the levelset contour on the by a moving camera and the colors of the
current frame at time t. object are similar to the background clutter. In
The contour will evolve until the boundaries fact, the color distribution to the upper part of
of the object as illustrated by the figure 4. the object appear to be similar to the white
car, otherwise, the lower part has similar color
to the blue car.

Figure 8 presents some representative frames

that prove the performances of the proposed
method compared to (a) particle filter
algorithm [23] (b) level-set based edge within
Figure 4. Tracking initilation on the elderly fall
detection sequence: (a) Moving object detected in the the particle filter framework [24] (c)
frame number 120 (b)Level-set contour initialization supervised level set model (SLSM) [25] based
in frame number 121 with the previous contour on shape prior which is learnt in an online
detected in red (c) Contour evolution in green using boosting manner. The standard particle filter
levelset based region (f) final contour detected in blue

ISBN: 978-1-941968-43-72017 SDIWC 54

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

does not pick up the whole target accurately 6.2 Quantitative Evaluation
because it is based only on HSV color
histogram as appearance model. The level-set In order to evaluate quantitatively our
based edge (b) is based only on intensity algorithm, we have used the Percentage of
edges without any target information. It is not Correctly tracked Frames (PCF) over the total
able to detect correctly a multi-colored object. number of frames in the sequence. Tracking is
The third algorithm (c) get more precise considered to be correct if the overlap of the
contour but it depends on specific prior target bounding box of the tracking result and that
knowledge thats why it is inapplicable in of the ground truth is greater than 25% of the
complex scene with various shape area of the ground truth. The performances in
deformations. Our algorithm gives also terms of PCF are presented in Table 1.
perfect detection of the object based only on
region information without any prior Table 1. Information of the sequences and the tracking
knowledge. performances in terms of PCF

We have compared also our algorithm to Sequence indoor Pets09 Pets09

other two algorithms that are: covariance dataset view005 view006
tracking [12] and levelset based distribution Number of 193 795 795
[9]. The first method drifts at the frame 145 frames
and loses the target at the frame 357. The Frame size 320 720X576 720X576
visual appearance model has combined more 180
background when the occlusion occurs. This
Initial 16 65 39X138 39X126
is due to object initialization with bounding
object size
box not with an exact and accurate object
Severe None 6 times 10 times
shape. Despite the use of detailed shape, level
set based distribution does not perform well
Covariance 98% 75,47% 61%
because the background is similar to the color
of object, which disturbs the evolution of the
proposed 99% 81,30% 68,12%
level set. It builds a color histogram-based
model of the target, however if the color of
the target changes, it is bound to fail the
tracking of the target. So, single feature is The average of the execution time T is
insufficient to deal with all environment computed on an Intel core i5 computer and is
variations. On the other hand, our algorithm defined as
integrate many features into the same
representation that are color, texture and T
T = all (18)
shape, if one feature bounds to fail the N frames

tracking of the object, other feature can be

used in order to get good results. with T all is the total time processing
the whole video sequence. N frames is the
We notice that accurate results are found by number of frames in the sequences.
the proposed method and objects are well
tracked. This is thanks to the regional The average of the execution time is about 0.6
information merged into the level-set sec par frame for pets2009 sequence or the
framework and the benefits of covariance original covariance tracker is about 1.189s
region descriptor. Our region descriptor and with adaptive roi [15] is 0.988s. So, the
encloses many features in covariance matrix. time execution consumption is optimized
Therefore, object contours are effectively during tracking using our method.
detected even in complex conditions.

ISBN: 978-1-941968-43-72017 SDIWC 55

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Figure 5. Tracking results for persons in PETS data set view 01. The tracking performance is correct even in presence of

Figure 6. Tracking results for persons in PETS data set view 07. The tracking is robust even in presence of scale changes

Figure 7. Tracking results for elderly people in indoor dataSet. The tracking is robust despite the pose and shape deformations

Figure 8. Tracking results of the Walking woman sequence. From right to left are the results of particle filter [23], levelset
based edge and particle filter [241], supervised levelset based shape [25] and our method, respectively.

Figure 9. Tracking results of the Walking woman sequence. From top to bottom are the results of Covariance tracking [12],
levelset based histogram [9] and our method, respectively. Row 1: Frame #47. Row 2: Frame #114 . Row 3: Frame #217 Row
4: Frame #373 Row 5: Frame #447
ISBN: 978-1-941968-43-72017 SDIWC 56
Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

7 CONCLUSION contours and SURF descriptor , International

Journal of Computer Science Issues 10 (2), 105-
110, 2013
In this paper, we have proposed a level-set [8] C. Khraief, S. Bourouis and K. Hamrouni,
"Unsupervised video objects detection and
based region for person tracking. The main tracking using region based level-set," 2012
contribution of our method can be International Conference on Multimedia
Computing and Systems, 2012, pp. 201-206
summarized as follows. First, a new region- [9] D. Freedman and T. Zhang, Active contours for
based covariance descriptor is introduced. tracking distributions, IEEE Transactions on
Image Processing vol. 13, pp. 518526, 2004.
Our method inherits the advantages of both
[10] W. Hu, X. Zhou, W. Li, W. Luo, X. Zhang, and S.
the covariance region descriptor and active Maybank, Active Contour based Visual Tracking
contour method. Second, in real scenes with by Integrating Colors, Shapes, and Motions ,
IEEE Transactions on Image Processing, Vol:22
low illumination or similarly colored ,pp. 1778 1792, 2013.
backgrounds tracker only using the color [11] S. Osher and J.A. Sethian, Fronts propagating with
curvature-dependent speed: Algorithm based on
feature may easily miss the target. Our hamilton-jacobi formulation. Journal of
contribution is to integrate multiple features Computational Physics, 79: 12-49, 1988
in the target model for accurate tracking with [12] F. Porikli, O; Tuzel, and P. Meer.; Covariance
tracking using model update based means on
less memory consumption allowed by the riemannian manifolds. In CVPR (2006)
compact representation of covariance matrix. [13] Q. Lei, S. Hichem and F. Abdallah, Object
Tracking Using Adaptive Covariance Descriptor
Third, the computation time is reduced by and Clustering-Based Model Updating for Visual
initializing the contour the closest to target Surveillance. Sensors journal, 2014.
[14] A. Romero M. Tern, L. Lacassagne, M. Gouiffs,
boundaries. It can perform a real time A. Hassan Zahraee:Covariance tracking:
tracking. Experimental results are very architecture optimizations for embedded systems.
EURASIP J. Adv. Sig. Proc. 2014: 175 (2014)
promising and reveal that the proposed [15] Y. H. Hassen, T. Ouni, W. Ayedi and M. Jallouli,
method is capable of handling partial "Mono-camera person tracking based on template
occlusion, complex background, appearance matching and covariance descriptor," International
Conference on Computer Vision and Image
changes, scale variations and illumination Analysis Applications, 2015, pp. 1-4
varying. In future work, the proposed [16] G.Walia, R Kapoor, Robust object tracking based
upon adaptive multi-cue integration for video
algorithm will be modified for tracking surveillance. Multimedia Tools Appl. 75(23):
multiple persons. 15821-15847 (2016)
[17] A. Babaeian, S. Rastegar, M. Bandarabadi, and M.
Rezaei, Mean shift-based object tracking with
REFERENCES multiple features, in System Theory, 2009. SSST
2009. 41st Southeastern Symposium on , Mar.
2009,pp. 68 72
[1] M Paul, M. E. Shah, S. Chakraborty, "Human [18] T. Ojala and M. Pietikainen, Multiresolution
detection in surveillance videos and its Gray-Scale and Rotation Invariant Texture
applications a review", EURASIP Journal on Classification with Local Binary Patterns, IEEE
Advances in Signal Processing, pp. 176, 2013 PAMI, 24(7):971-988, 2002.
[2] X. Li, W. Hu, C. Shen, Z. Zhang, A. Dick, and A. [19] K. E. Van De Sande, T. Gevers, and C. G. Snoek,
van den Hengel, A survey of appearance models Evaluating color descriptors for object and scene
in visual object tracking, ACM Transactions on recognition, IEEE Trans. Pattern Anal.Mach.
Intelligent Systems and Technology, vol. 4, no. 4, Intell., vol. 32, no. 9, pp. 15821596, 2010.
pp. 158, 2013
[3] H.Yang , L.Shao, F.Zheng , L.Wangd, [20] PETS09dataset,
Z.Song,"Recent advances and trends in visual http://www.cvg.rdg.ac.uk/PETS2009/a.html
tracking: A review", Elsevier Neurocomputing [21] Fall detection Dataset? http://le2i.cnrs.fr/Fall-
(2011) pp. 38233831, 2011 detection-Dataset
[4] G.Walia, R Kapoor, Recent advances on multicue [22] Walking woman dataset,
object tracking: a survey [J] , Artificial http://www.cs.technion.ac.il/amita/fragtrack/fragtr
Intelligence Review, vol. 46(1) , pp. 1- 39, 2016 ack.htm
[5] P. Dash, S. Aitha and D. Patra, Ohta Based [23] P. Perez, C. Hue, J. Vermaak, and M. Gangnet.
Covariance Technique for Tracking Object in Color-based probabilistic tracking. IEEE Conf. on
Video Scene, In Proceedings of the IEEE European Conference on Computer Vision, pages
Students Conference on Electrical, Electronics and 661675, 2002.
Computer Science, (2012), pp. 1-4 [24] Y. Rathi, N. Vaswani, and A. Tannenbaum.
[6] S.Singha and S.Datta , Diverse Methodologies to Particle filtering for geometric active contours
Improve Covariance based Object Tracking with application to tracking moving and deforming
International Journal of Signal Processing, Image objects. IEEE Conf. on CVPR, 2:29, 2005.
Processing and Pattern Recognition Vol.8, No.6 [25] X. Sun., H. Yao, AND S. Zhang, S. A novel
(2015), pp.33-44 supervised level set method for non-rigid object
[7] M Chakroun, A Wali, AM Alimi , MAS for video tracking. IEEE Conference on Computer Vision
objects segmentation and tracking based on active and Pattern Recognition, 2011. 33933400.

ISBN: 978-1-941968-43-72017 SDIWC 57

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Approaches for optimization of WEB pages loading via analysis of the speed of
requests to the database
Georgi Petrov Dimitrov, PhD Galina Panayotova, PhD
University of Library Studies and University of Library Studies and
Information Technologies Information Technologies
Sofia, Bulgaria Sofia, Bulgaria
geo.p.dimitrov@gmail.com panayotovag@gmail.com

Iva Kostadinova
University of Library Studies and Information Technologies
Sofia, Bulgaria

In the current article and analysis, we have Why do I think query optimization important
provided recommendations for decreasing first time for yourself as application developers?
the speed of web page loading. The biggest The fact is that when your users or your boss
reason for this is very frequently not the monitors the performance of the application
system overload or the poorly written code, you've done, they only see the page load speed. It
has to be known that the productivity of
but the slow execution of the SQL queries. As
applications is always important. How would you
a primary approach for optimization in this
feel if you hear the following: "We no longer
paper is viewed the possibility of SQL queries need this system because when we try to execute
optimization. The creation of fast and a query we have to wait 2-3 minutes, and we want
efficient queries, taking the minimal required that to happen right away."
quantity of data is required for achieving
good results. There are a lot of approaches for And you want to make sure that your
optimization but in this article, we have application works faster. You may find that in
attempted to analyze one of the most just 1-2 days you can optimize the performance
of your application so that it can begin to satisfy
frequently made mistakes using VIEW with
the user.
a big amount of columns.
This is because we did not pay attention to the
We have made recommendations for small details when we developed the app.
avoiding similar problems from the stage of When the Web Applications work slowly, the
web development in order to achieve a faster reasons are being sought in the code
result. optimization, caching, using a better hardware [7,
9,12]. When it comes to business applications,
which perform data exchange with RDBMS,
information system, database optimization, very frequently the reason for the slow work of
SQL, VIEW, query, Web Application, Business the applications is hidden in the SQL queries. In
application this case, there is a single piece of advice Try to
bring down to minimum the time for
execution of each query. [1,3]. One of the most
important skills for every web application
developer and database administrator is the

ISBN: 978-1-941968-43-72017 SDIWC 58

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

ability to create optimal SQL queries. In the first The sample algorithm for finding the
place, it is necessary to improve the efficiency of inefficient SQL queries and their optimization is
SQL queries [5,6]. Therefore developers and shown in chart 1.
database administrators must be able to
understand the mechanism of the plan for
executing queries and the techniques they can
apply for setting the queries [2].
MySQL has a powerful command line that
you can use to find out why your stitches work
slowly. This commands is EXPLAIN. EXPLAIN
can show you in detail what is actually going on
when you complete a batch. This way, you can
find the reasons for the slow execution of queries.
But this article is not for EXPLAIN.
The best way to learn to optimize the fast
execution is to attempt to write your queries in Chart 1: Algorithm for finding the inefficient
different ways and compare their time of SQL queries and their optimization is shown in
execution. figure 1.
In the current research, we have analyzed the Based on the conducted analysis we have
influence of the selected columns on the speed of found queries that slow down the page loading.
query executions in of the most popular The reasons for creating slow queries are
databases MySQL. The research is made with different, but in our research we have stopped on
the help of dbForge Studio for MySQL v. 7.2.53 the following: the use of queries that work with
. The analyzed tables have a different number of VIEW [8,11]. The conducted analysis shows that
rows and columns. one of the reasons for the slow execution in the
Our research is based on queries, included on MySQL environment is the mechanism for
48 pages, in an application developed for our SELECT execution based on VIEW. The reason
university. The application is developed using for this is that MySQL dont reason materialized
Microsoft Visual Studio 2015. view. The purpose of the view in MySQL is to
The measurement of the page loading speeds extend functionality by helping developers in
is made in the Visual Studio 2015 enterprise create the queries to create simple queries
environment. simple, who don't it affects server performance.
That is why, very often, when developers create
a new project, they use them. Views in MySQL
Following the development of an application, are handled using one of two different
very frequently in the process of work after a algorithms: MERGE or TEMPTABLE. MERGE
certain amount of time, we note the fact that is simply a query expansion with appropriate
certain page loads slowly, which brings aliases. TEMPTABLE is just what it sounds like,
discomfort to the users work, i.e. the user the view puts the results into a temporary table
experience becomes worse. Conducting analysis before running the WHERE clause, and there are
on the reasons for increase of the loading time of no indexes on it. And when users start to work
individual pages is required in order to normalize with the new system (the new software product)
the applications work. Very frequently the and the data increases, performance slows
reason for this are slow requests. [7] dramatically. Additionally, developers are keen
to make View universal, that is, with the
maximum number of columns they might "need".
This is the wrong approach.

ISBN: 978-1-941968-43-72017 SDIWC 59

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Even worse than that looking at the short and the time of execution is shown as an average.
table which just gets single row from the table by The queries, which are being executed are the
the key we think this is simple query, while can following:
be real monster instead with complexity hidden SELECT * FROM TABLE
away in VIEW definition.
You may even find yourself using cascading SELECT col_1 col_N FROM TABLE
views, not a single view. That is, you use a view SELECT col_1 col_30 FROM
that consists of several other views. And it may TABLE
be that when you run a view with a command of
"SELECT *" type, you get only 2-3 columns, SELECT col_1 col_20 FROM
and in fact you have received 23, 30 and even
more than 100 columns with results that are SELECT col_1 col_10 FROM
invisible to you. TABLE
The sample algorithm for the execution of the
view request includes in itself VIEW shown in
Chart 2. The sample queries are shown. We have
shown an example below of the first and last
Example :
Code before optimization :
Code after optimization :
p. Product_ID,
Chart 2: Algorithm for execution of a single p.ProductNumber,
query with and without VIEW p. ProductCode,
The reason for the slower execution of the p.Price
queries, which include in itself VIEW, is most
frequently the developers effort to create FROM Products p
universal VIEW, which can be used in almost all In table 1 we have shown the results of
situations. We should not forget that in this case, execution of each query execution
we have to work with Of Course, sometimes we
Select Sele Sele
have to use VIEW, but even then it is good to Count all ct selec ct Sele
carefully plan the structure. colum Selec colum 30 t 20 10 ct 5
In order to show the impact of using the ns t* ns col. col. col. col.
97 14 14 5.5 4.5 3 3
queries with only the minimum number of
72 9 9 5.5 4.5 3 3
columns, we have developed research, the results 52 8.7 8.7 6 5 4 3
of which are shown in the current paper. 41 8 8 5 4 3 3
The measurements are made, like start 13 4.2 4.2 4.1 4.1 3 3
queries, based on the tables with different Table 1: Time of the queries execution in
number of columns and different number of milliseconds with different number of columns
records. The queries are executed multiple times

ISBN: 978-1-941968-43-72017 SDIWC 60

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

ime for executing queries depending

on the number of columns
Count Select select Select Select 5
16 records 30 col. 20 col. 10 col. col.
14 380000 6.1 5.2 3.2 3
12 174890 5.9 5.1 3.1 3
10 93390 5.7 5 3.1 3
8 68000 5.5 4.5 3.1 3
6 10800 5.3 4.5 3 3
4 Table 2 Speed of queries execution in
2 milliseconds with a different number of records.
Select * Select all Select 30 select 20 Select 10 Select 5
columns col. col. col. col.
The results are shown in Chart 5.
97 columns 72 columns 52 columns
41 columns 13 columns Time for Query execution

Chart 4 Time for execution of the queries 6.1 5.9 5.7 5.5
depending on the number of columns 5.2 5.1 5
4.5 4.5
It is obvious that the main reason for the
3.2 3.1 3.1 3.1
slower execution is not a command from the 3 3 3 3 3

SELECT * type, but the number of selected

columns, which limits the time of execution more
than three times, as it is evident in chart 3.
1 2 3 4 5

Select 30 col. select 20 col.

14 Select 10 col. Select 5 col.

Chart 5: Time for Query execution depending
10 3 on the number of rows
5 From the results shown, it is obvious that the
selected records have minimal impact on the time
Time for execution
for query execution.
In table 3, we have shown the results of the
Chart 3 Comparison of the time of execution measurements of the page loading time before
for minimum and maximum number of queries and after optimization of the queries code on
Meanwhile, we have conducted analysis on part of the projects pages.
the influence of the number of selected records
on the speed of the queries. The number of
selected records is in the 10000 to 400000 range.
The queries, which are being executed are the
same as the ones shown in the example above. In
table 2, we have shown the results according to
the number of queries.

ISBN: 978-1-941968-43-72017 SDIWC 61

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Before After timely definition of the problematic areas in the

SQL SQL software operation is one of the most important
optimiza optimiza tasks. The optimization of the SQL is one of the
tion tion
most important tasks.
Transfer_Students.aspx 3638 432
From the analysis in the paper we can make
ElektronenDnevnikDetails.aspx 3577 444 the following conclusions and recommendations
Speciality.aspx 3298 675
- The use of VIEW in SQL queries in WEB
Speciality_ByYear.aspx 2948 694 pages must be limited to minimum.
ConfigureGridAndDetail.aspx 2609 455
- The decrease of the number of columns in
px 2503 1234 the query may lead to the substantial
Work_Dimi_Ins_Student_Reg. decrease of the page loading time of the
aspx 2375 344 individual pages.
AccessControl.aspx 2291 436 - The decrease of the number of selected
Register_Fast.aspx 1871 789 records on the page loading time is
minimal and may not be taken into
CopyOneSpec.aspx 1868 869
.aspx 1832 768 The displayed results and given
Scheduler_Page.aspx 1795 655 recommendations would help the developers
Table 3 Decrease of the time of queries to decrease the time of development of quality
execution following the transition from view to WEB-based business applications. In this
query manner, the optimization may be achieved
with minimum effort, if we follow certain
rules, one of which is to create queries that
The chart below shows the efficiency of the contain only the required columns.
performed optimization.
Loading time and reloading This work is partly supported by the project
PPNIP-2017-03 - Optimization approaches to
of pages load WEB pages by analyzing the database query
execution speed.
Work_Dimi_Ins_Stud REFERENCES
[1] Georgi Petrov Dimitrov, Galina Panayotova, Queuing
Speciality systems in insurance companies analyzing incoming
Transfer_Students requests, Volume: 2, Issue: 1, September 2013 , Publisher:
EDIS - Publishing Institution of the University of Zilina,
0 1000 2000 3000 4000 Powered by: Thomson Ltd, Slovakia, ISSN: 1339-9977, CD
ROM ISSN: 1338-7871, ISBN: 978-80-554-0762-3
After Before [2] Proceedings in Electronic International Interdisciplinary
Conference,The 2nd Electronic International
Interdisciplinary Conference,EIIC 2013,2. 6. September
Chart 6 Decrease of the time of queries 2013,Slovak Republic, ISBN 978-80-554-0762-3,ISSN
execution following the transition from view to 1338-7871,p. 139-142 Yes
[3] Georgi Petrov Dimitrov, Galina Panayotova, PhD,
query Researching and Modeling of Queuing Systems of the
Insurance Company Report 3 HASSACC -
Virtual Conference Human And Social Sciences at the
The optimization of the queries is a common Common Conference,18-22 November, 2013, ISBN: 978-
task, executed mutually by the database 80-554-0808-8 ISSN: 1339-522X, p. 93-95 Yes
administrators and software developers. The [4] Georgi Dimitrov, Ilian Iliev, Study of methods for front-end
webpage optimisation, Report , The 3rd

ISBN: 978-1-941968-43-72017 SDIWC 62

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

International Virtual Conference 2014 (ICTIC 2014)

Slovakia, March 24 - 28, 2014, Volume: 3, Issue: 1,
March 2014 , Publisher: EDIS - Publishing Institution of the
University of Zilina , Powered by: Thomson Ltd, Slovakia ,
ISSN: 1339-9144, CD ROM ISSN: 1339-231X, ISBN: 978-
[5] Georgi Dimitrov, Eng. Ilian Iliev, Front-end optimization
methods and their effect Report 4 MIPRO 2014
- 37th International Convention, 26-30.06.2014
[6] Georgi Petrov Dimitrov, Galina Panayotova, PhD, Stefkka
Petrowa, Analysis of the Probabilities for Processing
Incoming Requests in Public Libraries Report 4
The 2 nd Global Virtual Conference 2014 (GV-CONF
2014) Goce Delchev University Macedonia & THOMSON
Ltd. Slovakia, April 7 - 11, 2014,ISSN: 1339-2778
[7] Georgi Dimitrov, Galina Panayotova, ANALYSIS OF THE
Macedonia, 12th International Conference on Informatics
and Information Technologies, 04.2015, Bitola, Macedonia
[8] Georgi Dimitrov, Galina Panayotova, "ASPECTS OF
113, ISSN 1314-3077
[9] Georgi Petrov Dimitrov, Galina Panayotova, Comparison of
methods for optimization of websites, International
Symposium on Operations Research in Bled, Sloveia from
22 to 25 September 2015.
[10] Huston S., J.S. Culpepper, W.B. Croft, Indexing Word
Sequences for Ranked Retrieval. //ACM Transaction on
Information Systems, 2014, vol.32, 1, p.3:2-3:4
[11] Bag-of-words model,
accesssed on 20.06.2014 tfidf,
<http://en.wikipedia.org/wiki/Tf-idf>, accessed on

ISBN: 978-1-941968-43-72017 SDIWC 63

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Examining Stock Price Movements on Prague Stock Exchange Using Text


Jon Petrovsk, Pavel Netolick and Frantiek Daena

Department of Informatics, Faculty of Business and Economics, Mendel University in Brno,
Zemdlsk 1, 613 00 Brno, Czech Republic
jontesek@gmail.com, pavel.netolicky@gmail.com, frantisek.darena@mendelu.cz

ABSTRACT moods and opinions of market participants

[3]. These characteristics are difficult to
The goal of the article was to examine the obtain, but could be present in text documents
relationship between the content of text published on the internet (news articles, social
documents published on the Internet and the media posts, etc.), which express both
direction of movement of stock prices on the fundamental facts (rationality) and emotions
Prague Stock Exchange. The relationship was
and opinions of people (irrationality) [4]. To
modeled by text classification. As data were used
determine if the texts actually contain such
news articles and discussion posts on Czech
websites and the value of the PX stock index and information and that it is connected with stock
stock price of company CEZ. Documents class price, we need to show and quantify a
(plus/minus/constant) was determined by the connection between texts and stock price
relative price change that happened between the movements. In other words, examine the
publication date of a document and the next influence of the (i)rationality of investors on
working day. We achieved a high accuracy of stock market.
75% for classification of discussion posts,
however the classification accuracy for news 2 CURRENTLY USED METHODS
articles was about 60%. We tried both binary
(documents with constant class were discarded)
When trying to model behavior of a stock
and ternary classification the former was in all
price we can use classification or regression.
cases more successful.
Many studies (e.g. [4]) in this area chose the
KEYWORDS former approach and decided to examine not
the actual numeric price value, but only
Text Mining, Classification, Stock Market, change of the value they used direction of
Machine Learning the change (up, down or constant) as a class.
We focused on this approach as well, because
1 INTRODUCTION we are not interested in the actual stock price
value, but only in its change.
Efficient markets theory (EMT) says that
investors immediately incorporate all In fact, the problem represents a typical text
available information about given stock into classification task given a document,
its price and therefore the stock price is based determine to which class it belongs. For this,
solely on its fundamental value. However virtually any supervised learning algorithm
empirical observations contradict the EMT, may be used. However there are two main
because some price movements cannot be difficulties. Firstly, documents classes have
explained by change of fundamental figures to be defined in a meaningful and useful way.
[1]. Here comes the Behavioral finance theory Lee et al. [5] used 1% threshold value for
which says that emotions may deeply determining the direction of a stock price
influence behavior and decision making of change. Secondly, a suitable and effective set
individuals as well as whole human societies of features must be chosen. Many studies
[2]. This means that prices on capital markets used as features just single words (unigrams)
are (more or less) influenced by emotions, in so-called bag-of-words model with

ISBN: 978-1-941968-43-72017 SDIWC 64

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

satisfactory results [6]. The studies used Table 2 shows the information available for
different types of classifiers. Strength of the every text document. In subsequent analysis,
connection between texts and stock prices all these fields apart from author were used.
was evaluated by classification metrics (e.g.
by accuracy) which are based on how many Table 2. Available characteristics of a document with
times the classifier assigns correct class to the a concrete example of a discussion post regarding
company CETV.
given text.
Field original in Czech translated to
name English
3 DATA AND METHODOLOGY 2017-05-18 2017-05-18
11:49:00 11:49:00
The goal of the work was to examine the author mmmm mmmm
connection between content of text documents Za vodou koncila na Offshore price
published on the Internet and direction of 94. ended on 94.
ja si myslim ze se I think that it will
stock price movements, by using dostane nekam k 85 get to 85, but I
classification. A suitable approach had to be text
ale nemam kouli dont have crystal
taken for working with every aspect of this samozrejme. :)) ball of course. :))
task: handling prices and texts and processing
the data via classification algorithms. For every discussion post (Akcie.cz), it was
known to which company on the stock
3.1 Stock prices exchange it belongs. However, for news
articles (Patria.cz) this information was
For the main part of the work, we used the PX unknown. Moreover, it was found out, that a
index which reflects all companies traded on news article usually comments on multiple
the Prague Stock Exchange (BCPP). The data companies.
were downloaded from the stock exchanges
website (https://www.pse.cz). For every 3.3 Classification methodology
trading day, we used the closing value of the
index. We also decided to examine discussion We used classification to predict, whether a
posts for one company (CEZ). Because BCPP stock price will move up, down or stay
contains data only since 2012, we constant on the basis of documents text. Each
downloaded it from www.akcie.cz. price movement represented a class. To obtain
more diverse and possibly better results, we
3.2 Text data used both two (only up and down) and three
classes for classification. It was expected that
The examined text data (documents) were the ternary classification would perform
downloaded from two sources (see Table 1). worse, like mentioned in [7]. We extracted
All documents were written in Czech documents' features from the text by using the
language. bag-of-words model. Every document was
represented by a vector with values
Table 1. Examined text data. corresponding to the assigned weights of the
Source Documents Number Period Average words present in the document. For the
type of doc. per day
experiments with all discussion posts
Patria.cz News 1 244 9. 2. 16 2.63
articles to 27. (Akcie.cz) and news articles (Patria) values of
about 5. 17 the PX index were used. For one experiment
Czech (15 (referred to as CEZ experiment) stock
stock mon.) prices and discussion posts related to only one
company (CEZ) were used.
Akcie.cz Discussion 20 605 14. 3. 6.13
posts about 08 to
17 Czech 27. 5. Document class. Assigning a class to a
companies. 17 (9y.) document was based on the relative price

ISBN: 978-1-941968-43-72017 SDIWC 65

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

change between two moments and on the Classification. Converted data were
threshold value (v) of minimal percentage processed again by scikit-learn. The data were
price change. Formally, the percentage price split into training (60%) and testing (40%)
return R in time t is: datasets. Class balancing was not performed.
Each of the generated vector representations
Rt = (pt pt-1) / pt-1, (1) was processed by 20 different classifiers (with
default settings we did not optimize the
where price pt-1 is the closing price of the day parameters of the classifiers). The
when the document was published (or the last performance of a classifier was rated by the
working day) and price pt is close price the achieved accuracy (proportion of correctly
closing price of next working day. If the price classified instances on all examined instances
return was in the constant interval (v, +v), [9, p. 268]) on the test set.
the document was either discarded from
further processing (for binary classification) 4 RESULTS AND DISCUSSION
or assigned the constant class label (for
ternary classification). If the price return was Three different sets of text data, all discussion
equal to or larger than +v, the document was posts (Akcie.cz), posts related to the CEZ
labeled as plus, otherwise as minus. We company, and news articles (Patria) together
used 0.25, 0.5 and 1.0% as the threshold with information about stock prices were used
values. to prepare data for classification. Based on the
combination of variable experimental
Text pre-processing and conversion. The parameters the number of classes (2 or 3),
text was processed as follows: minimal percentage change (0.25, 0.5 and
1. Join document title and text into one string. 1%) and weighting scheme for the term-
2. Strip diacritics from text (convert special document matrix (TP, TF, TF-IDF) 54
Czech letters to their ASCII equivalents). different sets were created and subsequently
3. Strip all HTML tags. processed by 20 classification algorithms.
4. Lowercase and remove punctuation.
5. Tokenize get words (using In total 1080 classification results were
TreebankWordTokenizer). obtained. We evaluated the results for each
6. Filter words minimal length of 3 letters, data set separately and for each classification
exclude numbers. set, the highest accuracy achieved by any
combination of vector type and classification
The edited text had to be converted into a algorithm was found. Our findings are
structured format. For this, a Python library presented in Table 3, Table 4 and Table 5.
called scikit-learn and its Vectorizer class Class 1 means minus, class 2 constant
were used. Only words which occurred at and class 3 plus.
least 5 times (for discussion posts) and 10
times (for news articles) in the whole
document collection were considered. Those
words were converted to a bag-of-words
representation using three different weighting
schemes [8, p. 2126]:
Term Presence (TP): 1 if a term is present in
a document, 0 if not.
Term Frequency (TF): how many times is a
term is present in a document.
TF-IDF: TF (local weight) multiplied by
IDF (global weight).

ISBN: 978-1-941968-43-72017 SDIWC 66

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Table 3. Classification of Akcie.cz discussion posts

Num. of Percent Accuracy Total Class 1 Class 2 Class 3 Num. of
classes change samples samples samples samples words
2 0.25 0.74 16 673 8 169 0 8 504 10 604
2 0.50 0.76 13 449 6 454 0 6 995 9 181
2 1.00 0.78 8 170 3 939 0 4 231 6 359
3 0.25 0.67 20 576 8 169 3 903 8 504 12 337
3 0.50 0.65 20 576 6 454 7 127 6 995 12 337
3 1.00 0.79 20 576 3 939 12 406 4 231 12 337

Table 4. Classification of Patria news articles

Num. of Percent Accuracy Total Class 1 Class 2 Class 3 Num. of
classes change samples samples samples samples words
2 0.25 0.62 874 360 0 514 2 347
2 0.50 0.59 587 246 0 341 1 788
2 1.00 0.61 222 100 0 122 872
3 0.25 0.40 1 244 360 370 514 3 036
3 0.50 0.52 1 244 246 657 341 3 036
3 1.00 0.79 1 244 100 1 022 122 3 036

Table 5. Classification of CEZ discussion posts

Num. of Percent Accuracy Total Class 1 Class 2 Class 3 Num. of
classes change samples samples samples samples words
2 0.25 0.71 6 162 2 929 0 3 233 5 235
2 0.50 0.72 5 300 2 494 0 2 806 4 601
2 1.00 0.76 3 935 1 797 0 2 138 3 714
3 0.25 0.63 7 281 2 929 1 119 3 233 5 923
3 0.50 0.65 7 281 2 494 1 981 2 806 5 923
3 1.00 0.80 7 281 1 797 3 346 2 138 5 923

ISBN: 978-1-941968-43-72017 SDIWC 67

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

If we look at how balanced the datasets are, For discussion posts Extremely randomized
we can say that for 2 classes they are in all trees ensemble method was the best, closely
cases relatively well balanced. For 3 classes, followed by Multi-layer Perceptron neural
there is a clear misbalance. This can be seen network. However, out of the best five
especially in Table 6 where 82% of samples algorithms for news articles, only one
belongs to the constant class. (LogisticRegression) was successful also for
the posts. This indicates that for each type of
If we compare the classification accuracy for document different algorithms are suitable.
different datasets, we see that for discussion
posts it is far higher (+10%) than for news 5 CONCLUSION
articles. Interesting is that the accuracy
obtained by training a classifier on all The goal of the article was to examine the
discussion posts and the PX index (Table 3) is relationship between the content of text
higher than when using only posts and prices documents published on the Internet and the
for one company (Table 5). direction of movement of stock prices on the
Prague Stock Exchange. For this, text
The highest accuracy was achieved always for classification was used.
3 classes and 1% change. If we consider only
0.25 and 0.50% changes, accuracy for 2 The connection was found as demonstrated by
classes is always better than for 3 classes. the achieved classification accuracy. When
Generally, it can be said that the higher the using binary classification (documents with
percentage change, the higher the accuracy. constant class were discarded), we achieved
However, this does not hold for Patria news an accuracy of 75-78% for discussion posts
articles with 2 classes (Table 4), where is the and about 60% for news articles. For ternary
highest accuracy achieved for 0.25% change. classification, the accuracy was lower (about
65% and 40-50%). However, for all datasets
Table 6.Comparison of avg. accuracies for vector type was the accuracy, when using the highest 1%
Vector Akcie.cz CEZ Patria threshold for minimal price change, 80 %.
TP 0.67 0.63 0.51 During the work, we encountered several
TF 0.66 0.63 0.51 problems. The most notable one was a rather
TF-IDF 0.67 0.63 0.53 small amount of available data especially
the news articles.
Table 6 tells us that for discussion posts the
used vector type was not very important, It must be noted that the goal was to examine
however for news articles the highest if there is a connection between texts and
accuracy was achieved by TF-IDF. stock prices, not to achieve the highest
possible accuracy for each classification
Table 7. Comparison of avg. accuracies for
algorithm. Because of this, we used only
classification algorithms
Akcie.cz discussion posts Patria news articles
default settings (parameters values) for the
Algorithm Avg. Algorithm Avg. algorithms. An optimization of these
acc. acc. parameters might bring us a few percent
ExtraTrees 0.72 LogisticRegression 0.56 higher accuracy.
MLP 0.72 CalibratedClassifier 0.56 There are many options for further research in
RandomForest 0.71 SVC 0.56 this area: use clustering/topic models (e. g.
LogisticRegression 0.71 LogisticRegression 0.53 LDA) to find document classes based on their
LinearSVC 0.71 RidgeClassifier 0.53 content; use bigrams or tri-grams as features;
take into account the importance (popularity)
Table 7 shows classifiers with the best of the document, use more values for minimal
average performance across all experiments. price change and also other time interval
(more or less than 1 day).

ISBN: 978-1-941968-43-72017 SDIWC 68

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

[1] SHILLER, R. J. From efficient markets theory to
behavioral finance. The Journal of Economic
Perspectives. 2003, vol. 17, no. 1, p. 83104.

[2] BOLLEN, J., MAO, H. and ZENG, X. Twitter

mood predicts the stock market. Journal of
Computational Science. 2011, vol. 2, no. 1, p. 18.

[3] KAPLANSKI, G. and LEVY, H. Sentiment and

stock prices: The case of aviation disasters. Journal
of Financial Economics. 2010, vol. 95, no. 2, p.


Forecasting with Twitter Data. ACM Transactions
on Intelligent Systems and Technology (TIST).
2013, vol. 5, no. 1, p. 8.

[5] LEE, Heeyoung, et al. On the Importance of Text

Analysis for Stock Price Prediction. In: LREC.
2014. p. 1170-1175

[6] Schumaker, R. P., Zhang, Y., Huang, C.-N., and

Chen, H. (2012). Evaluating sentiment in financial
news articles. Decision Support Systems.


PRICHYSTAL, J. Analyzing the correlation
between online texts and stock price movements at
micro-level using machine learning," MENDELU
Working Papers in Business and Economics 2016-
67, Mendel University in Brno, Faculty of
Business and Economics. Available at:


T. Fundamentals of Predictive Text Mining.
London: Springer, 2010.ISBN 978-1-84996-225-4.

[9] MANNING, C. D. and SCHTZE, H.

Foundations of Statistical Natural Language
Processing. Cambridge, USA: The MIT Press,
1999. ISBN 9780262133609.

ISBN: 978-1-941968-43-72017 SDIWC 69

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Continuous GA-based Optimization of Order-Up-To

Inventory Policy in Logistic Networks for Achieving
High Service Rate
Przemysaw Ignaciuk ukasz Wieczorek
Institute of Information Technology Institute of Information Technology
Lodz University of Technology Lodz University of Technology
d, Poland d, Poland
przemyslaw.ignaciuk@p.lodz.pl 201010@edu.p.lodz.pl

AbstractThe paper addresses the problem of efficient goods more complex. One may argue that, nowadays, the general
distribution in logistic networks having a mesh structure. The availability of powerful computing machines creates new
transfer of goods takes place among the interconnected nodes opportunities for solving realistic optimization problems
with non-negligible delay. The stock gathered at the nodes is which until recently did not exist. However, performing
replenished from external sources as well as from other nodes in extensive numerical treatment becomes possible only when an
the controlled network. External demand is imposed on any node efficient method is selected, e.g., within the evolutionary
without prior knowledge about the requested quantity. The computation domain [9].
inventory control is realized through the application of order-up-
to policy implemented in a distributed way. The aim is to provide The purpose of this paper is to evaluate the usefulness of
high customer satisfaction while minimizing the total holding genetic algorithms (GAs) in the optimization of logistic
costs. In order to determine the optimal reference stock level for network performance when subjected to the control of the
the policy operation at the controlled nodes a continuous genetic classical order-up-to (OUT) [10] inventory policy. The
algorithm (GA) is applied and adjusted for the analyzed class of research is focused on a sophisticated, yet realistic case of a
application centered problems. system with mesh-type topology. In the analyzed structure
type, a particular node connected to multiple nodes may
Keywordslogistic networks, order-up-to policy, optimization, play the role of supplier and goods provider to effectuate the
continuous genetic algorithm.
stock replenishment decisions. The decisions are taken
I. INTRODUCTION according to the indications of the OUT policy, deployed in a
distributed way. The optimization objective is to determine the
The optimization of logistic network operation is a reference stock level for each individual node so that the
computationally challenging task. Owing to complex holding costs in the entire system are minimized while at the
mathematical dependencies and delayed interaction of system same time maintaining a given service level.
components (e.g., in a practical system the goods cannot be
transferred immediately among the nodes) makes the Since the considered problem has a continuous search
numerical analysis of multi-node networks resource domain, applying basic GA would require translating the
prohibitive. In particular, determination of the cost (or fitness) system variables (and associated operations) into the binary
function is time consuming. Moreover, the presence of form. Therefore, unlike the typical GA binary-value
nonlinearities may lead to many local minima. In the scientific implementation, one that resides in the continuous search
literature, the optimization of logistic systems is examined space is used. Moreover, as opposed to the standard GA
mainly in the case of basic structures, e.g., when each internal tuning procedures, proposed for the artificial optimization
node has only one goods supplier [1]. The most common types problems where multiple cost function evaluations are
of such structures are: permissible [11], the long time of obtaining the fitness
function value in the considered class of networked systems
single-echelon [2, 3] a single provider connected to shifts the GA tuning effort towards constrained number of
the controlled node; iterations. The effectiveness of GA in reaching the optimal
serial interconnection [4, 5] all the nodes connected network state is evaluated in numerous simulations.
one-by-one in a line; II. SYSTEM DESCRIPTION
tree-like organization [6][8] a particular node A. Actors in Logistic Processes
replenishes the stock of a few children.
The paper analyzes the process of goods distribution
These studies are not sufficient for current logistic among the nodes (warehouses, stores, etc.) of a logistic
systems, where the actually deployed architectures are much network. Each node has limited capacity to store the goods.

ISBN: 978-1-941968-43-72017 SDIWC 70

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

The nodes are connected in a direct manner and a mesh Let us introduce:
topology is permitted. Each connection is characterized by two
attributes: iS ( t ) quantity of goods sent by node i in period t,
delivery delay time (DDT) the time from issuing an iR ( t ) quantity of goods received by node i in
order for goods acquisition until their delivery to the period t.
ordering node;
The stock level at node i evolves according to
supplier fraction (SF) the percentage of ordered
quantity to be retrieved from a particular supply
li (t 1) li (t ) iR (t ) d i (t ) iS (t )

source selected by the ordering node.
Apart from the initial stock at the nodes, the main source
of goods in the network are the external suppliers. There are where ( f )+ denotes the saturation function ( f )+ = max{ f, 0}.
no isolated nodes that would not be linked to any other The satisfied external demand si(t) at node i in period t (the
controlled node or external supplier, neither the nodes that goods actually sold to the customers) may be expressed as
would supply the stock for themselves. In addition, there is a
finite path from each controlled node to at least one external si (t ) min li (t ) iR (t ), d i (t )
source, which means that the network is connected. The
system driving factor is the external customer demand
imposed on the controlled nodes. The demand can be placed at Consequently, (1) may be rewritten as
any node and, as in the majority of practical cases [10, 12], its
future value is not known at the moment of issuing an order. li (t 1) li (t ) iR (t ) si (t ) iS (t )
The business objective is to ensure high customer satisfaction
through fulfilling the external demand, at the same time Let oi(t) denote the total quantity of goods to be ordered by
avoiding unnecessary increase of the operational costs. Thus, node i in period t. oi(t) covers the orders to be realized both by
the optimization purpose is to obtain a high service level at the other controlled nodes as well as the external sources. Then,
lowest possible cost of goods storage at the nodes, i.e., the quantity sent by node i in period t in response to the orders
minimizing the total network holding cost (HC). from its neighbors
B. Actor Interaction
The considered logistic network consists of N nodes ni, iS (t )
j N
ij (t )o j (t )
where index i N = {1, 2, , N}, and M external sources mj,
where j M = {1, 2, , M}. The set containing all the
On the other hand, the quantity of goods received by node i in
indices = {1, 2, , N + M}. Let li(t) denote the on-hand
period t from all its suppliers
stock level (the quantity of goods currently stored) and di(t)
the external demand imposed on node i in period t, t = 0, 1, 2,
, T, T being the optimization time span. The connection iR (t ) ji (t ji )oi (t ji )
between two nodes i and j is unidirectional, characterized by
two attributes (ij, ij):
The nodes try to answer both the external and internal
ij the SF between nodes i and j, ij [0, 1]; demand. In case of insufficient stock to fulfill all the requests,
the ordered quantity is reduced accordingly, yet
ij the DDT between nodes i and j, ij [1, ], where
denotes the maximum DDT between any two
interconnected nodes. i 0 ji (t ) 1
Fig. 1 illustrates the operation sequence at a network node
occurring in each period.
When a node receives a request from another controlled node
in the network and is able to fulfill it, then ij(t) = ij.
Otherwise, ij(t) < ij. It is assumed that the external sources
are able to satisfy every order originating from the network
(uncapacitated external sources).
C. State-Space Description
For the purpose of convenience of further study, a network
state-space model will be introduced. The dynamic
Fig. 1. Node operational sequence.
dependencies can be grouped into
Detailed mathematical description of node interaction is

given in [12]. Below, only the fundamental issues required for l (t 1) l (t ) M (t )o(t ) M 0 (t )o(t ) s (t )
the algorithm implementation are covered. 1

ISBN: 978-1-941968-43-72017 SDIWC 71

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

where the applied symbols denote:

l (t ) vector of stock levels (system state)

l (t ) l1 (t ), l2 (t ),..., l N (t )

o ( t ) vector of stock replenishment orders

o(t ) o1 (t ), o2 (t ),..., oN (t )

Fig. 2. OUT policy operational sequence.
s ( t ) vector of satisfied (external) demands
According to [10], the quantity in the replenishment order
placed by node i in period t may be calculated as
s (t ) s1 (t ), s2 (t ),..., sN (t )

oi (t ) lir li (t ) i (t )
M (t) matrices specifying the node interconnections;
for each [1, ], where:
lir the reference stock level set at node i, i [1, N],
i1 (t ) 0 0 0
i : i 1 i (t ) the quantity of goods from pending orders
0 i 2 (t ) 0 0
i :i 2 issued by node i (the orders already placed by not yet
M (t ) 0 0 i 3 (t ) 0 realized due to delay).
i :i 3

In order to allow application of the OUT policy in a
0 0 0 iN (t ) distributed environment, which is considered explicitly in this
i :iN work, formula (13) needs to be converted into a vector form
M0 (t) matrix describing the stock depletion due to t 1
internal shipments o(t ) l r l (t ) M i ( )o( )

k 1 t k

0 12 (t ) 13 (t ) 1N (t )
(t ) where lr denotes the vector of reference stock levels.
21 0 23 (t ) 2 N (t )
A logistic network should retain a high service level
M (t ) 31 (t ) 32 (t ) 0 3 N (t ) despite imprecise knowledge about the demand future

evolution. The system performance is quantified through the
N 1 (t ) N 2 (t ) N 3 (t ) 0 fill rate, i.e., the percentage of actually realized customer
demand imposed on all the nodes. The optimization objective
is to indicate a reference stock level for each node so as to
D. Order-Up-To Inventory Policy preserve the lowest possible holding costs while keeping the
One of the popular stock replenishment strategies applied fill rate close to a predefined one ideally 100%. As a first
in logistic systems is the OUT inventory policy. This policy approximation, using only the knowledge about the highest
attempts to elevate the current stock level to a predefined expected demand in the system dmax, the 100% fill rate is
reference one. A replenishment order is issued if the sum of obtained if the reference stock level is selected according to
the on-hand stock level and goods quantity from pending the following formula
orders at a node is below the reference level. The reference
level should be set so that high percentage of the external

demand is satisfied. The network optimization procedures l r I N M M 1d max
discussed in this paper provide guidelines for the reference 1
stock level selection under uncertain demand (the future
demand is not known precisely while issuing the stock III. GENETIC ALGORITHM
replenishment orders). The operational sequence of the OUT
policy is presented in Fig. 2. In order to optimize the performance of the considered
class of logistic networks according to the objectives stated in
Section II, a continuous-domain GA has been implemented.
Let the vector containing the reference stock levels of all the
controlled nodes be a candidate solution (an individual) in the
population used by the GA. The genotypes of each individual
correspond to the phenotypes of reference stock levels. Since

ISBN: 978-1-941968-43-72017 SDIWC 72

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

the domain of reference stock level selection is continuous B. Fitness Function

(can take any value from a given interval), a continuous GA is The key element in the optimization problem and
employed [11]. The reference stock level at a particular node evaluation of the progress of applied evolutionary method is
represents an allele in the population individual. As a result, the choice of fitness function. In the case of considered
one does not need to represent candidate solutions through application area, two factors influence the solution fitness:
binary sequences the operations of selection, crossover, and
mutation are executed directly for real-valued candidates. HC holding cost, HC [0, HCmax],
Fig. 3 outlines the algorithm operational sequence, discussed
in detail in a latter part of the paper. FR fill rate, FR [0, 1].
The simulation objective is to minimize the total system
holding cost while ensuring high customer satisfaction.
Accordingly, the following fitness function has been selected:

Fitness 1 FR

where and are tuning parameters used to investigate the

impact of prioritizing cost reductions to customer satisfaction
in finding the optimal solution.
C. Selection
Selection in the considered GA is realized using roulette-
wheel approach illustrated in Fig. 4. It means that each
individual has assigned a fraction within the range [0, 1]
proportional to its fitness value relative to the rest of the
generation. Using a random selector, the entire population is
divided into pairs.

Fig. 4. Roulette-wheel selection for a particular generation.

D. Crossover
The crossover operation is performed in the typical way
Fig. 3. Genetic algorithm flowchart.
for GAs. First, a uniformly distributed random number is
A. Initialization generated. Its value does not exceed the gene size in an
individual. Then, each pair from the previous selection is used
The initialization stage includes calculating the reference to form two new candidate solutions. More precisely, each
stock levels according to formula (15), i.e., under the individual from a particular pair is divided into two sub-
assumption that the system is faced with fixed external vectors and two child individuals are formed through
demand equal to its largest value throughout the entire swapping these sub-vectors. For two individuals
optimization time span. This setting allows one to determine A = [lA1, lA2, , lAN] and B = [lB1, lB2, , lBN] the crossover at a
the maximum holding cost HCmax as a boundary point for point determined by random number , [0, N], results in
further calculations. Although full customer satisfaction is
then obtained, the holding cost is high and need to be reduced. C1 = [lA1, lA2, , lA, lB(+1), , lBN],
C2 = [lB1, lB2, , lB, lA(+1), , lAN].

ISBN: 978-1-941968-43-72017 SDIWC 73

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017


The final phase of the algorithm operation is mutation, Fitness function

Optimization results
which depends on the mutation rate coefficient and occurs coefficients Iteration
Holding number
relatively infrequently. The mutation involves replacing a Fill rate
selected gene with a random value from the considered
1 1 0.9727 10258 663
domain. If the mutation rate equals 0.01 each gene of the
individual after crossover operation has the probability of 1% 1 10 0.9994 17452 731
that its value will mutate.
1 40 1.0000 18217 933
IV. NUMERICAL STUDY 5 1 0.8867 6100 728
In order to evaluate the performance of GA in finding the 5 10 0.9865 12782 1401
optimal solution a MATLAB-based application has been
created (sources available on-line [13]). The application 5 40 0.9979 16869 576
enables generating a mesh-type network topology for given 10 1 0.6904 2496 1357
input parameters (node and supplier number, connectivity
structure, and demand pattern) and obtain the fitness function 10 10 0.9709 10093 672
value through the simulation of network behavior within a 10 40 0.9933 15118 1109
specified time frame.
Fig. 5 shows the topology under consideration in the
numerical study. The network consists of two external sources Fig. 6 illustrates the progress of optimization process
(M = 2) and three internal nodes (N = 3). The numbers above quantified through the fitness function changes in the GA
the links joining the nodes indicate the nominal quantity iterations. The fitness function shaping coefficients have been
partitioning and delay, e.g., node 3 acquires 20% of set as = 10 and = 40. Since the optimal solution is not
established order from node 1 with delay of 3 periods. The known a priori, the stopping criterion is enforced through a
external demand (imposed on each node) has been generated predefined maximum number of iterations. As the second
using Gamma distribution with parameters shape = 5 and stopping criterion, besides the simulation duration, a threshold
scale = 10. The simulation lasts T = 50 periods. The GA for the number of iterations without improving the fitness
generation size has been set as 10. The overall initial holding function value is specified. In the analyzed case the threshold
cost (considering all three nodes) equals 105 units. equals 300 iterations. The dashed line in Fig. 6 indicates the
best solution established using a full search method.

Fig. 5. Logistic network structure.

Table 1 groups the data regarding the fill rate and obtained
holding costs for different sets of fitness function shaping
coefficients. It allows one to assess the impact of cost
reduction vs. ensuring high customer satisfaction.
Fig. 6. Fitness adjustment progress.
It follows from the analysis of obtained data that even a
small change of the fitness function coefficients may have Figs. 7 and 8 display the stock level evolution at the nodes
significant impact on the cost structure and process of for the initial and final simulation. As can be seen from the
determining the optimal solution. Depending on the
graphs, the GA algorithm successfully eliminates superflous
circumstances in a given scenario, the relative importance of
resources (and reduces the holding costs) while keeping the
those factors can be balanced to achieve a desirable solution.
stock positive most of the time, which implies a high fill rate.
Increasing raises the importance of holding cost reduction in
the goods distribution process, while increasing guarantees a
higher fill rate (improved customer satisfaction). Simultaneous
increase of both coefficients leads to a state of near full
customer satisfaction attained with minimum holding costs.

ISBN: 978-1-941968-43-72017 SDIWC 74

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

number of iterations needed to find the optimum may be

reduced. On the other hand, in each iteration the fitness
function will be called more times. The tests, executed for
different network structures, GA parameters and fitness
function coefficients, indicate that the use of GAs for
reference stock level selection is advisable. In contrast to the
full-search approach, the desired balance between the holding
cost reduction and elevating the customer satisfaction is
achieved with computing resources attainable at a common
[1] H. Sarimveis, P. Patrinos, C.D. Tarantilis, and C.T. Kiranoudis,
Dynamic modeling and control of supply chain systems: A review,
Computers and Operations Research, vol. 35, pp. 3530-3561, 2008.
[2] K. Hoberg, J.R. Bradley, and U.W. Thonemann, Analyzing the effect
of the inventory policy on order and inventory variability with linear
control theory, European Journal of Operational Research, vol. 176, pp.
Fig. 7. Stock level at the nodes for initial generation.
1620-1642, 2007.
[3] P. Ignaciuk and A. Bartoszewicz, Linear-quadratic optimal control of
periodic-review perishable inventory systems, IEEE Transactions on
Control Systems Technology, vol. 20, pp. 1400-1407, 2012.
[4] M. Boccadoro, F. Martinelli, and P. Valigi, Supply chain management
by H-infinity control, IEEE Transactions on Automation Science &
Engineering, vol. 5, pp. 703-707, 2008.
[5] K.K. Movahed and Z.-H. Zhang, Robust design of (s, S) inventory
policy parameters in supply chains with demand and lead time
uncertainties, International Journal of Systems Science, vol. 46, pp.
2258-2268, 2015.
[6] C.O. Kim, J. Jun, J.K. Baek, R.L. Smith, and Y.D. Kim, Adaptive
inventory control models for supply chain management, The
International Journal of Advanced Manufacturing Technology, vol. 26,
pp. 1184-1192, 2005.
[7] P. Ignaciuk, LQ optimal and robust control of perishable inventory
systems with multiple supply options, IEEE Transactions on Automatic
Control, vol. 58, pp. 2108-2113, 2013.
[8] P. Ignaciuk, Nonlinear inventory control with discrete sliding modes in
systems with uncertain delay, IEEE Transactions on Industrial
Informatics, vol. 10, pp. 559-568, 2014.
Fig. 8. Stock level at the nodes for final generation.
[9] E. Choodowicz and P. Orowski, Comparison of a perpetual and PD
inventory control system with Smith Predictor and different shipping
V. CONSLUSIONS delays using bicriterial optimization and SPEA2, Pomiary Automatyka
The paper explores the application of continuous-domain Robotyka, vol. 20, pp. 5-12, 2016.
GAs for optimization of mesh-type logistic networks governed [10] S. Axster, Inventory Control, Springer International Publishing, 2015.
by the OUT inventory policy. In the considered class of [11] D. Simon, Evolutionary Optimization Algorithms, John Wiley & Sons,
systems, the objective has been defined as reducing the overall 2013.
holding cost while ensuring high customer satisfaction by [12] P. Ignaciuk, State-space modeling and analysis of order-up-to goods
distribution networks with variable demand and positive lead time,
appropriate choice of the stock reference level at the network Information Systems Architecture and Technology: Proceedings of 37th
nodes. The chosen fitness function allows one to smoothly International Conference on Information Systems Architecture and
balance the economic costs and customer satisfaction by Technology ISAT 2016 Part IV, pp. 55-65, 2017.
changing two algebraic coefficients. These changes have no [13] . Wieczorek, Application sources, [on-line]. Available:
significant effect on the number of iterations needed to find https://github.com/UksonCode/logistic-networks-ga-optimization
the optimal solution. The key behind the GA performance is
the size of generation. By increasing the generation size, the

ISBN: 978-1-941968-43-72017 SDIWC 75

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Fusing Geometric and Appearance-based Features for Glaucoma Diagnosis

Kangrok Oha Jooyoung Kima Sangchul Yoonb Kyoung Yul Seob
School of Electrical and Electronic Engineering, Yonsei University
50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
kangrokoh, harrykim@yonsei.ac.kr
Department of Opthalmology, Yonsei University
50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea

ABSTRACT Clinical glaucoma diagnosis is usually per-

formed based on intra-ocular pressure mea-
In this paper, we propose to fuse geometric and
surement, visual field test, appearance analy-
appearance-based features at the feature-level for
sis of the optic nerve head (ophthalmoscopy),
automatic glaucoma diagnosis. The cup-to-disc ra-
etc. By means of technology development in
tio and neuro-retinal rim width variation are ex-
the field of image processing and artificial in-
tracted as the geometric features based on a coarse-
telligence, several attempts to diagnose glau-
to-fine localization method. For the appearance-
coma in an automatic manner using fundus im-
based feature extraction, the principal components
ages can be found in the literature recently. A
analysis is adopted. Finally, these features are com-
brief summary of the recent works regarding
bined at the feature-level based on the random pro-
automatic glaucoma diagnosis is presented in
jection and the total error rate minimization classi-
Table 1. Depending on the way of extraction
fier. Experimental results on an in-house data set
features, the recent existing works related with
shows that the feature-level fusion can enhance the
automatic glaucoma diagnosis can be grouped
classification performance comparing with that be-
into two categories: i) appearance-based fea-
fore fusion.
ture extraction approach, and ii) geometric fea-
ture extraction approach.
KEYWORDS The existing works in the appearance-based
Glaucoma diagnosis, feature-level fusion, cup-to- feature extraction approach category aim to
disc ratio estimation, principal components analy- extract features using the intensity values of
sis, total error rate minimization an image or a sub-image. In [5], a set of
appearance-based features are combined based
on feature selection or ranking for accuracy
performance enhancement. Similarly, a set of
Glaucoma is defined as a group of disease different features (high order spectra (HOS),
that damage the eyes optic nerve and can re- discrete wavelet transform (DWT), etc.) are
sult in vision loss and blindness according fused to improve the classification accuracy in
to [1]. Glaucoma is known as the leading [6]. In [7], texture features are extracted based
cause of blindness [2], and the number of peo- on a local binary patterns (LBP) operator.
ple with glaucoma in the United States is re- Those in the geometric feature extraction ap-
ported as about 2.7 million in 2010 and esti- proach category attempt to localize mainly the
mated to reach about 6.3 million in 2050 [3]. optic disc and cup regions, and extract geo-
Since the progression of glaucoma is slow and metric features such as cup-to-disc ratio (CDR)
irreversible [4], often it is called the silent and neuro-retinal RIM thickness variations.
thief of sight. Hence, diagnosing glaucoma Essentially, iterative methods such as the ac-
at an early stage is important to prevent unde- tive contour model (ACM) [8], Otsus theshold
sirable impairment or blindness caused by non- [9], Hough transform [10] are adopted for the
awareness. localization task to improve the performance.

ISBN: 978-1-941968-43-72017 SDIWC 76

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Table 1. A brief summary of related works.

Index Category Feature Extraction Classification Data Set (#Gq ,#NGr ) Reported Accuracy(%)
Acharya et al. [5] Appearance-baseda HOSc , GLCMd , RLMe SVMk , SMOl , In-house (30,30) 91.00%
NBm , RFn
Krishnan and Faust Appearance-based HOS, TTf , DWTg SVM In-house (30,30) 91.67%
Ali et al. [7] Appearance-based LBPh NNo HRFs +In-house (13,28)t 95.10%
Fondon et al. [8] Geometricb CDRi Thrp In-house (,) 78.10%
Guerre et al. [9] Geometric CDR, NRIMj SVM In-house (15,14) 89.00%
In-house (18,8) 71.00%
Dutta et al. [10] Geometric CDR Thr HRF (15,15) 90.00%

a : Appearance-based Feature Extraction, b : Geometric Feature Extraction, c : High Order Spectra,

d: Gray-Level Co-occurrence Matrix, e : Run Length Matrix, f : Trace Transform, g : Discrete Wavelet Transform,
h : Local Binary Patterns, i : Cup-to-Disc Ratio, j : Neuro-retinal RIM width ratio, k : Support Vector Machine,
l : Sequential Minimal Optimization, m Naive Bayesian, n : Random Forests, o : Nearest-Neighbor, p : Threshold,
q : The number of glamatous images, r : The number of non-glaumatous images, s : High Resolution Fundus image data set,
t : only left eye images are utilized.

This may result in obtaining a high computa- nosis. For the geometric features, the cup-to-
tional cost. In addition, even though the ge- disc ratio (CDR) and the inferior-superior rim
ometric and appearance-based features can be length to nasal-temporal rim length ratio (IS-
easily combined, fusion of the geometric and NTR) are extracted from a fundus image. The
appearance-based features is not investigated principal components analysis (PCA) [11] is
thoroughly yet. adopted for the appearance-based feature ex-
In this paper, we propose to fuse geometric and traction. The geometric and PCA features are
appearance-based features at the feature-level. fused at the feature-level by feature concatena-
For the geometric feature extraction, a non- tion. Subsequently, the feature vector is nor-
iterative coarse-to-fine localization scheme is malized based on the min-max normalization
proposed. Particularly, a matrix multiplication [12], and expanded by the random projection
is designed to perform two-dimensional (2-D) (RP) [13, 14]. Finally, classification is per-
mean filtering at the coarse search stage. The formed based on the total error rate minimiza-
principal components analysis (PCA) [11] is tion (TER) classifier [15]. Figure 1 shows an
adopted for the appearance-based feature ex- overall flow of the proposed method.
traction. Finally, the total error rate minimiza-
tion (TER) with a random projection (RP) is 2.1 Image Preprocessing
adopted for classification. The main contri- At the preprocessing stage, image resize, mask
bution of our paper includes i) proposal of a generation, and image cropping are sequen-
feature-level fusion scheme based on the ge- tially performed for further feature extraction
ometric and appearance-based features, and and classification. 1424 2144 RGB images
ii) proposal of a matrix projection for two- are resized to 650 800 based on the bi-cubic
dimensional mean filtering. interpolation [16]. Then, a weighted image is
This paper is organized as follows: the pro- generated based on images in red, green, and
posed feature-level fusion scheme is presented blue channels as follows:
in Section 2. Section 3 provides some experi-
mental results and analysis. Finally, some con- W = wR R + wG G + wB B, (1)
cluding remarks are presented in Section 4.
where R R650800 , G R650800 , and
B R650800 are image matrices in red,
green, and blue channels, respectively. Here,
In this paper, we propose to fuse geometric and W R650800 denotes a weighted image ma-
appearance-based features for glaucoma diag- trix. wR , wG , and wB denote weight values for

ISBN: 978-1-941968-43-72017 SDIWC 77

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

)XQGXV3KRWR a pixel with the highest intensity value. Pur-

pose of this search is to reduce the range for
further disc and cup localization. To accom-
plish this, vessels are removed by means of the
morphological dilation operation [16]. Next,
,PDJH3UHSURFHVVLQJ (2K + 1) (2K + 1) mean filtering is per-
formed to smoothen the image. The 2-D mean
:HLJKWHG,PDJH :HLJKWHG,PDJH filtering is implemented using the following
&RDUVH'HWHFWLRQ matrix multiplication. For this operation, we
define two matrices as follows:
&XS&HQWHU 1 1 0 0

$QDO\VLV 3&$ 0 1 1
. . . . . . . ...

.. ,

. . .
.. .. .. 1 1 0

0 0 0 1 1 RP

1 0 0 0


1 ... . . .


F2 = 0 1 . . .
, (3)

1 0
. ..

0 0 .. . 1

.. .. . . ..
1 .
Figure 1. An overview of the proposed method. 0 0 0 1 QS
where FL1 IRP and FL2 IQS are ma-
the red, green, and blue channels, respectively. trices for pre- and post- multiplications with a
The purpose of generating a weighted image is weighted image matrix, and L = 2 K + 1.
to make the cup region distinguishable for fur- Here, R = P 2 K and S = Q 2 K are set
ther localization process. to exclude the boundary pixels of the weighted
image matrix for the filtering operation. We set
2.2 Geometric Feature Extraction wR = 0.2, wG = 0.3, and wB = 0.5 to gener-
ate the weighted image to obtain the brightest
The CDR and the ISNTR are adopted as the ge-
pixel coordinates.
ometric features for glaucoma diagnosis. They
From the matrices defined in (2) and (3), the
are estimated by localizing the disc and cup re-
output matrix from the 2-D mean filtering is
gions from a fundus image. The CDR is esti-
obtained as follows:
mated based on three different measures while
the ISNTR is estimated based on a single mea- 1 L
M= 2
F1 W FL2 , (4)
sure. Finally, the geometric features extracted L
from a fundus image becomes a four dimen-
where W RP Q is a weighted image ma-
sional vector.
trix (P = 650 and Q = 800), and M RRS
is an output matrix from the 2-D mean filter-
2.2.1 Coarse Detection of Disc Region
ing. Row and column coordinates of a pixel
In order to localize the disc and cup regions with the brightest intensity are searched from
from an image, we search for coordinates of the matrix M . Here, we denote the row and

ISBN: 978-1-941968-43-72017 SDIWC 78

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

column coordinates as (R, C). Subsequently, are removed. Subsequently, morphologi-

an image sub-region with the size of 251 251 cal hole filling, dilation, and erosion oper-
centered at the coordinates (R + K, C + K) ations [16] are performed to obtain a can-
are cropped from the red, green, and channel didate disc region.
image matrices R, G, and B. The cropped
red, green, and blue channel images are de- 5. An ellipse fitting based on least squares
noted as Rc R251251 , Gc R251251 , and [17] is applied to boundaries of the can-
Bc R251251 , respectively. Figure 2 (a) didate disc region to obtain a fine disc re-
shows examples of intermediate results from gion. The output matrix from the disc lo-
the coarse detection of disc region. calization is defined as Bdisc I251251
where pixels inside the disc region have
1 values while those in the non-disc re-
2.2.2 Disc Localization
gion have 0 values.
For disc localization, we utilize the cropped red
channel image since it shows the most distinc- Figure 2 (b) shows examples of the intermedi-
tive information on the disc region. To localize ate results obtained from the disc localization
the disc region, a set of operations are sequen- process.
tially performed on Rc as follows:
2.2.3 Cup Localization
1. Vessels are removed based on the mor- For cup localization, a set of operations are se-
phological dilation operations [16] from quentially performed on Rc , Gc and Bc as fol-
Rc . lows:

2. Histogram equalization [16] is performed 1. Vessels are removed using the morpho-
on the image without vessels for con- logical dilation operations [16] from Rc ,
trast enhancement. The image resulting Gc , and Bc . The resulting matrices from
from the vessel removal and the histogram the vessel removal are defined as Rr
equalization is defined as Hc R251251 . R251251 , Gr R251251 and Br
R251251 , respectively.
3. Threshold operation is performed on the
2. A weighted image Wr R251251 is gen-
histogram equalized image using two
erated from Rr , Gr , and Br by setting the
threshold values l and r . From an obser-
weight values as wR = 0.3, wG = 0.5,
vation regarding uneven spread of inten-
and wB = 0.2.
sity values over the left and right regions
of Hc , we apply different threshold val- 3. Element-wise matrix multiplication oper-
ues on the left and right half of Hc . We ations are performed on Wr , Gr , and Gc
set l = 0.9 V and r = 0.8 V for using Bdisc to exclude the non-disc re-
right eye images, and l = 0.8 V and gions for cup localization process. The
r = 0.9 V for left eye images. Here, V resulting matrices are defined as Wh =
denotes the intensity value of the brightest Wr Bdisc , Gh1 = Gr Bdisc , and Gh2 =
pixel of Hc . After the threshold operation, Gc Bdisc where denotes the element-
a binary matrix Bd I251251 wherein wise multiplication operator.
pixels with 1 values construct a candi-
date disc region is obtained. 4. A threshold operation is performed on
Wh R251251 using a threshold value
4. From the binary matrix Bd , a chunk of 1 c = 0.9 W where W is the highest in-
values whose center of mass is the clos- tensity value of Wh . We denote the binary
est from the coordinates (126, 126) is ex- matrix resulting from the threshold opera-
tracted and the rest chunk of 1 values tion as Bh I251251 .

ISBN: 978-1-941968-43-72017 SDIWC 79

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Weighted Image 2-D Mean Filtered Image Coarse Detection of Disc Region Cropped Image

(a) Coarse Detection of Disc Region

Red Channel Image Dilated Image Histogram Equalized Image Binary Image Element-wise Matrix Element-wise Matrix
Weighted Image Multiplication - Green Multiplication - Weighted

RGB Channel Images Dilated Images

Binary Image (Single Region) Disc Region - Binary Disc Region - RGB
Binary Image 1 Binary Image 2 Element-wise OR Operation Cup Region - Binary Cup Region - RGB

(b) Disc Localization (c) Cup Localization

Figure 2. Examples of intermediate results: (a) coarse detection of disc region, (b) disc localization, and (c) cup localiza-

5. Vessels with low intensity values are lo- 2.2.4 CDR Estimation
calized using a matrix difference opera-
For estimating the value of cup-to-disc ratio
tion, Gh1 Gh2 , followed by a threshold
(CDR), three different measures, namely i) ver-
operation. The threshold value v is set
tical CDR (VCDR ), ii) horizontal CDR (HCDR ),
as v = 0.8 J where J is the intensity
and iii) area based CDR (ACDR ) are defined as
value of the brightest pixel in Gh1 Gh2 .
We denote the resulting binary matrix as
Bv I251251 . VCDR = Vcup /Vdisc , (5)
HCDR = Hcup /Hdisc , (6)
6. An element-wise OR operation is per-
ACDR = Acup /Adisc , (7)
formed using Bh and Bv . The resulting
binary matrix is defined as Bc = Bh Bv where Vdisc , Hdisc , and Adisc respectively de-
where stands for the element-wise OR note a maximum value of vertical disc length,
operator. Subsequently, the morphologi- that of horizontal disc length, and the number
cal dilation operation [16] is also applied of pixels in a disc region. Similarly, Vcup , Hcup ,
to Bc . and Acup stand for the corresponding values in
a cup region.
7. An ellipse fitting based on least squares is
applied to boundaries of the candidate cup 2.2.5 ISNTR Estimation
region to obtain a fine cup region. The In order to assess the neuro-retinal rim thick-
output matrix from the cup localization is ness variations, thicknesses of inferior, supe-
defined as Bcup I251251 . The elements rior, nasal, and temporal rims are calculated
with 1 values belong to the localized cup from the disc and cup localization results. Sub-
region, and those with 0 values belong to sequently, a ratio of the total length of nasal
the non-cup region. and temporal rims over that of inferior and su-
perior rims (RISN T , ISNTR) is obtained as fol-
Figure 2. (c) shows examples of the interme- lows:
diate results which are acquired from the cup
localization process. RISN T = (LN + LT )/(LI + LS ), (8)

ISBN: 978-1-941968-43-72017 SDIWC 80

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

where LI , LS , LN , and LT respectively denote 2.4 Feature-level Fusion

a maximum length of inferior, superior, nasal,
and temporal rims. The geometric features and the PCA features
are fused at the feature-level by feature con-
catenation. From the four dimensional geomet-
2.3 Principal Components Analysis ric features and d dimensional PCA features,
The principal components analysis (PCA) [11] a feature vector f RM 1 (M = d + 4)
is adopted for appearance-based feature extrac- is constructed where the first four elements
tion. At first, Sub-images with the size of of f are the geometric features, and the last
181 181 centered at the cup center coor- d elements of f are the PCA features. We
dinates are cropped from the red, green, and note here that elements of f is represented
blue channel images (Rc , Gc , and Bc ). Then, within [0, 1] based on the min-max normal-
the sub-images are resized to 50 50 images. ization [12] using the entire samples. Subse-
For the left eye images, images are horizon- quently, the feature vector f is expanded based
tally reversed. Subsequently, a weighted im- on the random projection (RP) [13, 14]. An ex-
age is generated (wR = 0.2, wG = 0.4, and panded feature vector g RDrp 1 is defined as
h   iT
wB = 0.4) and vectorized. From the vectorized g = rT1 f + b1 , , rTDrp f + bDrp ,
images, a training data matrix Xtr RDNtr where rj RM 1 is a random weight vec-
is constructed (Ntr = the number of training tor, and bj is a random bias term for j =
samples and D = 2500). Here, Xtr con- 1, . . . , Drp . Here, Drp stands for the dimen-
catenates column vectors of normal eye im- sion of an expanded feature vector, and ()
ages and those of glaucomatous eye images se- denotes a sigmoid activation function.
quentially. Similarly, a test data matrix Xte
RDNte is constructed (Nte = the number of
test samples). 2.5 Total Error Rate Minimization
From Xtr , a mean vector m RD1 is calcu- The total error rate minimization (TER) [15]
lated as m = N1tr Xtr 1, where 1 NNtr 1 is adopted for classification. Firstly, a matrix
is a vector consisting of 1 values only. Sub- containing the expanded training feature vec-
sequently, a covariance matrix C RDD is tors is defined as follow:
calculated as follows:
Gtr = , (12)
C = (Xtr m 1T )(Xtr m 1T )T . (9) G+tr

Then, a subspace S RdD is obtained by where

concatenating d number of eigenvectors of C T


corresponding to the d largest eigenvalues in tr = g1 , , gN , (13)
a row-wise manner. We note here that each
row vector of S corresponds to an eigenvec- and
tor. PCA features are extracted by projecting +
tr = g1 , , gN + . (14)
the mean subtracted training and test matrices
onto S as follows: Here, gi RDrp 1 and gj+ RDrp 1 denote
the i-th negative and the j-th positive train-
Ptr = S Xtr m 1T ,

(10) ing sample for i = 1, . . . , N and for j =
1, . . . , N + . Here, N and N + stand for the
Pte = S Xte m 1T ,

(11) number of negative and positive training sam-
ples, respectively.
where Ptr RdNtr and Pte RdNte are the Next, at a training phase, a weight parameter
resulting training and test feature matrices. vector RDrp 1 is estimated based on the

ISBN: 978-1-941968-43-72017 SDIWC 81

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017



Figure 3. Sample images of the NMC data set.

TER [15] as follows: tured from left and right eyes. It contains 47
 1 non-glaucomatous and 24 glaucomatous im-
1 T 1 +T +
= bI + N Gtr Gtr + N + Gtr Gtr ages. Figure 3 shows five non-glaucomatous
and five glaucomatous images of the NMC data
G + +
tr 1 + N + Gtr 1+ ,
T N 1
1 = [1, . . . , 1] N and 1+ =
[1, . . . , 1]T NN 1 . We note that b is a

small regularization constant (e.g., 0.0001) and 3.2 Experimental Settings

I IDrp Drp is an identity matrix. In (15),
and stand for a preset threshold and an offset We evaluate the classification performance of
value. At a testing phase, the class label yt of a proposed method in terms of the accuracy
test sample gt is estimated as which is defined as NCte , where C and Nte
 denote the number of correctly classified test
1 if gtT samples and the total number of test sam-
yt = . (16)
0 otherwise ples, respectively. For performance evaluation,
stratified ten runs of five-fold cross-validation
tests are performed. Additionally, ten different
In this section, we evaluate the classifica- random values are utilized for the RP. Conse-
tion performance of the geometric and PCA quently, the classification accuracy is averaged
features before and after fusion. The goal from 500 repetitions.
of our experiments is to show whether the For the geometric feature extraction, the
feature-level fusion can improve the classifica- weight values for the red, green, and blue chan-
tion accuracy comparing with that before fu- nel images and threshold values for 41 images
sion. Firstly, the data set utilized in the ex- are manually adjusted to obtain the disc and
periments is introduced. Next, experimental cup regions more accurately. The reduced di-
settings including evaluation protocols and pa- mension d for the PCA is set to 60 which is
rameter settings are provided. Finally, experi- selected from pre-training, and the dimension
mental results and analysis are presented. Drp of the expanded feature vector by the RP
is controlled within {10, 20, . . . , 500}. Both
3.1 Data Set
and of the TER classifier are set to 0.5,
In our experiments, an in-house data set col- and the regularization constant is set as b =
lected by the National Medical Center in Re- 0.0001. The class-specific normalization pa-
public of Korea is utilized. Hereafter, the NMC rameters are adopted as in [15] (details can be
data set will be used to denote the in-house data found in [15]). The best class-specific normal-
set. The NMC data set consists of 71 eye fun- ization parameter is selected based on training
dus images with the size of 1424 2144 cap- accuracy results and utilized for testing.

ISBN: 978-1-941968-43-72017 SDIWC 82

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017


Figure 4. Examples of correct disc and cup localization.

1RQJODXFRPDWRXV computational complexity. However, incorrect

,QFRUUHFW/RFDOL]DWLRQ disc and cup localization results can be ob-
tained as shown in Figure 5. Figure 6 shows
the estimated CDR values of the entire images.
As shown in the figure, vertical and horizontal
CDR show better results than that of area based
CDR. The mean absolute difference values be-
tween the estimated CDRs and the CDR mea-
*ODXFRPDWRXV sured by an expert (ophthalmologist) are about
,QFRUUHFW/RFDOL]DWLRQ 0.13 (vertical CDR), 0.15 (horizontal CDR),
and 0.23 (area based CDR). From these values,
it is expected that the classification accuracy of
the geometric features are probably degraded
due to the localization errors. In our experi-
ments, we aim to improve the classification ac-
curacy of these geometric features by means of
Figure 5. Examples of incorrect disc and cup localiza- the feature-level fusion with appearance-based
tion. features (PCA).

The average test classification accuracy (here-

3.3 Results after, accuracy in short) performances before
and after fusion are drawn with respect to the
In this section, we provide results and analy- feature dimension Drp in Figure 7. For the
sis regarding i) the geometric feature extrac- geometric features, accuracy values between
tion, and ii) the classification accuracy per- about 71.20% and 74.75% are observed over
formances before and after fusion. Figure 4 the entire range of feature dimension varia-
shows examples of correct disc and cup local- tions. The PCA features show higher accuracy
ization for non-glaucomatous and glaucoma- results (from about 74.50% to 75.27%) than
tous images. As shown in the figure, glauco- that of the geometric features for Drp > 200.
matous images show larger cup regions than Accuracy performance degradation is observed
those of non-glaucomatous images. Figure 5 when the dimension of expanded feature vec-
shows incorrect localization of disc and cup tor by the RP (Drp ) is similar to the original
regions for non-glaucomatous and glaucoma- PCA feature dimension (d = 60). After fus-
tous images. The images in Figure 5 tend to ing the geometric and PCA features, the ac-
show low contrast between the intensity values curacy performance improves from 1% to 4%
of cup and disc regions. Comparing with it- when Drp 150 comparing with that be-
erative methods for disc and cup localization fore fusion. This implies that the geometric
[8, 9, 10], the proposed method has a lower and appearance-based (PCA) features provide

ISBN: 978-1-941968-43-72017 SDIWC 83

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Estimated CDR Values VS CDR by Expert


0.9 CDR-Expert
0.7 CDR-Area
CDR Value






10 20 30 40 50 60 70
Image Index (1~47: Non-glaucomatous, 48~71: Glaucomatous)

Figure 6. CDR values which are estimated by the proposed method and measured by an expert (ophthalmologist).

Classication Accuracy VS Feature Dimension



Average Test Classication Accuracy (%)






64 Before Fusion: Geometric Features

Before Fusion: PCA Features
After Fusion

10 50 100 150 200 250 300 350 400 450 500
Feature Dimension (Drp )

Figure 7. Average test classification accuracy performances of the geometric, PCA (before fusion), and combined features
(after fusion) with respect to feature dimension variations.

complementary information for glaucoma di- by a matrix multiplication which is identical

agnosis. The best accuracy performance after to the two-dimensional mean filtering. We
fusion the geometric and PCA features is about adopted the principal components analysis for
78.96%. the appearance-based feature extraction. Our
experimental results showed that the classifica-
4 CONCLUSION tion accuracy after fusion outperforms that be-
fore fusion. Our future works include acquir-
In this paper, we proposed a fusion scheme ing more fundus images and obtaining more re-
based on the random projection and the total liable results based on a large set of images.
error rate minimization classifier for automatic
glaucoma diagnosis. The proposed method
combines the geometric and appearance-based ACKNOWLEDGMENTS
features at the feature-level. For the geomet-
ric feature extraction, a coarse-to-fine method This research was supported by Basic Science
is proposed for optic disc and cup region lo- Research Program through the National Re-
calization. In particular, coarse detection of search Foundation of Korea(NRF) funded by
the brightest pixel coordinates is performed the Ministry of Education(2016-31-0650).

ISBN: 978-1-941968-43-72017 SDIWC 84

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

REFERENCES Imaging, m-Health and Emerging Communication

Systems (MedCom2014). IEEE, 2014, pp. 8690.
[1] Facts about glaucoma, accessed: 2017-08-
03. [Online]. Available: https://nei.nih.gov/health/
[11] M. A. Turk and A. P. Pentland, Face recogni-
glaucoma/glaucoma facts
tion using eigenfaces, in Proceedings of the IEEE
Computer Society Conference on Computer Vi-
[2] Y.-C. Tham, X. Li, T. Y. W. , H. A. Quigley, sion and Pattern Recognition 1991 (CVPR 1991).
T. Aung, and C.-Y. Cheng, Global prevalence IEEE, 1991, pp. 586591.
of glaucoma and projections of glaucoma bur-
den through 2040: a systematic review and meta-
[12] S. Aksoy and R. M. Haralick, Feature normal-
analysis, Ophthalmology, vol. 121, no. 11, pp.
ization and likelihood-based similarity measures
20812090, 2014.
for image retrieval, Pattern Recognition Letters,
vol. 22, no. 5, pp. 563582, 2001.
[3] Open-angle glaucoma defined tables, accessed:
2017-08-03. [Online]. Available: https://nei.nih.
[13] S. Kaski, Dimensionality reduction by random
mapping: Fast similarity computation for clus-
tering, in Proceedings of the 1998 IEEE Inter-
[4] Glaucoma: The silent thief begins to tell national Joint Conference on Neural Networks
its secrets, accessed: 2017-08-03. [Online]. (IJCNN 1998), vol. 1. IEEE, 1998, pp. 413418.
Available: https://nei.nih.gov/news/pressreleases/
012114 [14] W. F. Schmidt, M. A. Kraaijveld, and R. P.
Duin, Feedforward neural networks with random
[5] U. R. Acharya, S. Dua, X. Du, and C. Kuang, weights, in Proceedings of the 11th IAPR Interna-
Automated diagnosis of glaucoma using texture tional Conference on Pattern Recognition (ICPR
and higher order spectra features, IEEE transac- 1992). IEEE, 1992, pp. 14.
tions on Information Technology in Biomedicine,
vol. 15, no. 3, pp. 449455, 2011. [15] K.-A. Toh and H.-L. Eng, Between classification-
error approximation and weighted least-squares
[6] M. M. R. Krishnan and O. Faust, Automated learning, IEEE Transactions on Pattern Analysis
glaucoma detection using hybrid feature extrac- and Machine Intelligence, vol. 30, no. 4, pp. 658
tion in retinal fundus images, Journal of Mechan- 669, 2008.
ics in Medicine and Biology, vol. 13, no. 01, pp.
1 350 011121, 2013. [16] R. C. Gonzalez and R. E. Woods, Digital Image
Processing (Third Edition). Pearson Education
[7] M. A. Ali, T. Hurtut, T. Faucon, and F. Cheriet, International, 2010.
Glaucoma detection based on local binary pat-
terns in fundus photographs, in Proceedings [17] A. Fitzgibbon, M. Pilu, and R. B. Fisher, Di-
of SPIE Medical Imaging, 2014, pp. 903 531 rect least square fitting of ellipses, IEEE Trans-
903 531. actions on Pattern Analysis and Machine Intelli-
gence, vol. 21, no. 5, pp. 476480, 1999.
[8] I. Fondon, F. Nunez, M. Tirado, S. Jimenez, P. Ale-
many, Q. Abbas, C. Serrano, and B. Acha, Au-
tomatic cup-to-disc ratio estimation using active
contours and color clustering in fundus images for
glaucoma diagnosis, Image Analysis and Recog-
nition, pp. 390399, 2012.

[9] A. Guerre, J. M. del Rincon, and P. Miller, Au-

tomatic analysis of digital retinal images for glau-
coma detection, in Proceedings of 2014 Irish Ma-
chine Vision and Image Processing Conference
(IMVIP2014), 2014.

[10] M. K. Dutta, A. K. Mourya, A. Singh,

M. Parthasarathi, R. Burget, and K. Riha, Glau-
coma detection by segmenting the super pixels
from fundus colour retinal images, in Proceed-
ings of 2014 International Conference on Medical

ISBN: 978-1-941968-43-72017 SDIWC 85

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Classification and Data Analysis for Modeling Selected Colon Diseases

Anna Kasperczuk(1) and Agnieszka Dardzinska(2)

Bialystok University of Technology, Departament of Biocybernetics and Biomedical Engineering
Wiejska 45a, 15-351 Bialystok, Poland

ABSTRACT issues related to data mining and knowledge

discovery from databases is the induction of
Nowadays analyzing the vast amount of data, decision trees [2], [3], [4]. In this paper, we
including medical ones, whose numbers are show an application of selected classification
growing at a rapid rate, it becomes a very important methods for modeling extracted knowledge
task. Therefore, it is needed to find new tools, from a gastrological information system [5].
which will be helpful during the process of Ulcerative colitis is a disease causing long-
extracting knowledge, not only visible and obvious
term inflammation of the colon, creating
from data but also the hidden one. This is extremely
important in medicine, where the users still do not irritation or ulcers. This can lead to a
know the causes of the emergence and treatment of debilitating abdominal pain and potentially
many diseases. The emergence of methods that life-threatening complications. Unfortunately,
would help physicians to make proper decisions, to there is no study on ulcerative colitis.
expand possibilities of proper diagnoses and Specialists have to exclude many other
contribute to development medicine, in a modern, diseases such as ischemic colitis, irritable
automated way. In this paper, a method of bowel syndrome, Crohn's disease, and
constructing a classification model for ulcerative diverticulitis of colon cancer. Accurate
colitis, in which the process of formation is not physical examination and analysis of the
fully known and understandable, is presented. history of previous illnesses help to narrow the
scope of the study. Therefore it is extremely
KEYWORDS important to try to build a classification system
that will help doctors work.
Data mining, classification, decision tree, medical
database, ID3, J48, k-NN method, colon disease
Data mining is a relatively new discipline of
Today we observe huge development of learning. It seeks to understand the processes
science, but still the efficient analysis of stored under investigation and the data they generate.
data, including medical data, becomes an While talking about data mining, references
enormous challenge [1]. It generates the need should be clearly made to the analysis of real,
for exploring new methods and tools helpful to large sets of observations, tested to generate
extract interesting knowledge from the results which can be interesting from the user's
collected data. The rapid development of data expectations point of view [3].
generation, and processing technology creates Knowledge discovery and data mining can be
the need for data collections analyzes. The treated as set of methods and approaches of
clear answer to the need for advanced and analyzing observational data sets which
automated data analysis, database storage and generate unexpected relationships and
data warehouses becomes data mining summarize data in the original way, so that
technology. The main task is to discover non- they can be both understandable and useful to
trivial and previously unknown dependencies their user [6]. One of the popular task of data
and patterns from data. One of the important

ISBN: 978-1-941968-43-72017 SDIWC 86

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

mining is classification process [7], [8], [9], of other classification methods. On the other
[10]. hand, the disadvantage of this method is the
high sensitivity to the missing values of
2.1 Classification attributes, as there are no open assumptions
about the full availability of information
Classification is the assignment of certain gathered in the database [2]. That is why it is
objects to the appropriate classes based on extremely important to prepare data properly
certain features of these objects. While before proceeding to the analysis. We can use
dividing certain objects characterized by a ERID algorithm which helps to extract
variable (qualitative or quantitative) it is knowledge from incomplete information
necessary to designate certain values of these systems [7], [8], [12].
variables as class limit values, creating a Classification trees are used to determine the
classification scheme. affiliation of objects to the dependent variables
The simplest classification scheme is a qualitative classes. This can be done by
dichotomy, which is a simple division of measuring one or more predictive variables.
objects into two classes, a class of objects The classification tree represents the process of
having given feature and a class that does not dividing a set of objects into several
have this feature. The example of such division homogeneous classes. The division is based on
is a partition of society into adults (here the value of the feature of the objects, the list
understood as people over the age of 18) and corresponds to the classes to which the objects
minors. Another selection is a fission into belong, while the edges of the tree represent
women and men. the values of the attributes from which the
The classification is based on finding a way of division was made [13], [14].
mapping data set to predefined classes. Based Tree nodes are described by the attributes of
on the content of the database, a model (such the explored relationship. The tree borders
as decision tree or logical rules) is built. It is specify all possible values for selected
then used either to classify new objects in the attribute. Tree leaves are values of a class
database or to deeper understanding of existing attribute. Classification is done by viewing the
classes. For example, in the medical tree from root to the last leaf through all edges
information systems classification rules for described by attribute values [2], [11], [13],
describing individual diseases can be extracted [14].
from knowledgebase, and then they can be The algorithm for creating a decision tree can
applied automatically in diagnosed subsequent be written as follows [2]:
patients processes [4], [11]. Step 1: For a given set of objects, make sure
that they belong to the same class (if they
2.2 Decision Trees belong - finish the procedure if they are not -
consider all possible divisions of a given set
Decision tree models are the most common into the most uniform subsets).
form of representation of knowledge Step 2: Evaluate the quality of each of these
discovered in data mining process by today's subsets according to the previously accepted
commercially available software. Decision criterion and select the best one.
tree can be treated as a form of description of Step 3: Divide the set of objects as you like.
classification knowledge [12]. Step 4: The steps to perform for each of the
Compared to the other tree classification subsets.
methods, decisions can be made very quickly. For the purpose of this paper, we use modified
The primary advantage of using decision trees C4.5 algorithm. The C4.5 algorithm is one of
is a clear and fast representation of knowledge, two most popular algorithms used in practice.
the ability to use multidimensional data and the This algorithm is actually an extension of the
use of large data sets. In addition, the accuracy ID3 algorithm. In this method, we work on
of this method is comparable with the accuracy incomplete information system, where using

ISBN: 978-1-941968-43-72017 SDIWC 87

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

the containment relation we build a new
dataset, which is more complete then the (, ) = ( )2

primary one [2], [10]. =1

The C4.5 algorithm recursively passes through (1)
all the nodes by selecting a possible division as 2. Manhattan:

long as further subdivision are possible. For
qualitative variables, this algorithm by (, ) = | |
definition creates separate branches for each =1
value of the qualitative attribute. This may be (2)
due to the greater branching of the tree than it 3. Chebyshev:
is desirable, as some values may be rare and (, ) = max(| |)
naturally associated with other ones [2]. (3)
The general idea of the tree induction
algorithm using modified C4.5 algorithm is as In order to improve the performance of the k-
follows: NN algorithm, the commonly used technique
Step 1: The tree starts with a single node is standardization or normalization of data. Its
representing the entire training set. use causes all dimensions for which the
Step 2: If all the examples belong to one distance is calculated to have equal
decision class, then the examined node significance. Otherwise, one dimension would
becomes a leaf and is labeled with that be dominated by other dimensions.
decision. Standardization consists in bringing a situation
Step 3: Otherwise, the algorithm uses the where the average value of a particular feature
measure of entropy (heuristic function) as the is 0 and the standard deviation is equal to 1.
heuristics for selecting the attribute that best () ( )
shares the set of training examples. () =
( )
Step 4: For each test result one branch is (4)
created and training examples are Where:
appropriately separated into new nodes another vector index;
(subtrees). index of feature (variable);
Step 5: The algorithm continues in a recursive
( ) - average value of variable
manner for whole set of examples.
Step 6: The algorithm ends when the stop ( ) - standard deviation of variable .
criterion is reached. Normalization consists in bringing about a
Because of the high sensitivity of the algorithm situation where the value of the variable
for missing data, the k-Nearest Neighbors belongs to the range [0,1].
method (k-NN) can be used. Normalization is expressed by the formula:
Looking for the most similar solution, we get () ( )
() =
the 1-NN algorithm, but sometimes it is good ( ) ( )
to look for the most similar solutions and take (5)
what we have usually done in the past. This Where:
means that we are looking for the most similar another vector index;
solutions, then we count how many times we index of feature (variable);
have chosen a specific method and we choose ( ) maximum value of variable ;
the most popular solution [15].
( ) minimum value of variable .
Similarity is easily judged by distance
calculation. The shorter the distance the more
The description of k-NN algorithm, consisting
similar the case [16]. The most common
of teaching and testing modules [12] which
distances are as follow [12], [15], [16]:
looks as follows:
1. Euclidean:

ISBN: 978-1-941968-43-72017 SDIWC 88

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

1. Make alternatives: standardize / normalize / stricture_crohn_disease. The number of

leave data as they are. quantitative predictors is five and the number
2. Remember the entire training set. of quality predicates is four. Then we use k-
Testing NN method to complete missing data.
1. Make standardization /normalization/ leave
the data that are (testing). To find the best classifier we should pay
2. Count the distance between the test vectors attention to the following parameters we
and all the vectors of the training set. receive in output [17]:
3. Sort the distances from the largest to the TP Rate - rate of the instances correctly
smallest. classified as a given class);
4. See the k-label of the closest vectors to the FP Rate - rate of the instances falsely
test vector. Make a histogram of the frequency classified as a given class);
of each label from "k-nearest" (how many and Precision - proportion of instances that
whose labels were among the nearest ones). are truly of a class divided by the total
5. Assign the most common label as a test instances classified as that classes;
vector label. Recall - proportion of instances
6. If there was an impasse (two classes had the classified as a given class divided by
same number of votes) solve the problem the actual total in that class;
randomly. F-Measure - indicator of quality of the
3 EXPERIMENT ROC Area - the accuracy of the test
depends on how well the test separates
Medical data can be hollowed out to derive the group being tested into those with
rules that combine diagnoses with disease and without the disease in question;
symptoms. These rules can be used to Kappa Statistic - a measure of
automatically classify (detect a disease) of conformity between the proposed
new, previously undiagnosed patients based allocation instance of the class and the
only on their symptoms. They can also be used actual, which is about the overall
to find the hidden relationship between the accuracy of the model;
medical condition and predicates affecting the
Number of correctly classified
medical condition described. Medical
databases contain patient information, among
others. Registered during a doctor's visit or Table 1. Detailed Accuracy by Class
hospital stay, diagnostic test results.
The dataset contains clinical information about Class 0 1
152 patients affected by ulcerative colitis. TP Rate 0.895 0.778
Patients are characterized by 117 attributes, FP Rate 0.212 0.105
Precision 0.846 0.852
and classified into two groups: patients with
Recall 0.895 0.788
ulcerative colitis (class 0) and patients with F-Measure 0.87 0.819
other diseases of the digestive system, which is
not a coexisting disease for the disease under Table 1 shows a set of quality measures J48
examination (class 1). Our goal was to build model. For the classification attribute
classification model which help to reclassify characterized by ulcerative colitis, the true
patients to group of not healthy persons. positive value has high value (90%), which is
First, we use selection methods to obtain the a satisfactory result. The false positive rate is
set of attributes that are most strongly lower (21%), which may indicate a sufficient
associated with the classifier (dependent). quality of the generated model.
Selected attributes are following: age, The F-Measure measure, which estimates the
smoke_now, blood_feces, nu_blood_feces, overall quality of the model, has a high value
eosinophils, AlAT, sodium, potassium,

ISBN: 978-1-941968-43-72017 SDIWC 89

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

(87%) and indicates a properly constructed | | | sodium <= 136: 1 (2.0)

medical process. | | | sodium > 136
| | | | AlAT <= 7: 1 (2.0)
Table 2. Statistics for J48 model | | | | AlAT > 7: 0 (28.0/4.0)
| | age > 35: 0 (26.0)
Factor Value
Correctly classified 84.87% | stricture_Crohn_disease = 1: 1 (2.0)
instances | stricture_Crohn_disease = 2: 0 (0.0)
Incorrectly classified 15.13% | stricture_Crohn_disease = 3: 0 (0.0)
Kappa Statistic 0.69 The ROC curve is one of the ways to visualize
the quality of the classification, showing the
The percentage of correctly classified correlation between the TPR (True Positive
attributes by the decision tree is equal to Rate) and FPR (False Positive Rate). The
84.87%. It is a correct result and indicates the graph is more convex, the classifier is better
good quality of the decision tree generated. [18].
The Kappa stats index is relatively low, which
means that there are close to 15% of the Table 3. Confusion matrix
observations with which the randomizer failed
to cope. Predicted Class_positive Class_negative
Generated decision tree J48: Class_positive TP FN
Class_negative FP TN
blood_feces = N
| AlAT <= 13: 1 (22.0)
| AlAT > 13 () =
| | stricture_Crohn_disease = 0 (6)
| | | sodium <= 133: 1 (4.0)
| | | sodium > 133
| | | | AlAT <= 24 (1 ) =
| | | | | age <= 20: 0 (6.0) (7)
| | | | | age > 20
| | | | | | AlAT <= 19 How does the ROC curve appear:
| | | | | | | sodium <= 138 We calculate the value of the decision
| | | | | | | | eosinophils <= 1.3: 1 function.
(10.0) We test the classifier for different
| | | | | | | | eosinophils > 1.3 alpha thresholds. Let us recall, alpha is
| | | | | | | | | sodium <= 137: 0 (2.0) the threshold of the estimated
| | | | | | | | | sodium > 137 probability above which the
| | | | | | | | | | age <= 30: 0 (4.0) observation is classified into one
| | | | | | | | | | age > 30: 1 (6.0) category (Class_positive) and below
| | | | | | | sodium > 138: 0 (4.0) which - to the second category
| | | | | | AlAT > 19: 1 (10.0) (Class_negative).
| | | | AlAT > 24 From each classification carried out at
| | | | | age <= 19: 1 (2.0) the established alpha threshold, we
| | | | | age > 19: 0 (20.0) obtain a pair (TPR, FPR), which is a
| | stricture_Crohn_disease = 1: 1 (2.0) single point of the ROC curve.
| | stricture_Crohn_disease = 2: 0 (0.0)
Each classification, carried out at the
| | stricture_Crohn_disease = 3: 0 (0.0)
established alpha threshold,
blood_feces = Y
corresponds to a certain error matrix.
| stricture_Crohn_disease = 0
The quality of classification using the ROC
| | age <= 35
curve can be estimated by calculating the area

ISBN: 978-1-941968-43-72017 SDIWC 90

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

under the curve (AUC). The larger value of the [6] J. Han and M. Kamber, Data Mining: Concepts
and Techniques, Morgan Kaufmann Publishers,
AUC indicates better model: AUC = 1 (ideal Second Edition, 2006, pp. 21-27.
classifier), AUC = 0.5 (random classifier), [7] Z. Ras and A. Dardzinska, "On Rule Discovery
from Incomplete Information Systems,"
AUC <0.5 (invalid classifier, worse than [Proceedings of ICDM'03 Workshop on
random) [18], [19]. Foundations and New Directions of Data Mining,
(Eds: T.Y. Lin, X. Hu, S. Ohsuga, C. Liau),
Melbourne, Florida, IEEE Computer Society, p. 31-
Table 4. Confusion matrix for J48 algorithm 35, 2003].
[8] Z. Ras and A. Dardzinska, "Rule-based Chase
a b Classified as algorithm for partially incomplete information
systems," [Proceedings of the Second International
77 9 a=0 Workshop on Active Mining (AM'2003), Maebashi
14 52 b=1 City, Japan, October, p. 42-51, 2003].
[9] Z. Ras and A. Dardzinska, "On Rule Discovery
from Incomplete Information Systems,"
In our experiment we received ROC Curve [Proceedings of ICDM'03 Workshop on
which indicates very good classifier. The AUC Foundations and New Directions of Data Mining,
(Eds: T.Y. Lin, X. Hu, S. Ohsuga, C. Liau),
for all model is equal to 0.92. Melbourne, Florida, IEEE Computer Society, p. 31-
35, 2003].
[10] Z. Ras and A. Dardzinska, "Rule-based Chase
algorithm for partially incomplete information
5 CONCLUSIONS systems", [Proceedings of the Second International
Workshop on Active Mining (AM'2003), Maebashi
City, Japan, October, p. 42-51, 2003].
Classification methods are very useful in [11] J. Deogun, V. Raghavan and H. Sever, Rough set
modern medicine. They are very helpful in based classification methods and extended decision
tables, [International Workshop on Rough Sets
finding new symptoms and patients treatment and Soft Computing, p. 302-309, 1994].
methods. In this paper, we built classification [12] J. A. Swets, R. M. Dawer and J. Monahan, Better
decision throug science, Scientific American, ,pp.
model for dependent variable. It becomes 82-7, October 2000.
important to find the symptoms that affect [13] Y. Freund and L. Mason, The alternating decision
tree algorithm, [Proceedings of the 16th
whether the patient is ill or not. In this work, International Conference on Machine Learning,
we use J48 method to a classification task. The 1999, p. 124-133].
decision tree algorithm shows what attributes [14] W. Frawley, G. Piatetsky-Shapiro and C. Matheus,
Knowledge discovery in databases, An overview,
have the greatest impact on ulcerative colitis or Knowledge Discovery in Databases, 1991, pp.127.
are the most linked to it. [15] W. Wei, E. P. Xing, C. Myers, I. S. Mian and M. J.
Bissel, Evaluation of normalization methods for
Research was performed as a part of projects cDNA microarray data by k-NN classification,
MB/WM/8/2016 and financed with use of BMC Bioinformatics, 2004.
funds for science of MNiSW. [16] T. Liu, A. Moore and A. Gray, Efficient Exact k-
NN and Nonparametric Classification in High
Dimensions, Advances in Neural Information
Processing Systems 16 [Neural Information
Processing Systems, NIPS, p. 8-13, December,
[17] A. Kasperczuk and A. Dardzinska, Comparative
Evaluation of the Different Data Mining
[1] I. Yoo, P. Alafaireet and M. Marinov, Data mining Techniques Used for the Medical Database, Acta
in healthcare and biomedicine, A survey of the Mechanica et Automatica, Vol. 10 no. 3, pp. 233-
literature, Journal of medical systems, 36(4), 2012, 238, 2016.
pp. 2431-2448. [18] J. A. Hanley, Receiver operating characteristic
[2] L. Breiman, J.H. Friedman, R.A. Olshen and C.J. (ROC) methodology: the state of the art, Crit Rev
Stone, Classification and Regression Trees, Diagn Imaging, 29(3), pp. 307-35, 1989.
Wadsworth International Group, Belmont, 1984. [19] J. A. Hanley and B. J. Mc.Neil, The meaning and
[3] A. Dardzinska, Action Rules Mining, Springer, use of the area under receiving operating
2013, pp.90. characteristic (ROC) curve, Radiology, 43, pp. 29-
36, 1982.
[4] W. Frawley, G. Piatetsky-Shapiro and C. Matheus,
Knowledge discovery in databases. An overview,
Knowledge Discovery in Databases, 1991, pp. 1
[5] P.S. Levy and K. Stolte, Statistical methods in
public health and epidemiology: a look at the recent
past and projections for the next decade, Stat
Methods Med Res, 2000, pp. 9:4155.

ISBN: 978-1-941968-43-72017 SDIWC 91

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Study on route setting and movement based on 3D map generation by robot in

hydroponics managing system

Koji Mukai and Kazuo Ikeshiro and Hiroki Imamura

Department of Information Systems Science. Graduate School of Engineering, Soka University
Mailing Address: 1-236, Tangi-machi, Hachioji-Shi, Tokyo, Japan 192-8577
E-mail: e17m5226@soka-u.jp, ikeshiro@soka.ac.jp, imamura@soka.ac.jp

ABSTRACT temperature and humidity, nutrients, and moisture

necessary for plant growth. Hydroponic
We propose an efficient hydroponics mana- cultivation in a plant factory has the following
gement system where farmers (users) can merit.
cultivate their fields by operating a mobile robot Crops can be grown all year round without
in remote locations. This system can reflect receiving external factors such as insects and
usersrequests, thus this system can provide high weather.
efficient crop management and high quality Since cultivating crops indoors, introduction
crops.However, in order to put this system into of environmental sensors and robots is easy.
practice, we need to develop the robot that can From these advantages, a hydroponic culture
move in the field by users remote control. In the management system using a machine using
previous research, the robot created a 3D map environmental sensors such as a temperature
using RGB infor-mation.However, in this function sensor and a humidity sensor has been proposed.
misalignment occurs at the time of alignment Machine management creates breeding patterns
when the similar RGB information is contained in according to crops and manages crop plants using
the image taken by a RGB camera, making it environmental sensors, so it is not necessary to
difficult to create 3D map.Therefore, in this hire many workers and labor costs can be reduced.
research, we generate maps using ArUco marker However, since good quality crop production
to enable generation of maps even in the place requires delicate management of environmental
with similar RGB information. conditions, it is still difficult to cultivate good
quality crops by fully automated machine.
KEYWORDS Therefore, in this laboratory, by conducting
delicate crop management using the experience of
hydroponics, robot, ArUco marker, 3D map gene- the farmer (user) as shown in Fig 3, it is possible
ration, hydroponics managing system. to cultivate good quality crops, place robots on
multiple farms, We have proposed a hydroponic
1. INTRODUCTION cultivation managing system that can manage
multiple farms by one person[1]-[4]. With this
In recent years, hydroponic culture is attracting system, users can manage environmental
attention in the agricultural field. As one of the conditions and operate robots from remote
reasons why hydroponic cultivation attracts locations, thereby making it possible to realize
attention, a culture solution is used instead of soil tasks that require user's experience even on
as shown in Fig 1. In addition, as shown in Fig 2, remote farms. Two modes of this system are
it is possible to cultivate crops in a multistage shown below.
manner, so the yield per unit area is high. In the Automatic management mode: robots check
cultivation methods using soil, there is a growing state of farm crops automatically.
possibility that crops may cause disease or an Remote control mode: a farmer works from
insect damage due to soil bacteria. On the other the distant place using robots.
hand, hydroponic culture is not affected by pests In the automatic management mode, when there
and soil because it uses culture medium. is no instruction from the user, the farm
Hydroponic cultivation methods are used in plant temperature, humidity, light amount and the like
factories. A plant factory is a cultivation facility are automatically managed using a sensor
capable of stable crop management throughout the mounted on the robot and management is
year through cultivation by controlling the light,

ISBN: 978-1-941968-43-72017 SDIWC 92

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

performed according to the growing state of the The configuration of the hardware of the
crop. robot used in this research is shown in Fig 7.
In the remote control mode, Augmented Reality The size of the robot is 640 mm in length, 900
(AR) as shown in Fig 4 is used in consideration of mm in width and 900 mm in height. In
user operability. AR is a technique to add
addition, Kinect sensor (RGB + D) and PC
information created by digital synthesis or the like
to real information perceived by a person. In
are installed in the robot. As shown in Fig 8,
addition, as shown in Fig 5, a head mounted the Kinect sensor is equipped with an RGB
display (HMD) is used as a head mounted type camera and an Depth camera, and by
display. By using AR and HMD, the user can look combining these cameras it is possible to
into the farm AR displayed in the real space and acquire three-dimensional information.
monitor it by moving the robot, and it is possible
to intuitively remotely operate the robot by 2.2 Processing procedure
gripping the object become. In the proposed
system, it is thought that the robot can manage a As shown in Fig 9, Kinect should always be
large farm alone, because the robot automatically able to recognize two or more markers to be
performs many tasks. In addition, since AR can
installed on the farm. We will explain the
display information corresponding to markers, it
is possible to monitor a plurality of farms at the
whole process according to Fig 10.
same time, and improvement of production
efficiency is expected. 2.2.1 Obtaining image
In a previous research, Ide[1] proposed the
estimation method for the self-position of the We use Kinects RGB camera and Depth camera
robot based on the surrounding RGB information to create point cloud data. When two or more
and developed 3D map generation function using markers are recognized, RGB information and
the self-position using SLAM (Simultaneous depth information are acquired from Kinect.
Localization and Mapping) to realize the route of
the robot in the farm in the remote control mode. 2.2.2 Marker recognition
However, in this function misalignment occurs at
the time of alignment when the similar RGB The robot detects and recognizes markers based
information is contained in the image taken by a on the information acquired form Kinects RGB
RGB camera, making it difficult to create 3D map. camera. Since the map is generated considering
Therefore, in this research, we generate maps the angle and position of the marker, the robot
using ArUco marker[5] to enable generation of acquires the rotation matrix, the translation vector,
maps even in the place with similar RGB the marker ID, the two-dimensional coordinates of
information.. the four corners and the center of the marker.

2. PROPOSED METHOD 2.2.3 Creation point cloud data

The appearance of this research system is We create 3D point cloud data using RGB
shown in Fig 6. Place multiple markers on the information and depth information obtained
farm where you want to create the map. The from Kinect.
robot consecutively recognizes the marker,
compares it with the previously recognized 2.2.4 Map generation
scene at the recognized scene, and acquires
the information of the marker recognized, the As an example, suppose that there are scene A
RGB information, and the depth information created at point A and scene B created at point B.
when the same marker is recognized. Create Comparison of marker IDs
point cloud data from the acquired RGB
for each scene
information and depth information, and create
a map using marker information. When generating the map, it is necessary to
match the same data with the point cloud data of
2.1 Hardware configuration each scene. Therefore, the robot searches for the

ISBN: 978-1-941968-43-72017 SDIWC 93

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

same marker ID on consecutive scenes, scene A with the position of the point cloud of
furthermore if same markers are found, the robot scene B. Assuming that the coordinates of the
calculates rotation matrix and translation vector point cloud of scene A is P, moreover the
between scenes. coordinates of the point cloud of scene A
matching the position of scene B is , is Calculation of rotation matrix and expressed by the following equation.
translation vector between scenes (4)
Every time this map generation process is
First find the rotation matrix R. Fig 11 shows performed, the point cloud data is acquired to
how to determine the rotation matrix. We assume create a map. The created map is shown in Fig 12.
to be the normal vector of the marker. We
assume to be the normal vector of the camera 3 Evaluation experiment
of scene A furthermore to be the normal vector
of the camera of scene B. We denote the rotation In this experiment, we will experiment the
matrix for converting from normal vector accuracy of the map by comparing the map
to normal vector . Similarly, we denote the generation function created by Ide, which is a
rotation matrix for converting from normal previous study, and the map generation
vector to normal vector . Since the rotation function created in this research. Make the
matrix R is a rotation matrix from scene A to
map room in the same place and move the
scene B, it is expressed by the following
expression. robot manually. In the map generation
function of the previous study, as shown in
Next, we calculate the translation vector. Since
Fig 13, a poster carrying RGB information is
the translation vector is the same as the movement pasted on a place where RGB information is
amount of kinect, it can be obtained from the scarce and a map is created. In this study,
difference between the center coordinates of the ArUco marker was pasted as shown in Fig 14
markers for each scene. It is necessary to equalize and map creation was done. we created a map
the angle of the markers between the scenes. This of the same room and asked 10 subjects to
can be calculated using the rotation matrix R evaluate in 5 stages. Evaluation items are shown
obtained earlier. We assume to below. Figures 15 and 16 show the results of the
be the center coordinate of the marker of scene A map.
furthermore to be the center [1] Is the map compared with the map generation
coordinate of the marker of scene A whose function of the previous research more
rotation is matched with the marker of scene B. reproducible?
The center coordinate can be obtained from [2] Is the created map reproduce the corner of the
the following expression. room compared to the actual room?
[3] Are the walls reproduced in the map created
compared with the actual room?
The evaluation results are shown in Table 1.
We assume that the center coordinate of the Table 1: Result of evaluation experiment
marker of scene B is . Using the Evaluation average Standard
result of equation (2), the translation vector item deviation
T can be calculated by the following [1] 4.9 0.30
equation. [2] 4.6 0.49
[3] 4.4 0.49
(3) Because the value of [1] is high, we can see
that the map creation of the proposed method
has better room reproducibility than the map Map construction generation function of the previous research.
In addition, since the average value of items
Using the result of equation (1) and equation (3), in [2] and [3] is also high, the corners of the
the robot aligns the positions of point cloud of

ISBN: 978-1-941968-43-72017 SDIWC 94

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

room and the walls are highly reproducible,

so the usefulness as a map is high.


[1] Sinitiro Ide, Hiroki Imamura,A moving search

method of a mobile robot for a hydroponic
cultivation managing system in remote areas,
Automatic control union conference lecture paper
collection, The 57th Automatic Control
Association Lecture, pp. 516-521, 2014.
[2] Shun Kodema, Hiroki Imamura,The construction
of a new system related to hydroponic culture
based on AR, Technical report of Institute of
Image Information and Television Engineers, vol.
35, No. 47, pp.21-24, 2011.
[3] Akifumi Siozi, Hiroki Imamura,Construcion of Figure 3: hydroponics managing system
managing system for hydroponic cultivation based
on AR, Technical report of Institute of
Electronics, Information and Communication
Engineers, vol.114, No.239, pp.83-87, 2014.
[4] Fumiya Iwasaki Hiroki Imamura A Robust
Recognition Method for Occlusion of Mini
Tomatoes based on Hue Information and Shape of
EdgeSICE Articlepp516-5212014.
[5] OpenCV: Detection of ArUco Markers
http://docs opencv org/3 1

Figure 4: Augmented Reality

Figure 1: Cultivation using culture


Figure 5: head mounted display

Figure 2: Crop cultivation using

multistage system

ISBN: 978-1-941968-43-72017 SDIWC 95

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Figure 6: Outline drawing of

system Figure 9: Arrangement of markers

Figure 10: the flow chart of this system

Figure 7: Hardware configuration of

Figure 8: Construction of Kinect Figure 11: Description of rotation matrix

ISBN: 978-1-941968-43-72017 SDIWC 96

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Figure 15: Map of previous research

Figure 12: Map generation result of proposed


Figure 13: Experimental environment of

previous research method
Figure 16: Map of the proposed method

Figure 14: Experimental environment of

the proposed method

ISBN: 978-1-941968-43-72017 SDIWC 97

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Construction of Audio Corpus of Non-Native English Dialects -Arabs speakers-

Sara Chellali1, Soumaya Al-Maadeed2, Muhammad Asim2, Maamar Ahfir3, Walid Khald Hidouci1
Laboratory LCSI, National School of Computer Science, ESI, Alger, Alegria
Dep of Computer Science and Engineering, College of Engineering, Qatar University, Doha, Qatar
Dep of Computer Science, University Ammar Telidji, Laghouat, Algeria
sa_chellali@esi.dz, s_alali@qu.edu.qa, muhammad.asim@qu.edu.qa, m.ahfir@mail.lagh-univ.dz,

Speech recognition for the academic and
standard languages has achieved a great The Arab world is the Middle East and
development, but in the face of the multiple North Africa (MENA), divided into 22
dialects with their differences, there is still a countries, 10 of which are African. The Arab
noticeable lack of recognition rates, especially if region has an area of about 14 million km2,
these dialects are for non-native speakers of the equivalent to 10.2% of the world's area and
target language. The challenges facing the contains about 6% of the world population.
recognition or the identification of non-native where Arabic is the official language of these
dialects are numerous, among them the lack of
sound databases, whether approved or not.

This article presents part of our work on The status of the English language in the
creating a sound database of non-native English Arab world differs between a Second
dialects for Arabs speakers. This database will be Language (ESL1) and Foreign Language
used later by our Automatic System to Help (EFL2).
Learning English as Foreign Language
(ASHLEFL). It contains initially three non-native In its report of 2016, the organization EF
English dialects respectively Qataris, Egyptians, "Education First" of the language training,
and Pakistanis speakers. educational travel, academic certification and
cultural exchange program [1], revealed that
KEYWORDS MENA countries have the lowest level of
English proficiency with the average EF EPI3
Nonnative English Dialect; Sound Database; Arab
equal 44.92 in the world (72 countries were
involved in the tests, including 10 Arab
countries). The Arab countries were ranked in
1. INTRODUCTION the "very low" efficiency group with the
exception of the United Arab Emirates and
This article is part of a project to develop Morocco, which were classified as "low", as
an automatic system to help to learn English shown below Figure 1.
as a foreign language. The role of this system This weakness is attributed to several
is to detect, then to correct the common errors reasons: historical, religious, cultural,
of the pronunciation of the determined economic, educational, and sometimes
populations, where we begin with the political, without forgetting the great
identification of their dialects. The objective difference between the Arabic and English
of this paper is the presentation of the
construction of an audio data base of non- English Second Language
native English dialects of these populations. English Foreign Language
English Proficiency Index

ISBN: 978-1-941968-43-72017 SDIWC 98

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

language systems, which greatly affects the academic languages as well as their native
pace of learning as English is ESL or EFL [2] dialects remains persistent. As for non-native
and [3]. To improve the efficiency of the dialects, this is another challenge.
Arabs in mastering the English language
requires the development of the educational The English has taken the lion's share of
system and attention to language learning interest and research as the most widely
from the first levels of education. spoken language (more than 500 million
speakers). [4] presents some of the native
English dialects databases that were created
are as follows ANDOSL4, SEC5,
6 7
WSJCAMO , TIMIT . TIMIT is the famous
and the widely used speech database. It
contains 630 native speakers of American
English (70% male and 30% female). Each
speaker reads 10 sentences and takes
approximately 30 seconds. [5] provides an
overview of databases existing of non-native
speakers and notes the total absence of Arab
speakers of English (Table 1). We note that
free or paid databases are still very limited,
especially for non-native English dialects in
the Arab world, which sometimes obliges
laboratories to create their own database
internally. So, for to approach this project we
need to create our own audio database of
nonnative English dialects by Arab speakers.

Table 1 Overview of non-native English Databases [5]

Corpus & Available Speak. Native Language Utter.

date at

ATR- ATR 96 Chinese, German, 15000

Gruhn French, Japanese,
2004 Indonesian

Berkeley ICSI 55 German, Italian, 2500

Restaurant Hebrew, Chinese,
1994 French, Spanish,
Figure.1 Regional classification according to the EF
EPI [1] Cambridge- U. 10 Japanese, Italian, 1200
Witt 1999 Cambridge Korean, Spanish,
3. PROBLEMATIC Cambridge- U. 20 Chinese 1600
Ye 2005 Cambridge
The need of audio databases in the domain
of Natural Language Processing (language
identification, speech synthesis, speakers
identification, speech recognition...) is
unavoidable. So, the first step the researchers 4
Australian National Database of Spoken Language
make is to either provide the database or 5
British English, Spoken English Corpus
British English speech corpus
create it. Despite three decades of work in this 7
American English, Texas Instrument and Massachusetts Institute of
area, the availability of databases for Technology corpus

ISBN: 978-1-941968-43-72017 SDIWC 99

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

Children CMU 62 Japanese, Chinese 7500 square kilometers. It contains a mixture of

News 2000 races with the following ethnic distribution:
CLSU LDC 22 countries 5000
40% Arabs, 18% Indians, 18% Pakistanis,
2007 10% Iranians, 14% Others.

CMU CMU 64 German 452

A. Workplace
Cross U. 161 English, French, 72000
Towns Bochum German, Italian, Audio samples were collected at Qatar
2006 Spanish University in Doha, Qatar. Where audio
Duke- Duke 93 15 countries 2200
recordings were made in different places of
Arslan University the campus.
In the collection of samples, we took into
ERJ 2002 U. Tokyo 200 Japanese 68000 consideration all the places in which the
Fraenki U. 19 German 2148 learner could be present and the presence of
Erlangen the external influences. In order to obtain a
pure and clean recording, we registered in the
Hispanic 22 Spanish Virtual Reality Laboratory of the Department
of Computer Science, College of Engineering.
IBM- IBM 40 Spanish, French, 2000 The recording was also made at the General
Fischer German, Italian Library of the University to introduce the
2002 reverberation and echo elements in the
phonograms as well as recording abroad to
ISLE 2000 EU/ELDA 46 German, Italian 4000
obtain a natural environment containing
MIST 1996 ELRA 75 Dutch 2200 various types of chaos (sounds of nature,
voices of other people ...).
NATO NATO 81 French, Greek, 8100
HIWIRE Italian, Spanish The audio record included various levels
2007 available between teachers, staff, students,
and researchers (09 female and 25 male).
NATO M- NATO 622 French, German, 9833
ATC 2007 Italian, Spanish
B. Dialects concerned on sampling collect
PF-STAR U. 57 German 4627
2005 Erlangen The Qatari and Egyptian speakers (whom
their native language is Arabic) were chosen
Sunstar EU 100 German, Spanish, 40000
1992 Italian, Portuguese, along with the Pakistani speakers (for whom
Danish Arabic is a religious language for the
TC-STAR ELDA unknown EU countries Pakistani Muslims, - who present about 95%
2006 of the Pakistani -)
We begin with these three non-native
Verbmobil U. Munich 44 German
English dialects namely the Qatari, the
Egyptian and the Pakistani (as a neutral
dialect for comparison) dialects. The database
would then be expanded to include the rest of
4. CONSTRUCTION OF THE CORPUS the non-native English dialects of Arab
Qatar is a peninsula surrounded by the
Arabian Gulf, located to the east of the 5. CREATE THE AUDIO DATABASE
Arabian Peninsula, bordered by Saudi Arabia
and has maritime borders with both the UAE
and Bahrain. It covers an area of 11,437 A. Processing of Voice Data collected

ISBN: 978-1-941968-43-72017 SDIWC 100

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

The voice recording has been performed TABLE 2 DIVIDE FOR SPEAKERS
over several sessions in in real acoustic FEMALE MALE TOTAL
EGYPT 04 03 07
environment using Sony Dictaphone.
PAKISTANI 02 13 15
QATARI 03 09 12
we asked each speaker to read five times in TOTAL 09 25 34
one session the first ten numbers, twelve
isolated words divided into five groups, five
short sentences, and a paragraph of 69 words. B. Codification
it wasn't a spontaneous read but prepared read
After cutting the recording into audio files.
by each speaker.
We moved to the coding process where we
gave each audio file a nine-digit code with
The length of the recordings ranged from 2
information about the country, city, speaker,
to 6 minutes (some speakers did not respect
and number of utterances
the number of repetition of reading). After we
received all the recordings, we processed
them. Where we started the process of cutting
into recordings of length ranging from one
second to 29 seconds using the free demo
program Power MP3 Cutter Professional
Version 6.2. (Figure 2). Each record contains
one utterance: the ten numbers, one group of
isolated words, one sentence, or the paragraph
Figure.3 Code of recording sound

Figure.4 Code of country-dialect

Code of country: we need a code of 2

numbers for coded the 22 countries in the
world Arab,
e.g.: 01: Egypt, 02: Pakistani, 03: Qatar.
Code of dialect: we find several dialects in
the same Arab country. So, we gave a
different code to each dialect.
Figure.2 Segmentation of audio recording with e.g.: 00 (not identified)
Power MP3 Cutter Pro 01 (dialect of Cairo)
After the two recording was canceled for 02 (dialect of Alexandria)
two speakers (one Qatari's speaker and one Example:
Egyptian's speaker) because they could not
read in English and deleted some utterances 0 1 0 2 0 2 5 1 0
for a bad recording, we got a corpus
comprised of 1535 audio utterances from 34 The utterance n10 of a speaker n25 who
people. speak the dialect of Alexandria (Egypt).

ISBN: 978-1-941968-43-72017 SDIWC 101

Proceedings of The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017), Lodz, Poland, 2017

C. Text and the transcription phonetic z n tu ri rd bgz nd wi wl go mit hr

wnzdi t tren sten/.
In the choice of text; we tried to take into
account the above characteristics, the text 6. CONCLUSION
i. Digit: One, Two, Three, Four, Five, Six, The database created is a gain in a sense
Seven, Eight, Nine, Ten that there exists no database for nonnative
/wn/, /tu/, /ri/, /fr/, /fajv/, English dialects for Arabic speakers (to be
/sk s/, /svn/, /et/, /najn/, /tn/ completed). However, it is only the first step
ii. Isolated words: in our work. The next step is to conduct
1. Public, Jupiter, Parking, Pepsi, experiments on the human experts for to
Pakistan validate the hypothesis of their ability of
/pblk/, /duptr/, /park/, /ppsi/, identification of non-native dialects.
2. Victoria, Vacation, Vegetable, Verb,
Video This paper was made possible by
/vktri/, /veken/, /vdtbl/, a QUCP award QUCP-CENG-CSE-15-16-1]
/vrb/, /vd io/ from the Qatar University. The statements
3. Kick, Key, Keyboard, King, Kite made herein are solely the responsibility of the
/kk /, /ki/ , /kib rd/, /k /, /kajt/ authors.
4. Jupiter, Jug, Jumping, John, Jogging
/duptr/, /dg/, /dmp/, /dan/, REFERENCES
iii. Prepared sentences: [1] http://www.ef.dz/epi/
1. I like to eat vegetables
[2] S.S. Sabbah Negative Transfer: Arabic
/aj lajk tu it vdtblz/ Language Interference to Learning English
2. Victoria is jumping English Language Center, Community
/vktri z dmp/ College of Qatar, Arab World English Journal
3. John is drinking Pepsi (AWEJ) Special Issue on Translation No.4
/dan z drk ppsi/ May, 2015 Pp. 269-288.
4. The car is clean [3] M. Elkhair, H. Idriss, Pronunciation
/ kar z klin / Problems: A Case Study of English Language
iv.Paragraph: Students at Sudan University of Science and
For the selected paragraph, the same Technology, English Language and
paragraph used in the website "the speech Literature Studies; Vol. 4, No. 4; 2014, ISSN
1925-4768 E-ISSN 1925-4776, Published by
accent archive " has been adopted [5] : Canadian Center of Science and Education.
Please call Stella. Ask her to bring these
things with her from the store: Six spoons of [4] M. Alghamdi, F. Alhargan, M. Alkanhal, A.
fresh snow peas, five thick slabs of blue Alkhairy, M. Eldesouki, A. Alenazi, Saudi
cheese, and maybe a snack for her brother Accented Arabic Voice Bank, Computer and
Electronic Research Institute, King Abdulaziz
Bob. We also need a small plastic snake and City for Science and Technology (KACST),
a big toy frog for the kids. She can scoop Riyadh, Saudi Arabia, 25/06/2008
these things into three red bags, and we will
go meet her Wednesday at the train station. [5] M. Raab, R. Gruhn, E. Noeth, NON-NATIVE
/pliz kl stl sk hr tu br iz z w SPEECH DATABASES, Automatic Speech
Recognition and Understanding Workshop
hr frm str sk s spunz v fr sno piz fajv (ASRU), IEEE, 2007
k slbz v blu tiz nd mebi e snk fr hr
brr bab wi lso nid e sml plstk snek [6] http://accent.gmu.edu/browse_language.php?f
nd e bg tj frag fr kd z i kn skup iz unction=detail&speakerid=145

ISBN: 978-1-941968-43-72017 SDIWC 102