Académique Documents
Professionnel Documents
Culture Documents
Version 3.3
VERSION: 3.3 TRIPLES DB CAS CONSUMER
DATE: 8 DECEMBER 2021 USER'S GUIDE
Revision History
Date Version Description Author
30 July 2009 0.1 Initial version Vesela Grigorova
03 August 2009 0.2 Reviewed Maya Malakova
04 August 2009 0.3 Updated (DB2 added) Vesela Grigorova
04 August 2009 0.4 Reviewed Elena Stankova
04 August 2009 1.0 Reviewed Maya Malakova
07 August 2009 1.1 Updated Maya Malakova
12 August 2009 1.2 Updated Vesela Grigorova
12 August 2009 2.0 Reviewed Maya Malakova
01 September 2009 2.1 Updated Vesela Grigorova
02 September 2009 2.2 Updated Elena Stankova
02 September 2009 2.3 Reviewed Maya Malakova
28 October 2009 2.4 Updated – feature relations added Maya Malakova
2 November 2009 2.5 Updated – Oracle support Maya Malakova
9 November 2009 2.6 Updated – feature types support Maya Malakova
17 December 2009 2.7 Updated – logging configuration Maya Malakova
24 September 2010 2.8 Updated – MSSQL support Blagoy Borisov
04 October 2010 2.9 Updated – Oracle support Blagoy Borisov
07 October 2010 3.0 Updated Blagoy Borisov
13 October 2010 3.1 Updated Blagoy Borisov
27 October 2010 3.2 Updated – relationType added Blagoy Borisov
18 November 2010 3.3 Updated Blagoy Borisov
Table of Contents
1. INTRODUCTION......................................................................................................................... 4
2. PROJECT FEATURES AND FUNCTIONALITY.........................................................................4
3. ABBREVIATIONS AND ACRONYMS........................................................................................ 4
4. SOFTWARE PREREQUISITES.................................................................................................. 4
5. USING TRIPLES DB CAS CONSUMER.....................................................................................4
5.1 DATABASE CONFIGURATION.................................................................................................... 4
5.1.1 DB2 Configuration............................................................................................................ 4
5.1.1.1 Creating Database and Tables....................................................................................4
5.1.1.2 Deploying the Stored Procedure.................................................................................5
5.1.1.3 Configuration File........................................................................................................ 5
5.1.1.4 JDBC Driver................................................................................................................ 6
5.1.2 MSSQL Configuration...................................................................................................... 6
5.1.2.1 Creating Database and Tables....................................................................................6
5.1.2.2 Deploying the Stored Procedure.................................................................................6
5.1.2.3 Configuration File........................................................................................................ 6
5.1.2.4 JDBC Driver................................................................................................................ 7
5.1.3 Oracle Configuration........................................................................................................ 7
5.1.3.1 Creating Schema and Tables......................................................................................7
5.1.3.2 Deploying the Stored Procedure.................................................................................8
5.1.3.3 Configuration File........................................................................................................ 8
5.1.3.4 JDBC Driver................................................................................................................ 8
5.2 UIMA ANNOTATION TYPES CONFIGURATION............................................................................9
5.3 BAT FILE OVERVIEW............................................................................................................. 11
5.4 LOGGING CONFIGURATION.................................................................................................... 11
5.5 CPE CONFIGURATION.......................................................................................................... 11
5.6 CPE USAGE........................................................................................................................ 14
5.7 VIEWING THE PROCESSING RESULTS....................................................................................14
6. APPENDIX: USING THE TRIPLES DB CAS CONSUMER WITH ORACLE 10G ENTERPRISE
EDITION.............................................................................................................................................. 15
6.1 CREATE SCHEMA AND GRANT SYSTEM PRIVILEGES...............................................................15
6.2 CREATE TRIPLE TABLES....................................................................................................... 15
6.3 DEPLOY STORED PROCEDURES............................................................................................ 16
6.4 Configuration File............................................................................................................... 16
1. Introduction
The purpose of this document is to describe the steps that need to be performed in order to use the
Triples DB CAS Consumer.
4. Software Prerequisites
Microsoft Windows XP SP2
Java 1.5 or higher
DB2
In order to create database and tables for storing the extracted triples:
Open a DB2 Command Window.
Create a new database named TRIPLES by executing the following command:
db2 create db <databaseName>
Connect to the database by executing the following command:
db2 connect to < databaseName > user <username> using <password>
Navigate to the directory <install_package >/bin/db/db2 containing starSchemaDB2.sql file.
Execute the following command:
db2 -tf starSchemaDB2.sql
Check for errors, you may receive several warnings saying that the statistics are inconsistent,
these can be ignored.
Note: If you are using a database named different than TRIPLES, you need to change the name of
the database in the configuration file.
Note: You need to change the configuration file to fit your database admin username and password.
dbDriver=com.ibm.db2.jcc.DB2Driver
dbUrl=jdbc:db2://localhost:50000/triples
dbUsername=db2admin
dbPassword=db2admin
Note: If you are using a database named different than TRIPLES, you need to change the name of
the database in the configuration file.
Note: You need to change the configuration file to fit your database admin username and password.
dbDriver=com.microsoft.sqlserver.jdbc.SQLServerDriver
dbUrl=jdbc:sqlserver://127.0.0.1:1433;databaseName=triples;selectMethod=cursor
dbUsername=admin
dbPassword=admin
In order to create schema and tables for storing the extracted triples:
Open a console.
Start SQL*Plus executing the following command:
sqlplus system/<syspassword>@<oracle_sid>
Where <syspassword> is the password for the system account and <oracle_sid> is Oracle
System ID (usually: orcl)
Create permanent tablespace by executing the following command:
create tablespace <ptname> datafile '<ptpath>' size <ptsize> autoextend on;
Where <ptname> is the name of the permanent tablespace, <ptpath> is the path of the
tablespace file (for example: ‘C:/triples.dbf’), <ptsize> is the size of the tablespace (for example:
5m for 5 megabytes, 1g for 1 gigabyte, etc.)
Create temporary tablespace by executing the following command:
create temporary tablespace <ttname> tempfile '<ttfile>' size <ttsize> autoextend on;
Where <ttname> is the name of the temporary tablespace, <ttpath> is the path of the
tablespace file (for example: ‘C:/triples_temp.dbf’), <ttsize> is the size of the tablespace (for
example: 5m for 5 megabytes, 1g for 1 gigabyte, etc.)
Create the schema(the user) by executing the following commands:
create user <username> identified by <password> default tablespace <ptname>
temporary tablespace <ttname>;
Grant the user privileges to connect and create tables:
grant connect, resource to <username>;
Exit SQL*Plus executing exit command:
exit
From console navigate to <install_package >/bin/db/oracle
Start SQL*Plus using the newly created username executing the following command:
sqlplus <username>/<password>@<oracle_sid>
Where <username> is the new username, <password> is the password and <oracle_sid> is
Oracle System ID (usually: orcl)
When connected to Oracle execute the following command to create tables:
@starSchemaOracle.sql;
Check for errors.
Note: You need to change the configuration file to fit your database username, password and Oracle
System ID.
dbDriver=oracle.jdbc.driver.OracleDriver
dbUrl=jdbc:oracle:thin:@localhost:1521:orcl
dbUsername=triples
dbPassword=triples
triplesMapping
annotation
1 feature
-name
-extractSentence 1 -name
-extractParagraph -type
* *
1
relation
-from
* -to
-fromType
-toType
It contains a set of annotation elements. Each of them has the following attributes:
name – defines the name of a type of annotation that will be extracted as head of a triple;
extractSentence – a flag that defines if IN_SAME_SENTENCE relationships with annotations
of the given type will be extracted. The default value is “false”;
Note: IN_SAME_SENTENCE is a type of relationship between a Sentence annotation and an
annotation that is contained inside its text.
extractParagraph – a flag that defines if IN_SAME_PARAGRAPH relationships with
annotations of the given type will be extracted. The default value is ”false”;
Note: IN_SAME_PARAGRAPH is a type of relationship between a Paragraph annotation and
an annotation that is contained inside the text of the paragraph.
The annotation elements can contain multiple feature sub elements. Each of them corresponds to the
name of a feature which will be used to extract AGGREGATE_FEATURE relationships – a
relationship between an annotation and the value of the feature with the given name. Each feature
can have the following attributes:
name - The name of the feature can be a simple name like “crimeDetails” or a sequence of
feature names separated by “:” e.g. “crimeDetails:sentence” called feature paths. Feature
paths allow defining relationships not only with the direct child features of an annotation but
also with features of their child features and so on.
type – this is an optional attribute that defines a name of a type that will be associated with
the feature. If it is not specified then the consumer will try to retrieve the uima type of the
feature and if has no type (this means it is a literal) then the type associated with it will be
Text
The annotation elements can also contain multiple relation elements. They define relationships
between two features of an annotation type. It has attributes:
It defines that for the com.ibm.langware.CelebrityCrime annotation type the pipeline will extract
IN_SAME_SENTENCE and IN_SAME_PARAGRAPH relationships, AGGREGATE_FEATURE
relationships for the crimeDetails and name features, FEATURE_RELATION relationships between
the values of the crimeDetails and name features and CRIMEDETAILS_REALNAME_RELATION
relationships between the values of the crimeDetails and realName features. The type associated
with the name features will be com.ibm.langware.PersonName and the type associated with the
crimeDetails feature in the declared feature relations will be com.ibm.langware.Details.
It defines that for the com.ibm.langware.CelebrityCrime annotation type the pipeline will extract
IN_SAME_SENTENCE and IN_SAME_PARAGRAPH relationships, AGGREGATE_FEATURE
relationships for the crimeDetails:sentence and name features and also FEATURE_RELATION
relationships between the values of the crimeDetails:sentence and name features.
Note: If another annotator is used, the mapping file has to be edited.
java.util.logging.FileHandler.level = WARNING
Note: Choose File Clear All to remove any configurations remained from previous runs of the CPE.
Set the values of the CPE components using the following steps:
Collection Reader
o Descriptor – browse to the location of the descriptor file of the reader component:
<install_package>/bin/reader/desc/FileSystemCollectionReader.xml
o Input Directory – browse to the location of the directory containing the input files. Sample
data files are provided with the package in the <install_package>/bin/reader/data folder.
o Language – keep the field blank.
o Encoding – keep the field blank.
Analysis Engines
Choose the Add… button. In the dialog that opens, browse to the location of the XML file of
the annotator component:
<annotator_dir>/com.ibm.dltj.ruleannotator/com.ibm.dltj.ruleannotator_pear.xml
Note: <annotator_dir> is the folder where the Celebrity Crimes annotator is installed.
The annotator appears as a tab in the Analysis Engines section.
CAS Consumers
Choose the Add… button. In the dialog that opens, browse to the location of the descriptor
file of the consumer component:
<install_package>/bin/consumer/desc/TriplesDbConsumer.xml
The consumer will appear as a tab in the CAS Consumers section.
When the Stop button is chosen, on the Performance Report window, which appears, information for
the documents processing procedure is displayed.
The triple_store table shows the object annotations created for the input files’ content:
Note: If the annotation has IN_SAME_SENTENCE association type, the BODY column of the
triple_store table contains a number which points to a row in the doc_sentences table. This table
stores the sentence where the annotation text is found in.