Vous êtes sur la page 1sur 56

IBM InfoSphere Streams Version 2.0.0.

Database Toolkit

IBM InfoSphere Streams Version 2.0.0.4

Database Toolkit

Note Before using this information and the product it supports, read the general information under Notices on page 43.

Edition Notice This document contains proprietary information of IBM. It is provided under a license agreement and is protected by copyright law. The information contained in this publication does not include any product warranties, and any statements provided in this manual should not be interpreted as such. You can order IBM publications online or through your local IBM representative. v To order publications online, go to the IBM Publications Center at www.ibm.com/e-business/linkweb/ publications/servlet/pbi.wss v To find your local IBM representative, go to the IBM Directory of Worldwide Contacts at www.ibm.com/ planetwide When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. Copyright IBM Corporation 2009, 2012. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Summary of changes
This topic describes updates to this documentation for IBM InfoSphere Streams Version 2.0 (all releases). Note: The following revision characters are used in the InfoSphere Streams documentation to indicate updates for Version 2.0.0.4: v In PDF files, updates are indicated by a vertical bar (|) to the left of each new or changed line of text. v In HTML files, updates are surrounded by double angle brackets (>> and <<).

Updates for Version 2.0.0.4 (Version 2.0, Fix Pack 4)


v The following restrictions apply to Red Hat Enterprise Linux Version 6 (RHEL 6) only: On x86 systems running RHEL 6, Oracle databases are not supported. On IBM POWER7 systems running RHEL 6, IBM solidDB, Netezza, and Oracle databases are not supported. These restrictions are added to Chapter 2, How to use the Database Toolkit, on page 3 and Chapter 3, Known issues and restrictions, on page 33. v Streams applications can use the ODBCRun operator to run generic user-defined SQL statements to manage data, work with tables, and call stored procedures. For more information about this operator, see ODBCRun on page 10. v The ODBCAppend, ODBCRun, and the DB2PartitionedAppend operators have the optional commitOnPunctuation parameter that allows you to specify whether transactions are committed when the operator receives a punctuation. For more information about this parameter, see ODBCAppend on page 7, ODBCRun on page 10, and DB2PartitionedAppend on page 18. v The Connections Specifications Document now supports a statement element as part of an access_specification element. The statement element specifies information used by the ODBCRun operator to run an SQL statement. For more information, see Statement element on page 26.

Updates for Version 2.0.0.3 (Version 2.0, Fix Pack 3)


v The Database Toolkit includes operators that write data to a partitioned DB2 database. Streams applications can use these operators to write data to DB2 partitioned tables using parallel write operations for each of the partitions. For more information about these operators, see DB2SplitDB on page 17 and DB2PartitionedAppend on page 18. v The Database Toolkit includes the source files to build a db2helper program for the DB2 libraries installed on your system. You can use this program to determine the number of partitions in the database. For more information about building the db2helper program, see Building db2helper on page 37. For more information about the db2helper options, see Using db2helper on page 37. v An optional sleepTime parameter is added to the ODBCSource operator, which specifies the minimal time the operator has to wait before it can execute a query again. For more information, see ODBCSource on page 13.
Copyright IBM Corp. 2009, 2012

iii

v An optional key attribute is added to the attribute element to identify the key field in a table. For more information, see Attribute element on page 31. v The droppedTuples metric is added to the ODBCAppend, ODBCEnrich, DB2SplitDB, and DB2PartitionedAppend operators to track the number of input tuples that are associated with ODBC or DB2 failures. For more information about how each of these operators uses this metric, see the metrics section of the related operator. v Due to internal performance enhancements made to the ODBCAppend operator, the operator might use more memory at runtime than it did in previous releases.

Updates for Version 2.0.0.2 (Version 2.0, Fix Pack 2)


The Database Toolkit includes the source files to build an odbchelper program for the UnixODBC package installed on your system. You can now use the odbchelper program options to run SQL commands in the sample applications and to test the connection to an external data source. v For more information about the odbchelper options, see Using odbchelper on page 35. v For more information about building the odbchelper program, see Building odbchelper on page 35.

Updates for Version 2.0.0.1 (Version 2.0, Fix Pack 1)


This guide was not updated for Version 2.0.0.1.

Updates for Version 2.0


The Database Toolkit, which is made up of a subset of the operators that was formerly named the Adapters Toolkit, was originally written for the SPADE language in earlier versions of the Streams product. This version of the toolkit was written for the IBM Streams Processing Language (SPL). The function of the earlier SPADE version and the new SPL version of the toolkit is equivalent. All operator and parameter names are the same. The output from the operators in SPL are the same as the output of the operators in SPADE if the same input data, parameters, and external data configurations are used.

iv

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Contents
Summary of changes . . . . . . . . . iii Chapter 1. Overview . . . . . . . . . 1 Chapter 2. How to use the Database Toolkit . . . . . . . . . . . . . . . 3
Operator common parameters . . . . . . . . 5 Operator error output port . . . . . . . . . 6 Operators . . . . . . . . . . . . . . . 7 ODBCAppend . . . . . . . . . . . . . 7 ODBCEnrich . . . . . . . . . . . . . 8 ODBCRun . . . . . . . . . . . . . . 10 ODBCSource . . . . . . . . . . . . . 13 SolidDBEnrich . . . . . . . . . . . . 15 DB2SplitDB . . . . . . . . . . . . . 17 DB2PartitionedAppend . . . . . . . . . 18 Operator runtime error conditions . . . . . . . 20 ODBC and DB2 operators runtime error conditions . . . . . . . . . . . . . . 20 SolidDBEnrich operator runtime error conditions 22 Connection Specifications Document . . . . . . 22 Connection_specification Element . . . . . . 23 Access_specification element. . . . . . . . 24

Chapter 3. Known issues and restrictions . . . . . . . . . . . . . 33 Chapter 4. Connection setup and debug . . . . . . . . . . . . . . . 35
Building odbchelper Using odbchelper . . . . . . . . . . . . . . . . . . . . . . 35 . 35

Chapter 5. DB2 partition layout and debug . . . . . . . . . . . . . . . 37


Building db2helper . Using db2helper . . . . . . . . . . . . . . . . . . . . . . . 37 . 37

Chapter 6. Sample applications . . . . 39


Working with the samples in the command-line environment . . . . . . . . . . . . Updating database configuration information Working with the samples in Streams Studio . Updating database configuration information . . . . . . . . 39 39 40 41

Notices . . . . . . . . . . . . . . 43

Copyright IBM Corp. 2009, 2012

vi

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Chapter 1. Overview
IBM InfoSphere Streams applications process streams of data flowing from external sources and convert result streams to external formats to be used by components that are not part of InfoSphere Streams. Additionally, Streams applications can merge data from external repositories with internal streams, enriching their contents. The IBM Streams Processing Language (SPL) Standard Toolkit includes source and sink operators, which provide generic adapters for files and network sockets. However, much of the world's data is stored in and made available by data systems and products with higher-level interfaces than files and sockets. The Database Toolkit provides a set of SPL operators that allow easy integration with such external data systems. The Database Toolkit includes these operators: v The ODBCAppend operator stores a stream in a DBMS table. A row is appended to the table for each input stream tuple, using an SQL INSERT statement. v The ODBCEnrich operator generates a stream from an input tuple and the result set of an SQL SELECT statement. | | v The ODBCRun operator generates a stream from a generic user-defined SQL statement. v The ODBCSource operator generates a stream from the rows of the result set of an SQL SELECT statement. v The SolidDBEnrich operator generates a stream from an input tuple and the result set of a solidDB table query. The Database Toolkit provides a set of operators that write data to a partitioned DB2 database using parallel write operations for each partition. Streams applications that process huge volumes of data can use these operators to provide improved performance when writing data to partitioned tables. The Database toolkit includes the following operators to write to a partitioned DB2 database: v The DB2SplitDB operator determines the partition to use to write the input tuples. v The DB2PartitionedAppend operator appends input tuples to a table in the specified partition. A row is appended to the table for each input tuple, using an SQL INSERT statement. Important: The DB2SplitDB and DB2PartitionedAppend operators require the 9.7 version of the DB2 database to be installed on your system. Note that the Database Toolkit is installed as part of the IBM InfoSphere Streams product installation. For information about installing this product, see the IBM InfoSphere Streams: Installation and Administration Guide.

Copyright IBM Corp. 2009, 2012

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Chapter 2. How to use the Database Toolkit


The Database Toolkit operators must be configured to connect to an external data service and to access specific data from that service. This configuration information is specified in an XML document that is separate from the SPL application. There are two main reasons for this. First, this configuration information is often complex, detailed, and specific to a particular vendor or vendor product. The same configuration information is often shared by many operator declarations either in a single application or across multiple applications. Repeating the same information, in several operator declarations, multiplies the opportunity for errors and is difficult to keep consistent. We choose to consolidate the configuration information to make it easier to maintain both the information itself and the SPL programs that access it. Second, the people who understand how to configure the external data services are often not the same people who are developing the SPL applications. Separating the configuration information from the SPL application allows the people in the two roles to work more independently of each other with less need for low-level coordination. We describe how the operators in the Database Toolkit specify the external data service configuration file in Operators on page 7. The format of this XML file is described in Connection Specifications Document on page 22. Although the operators of the Database Toolkit access data from external data services, they do not define entities in those services or otherwise manage the data or the service. External data services are managed by tools and processes supplied by their vendors independently from the operators of the Database Toolkit and the SPL applications that use them. For example, the ODBCAppend operator inserts rows into a table in a DBMS (see ODBCAppend on page 7). The ODBCAppend operator does not attempt to create the table; if it does not already exist, the ODBCAppend operator will issue an error. Applications that contain Database Toolkit operators are compiled with the SPL compiler command, sc. To compile an application containing Database Toolkit operators, you must specify the toolkit install directory in either the STREAMS_SPLPATH environment variable, or with the -t option of the sc compiler command. The following is an example using the STREAMS_SPLPATH environment variable:
export STREAMS_SPLPATH=$STREAMS_INSTALL/toolkits/com.ibm.streams.db

The following is an example using the -t option of the sc compiler command:


sc -t $STREAMS_INSTALL/toolkits/com.ibm.streams.db -M MyMain

| | | | | |

Each operator requires a set of environment variables to be set at application compile time. These environment variables provide information needed to compile the application, as described below. v ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource These operators support access to databases that implement the ODBC specification. The following table lists the specific databases that are supported by these operators, and the environment variable that needs to be defined in order to use them. Exactly one of the environment variables from the table must be defined, according to which database you are choosing to be targeted by the

Copyright IBM Corp. 2009, 2012

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

operator's code generation at compile time. The value that the environment variable is assigned is not important, only that the variable itself be defined to some value. Restrictions: The following restrictions apply to Red Hat Enterprise Linux Version 6 (RHEL 6) only: On x86 systems running RHEL 6, Oracle databases are not supported. On IBM POWER7 systems running RHEL 6, IBM solidDB, Netezza, and Oracle databases are not supported.
Table 1. Supported databases and the corresponding environment variables
Database Product DB2 Runtime Client and DB2 Client IBM Data Server Client and IBM Data Server Runtime Client IBM Informix Dynamic Server Oracle Database Version 9.7 9.5 Environment Variable STREAMS_ADAPTERS_ODBC_DB2 STREAMS_ADAPTERS_ODBC_DB2

11.50 11g Release 2 6.5 5.1 2008 6.0

STREAMS_ADAPTERS_ODBC_IDS STREAMS_ADAPTERS_ODBC_ORACLE

IBM solidDB MySQL Microsoft SQL Server Netezza

STREAMS_ADAPTERS_ODBC_SOLID STREAMS_ADAPTERS_ODBC_MYSQL STREAMS_ADAPTERS_ODBC_SQLSERVER STREAMS_ADAPTERS_ODBC_NETEZZA

In addition to assigning some value to exactly one of the environment variables listed above, you must set the environment variables STREAMS_ADAPTERS_ODBC_INCPATH and STREAMS_ADAPTERS_ODBC_LIBPATH to the installed directory locations of the header files and the libraries for the database product that you are using. The operators also allow for additional databases that support ODBC via the UnixODBC driver to be configured. To use this capability, you should define the environment variable STREAMS_ADAPTERS_ODBC_UNIX_OTHER. The following table lists the drivers that are supported for the databases listed in Table 1.
Table 2. Supported drivers Database Product DB2 Runtime Client and DB2 Client IBM Data Server Client and IBM Data Server Runtime Client IBM Informix Dynamic Server Oracle Database IBM solidDB MySQL Microsoft SQL Server Netezza Version 9.7 9.5 11.50 11g Release 2 6.5 5.1 2008 6.0 Supported Driver DB2 ODBC IBM Data Server ODBC IBM Informix ODBC UnixODBC UnixODBC UnixODBC UnixODBC UnixODBC

v SolidDBEnrich

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

This operator supports access to solidDB databases using IBM solidDB 6.5. The environment variables STREAMS_ADAPTERS_SOLIDDB_LIBPATH and STREAMS_ADAPTERS_SOLIDDB_INCPATH must be defined to be the path names of the directories where the external libraries and header files for solidDB 6.5 are installed. v DB2SplitDB, DB2PartitionedAppend These operators support access to DB2 databases using a partitioned database configuration. The environment variables STREAMS_ADAPTERS_DB2_LIBPATH and STREAMS_ADAPTERS_DB2_INCPATH must be defined to be the path names of the directories where the external libraries and header files for DB2 are installed. Note: If you are using both ODBC and DB2 operators in your application to access a DB2 database, you must set both sets of _LIBPATH and _INCLUDEPATH variables described above.

Operator common parameters


The Database Toolkit operators are similar to the adapter operators in the SPL Standard Toolkit. For more information about the Adapter Operators, see the IBM Streams Processing Language Standard Toolkit Reference. The Database Toolkit operators are defined as SPL primitive operators. All of the operators in the Database Toolkit use three common parameters. These parameters describe the connection specifications document that the operator uses (for details about the contents of a connection specifications document, see Connection Specifications Document on page 22) as well as the particular connection specification and access specification within that document. We describe these parameters here. v connectionDocument The connectionDocument parameter specifies the path name of a file containing the connection and access specifications identified by the connection and access parameters (see Connection Specifications Document on page 22). The connection and access specifications defined by the operator's invocation parameters are set at SPL compile time; any change in the reference document or parameter settings requires a re-compile to take effect. Once compiled, the connections.xml document is not required for job submission. The connectionDocument parameter is optional. If present, it must have exactly one value of type rstring. If the parameter is absent, the operator will look for a file called connections.xml in the etc subdirectory of the SPL application directory (the current working directory where the sc command is invoked). For example, if you invoke the sc command from /home/myapp, the compiler would look for a connection document at /home/myapp/etc/connections.xml. v connection The connection parameter specifies the name of a connection_specification element in the connection specifications document that identifies the external service to which this operator will connect (see Connection_specification Element on page 23). This parameter is required and must have exactly one value of type rstring. v access The access parameter specifies the name of an access_specification element in the connection specifications document (see Access_specification element on page 24). This access specification specifies how this operator will access specific data in the external service identified by the connection parameter. This parameter is required and must have exactly one value of type rstring. The
Chapter 2. How to use the Database Toolkit

connection specification named in the connection parameter of this operator declaration must be the value of the connection attribute of a uses_connection element of the named access specification. Important: Although the SPL compiler in conjunction with the Database Toolkit does check that the connections specification and the access specification associated with a Database Toolkit operator are semantically valid XML, it cannot check at compile time that the operator can connect to the external data service and access data as configured. These operators have internal checks for correct configuration of the external data service that might result in a runtime failure, captured in the processing element logs. For information about tracing and logging, see the IBM Streams Processing Language Streams Debugger Reference. The Database Toolkit also provides utilities to help find setup and configuration issues. For more information about these utilities, see Using odbchelper on page 35 and Using db2helper on page 37.

Operator error output port


The ODBC and DB2 operators in the Database Toolkit provide the capability to specify an optional output port containing information about SQL errors that occur at runtime. This port gives an application writer the capability to handle errors as they occur within the application if desired. The output port can have up to four attributes, all of which are optional. The first attribute is an embedded tuple containing all of the attributes of the input tuple pertaining to the SQL error. This embedded tuple is only valid for operators that have input ports (for example, ODBCSource has no input port and thus cannot include the embedded tuple in its error output port). The remaining three attributes correspond to the SQL return code, SQL message, and SQL state returned on an SQL error. The data types of these attributes are int32, rstring, and rstring respectively. The attributes can have any name. The first rstring attribute, if specified, contains the SQL message data, and the second rstring attribute, if specified, contains the SQL state. The error output port is non-mutating and its punctuation mode is Free. Tuples are generated on this port when certain runtime SQL error conditions occur. For a complete list of these runtime error conditions, see Operator runtime error conditions on page 20. The following is an example of the ODBCAppend operator invocation, using a configured error output port:
stream <tuple<PersonSchema> inTuple, int32 sqlcode, rstring sqlmessage, rstring sqlstate> errors = ODBCAppend ( MyInputDataStream ) { param connectionDocument : "connections.xml"; connection : "PersonDB"; access : "PersonSink"; } //1 //2 //3 //4 //5 //6 //7 //8

The following is an example of the ODBCSource operator invocation, using a configured error output port. Note that the operator has two output ports, the first one being the port containing the operator's generated source data tuples. Also note that ODBCSource has no input port, so the error output port does not contain the embedded input tuple.

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

(stream <int32 id, rstring fname, rstring lname>; stream <int32 sqlcode, rstring sqlmessage, rstring sqlstate> errors) = ODBCSource () { param connectionDocument : "connections.xml"; connection : "PersonDB"; access : "InfPeronIFL"; initDelay : 3.0; }

//1 //2 //3 //4 //5 //6 //7 //8 //9 //10

Operators
The root namespace for all toolkit operators is com.ibm.streams.db. The following table summarizes the namespace declarations needed (in an SPL application) for each operator.
Table 3. Namespace for Database Toolkit operators Operator ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, SolidDBEnrich DB2SplitDB, DB2PartitionedAppend Namespace declaration use com.ibm.streams.db::*; use com.ibm.streams.db.db2::*;

ODBCAppend
Namespace com.ibm.streams.db Description The ODBCAppend operator stores an input stream into a DBMS table. A row is appended to the table for each input stream tuple, using an SQL INSERT statement based on the information specified in the table element of the access specification named by the access parameter. Input Ports The ODBCAppend operator has one required input port. The input port is non-mutating and its punctuation mode is Oblivious (there is no reasonable mapping to a table row). The names and types of input stream tuple attributes must correspond to the names and types of the attribute elements of the external_schema element (see External_schema element on page 30) of the access specification named by the access parameter. The external_schema attribute names must be the same as the names of the columns of the table being appended to. Also, the attribute types must be the SPL type that corresponds to the ODBC type of the table column (see Table 6 on page 32). Note: It is not required that all columns of a table be represented in the input stream schema and access specification external_schema. Columns which have an automatically assigned or default value at insert time can be excluded. Output Ports The ODBCAppend operator has one optional output port. This output port submits a tuple when an insert error occurs when trying to insert a tuple record into the table. For detailed information about the error output port, see Operator error output port on page 6.
Chapter 2. How to use the Database Toolkit

Parameters The ODBCAppend operator has the following parameter besides the set of common Database Toolkit operator parameters (see Operator common parameters on page 5). | commitOnPunctuation This optional boolean parameter allows you to specify whether or not transactions are committed when the operator receives a punctuation. The default is false. If the parameter is set to true, the operator will perform the following actions when a window punctuation is received: v If the current rowset is not empty, the rowset will be inserted. v If the number of rows inserted since the last transaction commit is greater than 0, the transaction will be committed and the uncommitted row counter will reset to 0. If no window punctuation is received, a commit will continue to occur if the number of rows inserted reaches the number specified in the transaction_batchsize attribute of the table element. For more information on the Table element on page 25 see the Connection Specification Document. Windowing The ODBCAppend operator does not accept any windowing configurations. Assignments The ODBCAppend operator does not allow assignments to output attributes. Metrics The ODBCAppend operator provides the following metric: v droppedTuples: The number of input tuples that are dropped (not inserted into the table) because of an insert failure. Exceptions The ODBCAppend operator does not throw any exceptions. For a list of error conditions that are logged, see Operator runtime error conditions on page 20.
() as mySink = ODBCAppend(persondata) { param connection : "PersonDB"; access : "PersonSink"; connectionDocument : "connections.xml"; } //1 //2 //3 //4 //5 //6 //7

ODBCEnrich
Namespace com.ibm.streams.db Description The ODBCEnrich operator executes an SQL SELECT statement for each input tuple, and submits an output tuple for each record in the result set of the SELECT statement. If an invocation of the SELECT statement does not result in any records, no output tuples are submitted for that invocation. The SELECT statement is specified as the value of the query attribute of the query element of the access specification named by the access parameter (see Query element on page 25). Any valid SELECT statement for the database specified in the connection specification named by the connection

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

parameter can be used. For each row in the result set, the operator produces a tuple. The tuple automatically assigns the values of the columns of the result set to the output stream attribute with the same name and data type. Null data in a result set's column produces 0 for numeric attributes and an empty string for the rstring attributes. These automatic assignments will not override an explicit assignment to an output attribute or an auto-assignment of an input stream attribute with the same name. The names and data types of the columns of the result set of the SELECT statement are specified by the attribute elements of the external_schema element of the access specification (see External_schema element on page 30). However, no external schema attribute can have the same name as an input stream attribute, since there is no syntactic means to indicate whether such a name refers to the input stream attribute or the external schema attribute. Additionally, the operator checks that every output stream attribute either has an explicit assignment, has the same name as an input stream attribute, or has the same name as an external schema attribute. Input Ports The ODBCEnrich operator has one required input port. The input stream tuples must contain attribute(s) corresponding to the columns of the table being queried. The input port is non-mutating and its punctuation mode is Oblivious. Output Ports The ODBCEnrich operator has one required output port and one optional output port. The required output port submits a tuple for each row in the result set of the SELECT statement executed for an input tuple. The resulting tuple contains the input tuple attributes, plus additional desired attribute(s) corresponding columns in the result set. The output port is mutating and its punctuation mode is Preserving. The optional output port submits a tuple when an error occurs on the SQL SELECT executed as the result of an input tuple. For detailed information about the error output port, see Operator error output port on page 6. Parameters In addition to the set of common Database Toolkit operator parameters (see Operator common parameters on page 5), the ODBCEnrich operator supports query-specific parameters that permit the parameterization of the SQL SELECT statement that is executed. In the access specification named by the access parameter, a parameter element is associated with each ODBC parameter marker in the SELECT statement (see Parameter element on page 30). The operator declaration must specify a parameter with the same name as each of the parameter elements in its access specification. The value(s) of a query-specific parameter must have the data type specified by its parameter element as well as the specified number of values. In the following example, the access specification PersonIAGST has these query and parameter elements:
stream <int32 id, rstring fname, rstring lname, int16 age, rstring gender, float32 score, float64 total> MyCompletePersonStream = ODBCEnrich ( MyPersonNamesStream ) { param connection : "PersonDB"; access : "PersonIAGST"; connectionDocument : "connections.xml"; pid : id; //1 //2 //3 //4 //5 //6 //7 //8 //9

Chapter 2. How to use the Database Toolkit

//10 //11 <query query="SELECT id, age, gender, score, total FROM personsrc WHERE ID = ?" /> //12 <parameters> //13 <parameter name="pid" type="int32" /> //14 </parameters> //15

Note that the SELECT statement has an ODBC parameter marker (the ?) in its WHERE clause. The parameter element named pid is associated with that parameter marker. In the operator declaration above, the corresponding operator parameter named pid has a single value, id, which denotes the input stream attribute with that name. The data type of the value of the pid parameter in the operator declaration, int32, matches that of the pid parameter element. For each incoming tuple, the operator executes this SELECT statement, providing the current value of the input stream attribute id as the value of that ODBC parameter marker. A SELECT statement can have multiple ODBC parameter markers ("?"s). The number of parameter markers must match the number of parameter elements. ODBC parameter markers are processed in the order in which they appear in the SELECT statement, and correspond to the order of the parameter elements in the configuration file. Windowing The ODBCEnrich operator does not accept any windowing configurations. Assignments The ODBCEnrich operator does not allow assignments to output attributes. Metrics The ODBCEnrich operator provides the following metric: v droppedTuples: The number of input tuples that are dropped (not processed) because of an SQL failure. Exceptions The ODBCEnrich operator does not throw any exceptions. For a list of error conditions that are logged, see Operator runtime error conditions on page 20. For an example of the ODBCEnrich operator, see the Parameters section. | | | | | | | | | | | | | | | |

ODBCRun
Namespace com.ibm.streams.db Description The ODBCRun operator runs a generic user-defined SQL statement as part of an application. This operator is commonly used to update, merge, and delete data. The ODBCRun operator is also used to create tables, drop tables and call stored procedures. Input Ports The ODBCRun operator is configurable with one required input port. The input port is non-mutable and its punctuation mode is Oblivious. The user-defined SQL statement is run each time a tuple is received on the input port. If the statement has ODBC parameter markers, the operator can be configured to use input tuple attribute values for these parameter markers at statement run time. For additional details about statement parameters, see the Parameters section of this operator.

10

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Output Ports The ODBCRun operator has one required output port and one optional output port. The required output port is mutating and its punctuation mode is Preserving. The output port submits a tuple for each row in the result set of the user-defined statement, if the statement produces a result set. The output tuple can contain any of the following results assigned in this order: v Any of the columns returned in the result set v Any of the attributes from the input tuple v Any operator-configured output assignments The optional output port submits one or more tuples when an error occurs while running the user-defined SQL statement. For detailed information about the error output port, see Operator error output port on page 6. Note: If the statement element of the access_specification parameter is configured with a transaction_batchsize > 1, the error output port will submit a tuple for each record in the rowset processed by the statement. Parameters In addition to the set of Operator common parameters on page 5 in the Database Toolkit, the ODBCRun operator supports the following parameter: commitOnPunctuation You can specify whether transactions are committed when the operator receives a window punctuation with this optional Boolean parameter. The default parameter is false. If the parameter is set to true, the operator performs the following actions when a window punctuation is received: v If the current rowset is not empty, the rowset is inserted. v If the number of rows inserted since the last transaction commit is greater than 0, the transaction is committed and the uncommitted row counter is reset to 0.

If no window punctuation is received, a commit continues to occur if the number of rows inserted reaches the number specified in the transaction_batchsize attribute of the table element. For more information about the Table element on page 25, see the Connection Specification Document. Additionally, the ODBCRun operator supports statement-specific parameters that establish the framework of the user-defined statement being run. In the access specification named by the access parameter, a parameter element is associated with each ODBC parameter marker in the statement (see Parameter element on page 30). The operator declaration must specify a parameter with the same name as each of the parameter elements in its access specification. The value of a statement-specific parameter must have the data type specified by its parameter element as well as the specified number of values. For a usage example, see the Parameters section of the ODBCEnrich operator. Assignments The ODBCRun operator can assign the following output tuple attributes: v If the statement produces a result set, the external_schema section of the access_specification element specified in the access parameter of the

Chapter 2. How to use the Database Toolkit

11

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

operator must correspond to the output stream schema. Output tuple attributes are assigned the values of the columns returned in the result set. v After result set values are assigned (or if there is no result set), unassigned output tuple attributes with a matching input tuple attribute are automatically assigned with the input tuple attribute value. v All remaining unassigned output tuple attributes must have an explicit assignment specified in the output section of the operator configuration. Windowing The ODBCRun operator does not accept any window configurations. Metrics The ODBCRun operator has the following metrics: v droppedTuples The number of input tuples which result in a statement failure. Exception The ODBCRun operator does not throw any exceptions. For a list of error conditions that are logged, see Operator runtime error conditions on page 20. Example The following sample SPL application creates a table at startup, inserts records read from a file, and updates them with new values.
use com.ibm.streams.db::*; composite Main { type TableSchema = rstring tablename;

PersonSchema = int32 id, rstring name, float32 salary; graph //////////////////////////////////////////////// // Create a new table //////////////////////////////////////////////// stream <TableSchema> tablebeacon = Beacon() { param iterations : 1u; } stream <TableSchema> createtable = ODBCRun(tablebeacon) { param connection : "DBPerson"; access : "PersonCreate"; connectionDocument : ./etc/connections.xml; } //////////////////////////////////////////////// // Read from file #1 //////////////////////////////////////////////// stream<PersonSchema> persondata = FileSource() { param

12

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

file format initDelay }

: "testdata.csv"; : csv; : 5.0;

//////////////////////////////////////////////// // Insert records into the table created earlier //////////////////////////////////////////////// () as tablePopulate = ODBCAppend(persondata) { param connection : "DBPerson"; access : "PersonSink"; connectionDocument : ./etc/connections.xml; } //////////////////////////////////////////////// // Read updated records from file #2 //////////////////////////////////////////////// stream<PersonSchema> persondata_updated = FileSource() { param file format initDelay } //////////////////////////////////////////////// // Update table with new values //////////////////////////////////////////////// () as tableUpdate = ODBCRun(persondata_updated) { param connection : "DBPerson"; access : "PersonUpdate"; connectionDocument : "./etc/connection.xml"; salary : salary; id : id; } ---connection.xml snippet--<access_specification name=PersonCreate> <statement statement=CREATE TABLE ? (ID INTEGER NOT NULL, NAME CHAR(20), SALARY FLOAT) /> <parameters> <parameter name=tablename type=rstring /> </parameters> </access_specification> <access_specification name=PersonUpdate> <statement statement="UPDATE TABLE person SET salary = ? WHERE id = ? /> <parameters> <parameter name=salary type=float32 /> <parameter name="id" type="int32" /> </parameters> </access_specification> ---end connection.xml snippet--: "testdata_updated.csv"; : csv; : 5.0;

ODBCSource
Namespace com.ibm.streams.db

Chapter 2. How to use the Database Toolkit

13

Description The ODBCSource operator generates a stream from the result set of an SQL SELECT statement. The SELECT statement is specified as the value of the query attribute of the query element of the access specification named by the access parameter (see Query element on page 25). Any valid SELECT statement for the database specified in the connection specification named by the connection parameter can be used. For each row in the result set, the operator produces a tuple by automatically assigning the values of the columns of the result set to the output stream attributes with the same name and data type. When a column of the result set contains null data, the corresponding output stream attribute is set to 0 for numeric data types and to the empty string for the rstring data type. The columns of the result set of the SELECT statement, as specified by the attribute elements of the external_schema element of the access specification (see External_schema element on page 30), must be a superset of the attributes of the output stream. The values of the columns of the SELECT statement result set are assigned to output stream attributes by name. If you want to stream the result set of your query into your application more than once, specify the number of times as the value of the replays attribute of the query element in the access specification. The operator will execute the query that number of times. If the value of the replays attribute is 0, the ODBCSource operator will execute the query repeatedly until the application is canceled. If the value of the replays attribute is not 0, a final punctuation will be generated when all queries for the operator invocation have completed. Input Ports The ODBCSource operator has no input ports. Output Ports The ODBCSource operator has one required output port and one optional output port. The required output port submits a tuple for each row in the result set of the SELECT statement. The output port is mutating and its punctuation mode is Generating. A punctuation is generated after all of the tuples have been submitted as the result of a query. This allows downstream operators to distinguish between tuples from different query executions. The optional output port submits a tuple when an error occurs on the SQL SELECT statement being executed. For detailed information about the error output port, see Operator error output port on page 6. Parameters In addition to the set of common Database Toolkit operator parameters (see Operator common parameters on page 5), the ODBCSource operator has the following additional parameters: initDelay The initDelay parameter specifies an initial processing delay, in seconds, before the operator begins emitting tuples. It is equivalent to the initDelay parameter of the FileSource or TCPSource operator in the SPL Standard Toolkit. This parameter is optional. If present, it must have exactly one value of type float64. sleepTime This parameter of type float64 specifies the minimal time in

14

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

seconds that the operator has to wait before it can execute a query again. It is equivalent to the sleepTime parameter of the DirectoryScan operator in the SPL Standard Toolkit. If this parameter is not specified, the operator does not wait for any amount of time between query executions. If the time difference between the last executed query and the current time is less than sleepTime seconds, the operator sleeps until the time since the last executed query is sleepTime seconds. If more than sleepTime seconds have already passed, the query is executed immediately. Additionally, the operator supports query-specific parameters that permit the parameterization of the SQL SELECT statement that is executed. In the access specification named by the access parameter, a parameter element is associated with each ODBC parameter marker in the SELECT statement (see Parameter element on page 30). The operator declaration must specify a parameter with the same name as each of the parameter elements in its access specification. The value(s) of a query-specific parameter must have the data type specified by its parameter element as well as the specified number of values. For a usage example, see the Parameters section of the ODBCEnrich operator. Windowing The ODBCSource operator does not accept any windowing configurations. Assignments The ODBCSource operator does not allow assignments to output attributes. Metrics The ODBCSource operator does not provide any metrics. Exceptions The ODBCSource operator does not throw any exceptions. For a list of error conditions that are logged, see Operator runtime error conditions on page 20.
stream <int32 id, rstring fname, rstring lname> MyPersonNamesStream = ODBCSource() { param connection : "PersonDB"; access : "InfPersonIFL"; connectionDocument : "connections.xml"; initDelay : 3; sleepTime : 6; } //1 //2 //3 //4 //5 //6 //7 //8 //9 //10

SolidDBEnrich
Namespace com.ibm.streams.db Description The SolidDBEnrich operator is similar to the ODBCEnrich operator, but its implementation is based on the proprietary solidDB SA API that achieves very high performance by bypassing the full SQL query engine. In particular, the query is executed against a solidDB table, and the search conditions on the query are much more restricted than those offered by the WHERE clause of an SQL SELECT statement. For each incoming tuple, operator executes a query against the table specified in the tablename attribute of the tablequery element of the access
Chapter 2. How to use the Database Toolkit

15

specification named by the access parameter (see Tablequery element on page 27). The search conditions for the query are specified by the parameter_condition and static_condition elements of the tablequery element. An output tuple is produced for each row in the result set by automatically assigning the values of the columns of the result set to the output stream attributes with the same name and data type. These automatic assignments will not override an explicit assignment to an output attribute or an auto-assignment of an input stream attribute with the same name. The names and data types of the columns of the solidDB table are specified by the attribute elements of the external_schema element of the access specification (see External_schema element on page 30). However, no external schema attribute can have the same name as an input stream attribute, since there is no syntactic means to indicate whether such a name refers to the input stream attribute or the external schema attribute. Additionally, the operator checks that every output stream attribute either has an explicit assignment, has the same name as an input stream attribute, or has the same name as an external schema attribute. Input Ports The SoliDDBEnrich operator has one required input port. The input stream tuples must contain attribute(s) corresponding to the columns of the table being queried. The input port is non-mutating and its punctuation mode is Oblivious. Output Ports The SoliDDBEnrich operator has one required output port. The output port submits a tuple for each row in the result set of the query executed for an input tuple. The resulting tuple contains the input tuple attributes, plus additional desired attribute(s) corresponding columns in the result set. The output port is mutating and its punctuation mode is Preserving. Parameters In addition to the set of common Database Toolkit operator parameters (see Operator common parameters on page 5), the SoliDDBEnrich operator supports tablequery-specific parameters that permit the values of search constraints on the solidDB query to be specified in the operator declaration rather than in the access specification. For example, the values of search constraints can be taken from input stream attributes. In the access specification named by the access parameter, a parameter element is associated with each parameter_condition element of the tablequery element. The operator declaration must specify a parameter with the same name as each of the parameter elements in its access specification. The value(s) of a tablequery-specific parameter must have the data type specified by its parameter element as well as the specified number of values. In the example below, the access specification bargainIndexRatings has these parameter_condition and parameter elements:
stream <rstring ticker, rstring ttype, float64 price, float64 volume, float64 ratingInternal, float64 ratingExternal, float64 epsExternal> EnrichedTrades = SolidDBEnrich ( TradeQuote ) { param connection : "FinancialDB"; access : "bargainIndexRatings"; ticker : ticker; ttype : ttype; } <tablequery tablename="ratings"> <parameter_condition column="symbol" condition="equal" /> //1 //2 //3 //4 //5 //6 //7 //8 //9 //10 //11 //12 //13

16

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

<parameter_condition column="type" condition="equal" /> </tablequery> <parameters> <parameter name="ticker" type="rstring" /> <parameter name="ttype" type="rstring" /> </parameters>

//14 //15 //16 //17 //18 //19

In the example operator declaration above, there are two tablequery-specific parameters, ticker and ttype, whose values are the input stream attributes of the same names. The data type of the values of these parameters in the operator declaration, rstring, matches those of the parameter elements. For each incoming tuple, the operator executes the query, providing the current value of the input stream attributes ticker and ttype to the query search condition. Windowing The SolidDBEnrich operator does not accept any windowing configurations. Assignments The SolidDBEnrich operator does not allow assignments to output attributes. Metrics The SolidDBEnrich operator does not provide any metrics. Exceptions The SolidDBEnrich operator does not throw any exceptions. For other errors, messages are written to the log. For a list of error conditions that are logged, see Operator runtime error conditions on page 20. For an example of the SolidDBEnrich operator, see the Parameters section.

DB2SplitDB
Namespace com.ibm.streams.db.db2 Description The DB2SplitDB operator uses DB2 database table key information from an input tuple to determine its corresponding partition number in the DB2 database table. The input tuples must contain the attributes that correspond to the key fields in the database table. The operator has n number of output streams where n is equal to the number of partitions in the database. The operator submits the input tuple to an output stream that corresponds to the partition number. The access specification named in the operator declaration must contain the tablename attribute within the table element. The access specification must also contain an external_schema element with attribute elements for each column in the table specified by the tablename attribute within the table element. Input Ports The DB2SplitDB operator has one required input port. The input port tuples must contain the attribute(s) corresponding to the key field(s) for the database table. The input port is non-mutating and its punctuation mode is Oblivious. Output Ports The DB2SplitDB operator has one output port open set. The required
Chapter 2. How to use the Database Toolkit

17

number of output streams is the same as the number of partitions in the database. The output port is mutating and its punctuation mode is Preserving. Parameters The DB2SplitDB operator does not have any additional parameters besides the set of common Database Toolkit operator parameters (see Operator common parameters on page 5). Windowing The DB2SplitDB operator does not accept any window configurations. Assignments The DB2SplitDB operator does not allow assignments to output attributes. Metrics The DB2SplitDB operator provides the following metric: v droppedTuples: The number of input tuples that are dropped (not processed) due to a failure in retrieving the partition number. Exceptions The DB2SplitDB operator will throw an exception and terminate in the following cases: v Unable to determine number of partitions v The number of output streams does not match the number of partitions. v The number of attributes in the external_schema element of the access_specification element that have the key=true attribute does not match the number of hash keys configured for the database table. For other errors, messages are written to the log. For a list of error conditions that are logged, see Operator runtime error conditions on page 20.
(stream <PersonSchema> stream <PersonSchema> stream <PersonSchema> stream <PersonSchema> { param connection access connectionDocument } out0; out1; out2; out3) = DB2SplitDB(persondata) : "DBPerson"; : "PersonSinkDefault"; : "./etc/connections.xml"; //1 //2 //3 //4 //5 //6 //7 //8 //9 //10

DB2PartitionedAppend
Namespace com.ibm.streams.db.db2 Description The DB2PartitionedAppend operator performs the same function as the ODBCAppend operator, but allows the user to specify a DB2 partition number, used on the SQL INSERT operation. This allows an application to take advantage of DB2 high performance partitioned databases by writing directly to partitions. The operator is intended to be used in conjunction with the DB2SplitDB operator, which determines the partition number using the key fields of an input tuple. Input Ports The DB2PartitionedAppend operator has one required input port. The input

18

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

stream tuples must contain the attribute(s) corresponding to the columns of the table being appended to. The input port is non-mutating and its punctuation mode is Oblivious (there is no reasonable mapping to a table row). Output Ports The DB2PartitionedAppend operator has one optional output port. This output port submits a tuple when an insert error occurs when trying to insert a tuple record into the table. For detailed information about the error output port, see Operator error output port on page 6. Parameters In addition to the set of common Database Toolkit operator parameters (see Operator common parameters on page 5), the DB2PartitionedAppend operator has the following additional parameters: partitionNumber This required int32 parameter is used to specify the DB2 database partition number to be used on the SQL INSERT operation. | commitOnPunctuation This optional boolean parameter allows you to specify whether or not transactions are committed when the operator receives a punctuation. The default is false. If the parameter is set to true, the operator will perform the following actions when a window punctuation is received: v If the current rowset is not empty, the rowset will be inserted. v If the number of rows inserted since the last transaction commit is greater than 0, the transaction will be committed and the uncommitted row counter will reset to 0. If no window punctuation is received, a commit will continue to occur if the number of rows inserted reaches the number specified in the transaction_batchsize attribute of the table element. For more information on Table Elements see the Connection Specification Document. Windowing The DB2PartitionedAppend operator does not accept any windowing configurations. Assignments The DB2PartitionedAppend operator does not allow assignments to output attributes. Metrics The DB2PartitionedAppend operator provides the following metric: v droppedTuples: The number of input tuples that are dropped (not inserted into the table) because of an insert failure. Exceptions The DB2PartitionedAppend operator does not throw any exceptions. For a list of error conditions that are logged, see Operator runtime error conditions on page 20. Example The following example assumes a database with four partitions, with partition information determined by the example listed in the DB2SplitDB operator.

Chapter 2. How to use the Database Toolkit

19

() as appendTable0 = DB2PartitionedAppend(out0) { param connection : "DBPerson"; access : "PersonSinkDefault"; connectionDocument : "./etc/connection.xml"; partitionNumber : 0u; } () as appendTable1 = DB2PartitionedAppend(out1) { param connection : "DBPerson"; access : "PersonSinkDefault"; connectionDocument : "./etc/connection.xml"; partitionNumber : 1u; } () as appendTable2 = DB2PartitionedAppend(out2) { param connection : "DBPerson"; access : "PersonSinkDefault"; connectionDocument : "./etc/connection.xml"; partitionNumber : 2u; } () as appendTable3 = DB2PartitionedAppend(out3) { param connection : "DBPerson"; access : "PersonSinkDefault"; connectionDocument : "./etc/connection.xml"; partitionNumber : 3u; }

//1 //2 //3 //4 //5 //6 //7 //8 //9 //10 //11 //12 //13 //14 //15 //16 //17 //18 //19 //20 //21 //22 //23 //24 //25 //26 //27 //28 //29 //30 //31 //32

Operator runtime error conditions


The following conditions will result in an error message being generated in the processing element log during the operator runtime. For information about tracing and logging, see the IBM Streams Processing Language Streams Debugger Reference.

ODBC and DB2 operators runtime error conditions


The OCBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators automatically respond to error conditions.
Table 4. Runtime error conditions for ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators Error Condition Unable to allocate ODBC environment handle Operators ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, DB2PartitionedAppend ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, DB2PartitionedAppend System Action Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt.

Unable to allocate ODBC connection handle

20

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Table 4. Runtime error conditions for ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators (continued) Error Condition Operators System Action Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. If the optional output error stream is specified, a tuple will be generated on this stream. Operator will retry the operation after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. If the optional output error stream is specified, a tuple containing the SQL error information will be generated on this stream. Unable to run SQL statement ODBCRun If the optional error stream is specified, a tuple will be generated on this stream. If the optional error stream is specified, a tuple will be generated on this stream. Not applicable.

Unable to connect to ODBCAppend, ODBCEnrich, database ODBCRun, ODBCSource, DB2SplitDB, DB2PartitionedAppend Unable to allocate SQL statement ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, DB2PartitionedAppend ODBCEnrich, ODBCRun, ODBCSource

Unable to set isolation level

Unable to prepare SQL statement

ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2PartitionedAppend

Unable to bind SQL parameters

ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2PartitionedAppend

Unable to set AutoCommit

ODBCAppend, ODBCRun, DB2PartitionedAppend

Unable to run SQL INSERT statement Unable to run SQL SELECT statement

ODBCAppend, DB2PartitionedAppend ODBCEnrich, ODBCSource

| | |

Unable to determine ODBCAppend, ODBCEnrich, number of statement ODBCRun, ODBCSource, result columns DB2PartitionedAppend Unable to fetch SQL ODBCEnrich, ODBCRun, SELECT results ODBCSource

Chapter 2. How to use the Database Toolkit

21

Table 4. Runtime error conditions for ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators (continued) Error Condition Operators System Action Not applicable.

Unable to close SQL ODBCEnrich, ODBCRun, SELECT results ODBCSource cursor Unable to determine DB2SplitDB partition information for a tuple

Not applicable.

SolidDBEnrich operator runtime error conditions


The SolidDBEnrich operator automatically responds to error conditions.
Table 5. Runtime error conditions for SolidDBEnrich operator Error Condition Unable to connect to solidDB database Unable to bind solidDB table column Unable to open solidDB table cursor Unable to set solidDB search constraints Unable to execute solidDB query System Action Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. Not applicable. Operator will retry the operation attempt after a wait period (starting with 1 second). The wait period increases by a power of two after every attempt. If the optional output error stream is specified, a tuple containing the solidDB error information will be generated on this stream. Unable to advance solidDB table cursor Unable to reset solidDB table cursor Operator will not attempt to process anymore results for the current input tuple. Not applicable.

Connection Specifications Document


A connection specifications document is an XML document that describes how operators in the Database Toolkit connect to and access specific external data services. Each document contains a collection of connection specification elements and access specification elements. These connection and access specifications can be for a single operator, all the operators in an application, or organized by adapter type or any other criterion you choose. The only restriction is that the connection specification and the access specification for a given operator declaration must be in the same connection specifications document. The relationship between connection specifications and access specifications is many-to-many. Operators can connect to the same external data service (one connection specification) and access several different data resources from that

22

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

service (many access specifications). On the other hand, operators can access equivalent data (one access specification) from several different external data services (many connection specifications), for example, accessing data from both a test system and a production system. When the SPL compiler encounters a Database Toolkit operator declaration, it must read the connection specifications document named by that operator's connectionDocument parameter. It checks that the document conforms to the semantic rules of the XML schema defined for these documents. The SPL compiler uses the information given in the connection and access specifications to configure the operator. The compiler does not attempt to connect to the external data service or access its data to verify correct configuration at compile time. The operators have runtime checks to validate configuration; if the configuration is incorrect these checks might result in runtime failures which are captured in the processing element logs. For information about tracing and logging, see the IBM Streams Processing Language Streams Debugger Reference. A valid connection specifications document consists of a connections root element which contains one connection_specifications element and one access_specifications element. These elements serve as containers for the connection specifications and access specifications, which we explain in detail in the following sections. Here is an abridged example of a complete connection specifications document, with all connection_specification and access_specification elements omitted.
<?xml version="1.0" encoding="UTF-8"?> <st:connections xmlns:st="http://www.ibm.com/xmlns/prod/streams/adapters" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <connection_specifications> . . . </connection_specifications> <access_specifications> . . . </access_specifications> </st:connections>

Connection_specification Element
A connection_specifications element is a sequence of zero or more connection_specification elements. Each connection_specification element has a single attribute, name, whose value can be specified in the connection parameter of a toolkit operator declaration to identify the named connection specification. A connection_specification element must have exactly one element. For the Database Toolkit, this can be ODBC or solidDB.

ODBC element
The ODBC element specifies the information needed to establish a connection to a database using the ODBC SQLConnect() function. Here is an example connection specification containing an ODBC element.
<connection_specification name="PersonDB"> <ODBC database="person" user="user1" password="somepw" /> </connection_specification>

The ODBC element has three attributes. v database


Chapter 2. How to use the Database Toolkit

23

The value of the database attribute is the data source name (DSN) of the target database. This attribute is required. v user The value of the user attribute is the user identification under which the connection to the database will be attempted. This attribute is optional; if omitted, the corresponding parameter to the ODBC SQLConnect() function call is NULL. v password The value of the password attribute is the authentication credentials (password) for the user ID. This attribute is optional; if omitted, the corresponding parameter to the ODBC SQLConnect() function call is NULL.

solidDB Element
The solidDB element specifies the information needed to establish a connection to a database using the solidDB SA API SaConnect() function. Here is an example connection specification containing a solidDB element.
<connection_specification name="ratingsDB"> <solidDB protocol="tcp" hostname="server1" port="333333" user="user2" password="anotherpw" /> </connection_specification>

The solidDB element has five attributes. v protocol The value of the protocol attribute is a communications protocol supported by solidDB for establishing a connection to a database. It is part of the data source connect string for that database. This attribute is required. v hostname The value of the hostname attribute is the host computer name of the database to which a connection will be established. It is part of the data source connect string for that database. This attribute is optional. v port The value of the port attribute is the port number on which the host computer of the database is listening for connections. It is part of the data source connect string for that database. This attribute is optional. v user The value of the user attribute is the user identification under which the connection to the database will be attempted. This attribute is required. v password The value of the password attribute is the authentication credentials (password) for the user ID. This attribute is required.

Access_specification element
An access_specifications element is a sequence of zero or more access_specification elements. Each access_specification element has a single attribute, name, whose value can be specified in the access parameter of a Database Toolkit operator declaration to identify the named access specification. An access_specification element has a choice of a query, table, statement, or tablequery element; an optional parameters element; one or more uses_connection elements; and exactly one external_schema element.

24

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

The SPL compiler checks that the type of access specification used in a Database Toolkit operator declaration is valid for that operator. Specifically, the access specification named in an ODBCAppend, DB2SplitDB, or DB2PartitionedAppend declaration must contain a table element; in an ODBCEnrich or ODBCSource declaration, a query element; in an ODBCRun declaration, a statement element, and in a SolidDBEnrich declaration, a tablequery element.

Query element
The query element specifies information used by an ODBCEnrich or ODBCSource operator using the access specification to query a database and produces a result set. Here is an example of an abridged access specification containing a query element.
<access_specification name="PersonRemainder"> <query query="SELECT id, fname, lname FROM personsrc" isolation_level="READ_UNCOMMITTED" replays="0" />

</access_specification>

The query element has three attributes. v query The value of the query attribute is any valid SQL SELECT statement for the database specified in the connection specification of the associated operator. The columns of the result set of the SELECT statement must correspond to the attribute elements of the external_schema element of the access specification. The SELECT statement can contain ODBC parameter markers; if so, the access specification must have a parameter element corresponding to each ODBC parameter marker (for an example, see ODBCEnrich on page 8). This attribute is required. v isolation_level The value of the isolation_level attribute specifies the isolation level at which the query in the database will be executed. The values for this attribute are the ODBC isolation levels. This attribute is optional; if omitted, the query is executed at level SQL_TXN_READ_UNCOMMITTED. v replays The value of the replays attribute specifies the number of times an ODBCSource operator will execute the query. This attribute is ignored by an ODBCEnrich operator. If the value is 0, the ODBCSource operator will execute the query repeatedly until the application is canceled. This attribute is optional; if omitted, the query will be executed once.

Table element
The table element specifies information used by the ODBCAppend, DB2SplitDB, or DB2PartitionedAppend operator associated with the access specification to efficiently insert rows in a database table. One attribute, tablename, directly contributes to the SQL INSERT statement used by the operator. The other attributes control how many rows at a time are sent to the database server and how many rows are committed per transaction. Here is an example of an abridged access specification containing a table element.
<access_specification name="PersonSink"> <table tablename="personSink" transaction_batchsize="10" rowset_size="4" />

Chapter 2. How to use the Database Toolkit

25

</access_specification>

The table element has three attributes. v tablename The value of the tablename attribute identifies the target of the SQL INSERT statement, usually a table. This attribute is required. Note: When using the DB2 database with the Database Toolkit operators, the tablename attribute value must be qualified with the schema name (for example, schemaname.tablename) for the DB2SplitDB operator whereas for other operators this value need not be schema-qualified. However, other operators allow the tablename attribute value to be schema qualified. So, if you use the DB2SplitDB operator with other operators in your application, you can qualify the tablename attribute value with the schema name to use the same connection configuration for all the operators. v transaction_batchsize The value of the transaction_batchsize attribute specifies the number of rows to commit per transaction. It must be greater than or equal to the value of the rowset_size attribute. This attribute is optional; if omitted, one thousand row inserts are committed per transaction. v rowset_size The value of the rowset_size attribute specifies the number of rows that are sent to the database server per network flow. It must be less than or equal to the value of the transaction_batchsize attribute. This attribute is optional; if omitted, one hundred rows are sent per network flow. | | | | | | | | | | | | | | | | | | | | | | |

Statement element
The statement element specifies information used by the ODBCRun operator using the access specification to run an SQL statement. The statement element supports four attributes: v statement The value of the statement attribute is any valid SQL statement. The statement can contain ODBC parameter markers; if so, the access specification must have a parameter element corresponding to each ODBC parameter marker. This attribute is required. v isolation_level The value of the isolation_level attribute specifies the isolation level at which the statement will be run. This attribute is optional; if omitted, the statement is run at level SQL_TXN_READ_UNCOMMITTED. This attribute applies if the statement produces a result set. v rowset_size The value of the rowset_size attribute specifies the number of rows that are sent to the database server per network flow. The value must be less than or equal to the value of the transaction_batchsize attribute. This attribute is optional; if not specified, the default rowset size is 1. This attribute applies if the statement is transactional in nature (for example, UPDATE). v transaction_batchsize The value of the transaction_batchsize attribute specifies the number of rows to commit per transaction. It must be greater than or equal to the value of the

26

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

rowset_size attribute. This attribute is optional; if not specified, the default transaction size is 1 and transactions will be auto-commited. This attribute applies if the statement is transactional in nature (for example, UPDATE). Consider these three examples of access specifications containing a statement element:
<access_specification name="PersonCreate> <uses_connection connection=DBPerson /> <statement statement=CREATE TABLE PERSON (ID INTEGER NOT NULL, FNAME CHAR(15), LNAME CHAR(20), AGE SMALLINT, GENDER CHAR(1), SCORE FLOAT, TOTAL DOUBLE PRECISION) /> <parameters> <parameter name=table type=rstring/> </parameters> </access_specification> <access_specification name="PersonQuery> <uses_connection connection=DBPerson /> <statement statement=SELECT * FROM PERSON isolation_level=READ_COMMITTED /> <parameters> </parameters> <external_schema> <attribute name="id" type="int32" /> <attribute name="fname" type="rstring" length="15" /> <attribute name="lname" type="rstring" length="20" /> <attribute name="age" type="int32" /> <attribute name="gender" type="rstring" length="1" /> <attribute name="score" type="float32" /> <attribute name="total" type="float64" /> </external_schema> </access_specification> <access_specification name="PersonUpdate> <uses_connection connection=DBPerson /> <statement statement=UPDATE PERSON SET FNAME=?, LNAME=?, AGE=?, GENDER=?, SCORE=?, TOTAL=? WHERE ID=? /> <parameters> <parameter name="fname" type="rstring" length="15" /> <parameter name="lname" type="rstring" length="20" /> <parameter name="age" type="int32" /> <parameter name="gender" type="rstring" length="1" /> <parameter name="score" type="float32" /> <parameter name="total" type="float64" /> <parameter name=id type=int32/> </parameters> <external_schema> <attribute name="id" type="int32" /> <attribute name="fname" type="rstring" length="15" /> <attribute name="lname" type="rstring" length="20" /> <attribute name="age" type="int32" /> <attribute name="gender" type="rstring" length="1" /> <attribute name="score" type="float32" /> <attribute name="total" type="float64" /> </external_schema> </access_specification>

Tablequery element
The tablequery element specifies information used by the SolidDBEnrich operator associated with the access specification to query a solidDB table and produce a result set. Here is an example of an abridged access specification containing a tablequery element.

Chapter 2. How to use the Database Toolkit

27

<access_specification name="bargainRatings"> <tablequery tablename="ratings"> <parameter_condition column="symbol" condition="equal" /> <parameter_condition column="type" condition="equal" /> <static_condition column="ratingInternal" condition="atLeast" value="1" type="float64" /> <static_condition column="epsExternal" condition="atMost" value="6.3" type="float64" /> </tablequery>

</access_specification>

Each tablequery element has a single attribute, tablename, whose value is the name of the database table against which the query is executed. The search constraints for the query are specified using the parameter_condition and static_condition elements, which are described in the next sections. Static_condition element: The static_condition element specifies a search constraint on the query declared by its parent tablequery element. A tablequery element can have zero or more static_condition elements. The constraint consists of the column of the database table to which it applies, the condition that must be met, and the value to which to compare the column's value. The static_condition element has four attributes. v column The value of the column attribute is the name of a column of the table which is being queried. This column name must be specified as one of the attribute elements of the external_schema element of the same access specification. This attribute is required. v condition The value of the condition attribute specifies the type of constraint the column value must meet. This attribute is required. Valid values are equal, atLeast, atMost, and like. v value The value of the value attribute is compared to the specified column's value to determine if the search constraint is met. This attribute is required. Its value is a literal of the SPL data type specified by the type attribute. v type The value of the type attribute is the SPL data type of the value attribute. This attribute is required. Valid values are int32, int64, float32, float64, and rstring. Parameter_condition element: The parameter_condition element specifies a search constraint on the query declared by its parent tablequery element. A tablequery element can have zero or more parameter_condition elements. The constraint consists of the column of the database table to which it applies, the condition that must be met, and the value to which to compare the column's value. This value is specified as a parameter in the SolidDBEnrich operator declarations

28

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

that use this access specification. For each parameter_condition element, the access specification must have a corresponding parameter element to establish the association between the parameter_condition element and a parameter in the SolidDBEnrich operator declaration. The parameter_condition element has two attributes. v column The value of the column attribute is the name of a column in the table that is being queried. This column name must be specified as one of the attribute elements of the external_schema element of the same access specification. This attribute is required. v condition The value of the condition attribute specifies the type of constraint the column value must meet. This attribute is required. Valid values are equal, atLeast, atMost, and like.

Parameters element
The parameters element provides the linkage between a parametrized query, statement, or tablequery element in the same access specification and parameters in ODBCEnrich, ODBCRun, ODBCSource, or SolidDBEnrich operator declarations that use the access specification. This element declares the names for the parameters in the operator declarations and provides information about the values those parameters can have, for example, their datatypes. Here is an example of an abridged access specification containing a parameter element; it extends the example in Tablequery element on page 27.
<access_specification name="bargainIndexRatings"> <tablequery tablename="ratings"> <parameter_condition column="symbol" condition="equal" /> <parameter_condition column="type" condition="equal" />

</tablequery> <parameters> <parameter name="ticker" type="rstring" /> <parameter name="ttype" type="rstring" /> </parameters>

</access_specification>

The parameters element has no attributes, only parameter elements. The number and order of the parameter elements depend on the contents of the query, statement, or tablequery element in the same access specification. v If the SQL SELECT statement in a query element, or the user-defined SQL statement in a statement element, contain ODBC parameter markers ("?"), the access specification must contain a parameter element for each ODBC parameter marker. The parameter elements and ODBC parameter markers are associated in lexicographic order; that is, the first ODBC parameter marker in the SELECT statement is associated with the first parameter element, and so on. v If an access specification contains a tablequery element that has parameter_condition elements, it must also contain a parameter element for each parameter_condition element. A parameter element is associated with the parameter_condition element at the same ordinal position within its parent feed element as the parameter element is within its parent parameters element.

Chapter 2. How to use the Database Toolkit

29

Parameter element: The parameter element declares the name for an access specification dependent parameter of an Database Toolkit operator declaration. It also provides information about the values that the parameter can have. The parameter element has five attributes. v name The value of the name attribute specifies the name by which this parameter will be identified in an Database Toolkit operator declaration. This attribute is required. v type The value of the type attribute specifies what SPL data type the value(s) of the parameter in an Database Toolkit operator declaration must have. This attribute is required. v default The value of the default attribute specifies the value that this parameter will have if none is specified in an operator declaration. The attribute's value is a literal of the SPL data type specified by the type attribute. This attribute is optional. v length The value of the length attribute is the maximum length of a parameter value whose SPL data type is rstring. This attribute is ignored for other data types. This attribute is optional. v cardinality The value of the cardinality attribute is the number of values for this parameter that must be specified in an operator declaration. Valid values for this parameter are -1, 0, or any positive integer. The special value -1 indicates that this parameter must have one or more values specified in an operator declaration. This attribute is optional; if omitted, its value is 1.

Uses_connection element
A uses_connection element identifies a connection specification that can be used with the access specification. Here is an example of an abridged access specification containing uses_connection elements.
<access_specification name="PersonSink"> . . <uses_connection connection="testSysten" /> <uses_connection connection="productionSystem" /> . . . </access_specification>

The uses_connection element has a single attribute, connection, whose value is the name of a connection_specification element in the same connection specifications document. An access_specification element must have at least one uses_connection element.

External_schema element
The external_schema element specifies the schema of the data received from or sent to an external data service.

30

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Here is an example of an abridged access specification containing an external_schema element; it extends the example in Query element on page 25.
<access_specification name="PersonRemainder"> <query query="SELECT id, fname, lname FROM personsrc" isolation_level="READ_UNCOMMITTED" replays="0" /> . . . <external_schema> <attribute name="id" type="int32" /> <attribute name="fname" type="rstring" length="15" /> <attribute name="lname" type="rstring" length="20" /> </external_schema> </access_specification>

The external_schema element has no attributes, only attribute elements. The number and order of the attribute elements depend on the contents of the query, table, or tablequery element in the same access specification. v If an access specification contains a query element, its external_schema element must have an attribute element for each column in the result set of the SQL SELECT statement in the query element. The attribute elements must be in the same order as the columns of the result set. v If an access specification contains a table element, its external_schema element must have an attribute element for each column in the database table named by the tablename attribute of the table element. The attribute elements must be in the same order as the columns of that table. v If an access specification contains a tablequery element, its external_schema element must have an attribute element for each column named in a parameter_condition or static_condition element of the tablequery element, as well as each column that will be assigned to an output stream attribute in a SolidDBEnrich operator that uses this access specification. Since the solidDB SA API accesses columns based on their names, the order of the attribute elements is not significant. Important: It is a common mistake for the information in external_schema elements to be inconsistent with that of the external data service. The SPL compiler cannot flag these inconsistencies at compile time, but the Database Toolkit operators have internal checks that might result in a runtime failure, captured in the processing element logs. For information about tracing and logging, see the IBM Streams Processing Language Streams Debugger Reference. Attribute element: The attribute element specifies information about a database table column from an external data service. The attribute element has three attributes. v name The value of the name attribute specifies the identifier by which a table column is known in an external data service. The Database Toolkit operators use these identifiers exactly as specified to access data in the external service; for example, the operators do not change the case of the identifiers. This attribute is required. v type The value of the type attribute specifies the SPL data type that a table column will map to as an SPL attribute. Note that the value is not a type from the native type system of the external data service. This attribute is required.
Chapter 2. How to use the Database Toolkit

31

Note: The SPL data types supported by a specific database is limited to the data types that are actually supported by that database. The following table lists the valid type values (and their corresponding ODBC types) for the access specifications used by ODBCAppend, ODBCEnrich, ODBCRun, ODBCSource, DB2SplitDB, and DB2PartitionedAppend operators.
Table 6. SPL to ODBC type mapping SPL type value rstring int8 int16 int32 int64 float32 float64 boolean ODBC type SQLCHAR SQLCHAR SQLSMALLINT SQLINTEGER SQLBIGINT SQLREAL SQLDOUBLE SQLCHAR

The following table lists the valid type values (and their corresponding solidDB types) for the access specifications used by the SolidDBEnrich operator.
Table 7. SPL to solidDB type mapping SPL type value float32 int32 int64 float64 rstring solidDB SA type Float Int Long Double Str

v length The value of the length attribute specifies the maximum length of a table column whose SPL type is rstring. The length value is representative of the size of the column in the database. If the length value specified is smaller than the size of the database column, the data might be truncated. The length attribute is required for type rstring and is ignored for all other data types. v key The key attribute specifies whether the attribute is a key field in the table. This attribute is optional. If specified, the only valid value is true. This attribute is only used for the DB2SplitDB operator and is ignored for all other operators.

32

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Chapter 3. Known issues and restrictions


The following table lists the problems that were encountered with the versions of databases and ODBC that are supported by the toolkit. Future versions of these databases and ODBC might resolve these problems.
Table 8. Database Toolkit known issues and restrictions Issue Data is truncated or lost using the ODBCAppend operator configured to do batch inserts. DB/OS/Driver DB - SQLServer OS - All 64-bit versions of Streams-supported operating systems. Driver - UnixODBC driver, using the Free TDS client driver. ODBC libraries for Oracle do DB - Oracle not support the SQL_C_SBIGINT ODBC type, which corresponds OS - All operating systems to the int64 SPL type. Driver - All drivers Issue with int32/SQLINTEGER type during batch inserts SQLINTEGER are defined as 8 bytes instead of 4 bytes. DB - Oracle OS - All operating systems Driver - All drivers There is no workaround. Users using an Oracle database should not use the SPL int64 type in their applications. Workaround/Resolution The problem appears to be with the Free TDS client driver. Tests were successful using an alternate driver (EasySoft).

The default code generation for an ODBCAppend operator configured to use Oracle and transaction_batchsize > 1 is to define an int32 schema type as the C type SQLLEN. If this causes your application to insert an incorrect number of bytes for your int32 field, define the environment variable STREAMS_ADAPTERS_ODBC_ORACLE_SQLINTEGER=1 and recompile the application. This will generate code for int32 schema types as a C type of SQLINTEGER instead. When using Informix, if an application requires multiple ODBCAppend operator instances writing to the same table in parallel, set transaction_batchsize=1 for the table element being used by the ODBCAppend operator in the connections.xml file. It is recommended that users use UnixODBC 2.3 or greater with Database Toolkit applications.

Locking issues can occur when running multiple ODBCAppend operators writing to the same table when transaction_batchsize > 1. Several issues with using UnixODBC version 2.2.

DB - Informix OS - All operating systems Driver - All drivers DB - All DBs OS - All operating systems Driver - UnixODBC 2.2

| | | | | | | | | | | | | |

Rowsets greater than one result DB - MySQL in only the first row of the OS - All operating systems rowset being inserted when using MySQL ODBC connector ODBC Connector version 5.1.7 or version 5.1.7 or earlier. earlier DB - Oracle Oracle is not currently supported on Red Hat Enterprise Linux Version 6. The OS - RHEL6 Oracle client is needed to Driver - All drivers interface with the Oracle database.
Copyright IBM Corp. 2009, 2012

Use the MySQL ODBC Connector driver version 5.1.8 or later when using the MySQL database in an application.

This is a current restriction for users using an Oracle database.

33

Table 8. Database Toolkit known issues and restrictions (continued) Issue DB/OS/Driver

Workaround/Resolution This is a current restriction for users using IBM POWER7 systems.

| | | | | | | |

DB - Oracle On IBM POWER7 systems running Red Hat Enterprise Linux Version 6, IBM solidDB, DB - IBM solidDB Netezza , and Oracle databases DB - Netezza are not supported. OS - RHEL6 Driver - All drivers

34

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Chapter 4. Connection setup and debug


While setup of your external data source and ODBC configuration is outside the scope of the Database Toolkit documentation, the Database Toolkit provides a program that can be used to help find setup and configuration issues. The source for this program, called odbchelper, is provided, along with the Makefile, which allows you to build the program using your own version of UnixODBC. Note: The odbchelper program is provided to help debug connection and configuration issues. It is not intended to be used in an application.

Building odbchelper
About this task
Before using the odbchelper program, you must create the program from source using the following procedure.

Procedure
1. Create a new directory. For example, you can create a directory in your home directory.
mkdir $HOME/odbchelper

2. Copy the odbchelper source and Makefile to the directory.


cp -R $STREAMS_INSTALL/toolkits/com.ibm.streams.db/etc/odbchelper/* $HOME/odbchelper

3. Go to the odbchelper directory.


cd $HOME/odbchelper

4. Set the environment variables STREAMS_ADAPTERS_ODBC_INCPATH and STREAMS_ADAPTERS_ODBC_LIBPATH to the locations where the UnixODBC include and library files are located, respectively.
export STREAMS_ADAPTERS_ODBC_INCPATH=$HOME/unixodbc/include export STREAMS_ADAPTERS_ODBC_LIBPATH=$HOME/unixodbc/lib

Note: If you use the operating-system-supplied UnixODBC package, you do not need to set these environment variables. 5. Run make to build the odbchelper program.
make

Using odbchelper
The odbchelper program has several action flags that can be used. v help: Displays the options and parameters available. v testconnection: Tests the connection to an external data source instance with a user ID and password. v runsqlstmt: Runs an SQL statement, either passed in on the command invocation, or in a specified file. v runsqlquery: Runs an SQL query, either passed in on the command invocation, or in a specified file. The results of the query are returned to STDOUT. v loaddelimitedfile: Allows you to pass in a comma-delimited file, used to create and populate a database table.
Copyright IBM Corp. 2009, 2012

35

To use one of the action options, invoke odbchelper followed by the action option (and any additional parameters required). For example,
$HOME/odbchelper/odbchelper help $HOME/odbchelper/odbchelper runsqlstmt -i myinstance -u myuserid -p mypassword -stmt DROP TABLE MYTABLE

A common use of odbchelper is to test that the external data source information in the connections.xml file is correct for your external data source setup. The testconnection action flag allows you to do this. For example, if the connection_specification portion of the connection.xml file is:
<connection_specification name="mydatabaseconnection" > <ODBC database="mydatabase" user="myuserid" password="mypassword" /> </connection_specification>

You run the following odbchelper invocation to test the connection:


$HOME/odbchelper/odbchelper testconnection -i mydatabase -u myuserid -p mypassword

36

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Chapter 5. DB2 partition layout and debug


The Database Toolkit provides a program that can be used to determine the number of partitions in the database. The source for this program, called db2helper, is provided, along with the Makefile, which allows you to build the program using your own version of DB2 installation libraries. This program gives you the partition information you require to set up your application. For example, the number of DB2PartitionedAppend operators to include in the application, whether partitions are logical or configured on different physical nodes and so on.

Building db2helper
About this task
Before using the db2helper program, you must create the program from source using the following procedure.

Procedure
1. Create a new directory. For example, you can create a directory in your home directory.
mkdir $HOME/db2helper

2. Copy the db2helper source and Makefile to the directory.


cp -R $STREAMS_INSTALL/toolkits/com.ibm.streams.db/etc/db2helper/* $HOME/db2helper

3. Go to the db2helper directory.


cd $HOME/db2helper

4. Set the environment variables STREAMS_ADAPTERS_DB2_INCPATH and STREAMS_ADAPTERS_DB2_LIBPATH to the locations for the DB2 client include and library files, respectively.
export STREAMS_ADAPTERS_DB2_INCPATH=/opt/ibm/db2/V9.7/include export STREAMS_ADAPTERS_DB2_LIBPATH=/opt/ibm/db2/V9.7/lib64

5. Run make to build the db2helper program.


make

Using db2helper
The db2helper program has several action flags that can be used. v help: Displays the options and parameters available. v testconnection: Tests the connection to an external data source instance with a user ID and password. v partitionconfig: Lists the partition configuration for the external DB2 data source instance. To use one of the action options, invoke db2helper followed by the action option (and any additional parameters required). For example,
$HOME/db2helper/db2helper help $HOME/db2helper/db2helper partitionconfig -i myinstance -u myuserid -p mypassword -table tablename [-key keyvalue]

Copyright IBM Corp. 2009, 2012

37

38

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Chapter 6. Sample applications


The Database Toolkit contains a set of simple sample applications illustrating how to use the operators. In the $STREAMS_INSTALL/toolkits/com.ibm.streams.db/ samples directory, there are four subdirectories named for each of the four operators. Each of these directories contains an SPL source file for the sample application, a Makefile, an info.xml file, and three subdirectories named data, etc, and .settings. The etc subdirectory contains a connections.xml document that is used by the application. Before using the samples, make sure that Streams is installed and that the STREAMS_INSTALL environment variable is set to the Streams install directory. You should also set the appropriate toolkit environment variables for the database that you will be using to run the sample. See Chapter 2, How to use the Database Toolkit, on page 3 for instructions on setting the necessary compile-time environment variables for the database you are using. Each sample contains two files, Setup.sql and Cleanup.sql. The Setup.sql file contains SQL statements for creating and populating the table used by the sample. The Cleanup.sql contains SQL statements for removing the table. You should refer to your database's specific documentation for details on how to run SQL statements for your particular database. Alternatively, you can use the odbchelper program to run the SQL commands in these files, using the odbchelper runsqlstmt action flag. For more information, see Using odbchelper on page 35.

Working with the samples in the command-line environment


About this task
Create your own copy of the samples prior to compiling and running them, as you will need to make modifications to some of the configuration files.

Procedure
1. Create a new directory. For example, you can create a directory in your home directory.
mkdir $HOME/dbsamples

2.

Copy the samples to this directory.


cp -R $STREAMS_INSTALL/toolkits/com.ibm.streams.db/samples/* $HOME/dbsamples/

What to do next
Update the database configuration information needed to access the external data source. For more information about updating the database configuration, see Updating database configuration information.

Updating database configuration information


Before you begin
Create a copy of the sample applications.
Copyright IBM Corp. 2009, 2012

39

About this task


After you create a copy of the sample applications, you need to update the connections.xml file with the database configuration information needed to access the external data source.

Procedure
1. Open the connections.xml file in your favorite editor, for example, emacs, and find the following section. For example, if you want to compile and run the ODBCSource sample application, you would need to open the $HOME/dbsamples/ODBCSource/etc/connections.xml file.
<connection_specification name="DBPerson"> <ODBC database="replace-with-database-name" user="replace-with-userid" password="replace-with-password" /> </connection_specification>

2. Update the database configuration information: a. Replace replace-with-database-name with the name of the database you are connecting to. b. Replace replace-with-userid and replace-with-password with the userid and password you are using to connect to the database. c. For the SolidDBEnrich sample, replace replace-with-host, replace-with-port, replace-with-userid, and replace-with-password. For more information about these values, see Connection_specification Element on page 23. d. For the DB2ParallelWriter sample, replace replace-with-schema with the DB2 database schema name for your database.

What to do next
After you have updated the connections.xml file, you can perform any of the following tasks: v Run make in the application directory. By default, the sample is compiled as a distributed application. v To compile the application as a stand-alone application, run make standalone. v To remove all the generated files and return the sample to its original state, run make clean.

Working with the samples in Streams Studio


About this task
To import the sample application into Streams Studio, you must first add the Database Toolkit to the Toolkit Locations section. You need to add the toolkit location only once.

Procedure
1. Add the toolkit location: a. From the Streams Explorer, right-click Toolkit Locations. b. Select Add Toolkit Location. c. Enter the directory or click Directory to select the install location of the Database Toolkit, and click OK. The Database Toolkit is located in the $STREAMS_INSTALL/toolkits/com.ibm.streams.db directory. 2. Import the sample application:

40

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

a. Click File > Import. b. Expand the InfoSphere Streams folder, and select SPL Project. c. Enter the directory or click Browse to select the directory of the sample you wish to import, and click Finish.

What to do next
Update the database configuration information needed to access the external data source. For more information about updating the database configuration, see Updating database configuration information.

Updating database configuration information


Before you begin
Import the sample application into Streams Studio.

About this task


After you import the sample application project, you need to update the connections.xml file with the database configuration information needed to access the external data source.

Procedure
1. In the Project Explorer, under the project for the sample you imported, expand Resources and etc. 2. Open the connections.xml file in the Eclipse editor, and find the following section.
<connection_specification name="DBPerson"> <ODBC database="replace-with-database-name" user="replace-with-userid" password="replace-with-password" /> </connection_specification>

3. Update the database configuration information: a. Replace replace-with-database-name with the name of the database you are connecting to. b. Replace replace-with-userid and replace-with-password with the userid and password you are using to connect to the database. c. For the SolidDBEnrich sample, replace replace-with-host, replace-with-port, replace-with-userid, and replace-with-password. For more information about these values, see Connection_specification Element on page 23. d. For the DB2ParallelWriter sample, replace replace-with-schema with the DB2 database schema name for your database.

What to do next
After you have updated the connections.xml file, you can use Streams Studio to build and run the application. For more information about using Streams Studio, see the IBM InfoSphere Streams: Studio Installation and User's Guide.

Chapter 6. Sample applications

41

42

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Notices
This information was developed for products and services offered in the U.S.A. Information about non-IBM products is based on information available at the time of first publication of this document and is subject to change. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country/region where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions; therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Copyright IBM Corp. 2009, 2012

43

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information that has been exchanged, should contact: IBM Canada Limited Office of the Lab Director 8200 Warden Avenue Markham, Ontario L6G 1C7 CANADA Such information may be available, subject to appropriate terms and conditions, including, in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement, or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems, and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements, or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information may contain examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious, and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs, in source language, which illustrate programming techniques on various operating platforms. You may copy,

44

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided AS IS, without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any derivative work must include a copyright notice as follows: (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. All rights reserved.

Trademarks
IBM, the IBM logo, ibm.com and InfoSphere are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/ copytrade.shtml. The following terms are trademarks or registered trademarks of other companies v Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. v Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. v UNIX is a registered trademark of The Open Group in the United States and other countries. v Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other product and service names might be trademarks of IBM or other companies.

Notices

45

46

IBM InfoSphere Streams Version 2.0.0.4: Database Toolkit

Printed in USA

Vous aimerez peut-être aussi