Vous êtes sur la page 1sur 194

Course Guide

Programming for IBM InfoSphere Streams


v4 with SPL
Demonstrations, Exercises and Exercise
Solutions
Course code DW724 ERC 1.0

IBM Training

Preface

September, 2015
NOTICES
This information was developed for products and services offered in the USA.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any
non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
United States of America
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in
certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these
changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the
program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of
those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information
concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available
sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the
examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and
addresses used by an actual business enterprise is entirely coincidental.
TRADEMARKS
IBM, the IBM logo, InfoSphere and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in
many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is
available on the web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Copyright International Business Machines Corporation 2015.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

P-2

Streams Processing Language Development

Streams Processing
Language Development

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-2

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

Demonstration 1
Streams Development Environment

In this demonstration, you will:

Test your development environment


Create a new Streams project
Compile and run a Streams program using

The Eclipse development environment

The command line

View the results

Streams Processing Language Development

Copyright IBM Corporation 2015

Demonstration 1: Streams Development Environment

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-31

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

Demonstration 1:
Streams Development Environment
Purpose:
This demonstration allows the student to work with the Eclipse development
environment when working with an InfoSphere Streams Processing Language
program. The student will invoke the Streams Processing Language compiler
using the command line as well.
Estimated Time: 45 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Task 1. Log on to your system.


Your lab image operating system is RHEL6.5. Once your Linux image has booted,
log in with a userid of student and a password of ibm2blue.
It will be necessary to make sure that you have a valid hostname for your system
and that your hostname points to the correct ip address. This is a critical step.
1. Open a Linux terminal window. Right click on the desktops background and
choose Open in Terminal.
2. Execute the following command in the terminal:
xhost +
3. Switch your user to root. (The password will be password)
su password
4. Execute ifconfig.
5. Make a note of the ip address next to inet addr:
6. Edit the /etc/hosts file.
gedit /etc/hosts &
7. Change the ip address for the following line to the value from the ifconfig.
192.168.1.1 is used below as just an example of an ip address you will use
your actual ip.
192.168.1.1 streamshost.localdomain streamshost
localhost.localdomain localhost
8. Save the file. Close gedit.
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-32

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

9.

Type in exit to switch back to student.

Important: Each time you restart your image after it has been suspended, check the
ip address to make sure that it did not change. If it did change, update /etc/hosts
using the above instructions to reflect the new ip address

Task 2. Start up the Streams Domain and Instance.


EVERY time your Streams image is rebooted, you need to start up your Streams domain
and instance. There are multiple ways to do this:
Start the domain and instance by using streamtool commands on the Linux
command line.
Open the Domain Manager GUI application and start the domain. Then open
the Streams Web console for that domain and start the instance.
Create a script that contains the streamtool commands and either manually
run it or have Linux run it for you.
For this course, a script has already been created for you that when invoked runs
multiple streamtool commands. If you would like to look at the script, it is located in
/home/student/streams_reset.sh. You will notice the script does multiple things:
In case your Streams domain and instance were not elegantly shutdown when
Linux was last rebooted, there are streamtool commands to force the domain
and instance to shutdown.
The script starts up your Streams domain and your Streams instance (in this
case the domain is named "StreamsDomain" and the instance is named
"StreamsInstance")
A launcher has been placed on the student desktop called "Streams - Start Domain". When
this launcher is double clicked, the above script is run. You will see a Linux console
open up and see various log messages as your domain and instance are started.
Once the script completes, the Linux console will close and you can continue on.
1. Click the "Streams - Start Domain" desktop launcher icon. Wait for the script
to complete running.
Important: Each time you restart your image you MUST start your domain and
instance. If you do not, you will not be able to connect to your domain and instance
from any of the client tools such as Streams Studio and Streams Web Console.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-33

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

Task 3. Eclipse development environment.


1.
1.

2.

Click on the Streams Studio icon on the desktop.


When prompted for a workspace, take the default by clicking the OK
pushbutton. (The workspace should be
/home/student/StreamsStudio/workspace.)
When prompted for a Secure Storage password, enter ibm2blue and click
OK.

Note: If you receive a warning about the Streams domain refreshing and/or a toolkit
file missing, ignore them - Streams is looking for a specific toolkit directory which
you will create in a later lab.

Eclipse Perspectives
Eclipse perspectives allow you to define frames or views that allow you to
address your Eclipse requirements.
3. When you started Eclipse, the last perspective used is displayed. Lets see how
we got there. In the menubar select Window->Open Perspective->Other.
4. Select the InfoSphere Streams (default) perspective and click OK.
5. The Eclipse interface has a number of views or frames. You can resize any of
the views. If you double-click on the titlebar for a particular view (it will also have
icons in the right hand corner for minimizing and maximizing), it will either
maximize the view or restore it to the size that it was.

Task 4. First contact with a Streams application.


1.
2.
3.

4.

In the Project Explorer view, expand the project called Introduction. Then
expand the Resources folder.
Right click on Main.spl and select Open With->SPL Editor.
Look at the first operator. This TCPSource operator acts as a TCP server and
gets data from clients that connect to this TCP server. The format of the data is
described by the News stream schema. This TCPSource operator will emit the
data via a stream called News.
Next there is a Filter operator. This operator observes the stream emitted by
the TCPSource operator and searches through the summary attribute looking
for the character string Clinton. If the string exists, the tuple is emitted on the
output stream called NewsSearch.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-34

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

5.

At the bottom of the code, is the FileSink operator which writes out a file,
result.txt. Since only the file name is specified, the file is written into the data
directory that is under the project directory. (This is the default data location in
our Streams Studio project. Note that in a production environment this would
need to be changed to a location that is available on whatever resources your
job is running on that require data directory access)

Task 5. Using the SPL graphical editor.


1.

2.

3.

4.

In the Project Explorer view, right click again on Main.spl and this time select
Open With->SPL Graphical Editor. Listed are graphical representations of the
three operators.
In the Outline view, the yellow area indicates the portion of the application that
is currently displayed in the graphical editor. Since our applications will be
relatively simple examples, you can gain some additional real estate by closing
the Outline view.
Click on the News graphical operator. Then, at the bottom of the screen, click
on the Properties tab. (Actually you might want to maximize the Properties
view in order to see things more clearly.
Take a look at some of the properties.
a) Click the General tab. You can rename the schema or change the
operator.
b) Click the Output Ports tab. You can modify the schema.
c) Click the Param tab. You can add or remove parameters.

5.
6.
7.

Restore the Properties view.


Click on the arrow between News and NewsSearch. Look in the Properties
view. This is another place that you can modify the schema.
Click the Console tab to bring it into focus.

Task 6. Build your Distributed Application.


1.

2.

In the Project Explorer view, expand <default_namespace>. Then expand


Main. Listed are two items. The Distributed item allows you to build and launch
a distributed application. The Standalone item allows you to build and launch a
standalone application.
Right click on Distributed [Active] and select Build.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-35

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

3.

Click on the Console tab to see the build process.

In the lower right corner of the Eclipse window, you can see a status indicator
while the build is in process. (However, by the time you have read the previous
sentence, the indicator will have disappeared. But you now know to look for it in
the future.)

Task 7. Launch your Distributed Application.


Because you are running a distributed application, you must have a Streams
domain and instance that are started.
1. In the left-hand frame, click on the Streams Explorer tab. (It may just show the
word Streams.)
2. Expand Streams Domains and explore the sub tree. You will observe a
domain called StreamsDomain that has an instance within it called
Instances->StreamsInstance.
Note: What would you do if your Streams domain was not already started? You
could follow one of the steps previously mentioned, such as running the provided
script or using the GUI tools to start up the domain and instance. This generally
would be a task for a Streams administrator, however you should understand how to
do this also.
3.

4.

5.
6.
7.

Look under Streams Domains->StreamsDomain->Resources->Resource


Tags and observe the tags that have been applied to the one resource that will
be running your applications. This resource has been tagged with all the built in
Streams tags - it will run both the management services AND Streams
applications.
Expand Instances in the menu. Right click on
default:StreamsInstance@StreamsDomain and notice the menu options.
You could stop or start an instance from here. If your instance is not already
running (in stopped status), right click it and start it up now by selecting Start
Instance. The instance should now show that it is "running" default:StreamsInstance@StreamsDomain [Running]. Once the instance has
started, right click it and select the Show Instance Graph option.
Return to the Project Explorer.
Under the Introduction project, right click on Distributed [Active]. Select
Launch.
In the Edit Configuration dialog, look at the instance that is being used. The
default instance, which in your case is StreamsInstance@StreamsDomain, is
being used. Click Apply and then Continue.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-36

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

8.

Once again to see things more clearly, you might want to maximize the
Instance Graph view.
9. On the right side, under Color Schemes, you should see that Health is
checked. Click on the triangle to expand Health to see what the various colors
represent.
10. Move your mouse over the News operator. An information window is displayed.
Notice that your operator is healthy. It is just that no tuples have flowed. That is
the alert.
11. In the Color Schemes area, click on Flow[nTuples/s]. Expand it as well to see
what the colors represent.

Task 8. Show data.


1.

In the Instance Graph click on the output port of the NewsSearch operator and
then right-click. Select Show data.
2. Listed are the attributes for the schema. Click OK to select all attributes. If you
had maximized your Instance Graph view, then you will see that it gets
restored. Also a second Properties view has been opened. This is where your
data is to be displayed.
3. Expand the size of this new Properties view to see all of the columns in the
table.
4. Open a command window (click on the Red Hat Application icon (the red hat) at
the top of the screen and select Applications->System Tools->Terminal) or
use one that is already open.
5. Position the command window so that the Streams instance graph operators
are visible.
6. In the command window, change to the /home/labfiles directory.
cd /home/labfiles
7. Next, execute netcat in order to pass data to your Stream applications
TCPSource operator.
nc streamshost 1234 < news_ticker_nodelay.dat
8. Press CTRL-C if necessary to kill the command. Then you can press the up
cursor arrow to recall the previous command and press enter to execute it
again. Do this several times. Notice that when data is flowing, the colors of the
operators in the graph change to show the byte rate.
9. Look in the second Properties view to see the filtered data.
10. Either minimize or close the command window.
11. Return to the Eclipse IDE (make sure that it has the focus) and position the
cursor over one of the operators.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-37

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

12. You can also get specific port information by positioning the cursor over each
port.
13. In the Streams Explorer view, right click on Streams Jobs->0:Main_0
[Healthy] StreamsInstance@StreamsDomain. Select Metrics->Show
Metrics. Note that the number 0 in 0:Main_0 may vary on your image
depending on how many jobs you have run, etc...
14. In the Metrics tab area, drill down on
default:StreamsInstance@StreamsDomain->Main_0.
15. Click on any of the expanded items. In the right hand frame you can see some
metrics about that item.
16. Expand Main->News. Click Output[0]::News and you can see tuple
information.
17. Close the Metrics tab, the Instance Graph tab, and the Properties tab where
filtered data was displayed.
18. Now you need to stop your application. In the Streams Explorer, under
Streams Jobs, right-click on 0:Main_0 and select Cancel job.
19. Return to the Project Explorer, expand the data directory. (If the data folder
cannot be expanded, then right click on it and select Refresh.)
20. Double-click on result.txt.
Note: Sometimes when using a remote lab environment double-clicking an item
does not work. If that happens to you, select the item and press Enter and possibly
select Open.
(This is the file that was outputted by the FileSink operator.) If there is a
message that the resource is out of sync..., right click on result.txt and select
Refresh. Displayed should be the tuples where the name Clinton was found in
the summary attribute.
21. You can close any opened Editor tabs and contract your Introduction project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-38

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

Task 9. Command-line environment - Compile and invoke your


program from the command line.
You will be working with essentially the same application that you did in the
Introduction project except the data comes from a flat file that is located in the data
directory.
1. Open a command-line window, if one is not already opened. (You can click on
the Red Hat Applications icon->System Tools-> and select Terminal.)
2. Change to the CommandLineExercise directory.
cd ~/CommandLineExercise
3. Do a list directory.
ls
4. Listed is the Main.spl. This is essentially the same application with which you
worked in Eclipse but this application reads the source data from a flat file.
5. Starting in Streams version 4.0, the Streams compiler does not create a data
directory for you anymore. We will create one here.
mkdir data
6. Compile the program. At the command line execute the following:
sc -M Main
7. Do another list directory. Note that there are now additional directories and files
that were created.
8. Since this application was built as a distributed application, you must make sure
you have a Streams domain that is started, which contains an instance. Lets
check. Do so from the command line.
streamtool getdomainstate -d StreamsDomain
streamtool lsinstance -d StreamsDomain
9. Next submit your program by executing the following: (The bundle (.sab) file
was created when you built the application.) Notice how we use the C flag to
specify the location of our data directory.
streamtool submitjob d StreamsDomain i StreamsInstance
C data-directory=/home/student/CommandLineExercise/data
output/Main.sab
10. It will not take long to execute your program. List out the contents of the data
directory and you should see a new output file has been created.
ls data
11. Return to the Streams Explorer in the Eclipse IDE.
12. Expand the Streams Domains menu item.
13. Right click on StreamsDomain and select Open Streams Console.
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-39

U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t

14. If prompted, log in with a user id of student and a password of ibm2blue.


15. Click on Jobs at the top. Select your job and click Monitor Job from the right
hand menu.
16. Investigate some of the information that you can find about your job. When you
get bored, click on Jobs at the top of the screen again and click the Cancel job
link. Click OK to cancel the job.
17. Log out of the console and close the browser.
18. In the Streams Explorer refresh Streams Jobs. Note that your job is no longer
displayed.
Results:
This demonstration allowed you to work with the Eclipse development
environment while working with an InfoSphere Streams Processing Language
program. You invoked the Streams Processing Language compiler using the
command line as well.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

3-40

SPL Programming Introduction

SPL Programming
Introduction

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 4 SPL Programming Introduction

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-2

Unit 4 SPL Programming Introduction

Demonstration 1
Main Composite Operators
In this demonstration, you will:

Code a simple SPL application

Code a main composite operator

Define a namespace in an SPL Project

Explain how to build and launch an SPL program

SPL Programming Introduction

Copyright IBM Corporation 2015

Demonstration 1: Main Composite Operators

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-25

Unit 4 SPL Programming Introduction

Demonstration 1:
Main Composite Operators
Purpose:
This demonstration gives you a basic understanding as to how to code a main
composite operator using the SPL Graphical editor and how to launch both a
standalone and a distributed application.
This demonstration will accomplish several goals. One goal is to code an SPL
application. There are two ways to do that. One is to code each operator and
its parameters. The second, which will be the way that you will proceed, is to
use the SPL Graphical editor. The second goal is to cover some capabilities
that were not presented in the course material, that is, the design capabilities
of the SPL Graphical editor.
Estimated Time: 30 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Task 1. Create an SPL Project.


Put on your application designer hat
You will start by pretending that you are an application designer. You want a simple
application that generates a single Hello World message and that message is
written into a file.
1.
2.
3.
4.
5.

6.

If your Eclipse development environment is not open, double-click on the


Streams Studio icon on your desktop. Accept the default workspace.
Select File->New->Project. Expand InfoSphere Streams Studio and select
SPL Project. Click Next.
Type in a Project name of FirstProject. Click Next.
By default all the toolkits are selected. You and unselect any unneeded ones.
For this demonstration, none of the selected toolkits are needed. Click Finish.
Make use of the namespace capabilities of SPL. Create a namespace in your
project. In the Project Explorer right click on FirstProject. Select New->SPL
Namespace.
Type in a Namespace of sample.proj.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-26

Unit 4 SPL Programming Introduction

7.

Look at Folder path. Click on the drop-down. Listed are various ways to
implement this namespace. It can be implemented using a single directory
called sample.proj or it can be implemented as a multi-level directory,
/sample/proj. For this demonstration choose sample.proj. Then click Finish.
8. Expand FirstProject and then expand Resources.
9. Now create the source file that will contain the main composite. Right click on
FirstProject and select New->Main Composite.
Look at the namespace. Since only one is defined, it is selected by default.
10. Keep the default name of the Main Composite and the default File name. Click
Finish.
11. Now expand the sample.proj folder. (Not the namespace) You can see that
your source file was placed in that directory. Also look in the Edit view. You
have the beginning of your main composite. By default the SPL Graphical
editor is used.

Task 2. Design the Main Composite Operator.


Put on your application designer hat and design the main composite operator
that has an operator emit a tuple that has an attribute with a value of Hello
World. Then write that piece of data to a file.
1.

2.

3.
4.
5.

6.
7.

As a designer, you are not particularly familiar with all of the Streams operators.
But you do know that you want an operator that will generate the Hello World
message. In the Palette area, under Design, click Operator and drag it onto the
canvas. Drop it in Main.
You also know that you need a second operator to write the message into a file.
So drag a second operator and drop it in Main to the right of the first operator.
The Main composite area will automatically enlarge.
These two operators are to be connected via a stream. In the Palette view, click
Stream and drop it on the left side of op_1. A light gray square should appear.
Position the cursor over the small square. The square should turn green. Click
the left mouse button once.
Move the cursor to the left side of Op_2. A second green square should appear.
Click the left mouse button once. You have now defined an output port for Op_1
being connected to an input port for Op_2.
Now you want to define the composition of the message that is sent on the
stream. Select the Op_1 icon. It should turn aqua-blue.
Click on the Properties tab towards the bottom half of the screen. In the
Properties view, in the list of menu items on the left side, click Output Ports.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-27

Unit 4 SPL Programming Introduction

8.

9.

Scroll down and in the Output stream schema table area you already have an
attribute called, varName, defined. Overtype varName and change it to
message.
In the Type column, click on varType. A little light bulb should appear on the left
edge of the entry field. (You may have to do this twice. Go figure.) Make sure
that the varType is highlighted or that you have totally deleted varType from the
field. Once again, as a designer, you may not know all of the Streams data
types. So, while holding down the ctrl key, press the space bar. A list of data
types appears. Scroll down and double-click rstring.

Task 3. Annotate each operator.


1.
2.
3.
4.
5.

In the canvas area, right click Op_1. Select Annotations->Add Note.


Select the generated post-it note and in the Properties view type in a description
of This operator is to generate a single message consisting of Hello World.
In the canvas area, right click Op_2. Select Annotations->Add Note.
Select the generated post-it note. In the Properties view type a description of
This operator is to write the message to a file named result.txt.
Save your work.

Important: Each time you save your Streams application, a build is started to
compile the application; the progress messages are shown in the Console view (in
the SPL Build console). If you scroll back through the build progress messages, you
will see some builds that terminated on errors, with messages in red. This will be a
common occurrence while you are building up your application - often times there
will be errors in compilation until your operators have been built up to a working
state. By the time you get to the Build and Launch section of each lab, the
compilation errors should go away (if you have coded your application correctly).

Put on your application programmer hat


Now you are going to use your imagination (sorry, I know that it takes a lot of
energy) and pretend that some designer has handed you this FirstProject to code.
6. In the canvas, click on Op_1. As the programmer and having read the
annotation, you know that you want Op_1 to be a Beacon operator.
7. In the Properties view, select General. Click the Change pushbutton for
Operator. From the displayed list, select Beacon and click OK.
8. In the Description field, type This operator will only generate a single message.
9. On the left of the Properties, click Output Ports.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-28

Unit 4 SPL Programming Introduction

10. The stream schema is already defined. But you want a different stream name.
Click the Rename pushbutton for Output stream name. Specify a new name of
Hi and click OK.
11. On the left of the Properties, click Param. (You may have to scroll to find it.)
Click the Add pushbutton. Select Iterations and click OK.
12. In the Value column, type in 1u. (Unsigned value of 1)
13. On the left of the Properties, click Output. Expand Hi. For message : rstring,
give it a value of Hello World. Make sure to include the quotes.
14. In the canvas, click on Op_2. As the programmer and having read the
annotation, you know that you will want Op_2 to be a FileSink operator.
15. In the Properties view, select General. Click the Change pushbutton for
Operator. From the displayed list, select FileSink and click OK.
16. Click the Rename pushbutton. Change the name of this operator to sink.
17. In the Description field, type The result.txt file will be written to the default
data directory.
Since the input port of sink is connected to the output port of Op_1, the schema
definition is automatically passed. So no changes are needed here.
Note: Beginning with version 4.0, Streams applications do not have a default data
directory unless you explicitly set one in the build specification. Here, we are simply
taking advantage of a feature of Streams Studio, which will provide that specification
by default. It works, because we only have a single host. Because Streams is a
distributed system that does not require a shared file system, you have to be careful
when specifying file paths. A process accessing a file must run on a host that can
reach it; in general this means specifying absolute paths and constraining where a
particular process can run; using relative paths and a default data directory makes
the application less portable.
18. On the left of the Properties, click Param. The file parameter is required, so it is
already displayed. Change parameterValue to "result.txt". Include the
quotation marks.
19. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-29

Unit 4 SPL Programming Introduction

20. Click on the Console tab to display the Console view. The console view might
still be the Streams Studio Console. To switch to the SPL Build console, click
the down arrow for the screen icon that is fourth from the right on the Console
titlebar and select SPL Build.

Note that by just saving your work, your program was automatically built.
21. In the Project Explorer view, right click on Main.spl. Select Open With->SPL
Editor to see the generate SPL code. Any changes made to the code are
reflected in the SPL Graphical editor as well.

Task 4. Build and Launch a Standalone Application.


1.

2.

3.
4.

5.
6.
7.

In the Project Explorer under FirstProject, expand sample.proj namespace.


(The one with the letter N in a purple circle.) Then expand Main. By default,
distributed applications are built. Also Distributed is labeled as Active. This
means that when the program is saved, it is a distributed application that gets
built. Right click on Main and select New->Standalone Build. Click OK on the
configuration dialog.
Now you have the option to build your program as a standalone application as
well. Right-click on Standalone and select Set active. Now when you update
your code and do a save, it will be a standalone application that gets built. (If
you still have the Console view displayed, then you saw that your application
was rebuilt.)
Right click on Standalone and select Launch. Click the Apply pushbutton and
then click Continue.
The Beacon operator emitted a final punctuation after it emitted its last tuple.
When a standalone application sees a final punctuation, it terminates. This does
not happen with distributed applications. They have to be canceled in order to
terminate.
Right click on the Resources->data folder and select Refresh.
Expand the data folder and double-click on result.txt to open it. Close
result.txt.
Right click on result.txt and select Delete.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-30

Unit 4 SPL Programming Introduction

Task 5. Launch a Distributed Application.


1.

2.
3.

4.

5.
6.

7.
8.
9.

Before you can launch a distributed application, your Streams domain and
instance must be started. We will check to make sure these are started. Click on
the Streams Explorer tab.
Expand Streams Instances. If your instance is running, you will see something
similar to: default:StreamsInstance@StreamsDomain [Running, hh:mm:ss]
If for some reason your instance was not running, you could right click on the
instance here and select Start Instance. (You might want to switch your console
window to Streams Studio Console to verify that the instances started
correctly. (You can also scroll the Streams Explorer to the right to view the
status of your instance.)
Return to the Project Explorer. Under FirstProject right click on Distributed
and select Launch. (The distributed application was built when you saved you
work. Remember, it was originally set to being active.)
Click the Apply pushbutton and then the Continue pushbutton. Click OK on the
Warning: Launch dialog.
Right click on data and select Refresh. Drill down and open the result.txt file.
You should see the same results as you saw with the standalone application.
Close result.txt.
Since this is a distributed application, you must terminate the running
application. Return to the Streams Explorer and expand Streams Jobs.
Right click on your job and select Cancel job.
Return to the Project Explorer. Close your opened tabs and contract your
FirstProject project.

Results:
This demonstration gave you a basic understanding as to how to code a main
composite operator using the SPL Graphical editor and how to launch both a
standalone and a distributed application.
This demonstration accomplished several goals. One goal was to code an
SPL application. There were two ways to do that. One was to code each
operator and its parameters. The second, which you did, is to use the SPL
Graphical editor. The second goal was to cover some capabilities that were
not presented in the course material, that is, the design capabilities of the SPL
Graphical editor.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

4-31

Adapter Operators

Adapter Operators

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 5 Adapter Operators

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-2

Unit 5 Adapter Operators

Demonstration 1
Source and Sink Type Operators

In this demonstration, you will:

Write your first Streams program to use Source type and Sink type
operators to copy a file

Code a FileSource operator

Code a FileSink operator


Use export and import to exchange streams between two applications

Adapter Operators

Copyright IBM Corporation 2015

Demonstration 1: Source and Sink Type Operators

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-42

Unit 5 Adapter Operators

Demonstration 1:
Source and Sink Type Operators
Purpose:
This demonstration has the student code a FileSource operator to read from a
file and then use a FileSink operator to externalize the data.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 30 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Graphical Editor
Task 1. Requirements for Source and Sink operators.
Here are the requirements for your source and sink operators. This portion of the
demonstration will step you through using the SPL Graphical editor. If you desire,
you can use the SPL Editor to actually code the operators.
Use a FileSource operator to read the file /home/labfiles/stock_report_nodelay.dat
that is in a csv format. It will emit a stream called StockReport. This stream will be
observed by a FileSink operator. That operator will write the tuple data to a file
called copyfile.dat that will be located in the default data directory created by
Streams Studio. This file will also be in a csv format.
If you choose to code the operators then proceed to the demonstration instructions
for coding operators portion of this demonstration.
1. If your Eclipse development environment is not open, double-click on the
Streams Studio icon on your desktop. Accept the default workspace. Enter
ibm2blue as the secure storage password if prompted.
2. In Eclipse create an SPL application project by clicking File->New->Project.
Expand InfoSphere Streams Studio. This time, select SPL Application
Project. Click Next.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-43

Unit 5 Adapter Operators

3.
4.
5.

Creating an application project allows you to create a project, a namespace,


and the Main composite at the same time.
Name the project SourceSink. Take the default of application for the
namespace and SourceSink for the Main composite. Click Next.
This application is not dependent on any toolkits, so uncheck Toolkit
Locations. Click Finish.

Task 2. Using the graphical editor.


1.
2.
3.
4.

In the Palette, expand Toolkits->spl->spl.adapter.


Drag FileSource to the Main composite.
Select the FileSource_1 operator.
In the Properties view, click Output Ports.
a) Rename the Output stream name to StockReport.
b) Define the Output stream schema as specified below. The order of the
attributes is important.
ticker - rstring
tradeDate - rstring
closingPrice - rstring
volume - rstring

5.

In the Properties view, click Param.


a) Click the Add pushbutton. Select file and format. Click OK. You should
now have both file and format parameters listed.
b) The input file is not located relative to the data directory. So you will have to
specify an absolute address. "/home/labfiles/stock_report_nodelay.dat"
c) For format, type csv. (If you did not know the different options, you could
use the ctrl-space technique to get a list. Where ever you see the tiny light
bulb icon, you can use ctrl-space option.)

6.
7.

Drag FileSink to the SourceSink main composite.


Connect the output port of FileSource operator to the input port of FileSink
operator.
8. Select the FileSink operator.
9. In the Properties view, click Param.
10. For the file parameter, set it to "copyfile.dat". Make sure to include the
quotation marks.
11. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-44

Unit 5 Adapter Operators

Task 3. Build and Launch your application.


Note: In this demonstration and any of the following demonstrations, except where
specifically noted, you can either build and launch distributed applications or
standalone applications to test your demonstrations. It is your choice. If you decide
to build distributed applications, then remember to make sure the Steams domain
and instance are running and then launch your application.
If you build and run standalone applications, and if you are using a FileSource or a
Beacon operator that is just doing a normal file read (not a hotfile read) or emitting a
set number of tuples, then your application will terminate automatically when all
input records have been exhausted. If you are reading from some other source type
operator, to terminate the running application, click on the Console tab that is in the
bottom view of the Eclipse window. (Make sure you are viewing the Streams Studio
Console.) It will show that your program is running. Click on the red square to
terminate your program.
All demonstrations are set up so that if the output is written to a file, that file will be
located in the default data folder that is defined under your SPL project within
Streams Studio. You may have to right-click on the data folder and select Refresh
in order to see your output file.
1.

In the Project Explorer, under SourceSink expand the application


namespace. Then expand SourceSink.
By default in the IDE your project is initially set to build a distributed application. If
you wish to work with a distributed application, then right-click on Distributed
and select Launch.
2. If you wish to work with a standalone application, right-click on SourceSink and
select New->Standalone Build. Click OK. You might want to display the SPL
build in the console.
3. Right-click on Standalone and select Set active.
4. Right-click on whichever application type is marked as active and select
Launch.
5. Click Apply on the Edit Configuration dialog. Then click Continue.
6. In the Project Explorer view, expand Resources and refresh the data folder.
Expand it and double-click on copyfile.dat.
7. Close any opened tabs in the Editor view.
8. Also, contract the SourceSink project as well.
9. If you ran the application as distributed, go to the Streams Explorer tab, find
the running job, right click and cancel the job.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-45

Unit 5 Adapter Operators

Task 4. Export and Import.


Lets take a look at the capability of one Streams application passing stream data to
another through the use of export and import. This will require that you code two
applications. And since the schema of the exported stream and the schema of the
imported stream must be the same, you will create a third file that defines the
shared schema as a type.

Export
1.

In Eclipse create a new SPL project by clicking File->New->Project. Then


select SPL Project. Click Next.
2. Name the project ExportImport. Click Finish.
For this project, you are not going to use a namespace. (No particular reason.
Just to show you that you can.) First you are going to create a file that only
contains a schema definition defined as a type. You will then reference this type
from both of your export and import programs.
3. Right click on ExportImport and select New->SPL Source File.
4. Uncheck Generate Main composite.
5. Change the File name to Common.spl. Then click Finish.
6. The Common.spl file is automatically opened in the graphical editor. (But since
it is not a Main composite, you do not see the dotted-lined box.)
7. Double click the canvas to open the Properties editor. In the Properties view,
click Types. Graphically you are going to code the equivalent of the following
type definition.
type NewsTicker = tuple<rstring agency, rstring category,
rstring summary>;
8. Click Add New Type. Give it a name of NewsTicker.
9. Add the three attributes (agency - rstring, category - rstring, summary - rstring)
and make sure that they are in the order defined above.
10. Save your work.
11. Right click on ExportImport and select New->SPL Source File. Set the Main
Composite name to be MyExport and click Finish.
Use the TCPSource operator to get access to the data.
12. In the Palette, expand Toolkits->spl->spl.adapter. Drag TCPSource into the
MyExport composite.
13. Also drag the Export into the MyExport composite.
14. Connect the two ports.
15. Click on the TCPSource operator and then click the Properties view.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-46

Unit 5 Adapter Operators

16. In the Properties view, click Output Ports.


a) Change the Output stream name to News.
b) In the Output stream schema table, select the row in the table titled
varName and Remove it.
Important: In upcoming demos, whenever you see a default varName
row in this table remove it before adding new attributes.
c) Click in the Add attribute... field to add a new attribute to the output stream
schema. Press CTRL-Space. Double-click <extends>. (If you are having a
hard time with the ctrl-space, just type <extends>)
d) In the Type column, press ctrl-space again. Double-click NewsTicker. (Once
again, you could just type NewsTicker.)
17. In the Properties view, click Param.
a) The role parameter is already displayed. Set its value to server.
b) Click the Add pushbutton. Select both name and port. Then click OK.
c) The value for name is "streamshost". Dont forget the quotes.
d) The value for port is 1234u.
18. Select the Export operator.
19. In the Properties view, click Param.
a) Click the Add pushbutton. Select streamId and press OK.
b) Set the value to ExportedNews. Dont forget the quotes.
20. Save your work.

Import
21. In the Project Explorer, create another SPL Source File under ExportImport.
Give it a Main Composite name of MyImport. (Keep the Generate Main
composite checked.)
22. In the Palette, expand Toolkits->spl->spl.adapter. Drag Import into the
MyImport composite.
23. Also drag the FileSink into the MyImport composite.
24. Connect the two ports.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-47

Unit 5 Adapter Operators

25. Select the Import operator and in the Properties view, click Param.
a) There are two ways for the Import operator to get access to a stream, by
subscription and by stream id. The demonstration uses the stream id
technique. So select the subscription parameter and click the Remove
pushbutton.
b) Click the Add pushbutton. Select applicationName and streamId. Then
click OK.
c) Value for applicationName is "MyExport".
d) Value for streamId is "ExportedNews".
26. Select Output Ports.
a) Change the Output stream name to In.
b) In the Output stream schema table, select varName and type <extends>.
c) In the Type column, key in NewsTicker.
27. Click the FileSink operator.
28. In the Properties view, click General.
a) Rename the Alias to sink.
29. Click Param. Set the value for file to "result.dat".
a) Click the Add pushbutton. Select format and click OK.
b) Set the value for format to csv.
30. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-48

Unit 5 Adapter Operators

Task 5. Run the applications.


This is one of those times where you must run the demonstration in distributed
mode.
At this point both of your programs should have been built (since Distributed is
set to active by default). You need to make sure your Streams domain and
instance are running and then launch the programs. You will once again use
netcat to send data to the TCPSource operator. The externalized data is placed
in the result.dat file.
1. Click on the Streams Explorer tab. If you need to, expand Streams Instances.
Confirm your instance is running. If not, right click on the instance and select
Start Instance. If you had to start your instance, display the Console view and
select Streams Studio Console. This will let you see if your instance started
correctly. You could also scroll to the right in the Streams Explorer to see the
status of your instance.
2. Return back to the Project Explorer.
3. Under ExportImport expand <default_namespace>.
4. Right click on MyExport and MyImport individually and select Launch. On the
Edit Configuration dialogs, select Apply and then select Continue.
5. Open a command window by right-clicking on the desktop and selecting Open
in Terminal.
6. Change directory to /home/labfiles
cd /home/labfiles
7. You are now going to execute netcat in order to pass data to your Stream
applications TCPSource operator.
nc streamshost 1234 < news_ticker_nodelay.dat
8. In the Project Explorer right click on the data folder and select Refresh.
9. Expand the data folder and open result.dat. The data read by the TCPSource
in MyExport.spl should be displayed.
10. Close all of your open tabs.
11. Contract the ExportImport project.
12. Return to the Streams Explorer.
13. Under Streams Jobs, right click on MyExport and select Cancel job. Then do
the same for MyImport.
14. You can close the command window as well.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-49

Unit 5 Adapter Operators

SPL Editor
Code solutions follow this demonstration.

Task 1. Source and Sink operators.


1.

If your Eclipse development environment is not open, double-click on the


Streams Studio icon on your desktop. Accept the default workspace.
2. In Eclipse create an SPL application project by clicking File->New->Project.
Expand InfoSphere Streams Studio. This time, select SPL Application Project.
Click Next.
3. Name the project SourceSink. Take the default of application for the
namespace and SourceSink for the Main composite. Click Finish.
4. By default, the graphical editor is opened. Close the SourceSink.spl tab in the
Editor view.
5. In the Project Explorer view, under SourceSink, expand Resources>application. Right-click SourceSink.spl and select Open With->SPL Editor.
6. In the Editor view add a graph clause to your main composite
namespace application;
composite SourceSink {
graph
}
and then code a FileSource operator whose
a) Output stream name is StockReport
b) The schema is
ticker - rstring
tradeDate - rstring
closingPrice - rstring
volume - rstring
c) File is at a specific location - /home/labfiles/stock_report_nodelay.dat
(Remember that the value for the file parameter is in double quotes.)
d) Format is comma delimited (csv)

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-50

Unit 5 Adapter Operators

7.

Code a FileSink operator that:


a) Observes the StockReport stream
b) Writes to a file, copyfile.dat, in the default data folder.
c) Format is csv

8.

Save your updates.

Task 2. Run your applications.


Note: In this demonstration and any of the following demonstrations, except where
specifically noted, you can either build and launch distributed applications or
standalone applications to test your demonstrations. It is your choice. If you decide
to build distributed applications, then remember to make sure the Steams domain
and instance are running and then launch your application.
If you build and run standalone applications, and if you are using a FileSource or a
Beacon operator that is just doing a normal file read (not a hotfile read) or emitting a
set number of tuples, then your application will terminate automatically when all
input records have been exhausted. If you are reading from some other source type
operator, to terminate the running application, click on the Console tab that is in the
bottom view of the Eclipse window. (Make sure you are viewing the Streams Studio
Console.) It will show that your program is running. Click on the red square to
terminate your program.
All demonstrations are set up so that if the output is written to a file, that file will be
located in the default data folder that is defined under your SPL project. You may
have to right-click on the data folder and select Refresh in order to see your output
file.
1.

In the Project Explorer, under SourceSink expand the application namespace.


Then expand SourceSink.
By default in the IDE your project is initially set to build a distributed application. If
you wish to work with a distributed application, then right click on Distributed and
select Launch.
2. If you wish to work with a standalone application, right-click on SourceSink and
select New->Standalone Build. Click OK. Make sure that the build completed
successfully. Right-click on Standalone and select Set active.
3. Right-click on whichever application type is marked as active and select
Launch.
4. Click Apply on the Edit Configuration dialog. Then click Continue.
5. In the Project Explorer view, refresh the data folder, expand it and double-click
on copyfile.dat.
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-51

Unit 5 Adapter Operators

6.
7.
8.

Close any opened tabs in the Editor view.


Also, contract the SourceSink project as well.
If you ran the application as distributed, go to the Streams Explorer tab, find the
running job, right click and cancel the job.

Task 3. Export and Import.


Lets take a look at the capability of one Streams application passing stream data to
another through the use of export and import. This will require that you code two
applications. And since the schema of the exported stream and the schema of the
imported stream must be the same, you will create a third file that defines the
shared schema as a type.

Export
1.

In Eclipse create a new SPL project by clicking File->New->Project. Then


select SPL Project. Click Next.
2. Name the project ExportImport. Click Finish.
For this project, you are not going to use a namespace. (No particular reason.
Just to show you that you can.) First you are going to create a file that only
contains a schema definition defined as a type. You will then reference this type
from both of your export and import programs.
3. Expand ExportImport and then Resources.
4. Right click on ExportImport and select New->SPL Source File.
5. Uncheck Generate Main composite.
6. Change the File name to Common.spl. Then click Finish.
7. Close the Common.spl tab in the Editor view.
8. In the Project Explorer view, right-click Common.spl and select Open With>SPL Editor.
9. Code the following type definition and then save it.
type NewsTicker = tuple<rstring agency, rstring category,
rstring summary>;
10. Right-click on ExportImport and create another SPL Source File. Make the
Main Composite name to be MyExport and click Finish.
11. Close the MyExport.spl tab in the Editor view.
12. In the Project Explorer view, right-click MyExport.spl and select Open With>SPL Editor.
13. Add a graph clause to your main composite.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-52

Unit 5 Adapter Operators

14. Use the TCPSource operator to get access to the data. Code the following
operator:
stream<NewsTicker> News = TCPSource() {
param
role : server;
name : "streamshost";
port : 1234u;
}
15. Next add an export operator to your MyExport.spl source that
Observes the News stream
xports a stream with a streamId of ExportedNews
16. Save your work.

Import
17. Under ExportImport create another SPL Source File. Give it a name of
MyImport.
18. Close the MyImport.spl tab in the Editor view.
19. In the Properties view, right-click MyImport.spl and select Open With->SPL
Editor.
20. Add a graph clause to MyImport.spl.
21. Code an Import operator
a) Stream name is - In
b) Use the NewsTicker type for your schema definition
c) The application name will be - MyExport (Be aware that the application
name is a fully qualified name. Since we are not using a namespace, we
only have to reference the name of our exporting application. But if that
application was in a namespace, then the namespace must be specified as
part of the application name.)
d) The stream Id is - ExportedNews
22. Add a FileSink to the MyImport.spl code that observes the In stream and writes
it out in a csv format to a file called result.dat in the default data directory.
() as Sink = FileSink(In) {
param file
: "result.dat";
format : csv;
}
23. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-53

Unit 5 Adapter Operators

Task 4. Run the applications.


This is one of those times where you must run the demonstration in distributed
mode.
At this point both of your programs should have been built (since Distributed is
set to active by default). You need to make sure your Streams domain and
instance are running and then launch the programs. You will once again use
netcat to send data to the TCPSource operator. The externalized data is placed
in the result.dat file.
1.

2.
3.
4.
5.
6.
7.

8.
9.
10.
11.
12.
13.
14.

Click on the Streams Explorer tab. If you need to, expand Streams Instances.
Confirm your instance is running. If not, right click on the instance and select
Start Instance. If you had to start your instance, display the Console view and
select Streams Studio Console. This will let you see if your instance started
correctly. You could also scroll to the right in the Streams Explorer to see the
status of your instance.
Return back to the Project Explorer.
Under ExportImport expand <default_namespace>.
Right click on MyExport and MyImport individually and select Launch. On the
Edit Configuration dialogs, select Apply and then select Continue.
Open a command window by right-clicking on the desktop and selecting Open
in Terminal.
Change directory to /home/labfiles
cd /home/labfiles
You are now going to execute netcat in order to pass data to your Stream
applications TCPSource operator.
nc streamshost 1234 < news_ticker_nodelay.dat
In the Project Explorer right click on the data folder and select Refresh.
Expand the data folder and open result.dat. The data read by the TCPSource
in MyExport.spl should be displayed.
Close all of your open tabs.
Contract the ExportImport project.
Return to the Streams Explorer.
Under Streams Jobs, right click on MyExport and select Cancel job. Then do
the same for MyImport.
You can close the command window as well.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-54

Unit 5 Adapter Operators

Code solutions
namespace application;
composite SourceSink {
graph
stream <rstring ticker, rstring tradeDate, rstring
closingPrice, rstring volume> StockReport = FileSource() {
param file
: "/home/labfiles/stock_report_nodelay.dat";
format : csv;
}
() as Sink = FileSink(StockReport) {
param file
: "copyfile.dat";
format : csv;
}
}
Common.spl
type NewsTicker = rstring agency, rstring category, rstring
summary ;
MyExport.spl
composite MyExport
{
graph
stream<NewsTicker> News = TCPSource() {
param role : server;
name : "streamshost";
port : 1234u;
}
() as Exporter = Export(News) {
param streamId : "ExportedNews";
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-55

Unit 5 Adapter Operators

MyImport.spl
composite MyImport
{
graph
stream<NewsTicker> In = Import(){
param applicationName : "MyExport";
streamId : "ExportedNews";
}
() as Sink = FileSink(In) {
param file : "result.dat";
format: csv;
}
}

End of SPL Editor demonstration


Results:
This demonstration had you code a FileSource operator to read from a file and
then you used a FileSink operator to externalize the data.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

5-56

Relational and Utility Operators: The Journey Begins

Relational and Utility


Operators: The Journey
Begins

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 6 Relational and Utility Operators

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-2

Unit 6 Relational and Utility Operators

Demonstration 1
The Beacon and Custom operators

In this demonstration, you will:

Work with the Beacon operator


Work with the Custom operator

Relational and Utility Operators

Copyright IBM Corporation 2015

Demonstration 1

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-15

Unit 6 Relational and Utility Operators

Demonstration 1:
The Beacon and Custom operators
Purpose:
You want to work with two utility operators. The Beacon operator is a way to
generate test tuples. The Custom operator gives you a skeleton operator
where you can submit tuples.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-16

Unit 6 Relational and Utility Operators

Graphical Editor
Task 1. Beacon Operator.
1.

In Eclipse create a new SPL project by clicking File->New->Project. Then


select SPL Project. Click Next.
2. Name the project BeaconCustom. Click Finish.
3. For simplicity purposes we will not specify a namespace.
4. Right click on the BeaconCustom project and select New->SPL Source File.
Take the defaults for the Main composite name and click Finish.
The first operator for you to code is a Beacon operator. This operator is to emit
ten tuples on a stream. Each tuple has only a single integer attribute which starts
with a value of zero and gets incremented by 1 for each emitted tuple.
Fortunately, there is a function IterationCount() that can be called to get the
iteration count. Use the graphical editor to code a Beacon operator with the
following criterion:
Stream name - Beat
Iterations - 10
Attribute - uint64 val
5.
6.
7.
8.
9.
10.
11.
12.

13.
14.
15.

In the Palette, expand Toolkits->spl->spl.utility.


Drag a Beacon operator to the Main composite.
Select the Beacon operator and click on the Properties tab.
In the Properties view, click Output Ports.
Rename the output stream to Beat.
Modify the Output stream schema and add an attribute called val of type
uint64.
Click the Param menu item. Remove the period parameter. Add a new
iterations parameter.
Set the value of the iterations parameter to 10u. Now the Beacon operator is
going to do something ten times. Next you have to figure out what those ten
things are going to be.
You need to specify what is to happen when a tuple is about to be emitted. In
the Properties view, scroll the menu down and click Output.
Expand Beat. For the value of the val attribute, click ctrl-space. Select
IterationCount() : uint64.
If there is an entry underneath Beat that looks like outputExpression:
<unknown type>, select it and then click the Clear button to remove it.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-17

Unit 6 Relational and Utility Operators

Task 2. Custom Operators.


Next code a Custom operator. This operator is to observe the Beat stream and is
to emit two streams, one called Even and one called Odd. It will alternate
between the two output streams for each input tuple. Meaning tuples, 1,3,5,7,
and 9 will be emitted on the Odd stream. Tuples 0,2,4,6, and 8 will be emitted on
the Even stream.
1. Drag a Custom operator to the Main composite.
2. Click the Custom operator and in the Properties view, click Input Ports.
3. For Input Ports, click the Add pushbutton.
4. Now in the canvas the Custom operator has an input port. Connect the output
port of the Beacon operator to the input port of the Custom operator.
5. Select the Custom operator. In the Properties view click Output Ports.
6. Add an output port.
7. Rename the output stream name to Even.
8. The output schema for the Even stream should have a single attribute called
val that is of type uint64.
9. Click the Add pushbutton again in order to create a second output port.
10. Rename the output stream name for Port 1 to be Odd.
11. Make sure that Port 1 is selected for the Output port. Then make the output
stream schema to have a single attribute called val that is of type uint64.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-18

Unit 6 Relational and Utility Operators

Task 3. Code the logic for the Custom operator.


The logic of this Custom operator is simple. There will be a boolean variable
whose value will alternate between true and false. When the value is true, then
the input tuple gets emitted on the Even stream. When it is false, the input tuple
gets emitted on the Odd stream.
1. In the Properties view, click Logic.
2. In this case, the Properties area is actually an editor. Although you can click the
hyperlink of Click here to invoke the SPL editor, you do not have to. Add the
following code for the logic clause.
state
: mutable boolean sw = true;
onTuple Beat :{if (sw) {
sw = false;
submit(Beat,Even);
}
else {
sw = true;
submit(Beat, Odd);
}
}
3. Save your work.
4. You need to have some way to verify that your program is working properly.
Use another Custom operator to print the values emitted on the Even port. Drag
a second Custom operator from the palette and drop it in the Main composite.
Code this Custom operator to print the lone attribute value in the tuple from the
Even stream. Remember, you will have to cast that integer attribute to a string.
5. Select the second Custom operator on the canvas.
6. In the Properties view, click General. To differentiate the two Custom
operators, rename the alias of the second Custom operator to PrintEven.
7. On the canvas, click on the top output port for the Custom_1 operator. This
should be port 0 which is the port on which even-numbered tuples are emitted.
8. Note that when you clicked on this output port and began to drag the port, some
potential input ports are displayed. Connect the selected output port to the
potential input port for the PrintEven operator.
9. Look at the properties for the Printer operator. If you look at Input Ports, you
see that Port 0 was automatically defined and it has an input schema defined.
10. In Properties view, click on Logic. Code the following logic statement:
onTuple Even : println((rstring)val);
11. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-19

Unit 6 Relational and Utility Operators

Task 4. Build and run your application.


This will be a time where you may like to build a standalone application. That is
because an operator is printing output to the console. If you build a distributed
application it takes a few more steps to view that operator output.
1. In the Project Explorer expand BeaconCustom-><default_namespace>
->Main.
2. Right-click on Main and select New->Standalone Build. Click OK on the
displayed dialog.
3. Expand Main and right-click on Standalone. Select Set Active.
4. Right-click on Standalone and select Launch. Click Apply and then Continue.
5. Listed in the Console should be even numbers zero through eight.
6. IF you chose to run as a distributed application, open the Streams Explorer
tab. Find your job and right click. Select Get Job Logs. This will create a new
project in Streams Studio called StreamsStudioLogs. If you open that project
you will see all the log files for the PEs in your application. The PE with the
largest number is likely the one that will have the output you are looking for. On
my machine, this file was named
job_12.pe_32_streamshost.localdomain.console. Double click your file to see its
contents.
7. Close all opened tabs. Contract the BeaconCustom project.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-20

Unit 6 Relational and Utility Operators

SPL Editor
The code solution follows this demonstration.

Task 1. Beacon Operator.


1.

In Eclipse create a new SPL project by clicking File->New->Project. Then


select SPL Project. Click Next.
2. Name the project BeaconCustom. Click Finish.
For simplicity purposes we will not specify a namespace.
3. Right click on the BeaconCustom project and select New->SPL Source File.
Take the defaults for the main composites name and click Finish.
4. Close the Main.spl tab that is open in the Edit view.
5. In the Project Explorer view, expand BeaconCustom->Resources. Right-click
Main.spl and select Open With->SPL Editor.
6. Add a graph clause to your main composite.

Note: This will be the last time that you will be told to code the graph clause. By now
you should grasp the concept that it is needed.
The first operator for you to code is a Beacon operator. This operator is to emit
ten tuples on a stream. Each tuple has only a single integer attribute which starts
with a value of zero and gets incremented by 1 for each emitted tuple.
Fortunately there is a function IterationCount() that can be called to get the
iteration count.
7. Code a Beacon operator with the following criterion:
Stream name - Beat
Iterations - 10
Attribute - uint64 val

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-21

Unit 6 Relational and Utility Operators

Task 2. Custom Operators.


1.

Next code a Custom operator. This operator is to observe the Beat stream and
is to emit two streams, one called Even and one called Odd. It will alternate
between the two output streams for each input tuple. Meaning tuples, 1, 3,5,7,
and 9 will be emitted on the Odd stream. Tuples 0,2,4,6, and 8 will be emitted
on the Even stream.
You will need to make use of a logic clause. Define a boolean variable called sw,
set it initially to true, and allow it to be changed. Then each time a new tuple
arrives on the Beat stream, check the value of sw to determine on which output
port the tuple is to be written. Your code should look as follows:
logic state : mutable boolean sw = true;
logic onTuple Beat :{if (sw) {
sw = false;
submit(Beat,Even);
}
else {
sw = true;
submit(Beat, Odd);
}
}
2.

3.

You need to have some way to verify that your program is working properly,
Code another Custom operator that will print the lone attribute value in the tuple
from the even stream. Remember, you will have to cast the integer attribute to a
string.
Save your work.

Task 3. Build and run your application.


This will be a time where you may like to build a standalone application. That is
because an operator is printing output to the console. If you build a distributed
application it takes a few more steps to view that operator output.
1. In the Project Explorer expand BeaconCustom-><default_namespace>>Main.
2. Right-click on Main and select New->Standalone Build. Click OK on the
displayed dialog.
3. Expand Main and right-click on Standalone. Select Set Active.
4. Right-click on Standalone and select Launch. Click Apply and then Continue.
5. Listed in the Console should be even numbers zero through eight.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-22

Unit 6 Relational and Utility Operators

6.

7.

IF you chose to run as a distributed application, open the Streams Explorer


tab. Find your job and right click. Select Get Job Logs. This will create a new
project in Streams Studio called StreamsStudioLogs. If you open that project
you will see all the log files for the PEs in your application. The PE with the
largest number is likely the one that will have the output you are looking for. On
my machine, this file was named
job_12.pe_32_streamshost.localdomain.console. Double click your file to
see its contents.
Close all opened tabs. Contract the BeaconCustom project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-23

Unit 6 Relational and Utility Operators

Code solutions
composite Main {
graph
stream<uint64 val> Beat = Beacon() {
param iterations
: 10u;
output Beat
: val = IterationCount();
}
(stream<uint64 val> Even; stream<uint64 val> Odd) =
Custom(Beat) {
logic state
onTuple Beat

: mutable boolean sw = true;


:{if (sw) {
sw = false;
submit(Beat,Even);
}
else {
sw = true;
submit(Beat, Odd);
}
}

}
() as PrintEven = Custom(even) {
logic onTuple Even
: println((rstring)val);
}
}

End of SPL Editor demonstration


Results:
You worked with two utility operators. The Beacon operator allowed you to
generate test tuples. The Custom operator gave you a skeleton operator in
which you could submit tuples.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-24

Unit 6 Relational and Utility Operators

Demonstration 2
Functors and Filters

In this demonstration, you will:

Work with the Functor operator


Work with the Filter operators

Relational and Utility Operators

Copyright IBM Corporation 2015

Demonstration 2

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-37

Unit 6 Relational and Utility Operators

Demonstration 2:
Functors and Filters
Purpose:
This demonstration has you code a Functor operator to select only those
records that are for rooms that are laboratories and convert the recorded
room temperature from Fahrenheit to Celsius. You will then code user logic in
a Filter operator and finally you will conclude with coding a DynamicFilter
operator.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 30 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-38

Unit 6 Relational and Utility Operators

Graphical Editor
Task 1. Background.
Read data in a csv format from the /home/labfiles/room_sensor.dat file. Filter the
tuples so that you are only working with data from labs (roomType is equal to an
L). Enrich and transform the tuple data. Concatenate the character string
Laboratory to the roomId and convert the temperature of the room from
Fahrenheit to Celsius. Then write the results to a file called labtempdata.dat. This
will require FileSource, Filter, Functor, and FileSink operators. (You could eliminate
the Filter operator and just include the predicate in the Functor operator, but then
you would not get to code a Filter operator.)

Task 2. Functor operator.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project FunctorProj. Click Finish.
Right click on FunctorProj and select New->SPL Source File. Take the
defaults and click Finish.
Drag a FileSource operator to the Main composite.
Modify the FileSource properties so that
a) Click Output Ports. The emitted stream name is SensorSource and the
output schema is:
time
roomId
roomType
temp
lastMotion

uint32
rstring
rstring
float64
uint32

b) Click Param. The input file is /home/labfiles/room_sensor.dat that is in


a csv format. (File name must be in double quotes.) Add a format
Parameter and set the value to csv.
6.
7.

Drag a Filter operator into the Main composite. (The Filter operator is under
spl.relational.)
Connect the output port of the FileSource operator to the input port of the Filter
operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-39

Unit 6 Relational and Utility Operators

8.

For the properties of the Filter operator.


a) The name of the output stream for Port 0 as LabOnly. (Remember that
tuples emitted on Port 0 will be those that match the predicate.)
b) The output schema must be the same as the input schema. (You can
code all of the attributes that you coded for SensorSource. You also
could have initially defined the schema as a type and referenced the type
when referring to the schema. Or you can reference a previously defined
stream and your new schema will be the same as the schema for the
reference stream.) I would suggest that for the output schema variable
name that you code <entends>. Then for the variable type, code
SensorSource.
c) The predicate or Value field should be roomType == "L". (Filter
parameter defined in the Param properties.)

9. Drag a Functor operator into the Main composite.


10. Connect the output port of the Filter operator to the input port of the Functor
operator.
11. Specify the properties of the Functor operator so that:
a) The name of the output stream is NewLabData.
b) The schema of the output stream is
labName rstring
cTemp

float64

c) Click on Output. Expand NewLabData. Output attributes:


Output the attribute labName so that the character string Laboratory is
concatenated with the roomId attribute ("Laboratory " + roomId)
Output the attribute cTemp as ((temp - 32.0) * 5.0 / 9.0)
12. Drag a FileSink operator into the Main composite.
13. Connect the output port of the Functor operator to the input port of the FileSink
operator.
14. Specify the properties of the FileSink operator
a) Click General. Rename the alias to sink.
b) The output file name is labtempdata.dat and it will be located in this
projects data directory.
c) The format of the output file is csv.
15. Save your updates.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-40

Unit 6 Relational and Utility Operators

Task 3. Build and run your application.


1. Build and launch your application.
If you launch a standalone application, it will automatically terminate. If you
launch a distributed application, you will have to terminate the application.
2. In the Project Explorer view, refresh the data folder, expand it and open
labtempdata.dat.
3. Close any open tabs in the Editor view.
4. Contract the FunctorProj project.

Task 4. Code a DynamicFilter Operator.


The idea for this program is to demonstrate the DynamicFilter operator. This will be
done by using a Beacon operator to generate thirty tuples with an interval of 0.1
seconds. Each tuple will have a single integer attribute whose value varies from
zero to nine. Initially there will not be a filtering criterion. So there will not be any
matches. One and a half seconds into the process, a file is read which contains
values that are to be added to the filtering criterion. Tuples that are equal to those
values will match. The DynamicFilter operator emits two streams, one that has
matched tuples and one that has unmatched tuples. Custom operators print out the
results.
1. In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
2. Name the project DynamicFilter. Click Finish.
3. Right click on DynamicFilter and select New->SPL Source File. Take the
defaults and click Finish.
4. Drag a Beacon operator into the Main composite.
5. The properties for the Beacon operator are:
a) The output stream name is Data
b) The schema for the stream is:
num
uint64
otherData rstring
c) Both the iterations and period parameters are to be used. iterations is
30u and period is 0.1.
d) The output for the Data stream will set
num equal to IterationCount() % (uint64)10
otherData equal to "Other Data"
6.

Drag a FileSource operator into the Main composite.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-41

Unit 6 Relational and Utility Operators

7.

Set the properties for the FileSource to:


a) Output stream name is AddKey
b) The schema for AddKey
key uint64
c) The file to be read is keyfile.txt
d) Format for the file is csv
e) Specify an initDelay value of 1.5

8.
9.

Drag a DynamicFilter into the Main composite.


Connect the output port of the Beacon operator to the top input port of the
DynamicFilter operator. (The top port equates to Port 0.)
10. Connect the output port of the FileSource operator to the bottom input port of
the DynamicFilter operator.
Code the properties for the DynamicFilter operator.
11. For Output Ports
a) For output Port 0, rename the stream to Matched.
b) Set the output stream schema for Port 0
<extends> Data
c) Add a second output port, Port 1. Rename the stream to Unmatched.
d) Set the output stream schema for Port 1
<extends> Data

12. For Params


a) Set key to Data.num
b) Set addKey to AddKey.key
Code the following two Custom operators to print out the results.
13. Drag a Custom operator to Main composite.
14. Click on the top output port for the DynamicFilter operator. Connect that port to
the input port that appears on the Custom operator.
15. For the properties of the Custom operator
a) In the General properties, rename the Alias to Msink.
b) In the Logic edit area, code the following:
onTuple Matched: println("Matched: "+ (rstring)num);
16. Drag a second Custom operator to Main composite.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-42

Unit 6 Relational and Utility Operators

17. Click on the bottom output port for the DynamicFilter operator. Connect that
port to the input port that appears on the new Custom operator.
18. For the properties of the second Custom operator
a) In the General properties, rename the Alias to Usink.
b) In the Logic edit area, code the following:
onTuple Unmatched: println("Unmatched: "+ (rstring)num);
19. Save your work if needed.
20. Finally create a file called keyfile.txt in the default data directory. In the Project
Explorer, right-click on Resources->data and select New->File. Give it a File
name of keyfile.txt. Click Finish.
21. Add two values to the file, 2 and 5 on separate lines. Save the file and close it.
2
5

Task 5. Build and run your application.


1.

2.

3.

Once again, since we are printing out the results to the console, you may like to
build a standalone application. Then launch your application. It will terminate on
its own. If you set your build type to standalone before you do a save of your
code, then a standalone application will automatically be built.
Look at the printed results in the console. You should see several initial
iterations of the data where all of the values are Unmatched. Then once
keyfile.txt is read by the FileSource operator and that data is passed to the
DynamicFilter operator, you should begin to see that the values of 2 and 5 are
being Matched.
Close all opened tabs. Contract your DynamicFilter project.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-43

Unit 6 Relational and Utility Operators

SPL Editor
Task 1. Background.
Read data in a csv format from the /home/labfiles/room_sensor.dat file. Filter the
tuples so that you are only working with data from labs (roomType is equal to an
L). Enrich and transform the tuple data. Concatenate the character string
Laboratory to the roomId and convert the temperature of the room from
Fahrenheit to Celsius. Then write the results to a file called labtempdata.dat. This
will require FileSource, Filter, Functor, and FileSink operators. (You could eliminate
the Filter operator and just include the predicate in the Functor operator, but then
you would not get to code a Filter operator.)
Code solution follows this demonstration.

Task 2. Filter and Functor operator.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project FunctorProj. Click Finish.
Right click on FunctorProj and select New->SPL Source File. Take the defaults
and click Finish.
In the Editor view, close the Main.spl tab. Then open Main.spl using the SPL
editor.
Code a FileSource operator that will read the data that is to be processed:
a) Stream name is SensorSource
b) SensorSource schema is
time
roomId
roomType
temp
lastMotion

uint32
rstring
rstring
float64
uint32

c) File name is /home/labfiles/room_sensor.dat


d) Format is csv
6.

Code a Filter operator that observes the SensorSource stream and only
forwards those tuples were the roomType is equal to "L". The output streams
name is LabOnly which will have the same schema as the input SensorSource
stream

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-44

Unit 6 Relational and Utility Operators

7.

Code a Functor operator that:


a) Observes the stream, LabOnly, created by the Filter operator.
b) Emits a stream called NewLabData that will be observed by a FileSink
operator that has a schema as follows:
labName rstring
cTemp

float64

c) Output attributes
Output the attribute labName so that the character string Laboratory is
concatenated with the roomId attribute ("Laboratory " + roomId)
Output the attribute cTemp as ((temp - 32.0) * 5.0 / 9.0)
8.

9.

Code the following FileSink operator which observes the NewLabData stream
and writes the tuples to a file, labtempdata.dat, where the output data is in a
csv format
Save your updates.

Task 3. Build and run your application.


1. Build and launch your application.
If you launched a standalone application, it will automatically terminate. If you
launched a distributed application, you will have to terminate the application.
2. In the Project Explorer view, refresh the data folder, expand it and open
labtempdata.dat.
3. Close out all open tabs in the Editor view and contract your FunctorProj
project.

Task 4. Code a DynamicFilter Operator.


The idea for this program is to demonstrate the DynamicFilter operator. This will be
done by using a Beacon operator to generate thirty tuples with an interval of 0.1
seconds. Each tuple will have a single integer attribute whose value varies from
zero to nine. Initially there will not be a filtering criterion. So there will not be any
matches. One and a half seconds into the process, a file is read which contains
values that are to be added to the filtering criterion. Tuples that are equal to those
values will match. The DynamicFilter operator emits two streams, one that has
matched tuples and one that has unmatched tuples. Custom operators print out the
results.
1. In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
2. Name the project DynamicFilter. Click Finish.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-45

Unit 6 Relational and Utility Operators

3.

Right click on DynamicFilter and select SPL Source File. Take the defaults
and click Finish.
4. Close the SPL Graphical editor and open Main.spl using the SPL editor.
5. In the Editor view, code a Beacon operator that generates 30 tuples with a 0.1
second pause between emitting each tuple.
The properties for the Beacon operator are:
a) The output stream name is Data
b) The schema for the stream is:
num
uint64
otherData rstring
c) Both the iteration and period parameter will be used. iteration is 30 and
period is 0.1.
d) The output for the Data stream will set
num equal to IterationCount() % (uint64)10
otherData equal to "Other Data"
6.

Next code a FileSource operator that will read a file that contains the values
that are to be used in the filter predicate. This operator is to wait one and a half
seconds before it reads in the data.
The properties for the FileSource to:
a) Output stream name is AddKey
b) The schema for AddKey
key uint64
c) The file to be read is keyfile.txt
d) Format for the file is csv
e) Specify an initDelay value of 1.5

7.

Code a DynamicFilter that observes the Data stream that is emitted by the
Beacon operator on input port 0 and also observes the AddKey stream, emitted
by the FileSource operator, on input port 1. It emits two streams. The stream on
output port 0 is called Matched. The stream on output port 1 is called
UnMatched. The schema for both output streams will have the same schema
as the Data stream. The DynamicFilter The attribute that is examined by the
filter is the num attribute in the Data stream. The key attribute in the AddKey
stream is used to add new values to the filtering criterion.
Code two Custom operators to print out the results.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-46

Unit 6 Relational and Utility Operators

8.

The first Custom operator is to have an alias of Msink. This is to observe the
Matched stream. Using the onTuple of the logic clause execute the following:
println("Matched: "+ (rstring)num)
9. Code a second Custom operator that has an alias of Usink. This operator is to
observe the Unmatched stream. Using the onTuple of the logic clause execute
the following:
println("Unmatched: "+ (rstring)num)
10. Save your work.
11. Finally create a file called keyfile.txt in the default data directory. In the Project
Explorer view, right-click on data and select New->File. Give it a File name of
keyfile.txt. Click Finish.
12. Add two values to the file, 2 and 5 on separate lines. Save the file and close it.
2
5

Task 5. Build and run your application.


1.

2.

3.

Once again, since we are printing out the results, you may like to build a
standalone application. Then launch your application. It will terminate on its
own. If you set your build type to standalone before you do a save of your code,
then a standalone application will automatically be built.
Look at the printed results. You should see several initial iterations of the data
where all of the values are Unmatched. Then once keyfile.txt is read by the
FileSource operator and that data is passed to the DynamicFilter operator, you
should begin to see that the values of 2 and 5 are being Matched.
Close all opened tabs. Contract your DynamicFilter project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-47

Unit 6 Relational and Utility Operators

Code solutions
Code for Filter and Functor operator
composite Main {
graph
stream<uint32 time, rstring roomId, rstring roomType,
float64 temp, uint32 lastMotion> SensorSource =
FileSource() {
param

file
format

: /home/labfiles/room_sensor.dat";
: csv;

}
stream<SensorSource> LabOnly = Filter(SensorSource){
param
filter
: roomType == "L";
}
stream<rstring labName, float64 cTemp> NewLabData =
Functor(SensorSource) {
param
filter
: roomType == "L";
output
NewLabData
: labName = "Laboratory " + roomId,
cTemp = ((temp-32.0)*5.0/9.0);
}
() as Sink1 = FileSink(NewLabData){
param
file
: "labtempdata.dat";
format
: csv;
}
}
Code for DynamicFilter operator
composite Main {
graph
stream<uint64 num, rstring otherData> Data = Beacon() {
param
iterations
: 30u;
period
: 0.1;
output
Data
: num = IterationCount() %
(uint64)10,
otherData = "Other Data";
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-48

Unit 6 Relational and Utility Operators

}
stream<uint64 key> AddKey = FileSource() {
param
file
: "keyfile.txt";
format
: csv;
initDelay
: 1.5;
}
(stream<Data> Matched;stream<Data> Unmatched) =
DynamicFilter(Data;AddKey) {
param
key
: Data.num;
addKey
: AddKey.key;
}
() as Msink = Custom(Matched) {
logic
onTuple Matched

: println("Matched: "+
(rstring)num);

}
() as Usink = Custom(Unmatched) {
logic
onTuple Unmatched : println("Unmatched: "+
(rstring)num);
}
}

End of SPL Editor demonstration


Results:
You coded a Functor operator to select only those records that are for rooms
that are laboratories and convert the recorded room temperature from
Fahrenheit to Celsius. You then coded user logic in a Filter operator and
finally you concluded with coding a DynamicFilter operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-49

Unit 6 Relational and Utility Operators

Demonstration 3
Splits
In this demonstration, you will:

Work with the Split operator

Relational and Utility Operators

Copyright IBM Corporation 2015

Demonstration 3

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-59

Unit 6 Relational and Utility Operators

Demonstration 3:
Splits
Purpose:
You want to code a Split operator to split a single stream into two streams.
The split criterion will be based upon an input attribute that is of type list.
You are going to code a FileSource operator that reads records from a file.
Within the data will be a list of integers. The emitted stream from the
FileSource is observed by a Split operator. Based upon the values in the
integer list, the current tuple gets forwarded to the correct stream. The data
then is printed in order to see how the tuples were split.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-60

Unit 6 Relational and Utility Operators

Graphical Editor
Task 1. Split operator.
1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project Splitter. Click Finish.
Right-click on Splitter and select New->SPL Source File. Take the defaults
and click Finish.
Drag a FileSource operator into the Main composite.
The properties of the FileSource are as follows:
a) The output stream name is In.
b) The schema is
name rstring
num int32
idx list<int32>
c) The input file is /home/labfiles/splitfile.dat
d) The format of the file is csv

6.

Drag a Split operator (under spl.utility) to the Main composite.

This Split operator is to observe stream In and have two output ports. The stream
for output port 0 is to be called Out1. The stream for output port 1 is to be called
Out2. The output schemas will be the same as the schema of the input stream.
Use the values in the attribute idx to determine to which output port the tuple is to
be directed.
A wrinkle is being added to this operator. I would like for you to add user logic so
that when a tuple is observed on the input port, the following is executed.
println("out: " + name + (rstring)idx);
7. Connect the output port of the FileSource operator to the input port of the Split
operator.
8. Properties for the Output Ports
a) For Port 0, rename the stream name to Out1.
b) Set the output stream schema for Port 0 to <extends> In
c) Add a second output port.
d) For Port 1, rename the stream name to Out2.
e) Set the output stream schema for Port 1 to <extends> In
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-61

Unit 6 Relational and Utility Operators

9.

In the Properties view, click Logic. In the logic edit area, code the following:
onTuple ln : println("out: " + name + (rstring)idx);
10. In the Properties view, click Param
a) Add the index parameter
b) The value for the index parameter is idx.

11. Use a single Custom operator to print whether a tuple is arriving on port 0 or
port 1. Drag a Custom operator to the Main composite.
12. Connect the top output port of the Split operator to the newly appeared input
port for the Custom operator.
13. Connect the second output port of the Split operator to a second "magically
appearing" port for the Custom operator.
14. In the Properties view for the Custom operator, click General.
15. Rename the alias to PrintSink.
16. In the Properties view, click Logic. In the Logic edit area, code the following:
onTuple Out1: println("Out1");
onTuple Out2: println("Out2");
17. Save your work.

Task 2. Build and run your application.


1.
2.
3.
4.

Build your application as a standalone application. Once again we are printing


to the console.
Launch your application. The application will terminate automatically.
Looking at the output, you will see that the first two tuples were emitted on both
output ports. The next two tuples are only emitted on a single output port.
Close the opened tabs in the Editor view and contract the Splitter project as
well.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-62

Unit 6 Relational and Utility Operators

SPL Editor
Code solution follows this demonstration.

Task 1. Split operator.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project Splitter. Click Finish.
Right click on Splitter and select SPL Source File. Take the defaults and click
Finish.
Close the SPL Graphic editor and open Main.spl using the SPL editor.
Code a FileSource operator with the following specifications.
a) The output stream name is In.
b) The output schema is:
name rstring
num int32
idx list<int32>
c) The file to be read is /home/labfiles/splitfile.dat
d) The format of the file is csv

6.

Code a Split operator. This Split operator will observe stream In and have two
output ports. The stream for output port 0 is to be called Out1. The stream for
output port 1 is to be called Out2. The output schemas for both output ports will
be the same as the input schema. You will use the values in the attribute idx to
determine to which output port the tuple is to be directed.
We are going to add a wrinkle to this operator. I would like for you to add user
logic so that when a tuple is observed on the input port, the following is
executed.
println("out: " + name + (rstring)idx);
The Split operators properties
a) The logic clause will be
logic onTuple In : println("out: " + name + (rstring)idx);
b) The param clause will be
param index: idx;
Code a single Custom operator to print whether a tuple is arriving on port 0 or
port 1.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-63

Unit 6 Relational and Utility Operators

7.

The properties for the first Custom operator.


a) The alias for the operator is PrintSink
b) Code the following logic clause

8.

logic onTuple Out1 : println("Out1");


onTuple Out2 : println("Out2");
Save your updates.

Task 2. Build and run your application.


1.
2.
3.
4.

Build your application as a standalone application. Once again we are printing


to the console.
Launch your application. The application will terminate automatically.
Looking at the output, you will see that the first two tuples were emitted on both
output ports. The next two tuples are only emitted on a single output port.
Close the opened tabs in the Editor view and contract the Splitter project as
well.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-64

Unit 6 Relational and Utility Operators

Code solution
composite Main {
graph
stream<rstring name, int32 num, list<int32> idx> In =
FileSource() {
param file
: "/home/labfiles/splitfile.dat";
format
: csv;
}
(stream<In> Out1; stream<In> Out2) = Split(In) {
logic onTuple In : println("out: " + name +
(rstring)idx);
param index
: idx;
}
() as PrintSink = Custom(Out1 : Out2) {
logic onTuple Out1 : println("Out1");
onTuple Out2 : println("Out2");
}
}

End of SPL Editor demonstration


Results:
You coded a Split operator to split a single stream into two streams. The split
criterion was based upon an input attribute that was of type list.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

6-65

Windowing and Joins

Windowing and Joins

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 7 W i n d o wi n g a n d J o i n s

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-2

U n i t 7 W i n d o wi n g a n d J o i n s

Demonstration 1
Joins
In this demonstration, you will:
Use the Join operator

Windowing and Joins

Copyright IBM Corporation 2015

Demonstration 1: Joins

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-40

U n i t 7 W i n d o wi n g a n d J o i n s

Demonstration 1:
Joins
Purpose:
You want to code a Join operator to join an electrical companys rate data
with customers usage data in order to determine billing information.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-41

U n i t 7 W i n d o wi n g a n d J o i n s

Graphical Editor
Task 1. Background information.
In order for you to write the correct Join operator, you need some background
information and some assumptions.
Hourly records for electrical power consumption are captured for each home
using smart meters and are streamed every twenty-four hours. (For testing
purposes, this data will be in a file, elec_usage.dat.)
The electrical power company has rates based upon time of day usage that differ
with the summer and winter seasons. (For testing purposes, this data will be in a
file, elec_pricing.dat.)
Power consumption tuples are to be joined with the appropriate rate tuple in
order to calculate billing information.
1.

Look at the test data. In Eclipse select File->Open File. Drill down on
File System->home->labfiles. Double-click on elec_pricing.dat.

There are five records for the summer season, two sets of hours that are
designated as off peak hours, two designated as intermediate and one
designated as peak, each with their own rates.
Also there are records for the winter season as well and that the first winter rate
record has a delay value.
For this to work, rate tuples for a particular season must be held for a period of
time. Then when it is time to switch to another seasons rates, the rates for the
new season are read and held. To make this process flexible, a version number
has been assigned to each group of rate records. Through the use of an attribute
delta sliding window that is based upon this version number and a delta amount
of 0, the power company could at some time in the future increase the billing
granularity without changing the application.
For testing purposes, there are both summer and winter rates in the pricing file
and there are both summer and winter usage records in the usage file. To be
able to simulate matching summer pricing with summer usage and winter pricing
with winter usage, delay values in the data must be used.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-42

U n i t 7 W i n d o wi n g a n d J o i n s

2. Use the same technique to open the elec_usage.dat file.


Notice that the first record has a delay of 5 seconds. This is to make sure that the
summer season rate records are read before beginning to process the
consumers records.
3. Scroll about mid-way down in the file and view the DAY 2 records.
Day 2 tuples are to be matched against the winter season rate tuples so there
has to be a delay reading day 2 records until the winter season rate records have
been read.
Was there some mystical reason for the specified delay values? No. There were
chosen because they allowed the simulation to work properly. It may be that you
have to adjust these delay values.

Task 2. Join operator.


1.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
2. Name the project ElectricJoin. Click Finish.
3. Right click on ElectricJoin and select New->SPL Source File. Take the
defaults and click Finish.
Two streams will be used to read in the electric companys pricing rates and the
customers usage data. Those two streams will be joined based upon the
criterion listed below.
4. Drag a FileSource operator to the Main composite.
a) Rename the Alias to Company.
b) Output stream name is Pricing
c) Schema for the Pricing stream:
startTime uint32
endTime uint32
version uint32
season rstring
ratecat rstring
price
decimal32
d) Param
file - /home/labfiles/elec_pricing.dat
format - csv
hasDelayField - true

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-43

U n i t 7 W i n d o wi n g a n d J o i n s

5.

Drag a second FileSource operator to the Main composite.


a) Rename the Alias to Consumer.
b) Output stream name is Usage
c) Schema for the Usage stream:
time
uint32
device
rstring
wattshours decimal32
d) Param

6.
7.
8.
9.

file - /home/labfiles/elec_usage.dat
format - csv
hasDelayField - true
Drag a Join operator to the Main composite.
Connect the output port for the Consumer FileSource to the top port of the
Join operator.
Connect the output port for the Company FileSource to the bottom port of the
Join operator.
In the Properties view for the Join operator, click Input Ports. (Change the
port aliases. Although this is not a requirement, later on, it will make things less
confusing.)
a) For Port 0 rename the Alias to ElecUse.
b) For Port 1 rename the Alias to ElecRate.

10. The output stream for the Join operator is ElectricBill.


11. The schema for ElectricBill is
time
uint32
season rstring
ratecat rstring
device rstring
kW
decimal32
rate
decimal32
bill
decimal32

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-44

U n i t 7 W i n d o wi n g a n d J o i n s

12. In the Properties view, click WIndow.


a) Edit the ElectUse port. Join operators must use Sliding window. This
application requires a one-sided join for the usage data. The Eviction
policy should stay as count(value). The Eviction policy value needs to
be 0. Click OK.
b) Edit the ElectRate port. It, too, must be a Sliding window. To keep all
tuples for a particular version of the pricing data, select Attribute delta.
For Eviction policy value, type in version, 0u.
13. For Param
a) Select the match parameter. For its value:
time >= startTime && time < endTime
14. For Output (you will need to set and calculate output attribute values.)
a) Expand ElectricBill.
b) You can see that several of the output attributes will automatically be
assigned values. But three of them need to have a value specified.
kW - wattshours / 1000.00dw
the rate - Pricing.price
bill - price * wattshours / 1000.00dw
15. Save your updates.
16. Drag a FileSink operator to Main composite.
17. Connect the output port from the Join operator to the input port of the FileSink
operator.
18. In the Properties view for the FileSink operator for Param
a) file - pricingdetail.dat
b) format - csv
19. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-45

U n i t 7 W i n d o wi n g a n d J o i n s

Task 3. Build and run your application.


1.
2.

3.
4.
5.

Build and run your application.


Remember that there are delay values in the data. So if you are running a
distributed application, let it run for about 45 seconds, and then terminate your
application. Due to the type of data source, a standalone application will
automatically terminate.
In the Project Explorer view, refresh the data folder, expand it and open
pricingdetail.dat.
Close the opened tabs in the Editor view.
Also, contract the ElectricJoin project as well.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-46

U n i t 7 W i n d o wi n g a n d J o i n s

SPL Editor
See Background section in the Graphical Editor steps above before performing the
following steps.
Code solution appears after this demonstration.

Task 1. Join operator.


1.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
2. Name the project ElectricJoin. Click Finish.
3. Right click on ElectricJoin and select SPL Source File. Take the defaults and
click Finish.
4. Close the SPL Graphical editor and open main.spl using the SPL Editor.
Two streams will be used to read in the electric companys pricing rates and the
customers usage data. Those two streams will be joined based upon the
criterion listed below.
5. In the Editor view code a FileSource operator that will read the data in the
elec_pricing.dat file.
a) Output stream name is Pricing
b) Schema for the Pricing stream:
startTime uint32
endTime uint32
version uint32
season rstring
ratecat rstring
price
decimal32
c) Param
file - /home/labfiles/elec_pricing.dat
format - csv
hasDelayField - true

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-47

U n i t 7 W i n d o wi n g a n d J o i n s

6.

Code a second FileSource operator that will read the data in the elec_usage.dat
file.
a) Output stream name is Usage
b) Schema for the Usage stream:
time
uint32
device
rstring
wattshours decimal32
c) Param

7.

file - /home/labfiles/elec_usage.dat
format - csv
hasDelayField - true
In your Main.spl program add a Join operator that will join the Pricing stream
with the Usage stream. Your Join operator will
a) Emit a stream called ElectricBill.
b) Read the Pricing stream into a sliding window that keeps all tuples for a
particular version of pricing (delta(version, 0u)
c) For the Usage stream be a one-sided join. So the Usage stream will have
count(0)
d) Remember, usage records are hourly-based but the pricing records are
time period based. You want to match where the time attribute from the
Usage stream is greater than or equal to the startTime attribute from the
Pricing stream and less than the endTime attribute from the Pricing
stream.
e) Since you are outputting customer billing data, you want to emit the
following using an output clause for the ElectricBill stream:
time - from the Usage stream
season - from the Pricing stream
ratecat - from the Pricing stream
device - from the Usage stream
kW - wattshours/ 1000.00dw
rate - price from the Pricing stream)
bill - (price * (wattshours / 1000.00dw))

8.

Code a FileSink operator that observes the ElectricBill stream.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-48

U n i t 7 W i n d o wi n g a n d J o i n s

9.

For Param
a) file - pricingdetail.dat
b) format - csv

10. Save your work.


11. Save your updates. (If you are going to use a standalone application, define
your standalone build and set it as active before you do the save.)

Task 2. Build and run your application.


1.
2.

3.
4.
5.

Build and run your application.


Remember that there are delay values in the data. So if you are running a
distributed application, let it run for about 45 seconds, and then terminate your
application. Due to the type of data source, a standalone application will
automatically terminate.
In the Project Explorer view, refresh the data folder, expand it and open
pricingdetail.dat.
Close the opened tabs in the Editor view.
Also, contract the ElectricJoin project as well.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-49

U n i t 7 W i n d o wi n g a n d J o i n s

Code solution
composite Main
{
graph
stream<uint32 startTime, uint32 endTime, uint32 version,
rstring season, rstring ratecat, decimal32 price> Pricing =
FileSource() {
param file
format
hasDelayField

: "/home/labfiles/elec_pricing.dat" ;
: csv ;
: true ;

}
stream<uint32 time, rstring device, decimal32 wattshours>
Usage = FileSource() {
param
file
: "/home/labfiles/elec_usage.dat" ;
format
: csv ;
hasDelayField : true ;
}
stream<uint32 time, rstring season, rstring ratecat, rstring
device, decimal32 kW, decimal32 rate,
decimal32 bill> ElectricBill = Join(Pricing; Usage) {
window
Pricing
Usage
param match
output ElectricBill

: sliding, delta(version,0u);
: sliding, count(0);
: time >= startTime && time < endTime;
: kW = wattshours / 1000.0dw,
rate = price,
bill = price * (wattshours / 1000.0dw);

}
() as Sink = FileSink(ElectricBill) {
param file
: "pricingdetail.dat" ;
format
: csv ;
}
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-50

U n i t 7 W i n d o wi n g a n d J o i n s

End of SPL Editor demonstration


Results:
You coded a Join operator to join an electrical companys rate data with
customers usage data in order to determine billing information.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

7-51

Aggregation, Punctuation, and Sorting

Aggregation, Punctuation,
and Sorting

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 8 Aggregation, Punctuation and Sorting

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-2

Unit 8 Aggregation, Punctuation and Sorting

Demonstration 1
Aggregate

In this demonstration, you will:

Use the Aggregate operator

Aggregation, Punctuation, and Sorting

Copyright IBM Corporation 2015

Demonstration 1: Aggregate

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-16

Unit 8 Aggregation, Punctuation and Sorting

Demonstration 1:
Aggregate
Purpose:
This demonstration has you code an Aggregate operator to calculate the
number of records for a particular stock, the average closing price and the
total volume of shares. It will also demonstrate the difference between
groupBy and partitioning.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-17

Unit 8 Aggregation, Punctuation and Sorting

Graphical Editor
Task 1. Background.
Use a tumbling window to access three tuples. Then calculate the total number of
tuples (makes sense doesnt it? Tumbling window with three tuples and you need to
count them.), the average price, and the total number of shares. Do this using the
groupBy parameter and then via partitioning in order to see the difference.

Task 2. Aggregate operator - groupBy.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project Aggregation. Click Finish.
Right-click on Aggregation and select New->SPL Source File. Take the
defaults and click Finish.
Drag a FileSource operator to the Main composite.
Specify the following properties for the FileSource operator.
a) Output stream name - StockReport
b) Output schema
symbol
dateTime
closingPrice
volume

rstring
rstring
decimal32
uint32

c) file - /home/labfiles/stock_report_nodelay.dat
d) format - csv
6.

Drag an Aggregate operator to the Main composite.


a) Connect the output port of the FileSource operator to the input port of the
Aggregate operator.
b) Output stream name - StockReduced
c) Output schema will be
symbol
recordCnt
avgPrice
volume

rstring
int32
decimal32
uint32

d) In the Properties view, click Param.


Add a groupBy parameter
Set the groupBy parameter value to symbol.
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-18

Unit 8 Aggregation, Punctuation and Sorting

e) Use a tumbling window that keeps three tuples (count(3)). In the


Properties view, click Window.
Change Eviction policy value to 3.
f) In the Properties view, click Output. Expand StockReduced
symbol - (This will be the symbol value for one of the three tuples that
are in a window. But since you will have specified groupBy, then the
value for all three symbols will be the same. So you could choose
Any(symbol).
recordCnt - the count of the number of tuples ( Count() ). (Count() is of
type int32).
avgPrice - the average closingPrice ( Average(closingPrice))
volume - the total volume ( Sum (volume))
7.

Drag a FileSink operator to the Main composite.


a) Connect the output port of the Aggregate operator to the input port of the
FileSink operator.
b) file - stock_reduced.dat
c) format - csv

8.

Save your updates.

Task 3. Build and run your application.


1.
2.
3.

Build and run your application.


After it has run for a few seconds, terminate your application. (If you are running
a distributed application.)
In the Project Explorer view, refresh the data folder, expand it and open
stock_reduced.dat.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-19

Unit 8 Aggregation, Punctuation and Sorting

Task 4. Discuss the results.


The viewed results are probably not what you expected. The count for each symbol
is only one but the window size was count(3).
1. Look at the input data. /home/labfiles/stock_report_nodelay.dat.
The input shows that in each set of three tuples, there are no symbols that are
duplicated. Even though you specified groupBy, if the symbol values are unique
in the window, then you are just going to have groups with only a single tuple.
Grouping only makes sense when there are multiple values to group and when
you want to apply the Aggregate operator against all of the groups in the
window.
Change the application to work as intended.
2. Change the Param for the Aggregate operator. Remove the groupBy
parameter and replace it with the partitionBy parameter. The value for
partitionBy is still symbol.
3. Since you are using the partitionBy parameter, then you need to define the
window as being partitioned.
a) In the Properties view, select Window.
b) Edit the input stream and select the Partitioned checkbox.
4.
5.
6.

7.
8.

Save your work. Your application will be rebuilt.


Launch your application again.
Open stock_reduced.dat again. This time the results should be what you
expected.
Because your window is defined as partitioned, the application will actually
create a window for each unique symbol value. Whenever any one of those
sub-windows has three tuples, the Aggregate operator fires for just the tuples
in that one sub-window.
Cancel your running job (if running as a distributed application).
Close any opened tab and contract the Aggregation project.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-20

Unit 8 Aggregation, Punctuation and Sorting

SPL Editor
Use a tumbling window to access three tuples. Then calculate the total number of
tuples (makes sense doesnt it? Tumbling window with three tuples and you need to
count them.), the average price, and the total number of shares. Do this using the
groupBy parameter and then via partitioning in order to see the difference.
The solution code follows this demonstration.

Task 1. Aggregate operator - groupBy.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select SPL


Project and click Next.
Name the project Aggregation. Click Finish.
Right click on Aggregation and select SPL Source File. Take the defaults and
click Finish.
Close the SPL Graphical operator and open Main.spl in the SPL editor.
In the Editor view code a FileSource operator with the following properties.
a) Output stream name - StockReport
b) Output schema
symbol
dateTime
closingPrice
volume

rstring
rstring
decimal32
uint32

c) file - /home/labfiles/stock_report_nodelay.dat
d) format - csv
6.

Code an Aggregate operator with the following properties.


a) Input stream name - StockReport.
b) Output stream name - StockReduced
c) Output schema will be
symbol
recordCnt
avgPrice
volume

rstring
int32
decimal32
uint32

d) Specify the groupBy parameter.


e) Use a tumbling window that keeps three tuples (count(3)).

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-21

Unit 8 Aggregation, Punctuation and Sorting

f) The output clause for StockReduced


symbol - (This will be the symbol value for one of the three tuples that
are in a window. But since you will have specified groupBy, then the
value for all three symbols will be the same. So you could choose
Any(symbol).
recordCnt - the count of the number of tuples ( Count() ). (Count() is of
type int32).
avgPrice - the average closingPrice ( Average(closingPrice))
volume - the total volume ( Sum (volume))
7.

Code a FileSink operator with the following properties.


a) The input stream name - StockReduced.
b) file - stock_reduced.dat
c) format - csv

8.

Save your updates.

Task 2. Build and run your application.


1.
2.
3.

Build and run your application.


After it has run for a few seconds, terminate your application. (If you are running
a distributed application.)
In the Project Explorer view, refresh the data folder, expand it and open
stock_reduced.dat.

Task 3. Discuss the results.


The view results are probably not what you expected. The count for each symbol is
only one but the window size was count(3).
1. Look at the input data. /home/labfiles/stock_report_nodelay.dat.
The input show that in each set of three tuples, there are no symbols that are
duplicated. Even though you specified groupBy, if the symbol values are unique
in the window, then you are just going to have groups with only a single tuple.
Grouping only makes sense when there are multiple values to group and when
you want to apply the Aggregate operator against all of the groups in the
window.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-22

Unit 8 Aggregation, Punctuation and Sorting

Change the application to work as intended.


2. Change the Param for the Aggregate operator. Remove the groupBy parameter
and replace it with the partitionBy parameter. The value for partitionBy is still
symbol.
3. Since you are using the partitionBy parameter, then you must define the
window as being partitioned.
a) Add the Partitioned parameter to the window definition.
4.
5.
6.

7.
8.

Save your work. Your application will be rebuilt.


Launch your application again.
Open stock_reduced.dat again. This time the results should be what you
expected.
Because your window is defined as partitioned, the application will actually
create a window for each unique symbol value. Whenever any one of those
sub-windows has three tuples, the Aggregate operator fires for just the tuples
in that one sub-window.
Cancel your running job (if running as a distributed application).
Close any opened tab and contract the Aggregation project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-23

Unit 8 Aggregation, Punctuation and Sorting

Code solution
Aggregate operator using groupBy
composite Main {
graph
stream<rstring symbol, rstring dateTime, decimal32 closingPrice,
uint32 volume> StockReport = FileSource(){
param

file
format

: "/home/labfiles/stock_report_nodelay.dat";
: csv;

}
stream<rstring symbol, int32 recordCnt, decimal32 avgPrice,
uint32 volume> StockReducer = Aggregate(StockReport){
window
param
output

StockReport
groupBy

: tumbling, count(3);
: symbol;

StockReducer

: symbol = Any(symbol),
avgPrice = Average(closingPrice),
recordCnt = Count(),
volume = Sum(volume);

}
() as Sink = FileSink(StockReducer) {
param
file
: "stock_reduced.dat";
format
: csv;
}
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-24

Unit 8 Aggregation, Punctuation and Sorting

Aggregate operator using partitionBy


stream<rstring symbol, int32 recordCnt, decimal32 avgPrice,
uint32 volume> StockReducer = Aggregate(StockReport){
window StockReport : tumbling, count(3), partitioned;
param
partitionBy
: symbol;
output
StockReducer : symbol = Any(symbol),
avgPrice = Average(closingPrice),
recordCnt = Count(),
volume = Sum(volume);
}

End of SPL Editor demonstration


Results:
This demonstration had you code an Aggregate operator to calculate the
number of records for a particular stock, the average closing price and the
total volume of shares. It also demonstrated the difference between groupBy
and partitioning.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-25

Unit 8 Aggregation, Punctuation and Sorting

Demonstration 2
Sort and Punctor

In this demonstration, you will:


Use the Sort and Punctor operators

Aggregation, Punctuation, and Sorting

Copyright IBM Corporation 2015

Demonstration 2: Sort and Punctor

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-43

Unit 8 Aggregation, Punctuation and Sorting

Demonstration 2:
Sort and Punctor
Purpose:
You want to code a Sort operator and a Punctor operator in order to group
transactions for the same state as well as highlighting transactions that are
equal to or greater than 1000.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 40 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-44

Unit 8 Aggregation, Punctuation and Sorting

Graphical Editor
Gather ten tuples in a window and sort them on the specified sort keys. Then use
the Punctor operator to insert punctuation whenever the value for the state changes.
But just doing that would be too easy so you are to also isolate any transaction
where the amount attribute is 1000 or greater. In this case, isolate means to output
a punctuation both before and after the tuple.

Task 1. Sort operator.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project SortPunctor. Click Finish.
Right click on SortPunctor and select New->SPL Source File. Take the
defaults and click Finish.
Drag a FileSource operator to the Main composite.
Specify the following FileSource properties.
a) Output stream name - SalesDetail
b) Output stream schema
storeNumber
city
st
amount

uint32
rstring
rstring
decimal32

c) file - /home/labfiles/storesales.dat
d) format - csv
6.
7.
8.

Drag a Sort operator to Main composite.


Connect the output port of the FileSource operator to the input port of the Sort
operator.
Set the following Sort operator properties.
a) Output a stream name - SortedDetail.
b) The schema for the output port is the same as for the input port.
c) Use a tumbling window that keeps ten tuples (count(10)).
d) Add a sortBy parameter. Its value is storeNumber, city, st, amount in
ascending (asc) sequence using the order parameter set to ascending.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-45

Unit 8 Aggregation, Punctuation and Sorting

Task 2. Punctor operator.


1.
2.

Drag Punctor operator to the Main composite.


Connect the output port from the Sort operator to the input port of the Punctor
operator.
a) The Punctors output stream name - DividedDetail.
b) The output stream schema is the same as the input stream schema.
c) Specify the punctuate parameter. You want to compare the value of the
St attributes between the current tuple and the previous tuple and add a
punctuation before the current tuple if the state values are not equal
(st!=SortedDetail[1].st)
d) Or (| |) if the amount attribute is greater than or equal to 1000.00, then
you want to add a punctuation before the current tuple.
e) Now there is one other predicate that you have to consider. The goal is to
isolate transactions of 1000.00 or more. We already said that, when the
amount was greater than or equal to 1000.00, we were going to put a
punctuation before the current tuple. But you also need to figure out a
way to get a punctuation after the current tuple as well. This presents a
problem because you can only specify one position parameter (before /
after) not both. Change your focus a bit. Move off of the current tuple (the
nth tuple) to the next tuple (the nth + 1). If you place a punctuation mark
before the nth + 1 tuple, that is the same thing as placing a punctuation
mark after the nth tuple. So check the previous tuples amount value and
if is equal to or greater than 1000.00, then write a punctuation before the
current tuple which is, as we have determined, after the previous tuple,
and thereby will place punctuation before and after a tuple with an amount
greater than or equal to 1000.00. (If you are confused by all of this, great.
I worked very hard to try to word this step to cause the greatest amount of
confusion. Draw it out on a piece of paper to see how the logic works.)
The punctuate and position parameter values should be as follows:
punctuate - st != SortedDetail[1].st ||amount >= 1000.00dw
||SortedDetail[1].amount >= 1000.00dw
position - before

3.
4.

Drag a FileSink operator to the Main composite


Connect the output port of the Punctor operator to the input port of the FileSink
operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-46

Unit 8 Aggregation, Punctuation and Sorting

5.

Specify the following properties.


a) file - divideddetail.dat
b) format - csv

6.

Save your updates.

Task 3. Build and run your application.


1.
2.

Build and run your application.


After it has run for a few seconds, terminate your application. (IF you are
running distributed.)
3. In the Project Explorer view, refresh the data folder, expand it and open
divideddetail.dat.
Whoa, what happened to your output? Where is the punctuation?
The FileSink operator does not expect punctuation. So by default it is not going
to output any punctuation. But it can be forced to do so.
4. Add the parameter writePunctuations = true to your FileSink operator.
5. Relaunch your application and look at the data.
6. Close the opened tabs in the Editor view.
7. Also, contract the SortPunctor project as well.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-47

Unit 8 Aggregation, Punctuation and Sorting

SPL Editor
Gather ten tuples in a window and sort them on the specified sort keys. Then use
the Punctor operator to insert punctuation whenever the value for the state changes.
But just doing that would be too easy so you are to also isolate any transaction
where the amount attribute is 1000 or greater. In this case, isolate means to output
a punctuation both before and after the tuple.
The solution code follows this demonstration.

Task 1. Sort operator.


1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project SortPunctor. Click Finish.
Right click on SortPunctor and select SPL Source File. Take the defaults and
click Finish.
In the Editor view, close the open tab and then open Main.spl using the SPL
editor.
Code a FileSource operator with the following properties.
a) Output stream name - SalesDetail
b) Output stream schema
storeNumber
city
st
amount

uint32
rstring
rstring
decimal32

c) file - /home/labfiles/storesales.dat
d) format - csv
6.

Code a Sort operator with the following properties.


a) Input stream - SalesDetail
b) Output a stream name - SortedDetail.
c) The schema for the output port is the same as for the input port.
d) Use a tumbling window that keeps ten tuples (count(10)).
e) the sort key (sortBy) is storeNumber, city, st, amount in ascending
(asc) sequence.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-48

Unit 8 Aggregation, Punctuation and Sorting

Task 2. Punctor operator.


1.

Code a Punctor operator with the following properties..


a) Input stream - SortedDetail
b) Output stream name - DividedDetail.
c) The output stream schema is the same as the input stream schema.
d) You want to compare the value of the St attributes between the current
tuple and the previous tuple and add a punctuation before the current
tuple if the state values are not equal (st!=SortedDetail[1].st)
e) Or (| |) if the amount attribute is greater than or equal to 1000.00, then
you want to add a punctuation before the current tuple.
f) Now there is one other predicate that we have to consider. Our goal is to
isolate transactions of 1000.00 or more. We already said that, when the
amount was greater than or equal to 1000.00, we were going to put a
punctuation before the current tuple. But we also need to figure out a way
to get a punctuation after the current tuple as well. This presents a
problem because we can only specify one position parameter (before /
after) not both. Lets change our focus a bit. Lets move off of the current
tuple (the nth tuple) to the next tuple (the nth + 1). If we place a
punctuation mark before the nth + 1 tuple, that is the same thing as
placing a punctuation mark after the nth tuple. So if we check the previous
tuples amount value and it is equal to or greater than 1000.00, then we
can write a punctuation before the current tuple which is, as we have
determined, after the previous tuple, and thereby will place punctuation
before and after a tuple with an amount greater than or equal to 1000.00.
(If you are confused by all of this, great. I worked very hard to try to word
this step to cause the greatest amount of confusion. Draw it out on a
piece of paper to see how the logic works.)
The punctuate and position parameter values should be as follows:
punctuate - st != SortedDetail[1].st ||amount >= 1000.00dw
||SortedDetail[1].amount >= 1000.00dw
position - before

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-49

Unit 8 Aggregation, Punctuation and Sorting

2.

Code a FileSink operator with the following operators.


a) Input stream - DividedDetail
b) file - divideddetail.dat
c) format - csv

3.

Save your updates.

Task 3. Build and run your application.


1.
2.

Build and run your application.


After it has run for a few seconds, terminate your application. (This assumes
you are running a distributed application.)
3. In the Project Explorer view, refresh the data folder, expand it and open
divideddetail.dat.
Whoa, what happened to your output? Where is the punctuation?
The FileSink operator does not expect punctuation. So by default it is not going
to output any punctuation. But it can be forced to do so.
4. Add the parameter writePunctuations: true to your FileSink operator.
5. Relaunch your application and look at the data.
6. Close the opened tabs in the Editor view.
7. Also, contract the SortPunctor project as well.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-50

Unit 8 Aggregation, Punctuation and Sorting

Code solution
With no punctuation output
composite Main
{
graph
stream<uint32 storeNumber, rstring city, rstring st,
decimal32 amount> SalesDetail = FileSource(){
param

file
format

: "/home/labfiles/storesales.dat" ;
: csv ;

}
stream<SalesDetail> SortedDetail = Sort(SalesDetail){
window
param

SalesDetail
sortBy

: tumbling, count(10) ;
: storeNumber, city, st, amount ;

}
stream<SortedDetail> DividedDetail = Punctor(SortedDetail) {
param

punctuate

: st != SortedDetail[1].st ||amount
>= 1000.00dw
||(SortedDetail[1].amount >= 1000.00dw;
position
: before ;

}
() as FileSink_1 = FileSink(DividedDetail) {
param

file
format

: "divideddetail.dat";
: csv ;

}
}
FileSink that allows punctuation
() as FileSink_1 = FileSink(DividedDetail) {
param

file
: "divideddetail.dat";
format
: csv ;
writePuncutations : true

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-51

Unit 8 Aggregation, Punctuation and Sorting

End of SPL Editor demonstration


Results:
You coded a Sort operator and a Punctor operator in order to group
transactions for the same state as well as highlighting transactions that are
equal to or greater than 1000.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

8-52

Timing and Coordination

Timing and Coordination

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-2

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

Demonstration 1
Barriers and Switches

In this demonstration, you will


Code a Barrier operator
Code a Switch operator

Timing and Coordination

Copyright IBM Corporation 2015

Demonstration 1: Barriers and Switches

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-21

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

Demonstration 1:
Barriers and Switches
Purpose:
You will code two applications. One will employ a Barrier operator to combine
related tuples from multiple streams and another one will demonstrate the
working of the Switch operator.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Task 1. Barrier operator.


Consider the situation where two sets of data, within a tuple, need to be processed
separately. If the processing of one of the sets is not dependent upon the other,
then one might consider processing these sets in parallel. However, normally, data
is in a tuple because ultimately, all of that data will be used together. So if you split
the two sets of data, in order to process them in parallel, you will need to later rejoin
the two sets into a single tuple. There needs to be a mechanism to delay the faster
processing set of data so that in can be married to the appropriate slower
processing set of data. Hence the need for the Barrier operator.
Although it is possible to develop a Barrier demonstration that accomplishes the
above scenario, on a single node system, it would not be obvious as to how the
Barrier operator actually functions. The following demonstration does not utilize the
Barrier operator in the way that it was designed to be used. But it will demonstrate
how the Barrier operator delays data arriving on one port until data arrives on a
second port.
1. In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
2. Name the project BarrierProj. Click Finish.
3. Right click on BarrierProj and select New->SPL Source File. Take the defaults
and click Finish.
Set up a Beacon operator that emits a total of ten tuples with each tuple being
emitted every half second.
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-22

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

4.
5.

Drag a Beacon operator to the Main composite.


Specify the following properties for the Beacon_1 operator:
a) Output stream - S1
b) Output schema
barrierInfo1 rstring
c) Param
period - 0.5
iterations - 10
d) In the Properties view, click Output. Expand S1. Specify a value of
"source1" for the barrierInfo1 attribute.
Code and connect a Custom operator to the Beacon1 operator. This Custom
operator is to print a line each time a tuple is observed and immediately submit
the tuple.
6. Drag a Custom operator to the Main composite.
7. Click on the output port of the Beacon1 operator and an input port will appear
for the Custom operator. Connect the two.
8. Rename the Custom operators alias to PrtFast.
9. Specify the following properties for the Custom operator
a) Add a new Output Port
b) Output stream name - FastData
c) Output stream schema is the same as the input schema
d) In the Properties view, click Logic. In the logic edit area code the following:
onTuple S1 : {println("Read record");
submit(S1, FastData); }
Set up a second Beacon operator that emits a tuple every half second and do
this ten times.
10. Drag a second Beacon operator to Main composite.
11. Specify the following properties for the Beacon_2 operator:
a) Output stream - S2
b) Output schema
barrierInfo2 rstring
c) Param
period - 0.5
iterations - 10
d) In the Properties view, click Output. Expand S2. Specify a value of
"source2" for the barrierInfo2 attribute.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-23

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

12. Drag a Delay operator to the Main composite. Position it to the right of the 2nd
Beacon operator.
13. Connect the output port of the Beacon operator to the input port of the Delay
operator.
14. Specify the following properties for the Delay operator.
a) Output stream - DelayedData
b) Output schema - same as the input stream
c) For Param, set the delay to 5.0
15. Drag a Barrier operator to the Main composite.
16. Connect the output port of PrtFast to one of the input ports of the Barrier
operator. Connect the output port of the Delay operator to the other input port of
the Barrier operator.
17. Specify the following properties for the Barrier operator.
a) Output stream - CombinedData
b) Output schema
source1 rstring
source2 rstring
c) In the Properties view, click Output. Expand combinedData and set
source1 barrierInfo1
source2 barrierInfo2
18. Drag a Custom operator to the Main composite.
19. Connect the output port of the Barrier operator to the input port of the Custom
operator.
20. In the Properties view for this Custom operator, click General and rename the
Alias to PrtCombined.
21. In the Properties view, click Logic. In the edit area, code the following:
onTuple CombinedData : println(source1 + " " + source2);
22. Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-24

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

Task 2. Build and run your application.


This is an explanation as to what you should expect to see. You have two Beacon
operators that are generating a tuple every half second. The PrtFast operator is
going to print a message each time it receives a tuple and then emit that tuple. You
should see ten printed messages in the first five seconds. The ten tuples emitted by
the Beacon_1 operator will reach the Barrier operator in the first five seconds.
But the ten tuples emitted by the Beacon_2 operator flow through a Delay operator
where they are held for five seconds. So those ten tuples will reach the Barrier
operator five seconds after the other set of tuples. Because there must be a tuple
received on each of the input ports of the Barrier operator in order for the operator
to emit a tuple, the first set of tuples is held until the second set arrives. Then the
tuples are combined and passed to the PrtCombined operator where the data is
printed. This printed data should appear after the printing by the PrtFast operator
has completed.
1. Build a standalone application and then launch it. (Since you are printing data,
you must run as a standalone application.)
2. Close the opened tabs in the Editor view.
3. Also, contract the BarrierProj project as well.

Task 3. The Switch operator.


Use a Beacon operator to generate sixty tuples at a rate of one every half second. A
second Beacon operator is to generate control tuples. Both Beacon operators feed a
Switch operator. The output of the Switch operator feeds a Custom operator that prints
the tuple data. Read the control tuples after a five second delay. The Switch operator
initially is to allow tuples to pass through it. Because of this, you will see printing for five
seconds. Then the second Beacon operator generates and passes control data that
turns off the Switch operator. So for five seconds there will not be any printing. Then the
second Beacon operator generates and sends control data that turns on the Switch
operator. This process is to continue for several iterations.
1.
2.
3.

4.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project SwitchProj. Click Finish.
Right click on SwitchProj and select New->SPL Source File. Take the defaults
and click Finish.
This Beacon operator will generate the test data.
Drag a Beacon operator to the Main composite.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-25

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

5.

Specify the following properties for the Beacon_1 operator:


a) Output stream - DataFlow
b) Output schema
beaconInfo rstring
c) Param
period - 0.5
iterations - 60
d) In the Properties view, click Output, expand DataFlow, and set beaconInfo
to "Beacon Data".
This second Beacon operator is to generate the control data. It is to wait five
seconds before generating the first tuple. The value in the lone attribute in its
tuple is increment from zero to nine, zero to nine, etc. Each tuple is to be
emitted once every second.
6. Drag a second Beacon operator to the Main composite.
7. Specify the following properties for the Beacon_2 operator:
a) Output Stream - ControlData
b) Output schema
setSwitch uint64
c) Param
initDelay - 5.0
iterations - 40
period - 1.0
d) In the Properties view for the Beacon_2 operator, click Output, expand
ControlData, and set the value for setsSwtich to
IterationCount() % (uint64)10
8. Drag a Switch operator to the Main composite. Position is to the right of the
Beacon operators.
9. Connect the output port of the Beacon_1 operator to the upper input port for
the Switch operator.
10. Connect the output port of the Beacon_2 operator to the lower input port for the
Switch operator.
The Switch operator, initially, is to allow tuples to pass. Every second it is to
receive a control tuple and check the value of the setSwitch attribute. If the
value is greater than four, the status is set to true otherwise it is set to false.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-26

U n i t 9 Ti m i n g a n d c o o r d i n a t i o n

11. Specify the following properties for the Switch operator:


a) The output stream schema is the same as the input stream schema from
Beacon_1.
b) Param
initialStatus - true
status - setSwitch > (uint64)4
12. Drag a Custom operator to the Main composite.
13. Click on the output port for the Switch operator and connect it to the input port
for the Custom operator.
14. Specify the following properties for the Custom operator:
a) Click Logic and add the following code. (It assumes that the input stream
name is Switch_1_out0.)
onTuple Switch_1_out0 : println(beaconInfo);
15. Save your work.

Task 4. Build and run your application.


1.

2.
3.

Build a standalone application and then launch it. (Since you are printing data,
you must run as a standalone application.)
You should see "Beacon Data" print for about five seconds. Then you should
see the printing pause for about five seconds. Then this process repeats several
times.
Close the opened tabs in the Editor view.
Also, contract the SwitchProj project as well.

Results:
You coded to two applications. One employed a Barrier operator to combine
related tuples from multiple streams and another one demonstrated the
working of the Switch operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

9-27

Consistent Regions

Consistent Regions

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 11 Consistent regions

Demonstration 1
Consistent Regions

In this demonstration, you will:


Implement a consistent region
Recover from a fault

Consistent Regions

Copyright IBM Corporation 2015

Demonstration 1: Consistent Regions

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-20

Unit 11 Consistent regions

Demonstration 1:
Consistent Regions
Purpose:
You want to create a Consistent Region over a Beacon and FileSink operator
to create a fault tolerant application.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-21

Unit 11 Consistent regions

Graphical Editor
Task 1. Create the operators.
1.
2.
3.

4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project from the InfoSphere Streams Studio branch and click Next.
Name the project ConsistentRegions. Click Finish.
Right click on ConsistentRegions and select New->SPL Source File. Take
the defaults and click Finish.
First we will add the Beacon operator that generates our data to be written.
Drag a Beacon operator to the Main composite.
Edit the Beacon operators properties:
a) name the Output stream Sequence
b) Schema for the Sequence stream:
num uint64
c) Param values:
initDelay 5f
iterations 500u
period 1.0f
d) Go to the output tab and set the value of num to IterationCount()

6.
7.

Now we will add a FileSink to write results to. The operator will include logic to
simulate a fault.
Drag a FileSink operator to the Main composite.
Connect the Beacon to the FileSink operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-22

Unit 11 Consistent regions

8.

Edit the FileSink operators properties:


a) Param values:
file results.txt
b) Go to the Logic tab and write the following code:
onTuple Sequence :
{
// fail once on the 60th tuple
if(num == (uint64)60 && getRelaunchCount() == 0u)
{
abort() ;
}
}
The abort() functions causes the processing element for the FileSink operator to
stop execution, simulating a failure in the application. Streams will automatically
attempt to relaunch it. abort() will only execute if the element has not yet been
relaunched (getRelaunchCount() == 0u), so that it wont keep failing every
time it reaches the 60th tuple.

Task 2. Add a Consistent Region.


1.
2.
3.

4.
5.

6.

7.

8.

Right click the Beacon operator in the graphical editor.


Run Annotations -> Add Consistent Region
The trigger determines how often a consistent state will be established. It can
be periodic or operator-driven. For operator-driven triggers the interval is
determined by a corresponding parameter in the operator. For example, in the
case of a Beacon, you would need to set triggerCount, which establishes a
consistent region every n iterations.
Set the trigger to Periodic.
Set a period of 0.05. Note that this very short interval is a bit overkill. We are
only using it so you will be able to see the checkpoint operation happening
you will see in a little while.
Click Apply to save the changes.
There should now be a grey bar across the Beacon and FileSink operators in
the graphical editor, as well as an @ symbol indicating that there are
annotations for that operator.
All applications with consistent regions require a JobControlPlane operator.
Search for the JobControlPlane operator and add it to your main composite. It
does not need to be connected to anything or require any configuration.
Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-23

Unit 11 Consistent regions

Task 3. Build and run your application.


Before running your application we will do a couple things so you can see the
consistent regions in action.
1. In the Project Explorer, expand Consistent Regions -> Resources
2. Right click data and select New -> File
3. Name the file results.txt
4. Open a terminal.
5. Run the following command to go to the output directory of the application:
cd
/home/student/StreamsStudio/workspace/ConsistentRegions/da
ta
6. Now run tailf on the results file so we can watch it being updated.
tailf results.txt
7. Go back to Streams Studio. Keep this terminal open.
Note that when your Streams instance was first set up, two properties were set:
instance.checkpointRepository
instance.checkpointRepositoryConfiguration
These properties are necessary so that your instance can use consistent
regions. The checkpointRepositoryConfiguration property on your image is set
to use /tmp/streamscheckpoint as the directory for checkpointing. This
directory must already be created in order for the consistent region feature of
this demo to work.
Open a Linux console and cd to the /tmp directory.
cd /tmp
Take a look to see if the streamscheckpoint directory has already been
created within /tmp.
ls -l
Look for streamscheckpoint within /tmp. Not there already? If not, then
create the directory and set its permissions to 777.
mkdir /tmp/streamscheckpoint
chmod -R 777 /tmp/streamscheckpoint
ls -l
Confirm the directory is now there and move on to the next step.
8. Build and run your application. Make sure you are running in Distributed mode.
9. Open the Streams Explorer tab.
10. Expand Streams Jobs.
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-24

Unit 11 Consistent regions

11. Right-click the job (the most recent topmost one, if you have other un-cancelled
jobs) and select Show Instance Graph.
12. Bring up the terminal you opened and position it so you can see it alongside the
instance graph.
13. Once the operator is up and running (The instance graph is all green) you
should see the sequence of numbers being printed in the terminal as they are
written to the file.
14. Also notice the grey bars in the instance graph will periodically change to
>>>>>> symbols. This indicates a checkpoint for the consistent region is being
established.
15. Look for a few things to happen as our FileSink runs into its programmed crash
at the 60th tuple:
a) The terminal will stop printing numbers after 59.
b) The instance graph will turn red and alerts will be thrown.
c) As the FileSink operator is relaunched, the grey bars in the graph will change
to <<<<<<, indicating a reset to the last checkpoint.
d) The terminal will resume printing numbers, starting with 60. No tuples have
been lost.
16. When youre ready, close the terminal and cancel the job by right-clicking it in
the Streams Explorer and selecting Cancel Job.
17. Close the opened tabs in the Editor view.
18. Also, contract the ConsistentRegions project.

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-25

Unit 11 Consistent regions

SPL Editor
Task 1. Create the operators.
1.
2.
3.
4.
5.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project from the InfoSphere Streams Studio branch and click Next.
Name the project ConsistentRegions. Click Finish.
Right click on ConsistentRegions and select New->SPL Source File. Take
the defaults and click Finish.
Close the SPL Graphical editor and open main.spl using the SPL Editor.
First we will add the Beacon operator that generates our data to be written.
In the editor view code a Beacon operator.
a) Name the stream Sequence
b) Schema for the Sequence stream:
num uint64
c) Param values:
initDelay 5f
iterations 500u
period 1.0f
d) Set the output as follows:
output
Sequence : num = IterationCount() ;
Now we will add a FileSink to write results to. The operator will include logic to
simulate a fault.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-26

Unit 11 Consistent regions

6.

Code a FileSink operator that takes Sequence as input.


a) Code the following logic clause:
onTuple Sequence :
{
// fail once on the 60th tuple
if(num == (uint64)60 && getRelaunchCount() == 0u)
{
abort() ;
}
}
b) Param values:
file results.txt
The abort() functions causes the processing element for the FileSink operaor to
stop execution, simulating a failure in the application. Streams will automatically
attempt to relaunch it. abort() will only execute if the element has not yet been
relaunched, so that it wont keep failing every time it reaches the 60th tuple.

Task 2. Add a Consistent Region.


1.

2.

Adding a consistent region is simple. Add the following line, directly above your
Beacon operator in the editor:
@consistent(trigger = periodic, period = 0.05)
The trigger determines how often a consistent state will be established. It can be
periodic or operator-driven. For operator-driven triggers the interval is
determined by a corresponding parameter in the operator. For example, in the
case of a Beacon, you would need to set triggerCount, which establishes a
consistent region every n iterations.
The period is 0.05. Note that this very short interval is a bit overkill. We are only
using it so you will be able to see the checkpoint operation happening you will
see in a little while.
If you go back to the graphical editor, there should now be a grey bar across the
Beacon and FileSink operators, as well as an @ symbol indicating that there are
annotations for that operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-27

Unit 11 Consistent regions

3.

4.

All applications with consistent regions require a JobControlPlane operator.


Code one into your application:
() as JCP = JobControlPlane()
{
}
It does not need to be connected to anything or require any configuration.
Save your work.

Task 3. Build and run your application.

1.
2.
3.
4.
5.

6.
7.

Before running your application we will do a couple things so you can see the
consistent regions in action.
In the Project Explorer, expand Consistent Regions -> Resources
Right click data and select New -> File
Name the file results.txt
Open a terminal.
Run the following command to go to the output directory of the application:
cd
/home/student/StreamsStudio/workspace/ConsistentRegions/da
ta
Now run tailf on the results file so we can watch it being updated.
tailf results.txt
Go back to Streams Studio. Keep this terminal open.
Note that when your Streams instance was first set up, two properties were set:
instance.checkpointRepository
instance.checkpointRepositoryConfiguration
These properties are necessary so that your instance can use consistent
regions. The checkpointRepositoryConfiguration property on your image is set
to use /tmp/streamscheckpoint as the directory for checkpointing. This
directory must already be created in order for the consistent region feature of
this demo to work.
Open a Linux console and cd to the /tmp directory.
cd /tmp
Take a look to see if the streamscheckpoint directory has already been
created within /tmp.
ls -l

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-28

Unit 11 Consistent regions

8.
9.
10.
11.
12.
13.

14.

15.

16.
17.
18.

Look for streamscheckpoint within /tmp. Not there already? If not, then
create the directory and set its permissions to 777.
mkdir /tmp/streamscheckpoint
chmod -R 777 /tmp/streamscheckpoint
ls -l
Confirm the directory is now there and move on to the next step.
Build and run your application. Make sure you are running in Distributed mode.
Open the Streams Explorer tab.
Expand Streams Jobs.
Right-click the job (the most recent topmost one, if you have other un-cancelled
jobs) and select Show Instance Graph.
Bring up the terminal you opened and position it so you can see it alongside the
instance graph.
Once the operator is up and running (The instance graph is all green) you
should see the sequence of numbers being printed in the terminal as they are
written to the file.
Also notice the grey bars in the instance graph will periodically change to
>>>>>> symbols. This indicates a checkpoint for the consistent region is being
established.
Look for a few things to happen as our FileSink runs into its programmed crash
at the 60th tuple:
a) The terminal will stop printing numbers after 59.
b) The instance graph will turn red and alerts will be thrown.
c) As the FileSink operator is relaunched, the grey bars in the graph will change
to <<<<<<, indicating a reset to the last checkpoint.
d) The terminal will resume printing numbers, starting with 60. No tuples have
been lost.
When youe ready, close the terminal and cancel the job by right-clicking it in the
Streams Explorer and selecting Cancel Job.
Close the opened tabs in the Editor view.
Also, contract the ConsistentRegions project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-29

Unit 11 Consistent regions

Code Solution: Main.spl


composite Main
{
graph
@consistent(trigger = periodic, period = 0.05)
(stream<uint64 num> Sequence) = Beacon()
{
param
initDelay : 5f ;
iterations : 500u ;
period
: 1.0f ;
output
Sequence

: num = IterationCount() ;

}
() as JCP = JobControlPlane()
{
}
() as FileSink_1 = FileSink(Sequence)
{
logic
onTuple Sequence :
{
if(num == (uint64)60 && getRelaunchCount() == 0u)
{
abort() ;
}
}
param
file

: "results.txt" ;

}
}

End of SPL Editor demonstration


Results:
You created a Consistent Region over a Beacon and FileSink operator to
create a fault tolerant application.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

11-30

Debugging

Debugging

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 13 Debugging

Demonstration 1
Debugging

In this demonstration, you will:

Set up debugging for an application


Use debug commands to:

Set breakpoints

Insert new tuples

Update tuples

Remove tuples

Work with metrics

Debugging

Copyright IBM Corporation 2015

Demonstration 1: Debugging

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

13-36

Unit 13 Debugging

Demonstration 1:
Debugging
Purpose:
You want to work with the debugging capabilities for the InfoSphere Streams
Processing Language. You want to investigate Streams metrics information.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Task 1. Breakpoints.

1.
2.

3.
4.
5.

6.
7.

You are to add a breakpoint on the output port of your FileSource operator. Add
a tracepoint on the output port of the Functor operator. Then at some point
create an inject point on the input port of the Functor operator.
In the Eclipse Project Explorer view, expand the application project called
Debugging. Then expand Resources.
Open the Main.spl file using the SPL editor. (Right-click Main.spl and select
Open With->SPL Editor.) Once again there is a FileSource operator that
reads from a data file.
Note that the filter parameter that only allows tuples that have a ticker attribute
equal to either IBM or GOOG to be emitted.
You are to use a standalone application for the debug demonstration. You could
use a distributed application as well but for this demonstration, debugging a
standalone application will be easier.
First prepare your project to create a standalone application. Expand
<default_namespace>->Main.
Right-click on Main and select New->Standalone Build.
To run in debug mode, you must build your application with the -g option. Since
you are using the IDE, this can be done automatically. On the Main Standalone dialog, select Debug.
From the Streams Debugger (SDB) drop down, select Debug application
with SDB. Then click OK.
Right-click on Standalone and select Set Active. This rebuilds the application
with the debug option.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

13-37

Unit 13 Debugging

8.

9.

10.
11.
12.
13.
14.

15.

16.
17.

18.

19.

20.

Once the building of the application completes, right-click on Standalone and


select Launch. Click the Apply pushbutton and then the Continue pushbutton.
The debug window appears. (The window may be behind the Ecliplse window.)
In the debug window, set a breakpoint on the output port of the FileSource
operator. (The name of an operator instance is the name of the stream emitted.
In the case of sink type operators, there is no output stream, hence the
requirement that you name each instance.) For the Filesource operator, there is
only one output port, which is referenced as port 0.
b StockReport o 0
In the debug window, set a tracepoint on the output port of the Filter operator.
t SelectedStocks o 0
In the debug window, allow your application to execute. Enter a g
Look in the debug window. Displayed is the first tuple.
In the debug window, enter c 0 to continue the breakpoint. The next tuple is
displayed.
In the debug window, enter s 1 t. This displays the tracepoint cache in a table
format. There is only one tuple in the cache and that is the first tuple which has
a ticker of IBM.
The current tuple that is being displayed for the breakpoint has a ticker value of
GOOG. Based upon the Filter predicate, this tuple is to be emitted by the Filter
and therefore should be inserted into the tracepoint cache. But you are not
going to let that happen. In the debug window, enter x 0. This deletes the
current tuple.
Next enter c 0 to continue the breakpoint and a new tuple is displayed.
Enter s 1 t to list out the tracepoint cache. The IBM tuple is still the only tuple in
the cache. The GOOG tuple was deleted and therefore was never placed into
the cache.
Update the current breakpoint tuple. Change the ticker attribute from HPQ to
IBM.
u 0 ticker "IBM"
A ticker equal to HPQ would not have passed the Filter predicate and you would
not expect to see a second tuple in the trace cache. But since you just updated
the ticker value before the tuple reached the Filter, you should expect to see a
second tuple in the trace cache.
Continue the breakpoint processing. Enter c 0.
Display the tuples in the tracepoint cache. s 1 t.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

13-38

Unit 13 Debugging

21. Now set an injection point for the input port of the Filter operator. This creates a
tuple with all attributes set to their default values. (The command starts with a
lower case i. The font below may look like a lowercase l.)
i SelectedStocks i 0
22. Update some of the attribute values as follows. (Note that you must refer to the
probe point that was assigned to the injection point):
u 2 ticker "GOOG"
u 2 volume "10000"
23. Continue the injection point. Enter c 2.
24. Display the tuples in the tracepoint cache. s 1 t. Your new tuple was emitted by
the Functor and was written to the tracepoint cache.
25. Remove the breakpoint and the tracepoint.
r 0
r 1
c *
26. Return to the Eclipse IDE. Terminate your standalone application. (Click on the
red square.)
27. Refresh your data folder and view the contents of result.dat. The second record
is the one for which you changed the ticker from HPQ to IBM. The GOOG tuple
that you deleted is not in the output but the GOOG tuple that you inserted is.
28. Close all of the opened tabs and contract your Debugging project.

Task 2. Metrics.

1.
2.
3.
4.
5.

This section is to allow you to become more familiar with the metrics capabilities
of InfoSphere Streams.
To work with the Streams metrics, you have to be running a distributed
application. Start your Streams instance.
In the Eclipse IDE, click the Streams Explorer tab.
Under Instances right-click default:StreamsInstance@StreamsDomain and
select Start instance if your instance is not already running.
Click the Project Explorer tab.
Build the Main composite in the DebugInstGraph project.
Expand the DebugInstGraph project.
Drill down on DebuginstGraph-><default_namespace->Main.
Right-click Distributed and select Build.

6.
7.

Right-click Distributed and select Launch.


Click Apply and then Continue.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

13-39

Unit 13 Debugging

8.
9.

Return to the Streams Explorer.


Right-click Streams Instances-> default:StreamsInstance@StreamsDomain
and select Show Instance Graph.
10. Move the cursor over some of the operators. The pop-up displays information
about that operator, number of tuples observed and emitted, tuple rate, and
health of the operator.
11. Select some other color schemes to get an idea as to what they depict.
12. You've previously seen one way to open up the metrics view. Let's try another
way. In the top menu of Streams Studio click Window->Show View->Other
Select InfoSphere Streams->Metrics from the listing. The Metrics screen
should now be open.
Click the Load View button in the upper right (the button looks like a down
arrow on top of a folder)

13.

14.
15.
16.
17.
18.
19.
20.

A new dialogue box will open asking you to select an instance to add to the
Metrics. Select default:StreamsInstance@StreamsDomain from the list and click
OK.
In the Metrics view, expand default: ->0:Main->Main. You can view metrics on
a PE basis or on an operator basis. This application has each operator in its
own PE, hence the reason that there are ten operators listed as well as ten PEs.
Expand Aggregate_1 and select Input[0]. you can see the number of tuples
processed, dropped, queued, etc. over time.
Right-click Output[0]:Aggregate_1_out0 and select Show Data.
As it turns out for this simple application, there is only a single attribute in the
Aggregate operators output tuple. Select value:uint64. Then click OK.
In a second or two a new Properties view is opened and the actual value of the
value attribute is being displayed.
Below the Properties view titlebar, click Stop. Then close this Properties view.
Expand PE:0[Heathy]->Output[0]::Beacon_1_out0.
First note that PE:0{Heathy} indicates that PE:0 is running properly.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

13-40

Unit 13 Debugging

21. Click on any of the expanded items and you can see the metrics collected at
that particular level.
22. Right-click PE:0[Heathy] and you can see that you can access tracing and log
files.
23. In the Streams Explorer right-click the instance and select Set Service Trace
Levels.
24. From this dialog you can set the logging and tracing levels. You can do so for
the overall instance or just focus on a particular service. Click Cancel.
25. Close both the Metrics and Instance Graph view tabs.
26. Make sure that the Properties tab has the focus.
27. In the Streams Explorer, Drill down to Streams Domains -> StreamsDomain
-> Instances -> default: -> 0:Main. Note the information in the Properties view.
28. Expand 0:Main and drill down on some of the items. Look at the information
that gets displayed for each item in the Properties view. This is another great
source of information that you can use when trying to debug a problem.
29. Right-click 0:Main and select Cancel Job.
30. Right-click default:StreamsInstance@StreamsDomain and select Stop
Instance.
31. Return to the Project Explorer and contract the DebugInstGraph project.
Results:
You want to work with the debugging capabilities for the InfoSphere Streams
Processing Language. You want to investigate Streams metrics information.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

13-41

Toolkits

Toolkits

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 1 4 To o l k i t s

Demonstration 1
Utilize the database and data mining toolkits

In this demonstration, you will


Score stream data using the IBM-supplied clustering operator
Write to a database with the ODBCAppend operator
Explain the toolkit structure for InfoSphere Streams

Toolkits

Copyright IBM Corporation 2015

Demonstration 1: Utilize the database and data mining toolkits

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-39

U n i t 1 4 To o l k i t s

Demonstration 1:
Utilize the database and data mining toolkits
Purpose:
You will utilize the database and data mining toolkits. You will be introduced
to some of the additional operators that are not part of the SPL Standard
Toolkit. You will use a clustering model to score Stream data that you will
read from a flat file.
Although there are additional toolkits supplied by IBM for InfoSphere Streams, this
demonstration will only deal with the data mining toolkit.
A clustering mining model has been created and exported in a PMML format. That
clustering model will be used to score Stream data that will be read from a flat file.
Note: This exercise has two sets of instructions: one using the SPL Graphical Editor,
and another for coding manually using the SPL Editor, with code solutions posted at the
end.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-40

U n i t 1 4 To o l k i t s

Graphical Editor
Task 1. Data Mining.

1.
2.
3.
4.

Although there are several data mining operators supplied by the Mining Toolkit
for InfoSphere Streams, the demonstration just focuses on the Clustering
operator.
The data mining operators require that you create a scoring model using some
other data mining tool and export that model into a PMML format which is then
referenced in the Streams data mining operator.
This demonstration has a FileSource operator read a file of client banking
information. That client data is scored using a clustering mining model. A
Functor removes some of the attributes from the tuple initially and the results
are written to a file. Then the program will be expanded to have the results
written into a MySQL table.
In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
Name the project Clustering. Click Finish.
Right click on Clustering and select New->SPL Source File. Take the defaults
and click Finish.
To speed things along use a schema that was already created for you. Copy the
file that has the schema definition to your project directory. Open a commandline session. (Click on the red hat icon and select Applications->System
Tools->Terminal.)

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-41

U n i t 1 4 To o l k i t s

5.

6.

Type the following copy command:


cp /home/labfiles/ClientSchema.spl
~/StreamsStudio/workspace/Clustering
The schema looks as follows:
type
Client = tuple<
rstring client_id,
rstring age,
rstring gender,
rstring marital_status,
rstring profession,
rstring nbr_years_cli,
rstring savings_account,
rstring online_access,
rstring joined_accounts,
rstring creditcard,
rstring average_balance >;
The mining schema looks as follows:
CLIENT_ID
AGE
BANKCARD
JOINED_ACCOUNTS
NBR_YEARS_CLI
AVERAGE_BALANCE
PROFESSION
SAVINGS_ACCOUNT
ONLINE_ACCESS
MARITAL_SATUS
GENDER
Of the above attributes, CLIENT_ID is a supplementary attribute. (It was not
used in the creation of the model.) All others are active attributes.
From the command line display the STREAMS_SPLPATH environment
variable. You will see that the environment variable points to the parent folder of
the installed InfoSphere Data Mining Toolkit.
echo $STREAMS_SPLPATH

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-42

U n i t 1 4 To o l k i t s

7.
8.

Drag a FileSource operator to the Main composite.


For the FileSource operator, code the following properties.
a) Click Output Ports
i) Replace varName with <extends>
ii) For varType type Client.
b) Click Param
i) file - "/home/labfiles/bankcustomers.dat"
ii) format - csv

9.

Next in the palette area, under Toolkits expand com.ibm.streams.mining>com.ibm.streams.mining.scoring and drag a Clustering operator to the
Main composite. Place it to the right of the FileSource operator.
10. Connect the output port of the FileSource operator to the input port of the
Clustering operator.
11. Update the properties for the Clustering operator.
a) Output Ports:
Once again use the <extends> option to use the previously defined type
called Client when defining the output schema.
Also add two additional attributes to the output schema.
clusterindex - int64
score - float64

b) Param:
model - "/home/labfiles/mining/BankClustering.pmml"

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-43

U n i t 1 4 To o l k i t s

12. Save your work.


The names of the attributes defined in the Client type must be matched to the
attribute names defined in the mining schema. This cannot be accomplished by
directly using the SPL Graphical editor. So open Main.spl using the SPL editor.
Before you map the tuple attributes to the mining schema, first look at the top
statement in your editor window. Note the use statement. Remember that
operators, other than the supplied SPL operators, when referenced, must be
qualified. And that a way to getting around having to qualify an operator, one
could code a use statement that references the appropriate namespace. The
SPL Graphical editor added this use statement for you.
13. Following the model parameter in the Clustering operator, code these
statements to map the tuples attributes with the mining schema. (If there are
any errors in the mapping, they are displayed in the Console view.)
age
: "AGE";
creditcard
: "BANKCARD";
joined_accounts : "JOINED_ACCOUNTS";
nbr_years_cli
: "NBR_YEARS_CLI";
average_balance : "AVERAGE_BALANCE";
profession
: "PROFESSION";
savings_account : "SAVINGS_ACCOUNT";
online_access
: "ONLINE_ACCESS";
marital_status : "MARITAL_STATUS";
gender
: "GENDER";
14. Save your work. Close the Main.spl tab for the SPL editor. Then return to the
Main.spl tab for the SPL Graphical editor. Specify to replace editor content.
15. Drag a Functor operator to the Main composite. Connect the output port of the
Clustering operator with the input port of the Functor operator.
The output schema for the Functor is as follows:
client_id - rstring
age - rstring
gender - rstring
clusterindex - int64
score - float64
Because this Functor operator is not going to apply a filter, there will be no
parameters defined.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-44

U n i t 1 4 To o l k i t s

16. Drag a FileSink operator to the Main composite. Connect the output port of the
Functor to the input port of the FileSink.
17. The parameters for the FileSink are:
file - "clusteredresults.dat"
format - txt
18. Save your work.

Task 2. Build and run your application.


1.

2.
3.

Build and run your application.


If you build a distributed application, after it has run for a few seconds, terminate
your application.
In the Project Explorer view, For the Clustering project, refresh the data folder,
expand it and open clusteredresults.dat.
Close the tab for clusteredresults.dat.

Task 3. Write your clustering output to a MySQL table.


1.
2.

Next in the palette, expand com.ibm.streams.db and drag an ODBCAppend


operator to the Main composite. Place it under the FileSink operator.
Connect the output port of the Functor operator with the input port of the
ODBCAppend operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-45

U n i t 1 4 To o l k i t s

3.

4.

In the properties for the ODBCAppend operator, click on Param. Specify the
following parameters.

connectionDocument

- "/home/labfiles/mining/connection.xml"

connection

- "Mining"

access

- "ClusterSink"

The unqualified ODBCAppend operator requires its own use statement. But
once again, the SPL Graphical editor supplies the use statement for you.
use com.ibm.streams.db::*;
The key to the ODBCAppend operator is the connection document. A copy of
the file is located at the end of this demonstration. Take a look at it.
The ODBCAppend operator references a connection called Mining. This is
matched to the connection_specification name found in connection.xml. It
specifies the database name, mining, and the userid and password used to
connect to that database.
The access parameter of the ODBCAppend operator is matched to the
access_specification name. It specifies the table to be accessed. It also
specifies which Stream attributes are to be used to insert data into the table.
The order of the attributes in the native_schema matches the order of columns
in the table. This means that the value in the stream attribute, client_id, gets
inserted into the first column in the mining.bankclustering table.
Save your work.

Task 4. Build and Launch Your Application.


1.
2.

3.

4.

Build your application.


MySQL should automatically be started but one never knows about a lab
system. So execute the following from a command line (enter the root password
when prompted):
su
service mysqld start
logout
Launch your application.
If you built a distributed application, after it has run for a few seconds, terminate
your application.
From a command line, connect to the MINING database. Enter your password,
ibm2blue, when prompted.
mysql mining p

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-46

U n i t 1 4 To o l k i t s

5.

6.
7.

View the contents of the bankclustering table.


select * from bankclustering;
You should see 47 rows.
Close the opened tabs in the Editor view.
Also, contract the Clustering project as well.

Code Solution: Connection file


<?xml version="1.0" encoding="UTF-8"?>
<st:connections
xmlns:st="http://www.ibm.com/xmlns/prod/streams/adapters"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<connection_specifications>
<connection_specification name="Mining">
<ODBC database="mining" user="student" password="ibm2blue"
/>
</connection_specification>
</connection_specifications>
<access_specifications>
<access_specification name="ClusterSink">
<table tablename="mining.bankclustering"
transaction_batchsize="2" rowset_size="1" />
<uses_connection connection="Mining" />
<native_schema>
<column name="client_id" type="CHAR" length="9"/>
<column name="age" type="CHAR" length="2" />
<column name="gender" type="CHAR" length="2" />
<column name="clusterindex" type="BIGINT" />
<column name="score" type="DOUBLE" />
</native_schema>
</access_specification>
</access_specifications>
</st:connections>

End of SPL Graphical Editor demonstration

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-47

U n i t 1 4 To o l k i t s

SPL Editor
Task 1. Data Mining.

1.
2.
3.
4.

Although there are several data mining operators supplied by the Mining Toolkit
for InfoSphere Streams, the demonstration just focuses on the Clustering
operator.
The data mining operators require that you create a scoring model using some
other data mining tool and export that model into a PMML format which is then
referenced in the Streams data mining operator.
This demonstration has a FileSource operator read a file of client banking
information. That client data is scored using a clustering mining model. A
Functor removes some of the attributes from the tuple and initially, the results
are written to a file. Then the program will be expanded to have the results
written into a MySQL table.
In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
Name the project Clustering. Click Finish.
Right click on Clustering and select New->SPL Source File. Take the defaults
and click Finish.
To speed things along use a schema that was already created for you. Copy the
file that has the schema definition to your project directory. Open a commandline session. (Click on the red hat icon and select Applications->Systems>Terminal.)

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-48

U n i t 1 4 To o l k i t s

5.

6.

Type the following copy command:


cp /home/labfiles/ClientSchema.spl
~/StreamsStudio/workspace/Clustering
The schema looks as follows:
type
Client = tuple<
rstring client_id,
rstring age,
rstring gender,
rstring marital_status,
rstring profession,
rstring nbr_years_cli,
rstring savings_account,
rstring online_access,
rstring joined_accounts,
rstring creditcard,
rstring average_balance >;
The mining schema looks as follows:
CLIENT_ID
AGE
BANKCARD
JOINED_ACCOUNTS
NBR_YEARS_CLI
AVERAGE_BALANCE
PROFESSION
SAVINGS_ACCOUNT
ONLINE_ACCESS
MARITAL_SATUS
GENDER
Of the above attributes, CLIENT_ID is a supplementary attribute. (It was not
used in the creation of the model.) All others are active attributes.
From the command line display the STREAMS_SPLPATH environment
variable. You will see that the environment variable points to the parent folder of
the installed InfoSphere Data Mining Toolkit.
echo $STREAMS_SPLPATH

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-49

U n i t 1 4 To o l k i t s

7.

8.

You will have to code a use directive to point to the Clustering operator. Add
the following statement to the beginning of your Main.spl source file.
use com.ibm.streams.mining.scoring::*;
In the Editor view code a FileSource operator:
The output schema - Client
Output stream name - ClientData
The file being read - /home/labfiles/bankcustomers.dat
File format - csv

9.

Next code a Clustering operator that reads the ClientData stream and scores
the tuple data using the /home/labfiles/mining/BankClustering.pmml model.
Then emit a stream called ResultClustering that is comprised of the Client
type, and two additional attributes, clusterindex that is of type int64 and score
that is of type float64.
10. Save your work.
11. The names of the attributes defined in the Client type must be matched to the
attribute names defined in the mining schema. To help you, the following shows
the matching. (CLIENT_ID does not need to be matched since it is a
supplementary attribute.)
model
: "/home/labfiles/mining/BankClustering.pmml";
age
: "AGE";
creditcard
: "BANKCARD";
joined_accounts : "JOINED_ACCOUNTS";
nbr_years_cli
: "NBR_YEARS_CLI";
average_balance : "AVERAGE_BALANCE";
profession
: "PROFESSION";
savings_account : "SAVINGS_ACCOUNT";
online_access
: "ONLINE_ACCESS";
marital_status
: "MARITAL_STATUS";
gender
: "GENDER";

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-50

U n i t 1 4 To o l k i t s

12. Code a Functor that observes the ResultClustering stream and emits a
stream called Shrink with the following attributes.
client_id - rstring
age - rstring
gender - rstring
clusterindex - int64
score - float64
13. Using a FileSink operator, write the stream emitted from the Functor to a file
called clusteredresults.dat, which is to be located in your default data
directory. Write the data out using the txt format.
14. Save your work.

Task 2. Build and run your application.


1.

2.
3.

Build and run your application.


If you build a distributed application, after it has run for a few seconds, terminate
your application.
In the Project Explorer view, For the Clustering project, refresh the data
folder, expand it and open clusteredresults.dat.
Close the tab for clusteredresults.dat.

Task 3. Write your clustering output to a MySQL table.


1.

Code an ODBCAppend operator. The input stream should be the output


stream of the Functor operator (Shrink).
The connection document is - "/home/labfiles/mining/connection.xml"
The connection is - "Mining"
The access is - "ClusterSink"

2.

To be able to reference this ODBCAppend operator, another use statement is


required. Add the following at the top of your code. (Under the existing use
statement.)
use com.ibm.streams.db::*;

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-51

U n i t 1 4 To o l k i t s

3.

4.

The key to the ODBCAppend operator is the connection document. A copy of


the file is located at the end of this demonstration. Take a look at it.
The ODBCAppend operator references a connection called Mining. This is
matched to the connection_specification name found in connection.xml. It
specifies the database name, mining, and the userid and password used to
connect to that database.
The access parameter of the ODBCAppend operator is matched to the
access_specification name. It specifies the table to be accessed. It also
specifies which Stream attributes are to be used to insert data into the table.
The order of the attributes in the native_schema matches the order of columns
in the table. So the value in the stream attribute, client_id, will be inserted into
the first column in the mining.bankclustering table.
Save your work.

Task 4. Build and Launch Your Application.


1.
2.

3.

4.

5.

6.
7.

Build your application.


MySQL should automatically be started but one never knows about a lab
system. So execute the following from a command line (enter root password
when prompted):
su
service mysqld start
logout
Launch your application.
If you built a distributed application, after it has run for a few seconds, terminate
your application.
From a command line, connect to the MINING database. Enter your password,
ibm2blue, when prompted.
mysql mining p
View the contents of the bankclustering table.
select * from bankclustering;
You should see 47 rows.
Close the opened tabs in the Editor view.
Also, contract the Clustering project as well.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-52

U n i t 1 4 To o l k i t s

Code Solution: Connection file


<?xml version="1.0" encoding="UTF-8"?>
<st:connections
xmlns:st="http://www.ibm.com/xmlns/prod/streams/adapters"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<connection_specifications>
<connection_specification name="Mining">
<ODBC database="mining" user="student" password="ibm2blue"
/>
</connection_specification>
</connection_specifications>
<access_specifications>
<access_specification name="ClusterSink">
<table tablename="mining.bankclustering"
transaction_batchsize="2" rowset_size="1" />
<uses_connection connection="Mining" />
<native_schema>
<column name="client_id" type="CHAR" length="9"/>
<column name="age" type="CHAR" length="2" />
<column name="gender" type="CHAR" length="2" />
<column name="clusterindex" type="BIGINT" />
<column name="score" type="DOUBLE" />
</native_schema>
</access_specification>
</access_specifications>
</st:connections>

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-53

U n i t 1 4 To o l k i t s

Code Solution: Writing output to a file


use com.ibm.streams.mining.scoring::*;
composite Main {
graph
stream <Client> ClientData = FileSource() {
param file
: "/home/labfiles/bankcustomers.dat";
format
: csv;
}
stream <Client, tuple<int64 clusterindex, float64 score>>
ResultClustering = Clustering(ClientData) {
param
model
: "/home/labfiles/mining/BankClustering.pmml";
age
: "AGE";
creditcard
: "BANKCARD";
joined_accounts : "JOINED_ACCOUNTS";
nbr_years_cli : "NBR_YEARS_CLI";
average_balance : "AVERAGE_BALANCE";
profession
: "PROFESSION";
savings_account : "SAVINGS_ACCOUNT";
online_access : "ONLINE_ACCESS";
marital_status : "MARITAL_STATUS";
gender
: "GENDER";
}
stream<rstring client_id, rstring age, rstring gender,
int64 clusterindex, float64 score> Shrink =
Functor(ResultClustering) {
}
() as Sink = FileSink(Shrink){
param file
: "clusteredresults.dat";
format
: txt;
}
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-54

U n i t 1 4 To o l k i t s

Code Solution: Writing to a MySQL table


use com.ibm.streams.mining.scoring::*;
use com.ibm.streams.db::*;
composite Main {
graph
stream <Client> ClientData = FileSource() {
param file
: "/home/labfiles/bankcustomers.dat";
format
: csv;
}
stream <Client, tuple<int64 clusterindex, float64 score>>
ResultClustering = Clustering(ClientData) {
param
model
: "/home/labfiles/mining/BankClustering.pmml";
age
: "AGE";
creditcard
: "BANKCARD";
joined_accounts
: "JOINED_ACCOUNTS";
nbr_years_cli : "NBR_YEARS_CLI";
average_balance
: "AVERAGE_BALANCE";
profession
: "PROFESSION";
savings_account
: "SAVINGS_ACCOUNT";
online_access : "ONLINE_ACCESS";
marital_status
: "MARITAL_STATUS";
gender
: "GENDER";
}
stream<rstring client_id, rstring age, rstring gender,
int64 clusterindex, float64 score> Shrink = Functor(ResultClustering) {
}
() as Sink = FileSink(Shrink){
param file
: "clusteredresults.dat";
format
: txt;
}
() as DBSink = ODBCAppend(Shrink) {
param
connectionDocument :"/home/labfiles/mining/connection.xml";
connection
: "Mining";
access
: "ClusterSink";
}
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-55

U n i t 1 4 To o l k i t s

End of SPL Editor demonstration


Results:
You utilized the database and data mining toolkits. You were introduced to
some of additional operators that are not part of the SPL Standard Toolkit.
You used a clustering model to score Stream data that you read from a flat
file.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

14-56

SPL Functions

SPL Functions

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 15 SPL functions

Demonstration 1
Implement a native function and a non-native function

In this demonstration, you will


Describe the concept of an SPL user-defined function
List the steps necessary to implement a C++ user-defined function
Describe how to put a user-defined function in a toolkit and make it
generally accessible

SPL Functions

Copyright IBM Corporation 2015

Demonstration 1: Implement a native function and a non-native function

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-23

Unit 15 SPL functions

Demonstration 1:
Implement a native function and a non-native function
Purpose:
In this demonstration, you will implement an SPL user-defined function and a
function written in C++.
After completing this demonstration, you should be able to:
Describe the concept of an SPL user-defined function
List the steps necessary to implement a C++ user-defined function
Describe how to put a user-defined function in a toolkit and make it generally
accessible.
This demonstration works with both a composite function and a native function. The
composite function raises an integer to some positive integer power. The second native
function determines if a passed value is even or odd.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Task 1. Setup for an SPL Function.


1.
2.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project SPLFunction. Click Finish.

Task 2. Code the SPL Function.

1.
2.
3.
4.

This function will calculate an integer raised to some power. We will pass in two
integer values and return an integer value.
The name of the function is power. It will reside in a namespace of SPLType.
Right-click on the SPLFunction and select New->SPL Source File.
Type in a Namespace of SPLType.
Uncheck Generate Main composite.
Change the File name to Power.spl. Click Finish.
Notice that Power.spl was opened in the Editor view but since it is not a Main
composite, there are no graphical capabilities.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-24

Unit 15 SPL functions

5.
6.

7.

8.

Right-click the editor canvas and select Open with SPL Editor.
The only code displayed should be namespace SPLType;. If it is not there, then
code the following:
namespace SPLType;
Next code the following after the namespace statement:
public int32 power(int32 num, uint32 exp) {
mutable uint32 i = 1;
mutable int32 val = num;
if (exp == 0u)
val = 1;
else
while (i < exp){
val = val * num;
i++;
}
return(val);
}
So what did you just code? The first statement defines a public function called
power that returns an integer value. Two integers are passed when the function
is invoked. Remember that a variable, by default, cannot be changed. To allow
a variable to be changed, it must be defined as mutable. If the exponent integer
is a zero, then you return a value of 1 otherwise the function loops to calculate
the integer raised to the appropriate exponent.
Save your work.

Task 3. Code a Simple Main Composite to Invoke Your


Function.
Use the Beacon operator to generate tuples. Use the power function to set one of the
tuples attributes. Use a Custom operator to print out the attribute values.
1. Now code a simple Main composite. Right-click the SPLFunction project and
select New->SPL Source File.
2. To show that a called function can reside in a namespace that is different from
the Main composite, erase the value in the Namespace field. Click Finish.
3. Since you are calling a function that is not located in the same namespace as
the Main composite, you need to code a use directive to give access to the
function. In the graphical editor view, right-click inside the Main composite and
select Open with SPL Editor.
4. Above the Composite Main() line, code the following use statement.
use SPLType::*;

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-25

Unit 15 SPL functions

5.

Save your work. Close the SPL Editor tab for Main.spl. Accept the
replacement of the editor content.
Use the Beacon operator to generate four tuples where each tuple will have the
integer 3 raised to an increasing power through the use of the power function.
6. Drag a Beacon operator to the Main composite.
7. In the Properties for the Beacon operator, select Output Ports.
8. The output schema is:
total - int32
exp - uint32
9. Select Param. Specify iterations with a value of 4u.
10. Select Output. Expand Beacon_1_out0;
The value for total - power(3, (uint32)IterationCount())
The value for exp - (uint32)IterationCount()

11. Drag a Custom operator to the Main composite. Connect the output port of the
Beacon operator to the input port of the Custom operator. (Remember, that the
input port for the Custom operator will "magically" appear.
12. In the Properties for the Custom operator click Input Ports. Note that the
name of the input stream is Beacon_1_out0. (You will need to know the name
of the generated input stream when coding the onTuple clause.
13. Select Logic. Type the following code:
onTuple Beacon_1_out0 : println((rstring)exp + " " +
(rstring)total);
14. Save your work.

Task 4. Build and run your application.


1.

Build your application as a standalone application and launch your application.


It will terminate automatically. The results are the exponent and the value three
raised to that power.

Task 5. Move Your Function to a Toolkit.


1.

From a command-line view the STREAMS_SPLPATH environment variable.


echo $STREAMS_SPLPATH
This STREAMS_SPLPATH variable is one way you can tell Streams where
your toolkit(s) are located. STREAMS_SPLPATH is especially useful if you will
be developing in both Streams Studio and command line. If you are doing all
your development in Streams Studio, there is an easier way to let Streams
Studio know about your toolkit(s). The following steps walk you through this
method.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-26

Unit 15 SPL functions

2.

3.

4.

5.
6.
7.
8.

9.
10.
11.
12.
13.
14.

15.
16.
17.

A directory has already been created for you in /home/toolkits/inhouse. You


are now going to create a directory within that directory, called SPLType, which
will hold your new toolkit. From a command-line type the following:
mkdir /home/toolkits/inhouse/SPLType
Copy your function to this directory.
cp ~/StreamsStudio/workspace/SPLFunction/SPLType/Power.spl
/home/toolkits/inhouse/SPLType
The toolkit needs to be indexed before it can be used. From the command-line,
create and index the toolkit. Give the toolkit a name of inhouse.
spl-make-toolkit -i /home/toolkits/inhouse -n inhouse
Make the IDE aware of your new toolkit. Click on the Streams Explorer tab.
Expand InfoSphere Streams 4.0.1.0->Toolkit Locations
Right click Toolkit Locations and select Add Toolkit Location.
In the Add Toolkit Location box, click the Directory button. Navigate to
/home/toolkits/inhouse and click OK. Then click OK back on the Add toolkit
location.
You will now notice that there is a (Local) /home/toolkits/inhouse entry under
the Toolkit Locations. Expand that and check it out!
Return to the Project Explorer. Right click on info.xml (under Resources) and
select Open With->SPL Info Model Editor.
Expand Info and right click on Dependencies. Select New Child->Toolkit.
With Toolkit(<toolkit-name>) selected, click on the Properties tab.
Change the value for Name to inhouse. (This was the name that you gave the
toolkit with you did the toolkit indexing.)
Save your work. You will likely notice an error displayed in the console. Now
you have the function residing in both your project and in the toolkit. Which one
will be used? Lets take the guessing out of it. In the Project Explorer rename
the Power.spl that is in your project (under the Resources->SPLType folder,
not under the namespace) to Power.spl.xxx.
Your application should have been rebuilt but if you did not see that happen,
then you can build it again.
Launch your application.
Close your tabs and contract the SPLFunction project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-27

Unit 15 SPL functions

Task 6. Code a Native Function Called evenodd.

1.
2.
3.
4.
5.

6.
7.
8.
9.
10.
11.
12.

13.
14.

15.

This simple function will be passed an unsigned integer and will return a string
that specifies if the integer is even or odd.
In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
Name the project CFunction. Click Finish.
Right-click on CFunction and select New->SPL Source File. Take the defaults
and click Finish.
Create a namespace. Right click on CFunction and select New->SPL
Namespace.
Type in a namespace name of CType.MyFunctions. Click on the Folder path
drop-down. Notice that this namespace can be implemented in two different
ways. Select CFunction/Ctype.MyFunctions. Click Finish.
Right-click on CFunction and select New->C++ Native Function.
For Namespace, select CType.MyFunctions.
Give a name of evenodd.
Set the return type to String. Click Next.
Change the Header file to evenodd.h. Leave CPPNamespace blank.
Click Finish.
Expand the CFunction project, then Resources and then impl. Your C++
artifacts resides in the directories under the impl directory. Right click on src and
select New->File. Give it a name of evenodd.cpp and click Finish.
Right click on include and select New->File. Give it a name of evenodd.h. Click
Finish.
In evenodd.h type the following header information:
#ifndef evenodd_H_
#define evenodd_H_
// Define SPL types and functions
#include "SPL/Runtime/Function/SPLFunctions.h"
namespace CType {
namespace MyFunctions {
SPL::rstring evenodd(SPL::uint32 const & num);
}
}
#endif
Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-28

Unit 15 SPL functions

16. In evenodd.cpp type the following implementation code:


#include "evenodd.h"
SPL::rstring CType::MyFunctions::evenodd(SPL::uint32 const
& num) {
if (num % 2)
return "odd";
else
return "even";
};
17. Save your work.
18. In Main.spl tab, right-click inside the Main composite and select Open with
SPL Editor.
19. Above the composite Main() line, type the following:
use CType.MyFunctions::evenodd;
20. Save your work and close this SPL editor tab of Main.spl. Replace the editor
content.
21. Drag a Beacon operator into the Main composite.
22. In the Properties for the Beacon operator, select Output Ports.
23. Define the output schema as:
num - uint32
24.
25.
26.
27.
28.

In the Properties, click Param.


Set iterations to 10u
In the Properties, click Output.
Expand Beacon_1_out0 and set the value of num to (uint32)IterationCount()
Use a Functor operator to invoke your function. Drag a Functor operator to the
Main composite.
29. Connect the output port of the Beacon operator to the input port of the Functor
operator.
30. In the Properties for the Functor operator, click on Output Ports.
31. Define the output schema to be as follows:
num - uint32
kind - rstring
32. In the Properties, click Output.
33. Expand Functor_2_out0.
34. Set the value of kind to evenodd(num)
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-29

Unit 15 SPL functions

35. Use a Custom operator to print the results. Drag a Custom operator to the
Main composite.
36. Connect the output port of the Functor operator to the input port of the Custom
operator.
37. In the Properties for the Custom operator, click Input Ports. Note the name of
the input stream.
38. In the Properties, click Logic.
39. Type the following: (It assumes that the input stream name is Functor_1_out0 update yours with the name you noted on the Input Ports tab if necessary.)
onTuple Functor_2_out0 : printStringLn((rstring)num + " "
+ kind);
40. Save your work. You will get an error since the evenodd function does not exist.
The evenodd function needs to be compiled and a shared library created. You
need a makefile to do that. Actually, you will use a makefile to build the
application as well. Using a makefile to build the application also allows for the
compilation and linkage of the function at the same time. That way a two-step
process is handled by a single step.
To save you some time, the makefiles are supplied. You can find examples of
these makefiles in the InfoSphere Streams Studio Installation and Users Guide.
41. Here is the gist of what needs to be done. /home/labfiles/CFunction/makefile
needs to be copied to your CFunction application directory. From a Commandline execute the following:
cp /home/labfiles/CFunction/makefile
~/StreamsStudio/workspace/CFunction
42. /home/labfiles/evenodd/makefile needs to be copied to the impl directory that is
under your application directory. From a Command-line execute the following:
cp /home/labfiles/evenodd/makefile
~/StreamsStudio/workspace/CFunction/impl
43. Since you are printing out the results, create a standalone application. Under
<default_namespace> right click on Main and select New->Standalone
Build. Then set it to active.
44. Right-click on the CFunction project and select Configure SPL Build.
45. From the Builder type drop-down, select External builder. Keep the filled in
options and click OK.
46. Next you have to work on your function model. Open Resources>CType.MyFunctions->native.function->function.xml.
47. Expand Native functions->Function Set evenodd.h->Prototypes. Select
Function.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-30

Unit 15 SPL functions

48. Then click on the Properties tab.


49. In the Properties, scroll down and change the Value that is under Prototype to
public rstring evenodd(uint32 num)
50. In the Function Model editor, right click on Prototypes and select New
Sibling->Libraries.
51. Expand Libraries and select Library.
52. In the Properties view, scroll down. You are going to update the Include Path,
Lib, and Lib Path.
53. This is where it gets interesting. You need to specify an Include Path but what is
your starting point? Your starting point is the location of the function model
(function.xml). And what is your ultimate destination? It is the include directory
under the impl directory. Click on the Value for the Include Path. An ellipsis
pushbutton is displayed on the right side. Click it.
54. In the Value field, enter ../../impl/include and click Add. (Look at the Project
Explorer. You need to traverse up two levels before you can travel down to the
impl/include directory.) Click OK.
55. Click on the Value for Lib and select the ellipsis pushbutton. Type in evenodd,
click Add and then OK.
56. Click on the Value for the Lib Path and select the ellipsis pushbutton. Type in
../../impl/lib, click Add and then OK.
57. Save the model. (Note: if you see a message that there is an error while toolkits
were loading, just go into the Streams Explorer and refresh the toolkit locations
again)
58. Build a standalone application. Then launch it.
59. If you are stopping here, then close all open tabs and contract the CFunction
project.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-31

Unit 15 SPL functions

Task 7. Extra - Moving a Native Function to a Toolkit.


In your project, the directories CType.MyFunctions and impl are on the same level. This
seems to complicate things when moving your function to a toolkit. Namespaces are
designed to keep objects with the same name from clashing. But since the impl
directory, which ultimately contains your shared library, does not reside under the
namespace directory, the potential exists for function names to clash. So I do not see a
good way of moving a function into a toolkit without requiring modifications to the
function model. Here is merely a suggestion.
1. Under /home/toolkits/inhouse, create a directory called CType.MyFunctions.
2. Under CType.MyFunctions, create two directories, native.function and impl.
3. Copy the function.xml that is in your project
(../CType.MyFunctions/native.function) under native.function.
4. Copy the lib directory that is under impl in your project under impl.
5. Copy the include directory that is under impl in your project under impl.
6. Edit function.xml under CType.MyFunctions/native.function and change the
libPath value to ../impl/lib
7. Change the includePath to ../impl/include
8. Re-index the toolkit:
spl-make-toolkit -i /home/toolkits/inhouse -n inhouse
9. To test, you can quickly create a new project CFunctionTest. Create a main
composite and copy the code for Main.spl from CFunction to the Main.spl for
CFunctionTest. Add a dependency to info.xml and specify a toolkit with the
name of inhouse. Build a normal standalone application. You do not have to
worry about a makefile since you will just be referencing your function.
10. Close all open tabs and contract all open projects.
Results:
In this demonstration, you implemented an SPL user-defined function and a
function written in C++.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

15-32

SPL C++ Non-Generic Primitive Operators

SPL C++ Non-Generic


Primitive Operators

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

Demonstration 1
Code a C++ non-generic primitive operator

In this demonstration, you will:


Outline the steps to create a non-generic primitive operator
Describe how to move a non-generic primitive operator to a toolkit

C++ Non-Generic Primitive Operators

Copyright IBM Corporation 2015

Demonstration 1: Code a C++ non-generic primitive operator

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-26

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

Demonstration 1:
Code a C++ non-generic primitive operator
You are to write a primitive operator in C++ that observes tuples on a single
input port. It is to add two integer attributes from the input tuple together and
then multiply the result by an optional parameter value. A new tuple is then
emitted on a single output port.
Most non-generic primitive operators are written to accomplish a particular
task and have known schemas.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
The input schema will be:
num1 - int32
num2 - int32
desc - rstring
The output schema will be:
num1 - int32
num2 - int32
desc - rstring
result - int32

Task 1. Non-generic Operator.


1.
2.
3.
4.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project CppPrimitive. Click Finish.
Right click on CppPrimitive and select New->SPL Source File. Take the
defaults and click Finish.
Create a namespace. Right click on CppPrimitive and select New->SPL
Namespace.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-27

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

5.
6.
7.
8.
9.
10.

11.

12.
13.

14.

Type in a namespace name of MyOperators.Utils. Keep the folder path as


CppPrimitive/MyOperators.Utils and click Finish.
Right click on CppPrimitive and select New->C++ Primitive Operator.
For Namespace select MyOperators.Utils.
Specify a name of CppAddOp.
Uncheck Generic operator. Click Finish.
In the Project Explorer, expand CppPrimitive->Resources>MyOperators.Utils->CppAddOp. Note that a few skeleton templates were
created for you.
Open CppAddOp_h.cgt to define some of your own variables. Listed are all of
the methods for the My_OPERATOR class. Scroll to the bottom. Under Private:
add the following variable definitions:
int multiply;
Mutex _mutex;
Save your work.
Open CppAddOp_cpp.cgt. In the MY_OPERATOR::MY_OPERATOR()
constructor method, place the code that determines if a parameter has been
passed to this instance of the operator, and if there is a parameter, access it
and place its value into the variable multiply. If there is not a parameter, then set
the variable multiply equal to 1.
The parameter name is to be multiplier. The good news is that you do not have
to worry about whether the value passed for this parameter is of the correct
type. The operator model has the required information and lets the compiler do
the checking for you. Add the following code:
if (hasParameter("multiplier"))
multiply = getParameter("multiplier");
else
multiply = 1;
Scroll down through CppAddOp_cpp.cgt until you come to the
MY_OPERATOR::process(...) method. (Actually there are severaltwo such
methods. You want the one for mutating ports.)

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-28

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

15. First add code that forces this method to a single thread and then define the
input and output tuples. Use the assign method that automatically generates the
code to copy input attribute values to their corresponding output attribute values.
Next do the required calculation and assign the result to the output attribute
called result. Finally submit the output tuple on the first (and only) output port.
To simplify things I suggest that you remove all of the sample code from this
method. Then type the following:
AutoPortMutex apm(_mutex, *this);
IPort0Type & ituple = static_cast<IPort0Type &>(tuple);
OPort0Type otuple;
otuple.assign(ituple);
otuple.set_result((ituple.get_num1() + ituple.get_num2())
* multiply);
submit(otuple, 0);
Note: Because there is an input attribute called num1, you are able to access its
value using a generated function called, ituple.get_num1(). Also, to set the value for
the output attribute result, you use the generated function
otuple.set_result(somevalue).

16. Save your work.

Task 2. Update the Operator Model.


1.

2.
3.
4.
5.

6.

There should be a CppAddOp.xml tab that is open. Select it. (If it is not, in the
same folder where the skeleton templates are located, right click on
CppAddOp.xml and select Open With->Other. Then select SPL Operator
Model Editor.)
Click on the Properties tab.
In the Operator Model editor, expand Operator->C++. Then select Context.
In the Properties view, scroll through the different property values that can be
modified.
In the Operator Model editor, right click on Context and select New Child.
Listed are additional elements that you can add to the operator model. Close
the menu.
For the Operator Model, select Parameters. In the Properties view, expand
Misc. You can see that you can specify if any parameters are allowed. By
default, this is set to true.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-29

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

7.
8.
9.
10.

11.

12.
13.
14.

15.
16.

17.
18.

19.

20.
21.
22.
23.

Right click on Parameters and select New Child->Parameter. You are now
going to specify properties for a particular parameter.
In the Properties view, scroll down and expand Misc.
In the Name field type in multiplier. (This is why you checked for a parameter
called multiplier in the C++ code.)
Keep Optional as true. This states that this is an optional parameter. Change
the Type to int32. (Once again, this is why you did not have to worry about
checking to see if the value passed was numeric.)
In the Operator Model editor, expand both Input Ports and Output Ports. The
properties for the Port Open Set can be used to specify properties for all input
or output ports. You can right click on either Input Ports or Output Ports and
add properties that will describe particular input and output ports. For this
demonstration, you do not require any changes.
Save your Operator Model.
In the Edit view, click on the Main.spl tab to return to the SPL Graphical editor
for the Main composite.
Drag a FileSource operator to the Main composite. You are to read a file
called, input.data, that is in the default data directory. The format of this file is
csv and each tuple has three attributes, two int32 attributes and one rstring
attribute.
In the Properties view for the FileSource operator, select Output Ports.
Define the following schema:
num1 - int32
num2 - int32
desc - rstring
In the Properties view, select Param.
Add the following parameters:
file - "input.dat"
format - csv
Next add the CppAddOp operator that you just coded to the Main composite. In
the palette area, expand Toolkits->CppPrimitive->MyOperators.Utils. Then
drag CppAddOp to the Main composite.
Connect the output port of the FileSource operator to the input port of the
CppAddOp operator. (Once again this port will "magically" appear.)
Select the CppAddOp operator and in the Properties view, select Param.
Add multiplier parameter. Set the value to 3;
In the Properties view, select Output Ports.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-30

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

24. For the Output port click the Add pushbutton.


25. Define the output schema to be:
num1 - int32
num2 - int32
desc - rstring
result - int32
26. Finally code a Custom operator to print the results. Drag a Custom operator to
the Main Composite.
27. Connect the output port of the CppAddOp operator to the hidden input port of
the Custom operator.
28. Select the Custom operator and in the Properties view, select Input Ports.
Note the name of the input stream. You will need this name when you code the
onTuple statement in the logic clause.
29. In the Properties view, select Logic.
30. Where the cursor is positioned, type the following code. (Note that the name of
your input stream may be different.)
onTuple CppAddOp_2_out0:
println((rstring)num1 + " " + (rstring)num2 + " " +
desc + " " + (rstring)result);
31. Save your work.
32. Now create a file that your application is to read. In your CppAddOp
CppPrimitive project, right click on the data folder and select New->File. Give
it a name of input.dat.
33. Add the following lines to input.dat and then save it.
25,2,"value should be 81"
30,4,"value should be 102"
34. Build a standalone application and set it active. The Main composite is located
in the default namespace. (Notice that you did not have to worry about
makefiles like you did with your function.)
35. Launch the standalone application.
36. Just for kicks, change the multiplier parameter for the CppAddOp operator to 3 *
1. Save your work and relaunch the application.
37. Close all open tabs and contract your project.
Information: The SPL Graphical editor was aware of the location of the CppAddOP
operator and automatically added a use statement. Had you coded the application
using the SPL editor, you would have had to code the appropriate use statement.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-31

U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s

Task 3. Deploying to a Toolkit.


1. Deploying a C++ non-generic primitive operator to a toolkit is pretty straight
forward.
a. Create the directory structure for your namespace under the toolkit root
directory.
b. Copy the primitive operator directory (in your case this would be
CppAddOp) under the namespace directory structure.
c. Index the toolkit by running spl-make-toolkit.
d. Update the information model for your application and define the toolkit
dependency.
Results:
You wrote a primitive operator in C++ that observes tuples on a single input
port. It added two integer attributes from the input tuple together and then
multiplied the result by an optional parameter value. A new tuple was then
emitted on a single output port.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

16-32

SPL Java Non-generic Primitive Operators

SPL Java Non-Generic


Primitive Operators

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 17 SPL Java Non-Generic Primitive Operators

Demonstration 1
Code a Java primitive operator

In this demonstration you will:


Outline the steps to create a non-generic primitive operator in Java.
Describe how to move a non-generic primitive operator to a toolkit.

Java Non-Generic Primitive Operators

Copyright IBM Corporation 2015

Demonstration 1: Code a Java primitive operator

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-23

Unit 17 SPL Java Non-Generic Primitive Operators

Demonstration 1:
Code a Java primitive operator
Purpose:
You will implement the same non-generic operator that you did in the
previous demonstration except you will be coding in Java.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Task 1. Create an SPL Project.


1.
2.
3.
4.
5.
6.
7.
8.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project JavaPrimitive. Click Finish.
Right click on JavaPrimitive and select New->SPL Source File. Take the
defaults and click Finish.
Create a namespace. Right click on JavaPrimitive and select New->SPL
Namespace.
Type in a namespace name of MyOperators.Utils. Keep the folder path as
JavaPrimitive/MyOperators.Utils. Click Finish.
Right click on JavaPrimitive and select New->Java Primitive Operator.
For Namespace select MyOperators.Utils.
Specify a name of JavaAddOp. Click Finish.
JavaAddOp.java will open in the editor with template code for a Java primitive
operator.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-24

Unit 17 SPL Java Non-Generic Primitive Operators

Task 2. Implement the Java Operator.


Now we are going to implement our operator logic by adding some code to
JavaAddOp.java. The entire code for the Java class is at the end of these
demonstration instructions.
The class extends AbstractOperator. Notice the annotations (beginning with
@). These serve in place of an xml operator model. They identify the class as a
primitive operator, define the ports, libraries, parameters, etc.
1. Define an integer variable called multiply and initialize it to 1. Place it directly
underneath public class JavaAddOp extends AbstractOperator {
private int multiply = 1;
2. Now code the process method that gets invoked whenever a new tuple arrives.
Here you have to define your input and your output tuples. Remember, for this
operator, there is only one input port and one output port. Use the assign
method to move input tuple attribute values to corresponding output tuple
attribute values. Do your calculation and submit your tuple. Replace the existing
process method with the following code:
@Override
public synchronized void process(StreamingInput<Tuple>
port, Tuple tuple) throws Exception {
StreamingOutput<OutputTuple> out = getOutput(0);
OutputTuple outTuple = out.newTuple();
outTuple.assign(tuple);
outTuple.setInt("result",
multiply * (tuple.getInt("num1") +
tuple.getInt("num2")));
out.submit(outTuple);
}
3. Finally, you need to add an annotation and get/set methods so we can use
multiply as an optional parameter in our SPL. Add the following code at the
bottom of the file, right before the last closing bracket:
@Parameter(name="multiplier", optional=true)
public void setFilter(int multiply) {
this.multiply = multiply;
}
public int getFilter() {
return multiply;
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-25

Unit 17 SPL Java Non-Generic Primitive Operators

4.
5.

Add the following import statement to use the parameter annotation:


import com.ibm.streams.operator.model.Parameter;
Save your work.

Task 3. Code Your Main.spl.


1.
2.
3.
4.
5.
6.
7.

8.
9.
10.
11.
12.
13.

To speed things along, copy code from the CppPrimitive project. Open the
Main.spl for the CppPrimitive project.
Right-click the FileSource operator for the CppPrimitive Main composite and
select Copy.
Right-click in the Main composite for the JavaPrimitive project and select
Paste.
In the palette for the JavaPrimitive, expand Toolkits->JavaPrimitive>MyOperators.Utils and drag JavaAddOp to the Main composite.
Connect the output port for the FileSource operator to the input port of the
JavaAddOp operator.
Select the JavaAddOp operator and in the Properties view, select Output
Ports.
Define the following schema for the output port 0:
num1 - int32
num2 - int32
desc - rstring
result - int32
In the Properties view, select Param.
Add the multiplier parameter and set it equal to 3.
Copy the Custom operator from the Main composite of the CppPrimitive
project to the Main composite of the JavaPrimitive project.
Connect the output port of the JavaAddOp operator to the input port of the
Custom operator.
Save your work.
Go through the steps to build a standalone application and set it to be Active.
(Remember that for this demonstration, the Main is under the default
namespace.)

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-26

Unit 17 SPL Java Non-Generic Primitive Operators

Task 4. Launch the Application.


1.

2.
3.
4.

Copy the input.dat file under the data folder in the CppPrimitive project to the
data folder in the JavaPrimitive project. (Or create a new file and add the
following records.)
25,2,"value should be 81"
30,4,"value should be 102"
Launch your standalone application.
Since the multiplier parameter is optional, you might try removing it to see the
effect.
Close all opened tabs and contract your projects.

Information: Once again you did not have to code a use statement when using the
SPL Graphical editor.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-27

Unit 17 SPL Java Non-Generic Primitive Operators

Full Code Solution - JavaAddOp.java


import com.ibm.streams.operator.AbstractOperator;
import com.ibm.streams.operator.OperatorContext;
import com.ibm.streams.operator.OutputTuple;
import com.ibm.streams.operator.StreamingInput;
import com.ibm.streams.operator.StreamingOutput;
import com.ibm.streams.operator.Tuple;
import com.ibm.streams.operator.model.InputPortSet;
import com.ibm.streams.operator.model.InputPorts;
import com.ibm.streams.operator.model.OutputPortSet;
import com.ibm.streams.operator.model.OutputPorts;
import com.ibm.streams.operator.model.PrimitiveOperator;
import com.ibm.streams.operator.model.InputPortSet.WindowMode;
import
com.ibm.streams.operator.model.InputPortSet.WindowPunctuationInputMode;
import
com.ibm.streams.operator.model.OutputPortSet.WindowPunctuationOutputMode
;
import com.ibm.streams.operator.model.Parameter;

@PrimitiveOperator(name="JavaAddOp", namespace="MyOperator.Utils",
description="Java Operator JavaAddOp")
@InputPorts({@InputPortSet(description="Port that ingests tuples",
cardinality=1, optional=false, windowingMode=WindowMode.NonWindowed,
windowPunctuationInputMode=WindowPunctuationInputMode.Oblivious),
@InputPortSet(description="Optional input ports", optional=true,
windowingMode=WindowMode.NonWindowed,
windowPunctuationInputMode=WindowPunctuationInputMode.Oblivious)})
@OutputPorts({@OutputPortSet(description="Port that produces tuples",
cardinality=1, optional=false,
windowPunctuationOutputMode=WindowPunctuationOutputMode.Generating),
@OutputPortSet(description="Optional output ports", optional=true,
windowPunctuationOutputMode=WindowPunctuationOutputMode.Generating)})
public class JavaAddOp extends AbstractOperator {
private int multiply = 1;
@Override
public synchronized void initialize(OperatorContext context) throws
Exception {
super.initialize(context);
@Override
public synchronized void process(StreamingInput<Tuple> port, Tuple
tuple) throws Exception {
StreamingOutput<OutputTuple> out = getOutput(0);
OutputTuple outTuple = out.newTuple();

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-28

Unit 17 SPL Java Non-Generic Primitive Operators

outTuple.assign(tuple);
outTuple.setInt("result",
multiply * (tuple.getInt("num1") + tuple.getInt("num2")));
out.submit(outTuple);
}
@Parameter(name="multiplier", optional=true)
public void setFilter(int multiply) {
this.multiply = multiply;
}
public int getFilter() {
return multiply;
}
}

Results:
You implemented the same non-generic operator that you did in the previous
demonstration except you coded in Java.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

17-29

SPL Generic Primitive Operators

SPL Generic Primitive


Operators

IBM InfoSphere Streams V4.0


Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the written permission of IBM.

Unit 18 SPL generic primitive operators

Demonstration 1
Code a generic primitive operator

In this demonstration, you will

Write a generic primitive operator


Implement the same type of operation that you did with the nongeneric primitive operators

Describe the concept of defining a generic primitive operator


List the steps necessary to implement a generic primitive operator

SPL Generic Primitive Operators

Copyright IBM Corporation 2015

Demonstration 1: Code a generic primitive operator

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-29

Unit 18 SPL generic primitive operators

Demonstration 1:
Code a generic primitive operator
Purpose:
You want to implement the C++ non-generic primitive operator as a generic
primitive operator. The operator will have a single input stream and a single
output stream.
You are essentially going to implement the C++ non-generic primitive operator as a
generic primitive operator. The operator is to only have a single input stream and a
single output stream. The assumption is that the input and output schemas match each
other, except for the addition of an integer attribute called result that is added to the
output stream schema. Other than this one attribute, you will not know the names of the
attributes, the number of attributes, nor their types.
There is a subtle change to the operator as well. Instead of adding two attributes and
multiplying the results by a parameter value, add all input integer attributes together and
multiply the result by a parameter value.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password

Task 1. Create Your SPL Project.


1.
2.
3.
4.
5.
6.
7.
8.

In Eclipse create a new SPL project by clicking File->New->Project. Select


SPL Project and click Next.
Name the project GenPrimitive. Click Finish.
Right click on GenPrimitive and select New->SPL Source File. Take the
defaults and click Finish.
Create a namespace. Right click on GenPrimitive and select New->SPL
Namespace.
Type in a namespace name of MyOperators.Utils. Keep the folder path as
GenPrimitive/MyOperators.Utils. Click Finish.
Right click on GenPrimitive and select New->C++ Primitive Operator.
For Namespace select MyOperators.Utils.
Specify a name of GenAddOp. Keep Generic operator checked. Click Finish.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-30

Unit 18 SPL generic primitive operators

Task 2. Code Your Generic Operator.


1.

2.

3.
4.

5.

Expand GenPrimitive->Resources->MyOperators.Utils->GenAddOp. The


same set of skeleton templates that you saw with the C++ non-generic operator
were created for you.
Open GenAddOp_h.cgt. Listed are all of the methods for the My_OPERATOR
class. Scroll to the bottom. Under Private: define the following variables:
int totInt;
Mutex _mutex;
The logic for the generic primitive operator will have to be a little different
because you are coding greater flexibility into the operator. Since you do not
know how many input attributes to be totaled and what their names are, you will
use a variable, totInt, to sum all of the integer attributes in the input tuple.
Save your work.
Open GenAddOp_cpp.cgt. Before the
<%SPL::CodeGen::implementationPrologue($model);%>
statement, you need to code some Perl statements that give you access to
input port 0, output port 0, and your input and output tuples. Remember Perl
code goes between <% and %> pairs.
<%
my $inputPort = $model->getInputPortAt(0);
my $outputPort = $model->getOutputPortAt(0);
my $inTupleName = $inputPort->getCppTupleName();
my $outTupleType = $outputPort->getCppTupleType();
%>
In the constructor method for your operator,
MY_OPERATOR::MY_OPERATOR(), you need to write the Perl code that
checks to see if there is a parameter passed and if so, store the value of that
parameter in a Perl variable. If no parameter gets passed, then set the Perl
variable equal to 1.
<%
my $multiplyParam =
$model->getParameterByName("multiplier");
my $multiply = (not $multiplyParam) ? "1" :
$multiplyParam->getValueAt(0)->getCppExpression();
%>

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-31

Unit 18 SPL generic primitive operators

6.

7.

Scroll down through GenAddOp_cpp.cgt till you come to the


MY_OPERATOR::process(...) method. (Once again you want the one for
mutating ports.)
You need to code a combination of both Perl and C++ code. The Perl code
generates C++ code. Where the C++ code remains the same for every instance
of this operator, then static C++ code can be written. Here is the code that is
required. An explanation of what is going on follows this code snippet.
IPort0Type const & ituple = static_cast<IPort0Type
const&>(tuple);
AutoPortMutex apm(_mutex, *this);
<%=$outTupleType%>
otuple;
<%my $total = "totInt = 0";
foreach my
$attr(@ { $inputPort->getAttributes() })
{
%>
otuple.set_
<%=$attr->getName()%>
(ituple.get_<%=$attr->getName()%>());
<%if ($attr->getSPLType() eq "int32") {
$total = $total . " + ituple.get_" .
$attr->getName()."()";
}
}
%>
<%=$total%>;
otuple.set_result(totInt*<%=$multiply%>);
submit(otuple, 0);
So what the heck is going on here?
First you are getting access to the input tuple.
IPort0Type const & ituple = static_cast<IPort0Type
const&>(tuple);
Then there is a C++ statement to make this method single threaded.
AutoPortMutex apm(_mutex, *this);

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-32

Unit 18 SPL generic primitive operators

8.

Next is the definition of the output tuple. The print the value of $outTupleType
and follows it by otuple.
<%=$outTupleType%> otuple;
And finally is the initialization of a Perl variable to totInt = 0.
<%my $total = "totInt = 0";
Next begins the iteration through all of the attributes in the input tuple. For each
input tuple you are generating a call to a set method for the corresponding
output tuple and passing the get method of the input tuple. Essentially you are
setting the output tuples to their corresponding input tuple values. You could
have coded the assign method, like was done in the other primitive operator
demonstrations. But this technique allowed you to see something different.
foreach my $attr (@{$inputPort->getAttributes()}) { %>
otuple.set_<%=$attr->getName()%>(ituple.get_<%=$attr>getName()%>());
Also, for each input attribute, you are checking its type to see if it is a 32-bit
integer. If it is, then the get method of that attribute as well as a plus sign is
concatenated to the Perl variable that was initialized earlier. The value of Perl
variable $total eventually will be a statement that adds each input integer
attribute together and placing the result it totInt.
<%if ($attr->getSPLType() eq "int32") {
$total = $total . " + ituple.get_" . $attr>getName()."()";
}
}%>
Finally you print the value of the $total Perl variable.
<%=$total%>;
You set the output attribute, result, equal to the sum of all integer attributes
multiplied by the parameter passed in.
otuple.set_result(totInt*<%=$multiply%>);
And then the tuple is submitted on port 0.
submit(otuple, 0);
Save your work.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-33

Unit 18 SPL generic primitive operators

Task 3. Update the Operator Model.


1.
2.
3.
4.
5.
6.
7.

Click on the GenAddOp.xml tab.


In the Operator Model editor, expand Operator->C++.
Right click on Parameters and select New Child->Parameter.
In the Properties view, scroll down and expand Misc.
In the Name field type in multiplier.
Keep Optional as true and change the Type to int32.
Save the model. This should build your application.

Task 4. Code the Main.spl.

1.
2.
3.
4.
5.
6.
7.

8.
9.
10.
11.
12.
13.

Once again, the easiest thing to do is to copy operators from the Main.spl in
CppPrimitive to your Main composite in GenPrimitive.
Copy the FileSource operator from the CppPrimitive project into the Main
composite for the GenPrimitive project.
In the SPL Graphical editor palette, expand Toolkits->GenPrimitive>MyOperators.Utils.
Drag the GenAddOp operator to the Main composite.
Connect the output port of the FileSource operator to the input port of the
GenAddOp operator.
Select the GenAddOp operator. In the Properties view, select Output Ports.
Add an output port.
Make this output port schema the same as the input port schema with the
exception that you are going to add an additional attribute. result - int32
(Suggestion is to use the <extends> option and then add an additional
attribute.)
In the Properties view, select Param.
Add the multiplier parameter and set it to 3.
Copy the Custom operator from the CppPrimitive project to the Main
composite of the GenPrimitive project.
Connect the output port of the GenAddOp operator to the input port of the
Custom operator.
Save your work.
Copy the input.dat that is under the data folder in the CppPrimitive project to
the data folder in the GenPrimitive project or create a new file with the
following values:
25,2,"value should be 81"
30,4,"value should be 102"

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-34

Unit 18 SPL generic primitive operators

14. Do the necessary steps to create a Standalone Build and set it to Active.
(Remember that your Main composite is in the default namespace.)

Task 5. Launch the Application.


1.

Launch your standalone application.

Task 6. Extra.

1.

Since this is the last lab of the class, I am sure that every student will want to try
additional permutations.
You might try adding additional int32 attributes to your stream (and obviously to
your data as well) and verify that the operator is generic in the regard that it will
add any number of integer attributes and multiply the result by the parameter
value.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-35

Unit 18 SPL generic primitive operators

Code solution: process()


Here is the code for the process() method.
void MY_OPERATOR::process(Tuple & tuple, uint32_t port)
{
IPort0Type const & ituple = static_cast<IPort0Type
const&>(tuple);
AutoPortMutex apm(_mutex, *this);
<%=$outTupleType%> otuple;
<%my $total = "totInt = 0";
foreach my $attr (@{$inputPort->getAttributes()}) { %>
otuple.set_<%=$attr->getName()%>(ituple.get_<%=$attr>getName()%>());
<%if ($attr->getSPLType() eq "int32") {
$total = $total . " + ituple.get_" . $attr>getName()."()";
}
}%>
<%=$total%>;
otuple.set_result(totInt*<%=$multiply%>);
submit(otuple, 0);
}

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-36

Unit 18 SPL generic primitive operators

Generated C++ Code


void MY_OPERATOR_SCOPE::MY_OPERATOR::process(Tuple &
tuple, uint32_t port)
{
IPort0Type const & ituple = static_cast<IPort0Type
const&>(tuple);
AutoPortMutex apm(_mutex, *this);
SPL::BeJwrMck0ySvNNQSTRsUmKanFyZlmRanFpTklAJFXApq
otuple;
otuple.set_num1(ituple.get_num1());
otuple.set_num2(ituple.get_num2());
otuple.set_desc(ituple.get_desc());
totInt = 0 + ituple.get_num1() + ituple.get_num2();
otuple.set_result(totInt*lit$0);
submit(otuple, 0);
}
Results:
You implemented the C++ non-generic primitive operator as a generic
primitive operator. The operator had a single input stream and a single output
stream.

Copyright IBM Corp. 2009, 2015


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

18-37

IBM Training

Copyright IBM Corporation 2015. All Rights Reserved.

Vous aimerez peut-être aussi