Académique Documents
Professionnel Documents
Culture Documents
IBM Training
Preface
September, 2015
NOTICES
This information was developed for products and services offered in the USA.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for
information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to
state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any
non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
United States of America
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in
certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these
changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the
program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of
those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information
concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available
sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the
examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and
addresses used by an actual business enterprise is entirely coincidental.
TRADEMARKS
IBM, the IBM logo, InfoSphere and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in
many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is
available on the web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Copyright International Business Machines Corporation 2015.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
P-2
Streams Processing
Language Development
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
3-2
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
Demonstration 1
Streams Development Environment
3-31
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
Demonstration 1:
Streams Development Environment
Purpose:
This demonstration allows the student to work with the Eclipse development
environment when working with an InfoSphere Streams Processing Language
program. The student will invoke the Streams Processing Language compiler
using the command line as well.
Estimated Time: 45 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
3-32
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
9.
Important: Each time you restart your image after it has been suspended, check the
ip address to make sure that it did not change. If it did change, update /etc/hosts
using the above instructions to reflect the new ip address
3-33
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
2.
Note: If you receive a warning about the Streams domain refreshing and/or a toolkit
file missing, ignore them - Streams is looking for a specific toolkit directory which
you will create in a later lab.
Eclipse Perspectives
Eclipse perspectives allow you to define frames or views that allow you to
address your Eclipse requirements.
3. When you started Eclipse, the last perspective used is displayed. Lets see how
we got there. In the menubar select Window->Open Perspective->Other.
4. Select the InfoSphere Streams (default) perspective and click OK.
5. The Eclipse interface has a number of views or frames. You can resize any of
the views. If you double-click on the titlebar for a particular view (it will also have
icons in the right hand corner for minimizing and maximizing), it will either
maximize the view or restore it to the size that it was.
4.
In the Project Explorer view, expand the project called Introduction. Then
expand the Resources folder.
Right click on Main.spl and select Open With->SPL Editor.
Look at the first operator. This TCPSource operator acts as a TCP server and
gets data from clients that connect to this TCP server. The format of the data is
described by the News stream schema. This TCPSource operator will emit the
data via a stream called News.
Next there is a Filter operator. This operator observes the stream emitted by
the TCPSource operator and searches through the summary attribute looking
for the character string Clinton. If the string exists, the tuple is emitted on the
output stream called NewsSearch.
3-34
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
5.
At the bottom of the code, is the FileSink operator which writes out a file,
result.txt. Since only the file name is specified, the file is written into the data
directory that is under the project directory. (This is the default data location in
our Streams Studio project. Note that in a production environment this would
need to be changed to a location that is available on whatever resources your
job is running on that require data directory access)
2.
3.
4.
In the Project Explorer view, right click again on Main.spl and this time select
Open With->SPL Graphical Editor. Listed are graphical representations of the
three operators.
In the Outline view, the yellow area indicates the portion of the application that
is currently displayed in the graphical editor. Since our applications will be
relatively simple examples, you can gain some additional real estate by closing
the Outline view.
Click on the News graphical operator. Then, at the bottom of the screen, click
on the Properties tab. (Actually you might want to maximize the Properties
view in order to see things more clearly.
Take a look at some of the properties.
a) Click the General tab. You can rename the schema or change the
operator.
b) Click the Output Ports tab. You can modify the schema.
c) Click the Param tab. You can add or remove parameters.
5.
6.
7.
2.
3-35
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
3.
In the lower right corner of the Eclipse window, you can see a status indicator
while the build is in process. (However, by the time you have read the previous
sentence, the indicator will have disappeared. But you now know to look for it in
the future.)
4.
5.
6.
7.
3-36
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
8.
Once again to see things more clearly, you might want to maximize the
Instance Graph view.
9. On the right side, under Color Schemes, you should see that Health is
checked. Click on the triangle to expand Health to see what the various colors
represent.
10. Move your mouse over the News operator. An information window is displayed.
Notice that your operator is healthy. It is just that no tuples have flowed. That is
the alert.
11. In the Color Schemes area, click on Flow[nTuples/s]. Expand it as well to see
what the colors represent.
In the Instance Graph click on the output port of the NewsSearch operator and
then right-click. Select Show data.
2. Listed are the attributes for the schema. Click OK to select all attributes. If you
had maximized your Instance Graph view, then you will see that it gets
restored. Also a second Properties view has been opened. This is where your
data is to be displayed.
3. Expand the size of this new Properties view to see all of the columns in the
table.
4. Open a command window (click on the Red Hat Application icon (the red hat) at
the top of the screen and select Applications->System Tools->Terminal) or
use one that is already open.
5. Position the command window so that the Streams instance graph operators
are visible.
6. In the command window, change to the /home/labfiles directory.
cd /home/labfiles
7. Next, execute netcat in order to pass data to your Stream applications
TCPSource operator.
nc streamshost 1234 < news_ticker_nodelay.dat
8. Press CTRL-C if necessary to kill the command. Then you can press the up
cursor arrow to recall the previous command and press enter to execute it
again. Do this several times. Notice that when data is flowing, the colors of the
operators in the graph change to show the byte rate.
9. Look in the second Properties view to see the filtered data.
10. Either minimize or close the command window.
11. Return to the Eclipse IDE (make sure that it has the focus) and position the
cursor over one of the operators.
3-37
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
12. You can also get specific port information by positioning the cursor over each
port.
13. In the Streams Explorer view, right click on Streams Jobs->0:Main_0
[Healthy] StreamsInstance@StreamsDomain. Select Metrics->Show
Metrics. Note that the number 0 in 0:Main_0 may vary on your image
depending on how many jobs you have run, etc...
14. In the Metrics tab area, drill down on
default:StreamsInstance@StreamsDomain->Main_0.
15. Click on any of the expanded items. In the right hand frame you can see some
metrics about that item.
16. Expand Main->News. Click Output[0]::News and you can see tuple
information.
17. Close the Metrics tab, the Instance Graph tab, and the Properties tab where
filtered data was displayed.
18. Now you need to stop your application. In the Streams Explorer, under
Streams Jobs, right-click on 0:Main_0 and select Cancel job.
19. Return to the Project Explorer, expand the data directory. (If the data folder
cannot be expanded, then right click on it and select Refresh.)
20. Double-click on result.txt.
Note: Sometimes when using a remote lab environment double-clicking an item
does not work. If that happens to you, select the item and press Enter and possibly
select Open.
(This is the file that was outputted by the FileSink operator.) If there is a
message that the resource is out of sync..., right click on result.txt and select
Refresh. Displayed should be the tuples where the name Clinton was found in
the summary attribute.
21. You can close any opened Editor tabs and contract your Introduction project.
3-38
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
3-39
U n i t 3 S t r e a m s P r o c e s s i n g L a n g u a g e D e ve l o p m e n t
3-40
SPL Programming
Introduction
4-2
Demonstration 1
Main Composite Operators
In this demonstration, you will:
4-25
Demonstration 1:
Main Composite Operators
Purpose:
This demonstration gives you a basic understanding as to how to code a main
composite operator using the SPL Graphical editor and how to launch both a
standalone and a distributed application.
This demonstration will accomplish several goals. One goal is to code an SPL
application. There are two ways to do that. One is to code each operator and
its parameters. The second, which will be the way that you will proceed, is to
use the SPL Graphical editor. The second goal is to cover some capabilities
that were not presented in the course material, that is, the design capabilities
of the SPL Graphical editor.
Estimated Time: 30 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
6.
4-26
7.
Look at Folder path. Click on the drop-down. Listed are various ways to
implement this namespace. It can be implemented using a single directory
called sample.proj or it can be implemented as a multi-level directory,
/sample/proj. For this demonstration choose sample.proj. Then click Finish.
8. Expand FirstProject and then expand Resources.
9. Now create the source file that will contain the main composite. Right click on
FirstProject and select New->Main Composite.
Look at the namespace. Since only one is defined, it is selected by default.
10. Keep the default name of the Main Composite and the default File name. Click
Finish.
11. Now expand the sample.proj folder. (Not the namespace) You can see that
your source file was placed in that directory. Also look in the Edit view. You
have the beginning of your main composite. By default the SPL Graphical
editor is used.
2.
3.
4.
5.
6.
7.
As a designer, you are not particularly familiar with all of the Streams operators.
But you do know that you want an operator that will generate the Hello World
message. In the Palette area, under Design, click Operator and drag it onto the
canvas. Drop it in Main.
You also know that you need a second operator to write the message into a file.
So drag a second operator and drop it in Main to the right of the first operator.
The Main composite area will automatically enlarge.
These two operators are to be connected via a stream. In the Palette view, click
Stream and drop it on the left side of op_1. A light gray square should appear.
Position the cursor over the small square. The square should turn green. Click
the left mouse button once.
Move the cursor to the left side of Op_2. A second green square should appear.
Click the left mouse button once. You have now defined an output port for Op_1
being connected to an input port for Op_2.
Now you want to define the composition of the message that is sent on the
stream. Select the Op_1 icon. It should turn aqua-blue.
Click on the Properties tab towards the bottom half of the screen. In the
Properties view, in the list of menu items on the left side, click Output Ports.
4-27
8.
9.
Scroll down and in the Output stream schema table area you already have an
attribute called, varName, defined. Overtype varName and change it to
message.
In the Type column, click on varType. A little light bulb should appear on the left
edge of the entry field. (You may have to do this twice. Go figure.) Make sure
that the varType is highlighted or that you have totally deleted varType from the
field. Once again, as a designer, you may not know all of the Streams data
types. So, while holding down the ctrl key, press the space bar. A list of data
types appears. Scroll down and double-click rstring.
Important: Each time you save your Streams application, a build is started to
compile the application; the progress messages are shown in the Console view (in
the SPL Build console). If you scroll back through the build progress messages, you
will see some builds that terminated on errors, with messages in red. This will be a
common occurrence while you are building up your application - often times there
will be errors in compilation until your operators have been built up to a working
state. By the time you get to the Build and Launch section of each lab, the
compilation errors should go away (if you have coded your application correctly).
4-28
10. The stream schema is already defined. But you want a different stream name.
Click the Rename pushbutton for Output stream name. Specify a new name of
Hi and click OK.
11. On the left of the Properties, click Param. (You may have to scroll to find it.)
Click the Add pushbutton. Select Iterations and click OK.
12. In the Value column, type in 1u. (Unsigned value of 1)
13. On the left of the Properties, click Output. Expand Hi. For message : rstring,
give it a value of Hello World. Make sure to include the quotes.
14. In the canvas, click on Op_2. As the programmer and having read the
annotation, you know that you will want Op_2 to be a FileSink operator.
15. In the Properties view, select General. Click the Change pushbutton for
Operator. From the displayed list, select FileSink and click OK.
16. Click the Rename pushbutton. Change the name of this operator to sink.
17. In the Description field, type The result.txt file will be written to the default
data directory.
Since the input port of sink is connected to the output port of Op_1, the schema
definition is automatically passed. So no changes are needed here.
Note: Beginning with version 4.0, Streams applications do not have a default data
directory unless you explicitly set one in the build specification. Here, we are simply
taking advantage of a feature of Streams Studio, which will provide that specification
by default. It works, because we only have a single host. Because Streams is a
distributed system that does not require a shared file system, you have to be careful
when specifying file paths. A process accessing a file must run on a host that can
reach it; in general this means specifying absolute paths and constraining where a
particular process can run; using relative paths and a default data directory makes
the application less portable.
18. On the left of the Properties, click Param. The file parameter is required, so it is
already displayed. Change parameterValue to "result.txt". Include the
quotation marks.
19. Save your work.
4-29
20. Click on the Console tab to display the Console view. The console view might
still be the Streams Studio Console. To switch to the SPL Build console, click
the down arrow for the screen icon that is fourth from the right on the Console
titlebar and select SPL Build.
Note that by just saving your work, your program was automatically built.
21. In the Project Explorer view, right click on Main.spl. Select Open With->SPL
Editor to see the generate SPL code. Any changes made to the code are
reflected in the SPL Graphical editor as well.
2.
3.
4.
5.
6.
7.
4-30
2.
3.
4.
5.
6.
7.
8.
9.
Before you can launch a distributed application, your Streams domain and
instance must be started. We will check to make sure these are started. Click on
the Streams Explorer tab.
Expand Streams Instances. If your instance is running, you will see something
similar to: default:StreamsInstance@StreamsDomain [Running, hh:mm:ss]
If for some reason your instance was not running, you could right click on the
instance here and select Start Instance. (You might want to switch your console
window to Streams Studio Console to verify that the instances started
correctly. (You can also scroll the Streams Explorer to the right to view the
status of your instance.)
Return to the Project Explorer. Under FirstProject right click on Distributed
and select Launch. (The distributed application was built when you saved you
work. Remember, it was originally set to being active.)
Click the Apply pushbutton and then the Continue pushbutton. Click OK on the
Warning: Launch dialog.
Right click on data and select Refresh. Drill down and open the result.txt file.
You should see the same results as you saw with the standalone application.
Close result.txt.
Since this is a distributed application, you must terminate the running
application. Return to the Streams Explorer and expand Streams Jobs.
Right click on your job and select Cancel job.
Return to the Project Explorer. Close your opened tabs and contract your
FirstProject project.
Results:
This demonstration gave you a basic understanding as to how to code a main
composite operator using the SPL Graphical editor and how to launch both a
standalone and a distributed application.
This demonstration accomplished several goals. One goal was to code an
SPL application. There were two ways to do that. One was to code each
operator and its parameters. The second, which you did, is to use the SPL
Graphical editor. The second goal was to cover some capabilities that were
not presented in the course material, that is, the design capabilities of the SPL
Graphical editor.
4-31
Adapter Operators
Adapter Operators
5-2
Demonstration 1
Source and Sink Type Operators
Write your first Streams program to use Source type and Sink type
operators to copy a file
Adapter Operators
5-42
Demonstration 1:
Source and Sink Type Operators
Purpose:
This demonstration has the student code a FileSource operator to read from a
file and then use a FileSink operator to externalize the data.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 30 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
Graphical Editor
Task 1. Requirements for Source and Sink operators.
Here are the requirements for your source and sink operators. This portion of the
demonstration will step you through using the SPL Graphical editor. If you desire,
you can use the SPL Editor to actually code the operators.
Use a FileSource operator to read the file /home/labfiles/stock_report_nodelay.dat
that is in a csv format. It will emit a stream called StockReport. This stream will be
observed by a FileSink operator. That operator will write the tuple data to a file
called copyfile.dat that will be located in the default data directory created by
Streams Studio. This file will also be in a csv format.
If you choose to code the operators then proceed to the demonstration instructions
for coding operators portion of this demonstration.
1. If your Eclipse development environment is not open, double-click on the
Streams Studio icon on your desktop. Accept the default workspace. Enter
ibm2blue as the secure storage password if prompted.
2. In Eclipse create an SPL application project by clicking File->New->Project.
Expand InfoSphere Streams Studio. This time, select SPL Application
Project. Click Next.
5-43
3.
4.
5.
5.
6.
7.
5-44
5-45
Export
1.
5-46
Import
21. In the Project Explorer, create another SPL Source File under ExportImport.
Give it a Main Composite name of MyImport. (Keep the Generate Main
composite checked.)
22. In the Palette, expand Toolkits->spl->spl.adapter. Drag Import into the
MyImport composite.
23. Also drag the FileSink into the MyImport composite.
24. Connect the two ports.
5-47
25. Select the Import operator and in the Properties view, click Param.
a) There are two ways for the Import operator to get access to a stream, by
subscription and by stream id. The demonstration uses the stream id
technique. So select the subscription parameter and click the Remove
pushbutton.
b) Click the Add pushbutton. Select applicationName and streamId. Then
click OK.
c) Value for applicationName is "MyExport".
d) Value for streamId is "ExportedNews".
26. Select Output Ports.
a) Change the Output stream name to In.
b) In the Output stream schema table, select varName and type <extends>.
c) In the Type column, key in NewsTicker.
27. Click the FileSink operator.
28. In the Properties view, click General.
a) Rename the Alias to sink.
29. Click Param. Set the value for file to "result.dat".
a) Click the Add pushbutton. Select format and click OK.
b) Set the value for format to csv.
30. Save your work.
5-48
5-49
SPL Editor
Code solutions follow this demonstration.
5-50
7.
8.
5-51
6.
7.
8.
Export
1.
5-52
14. Use the TCPSource operator to get access to the data. Code the following
operator:
stream<NewsTicker> News = TCPSource() {
param
role : server;
name : "streamshost";
port : 1234u;
}
15. Next add an export operator to your MyExport.spl source that
Observes the News stream
xports a stream with a streamId of ExportedNews
16. Save your work.
Import
17. Under ExportImport create another SPL Source File. Give it a name of
MyImport.
18. Close the MyImport.spl tab in the Editor view.
19. In the Properties view, right-click MyImport.spl and select Open With->SPL
Editor.
20. Add a graph clause to MyImport.spl.
21. Code an Import operator
a) Stream name is - In
b) Use the NewsTicker type for your schema definition
c) The application name will be - MyExport (Be aware that the application
name is a fully qualified name. Since we are not using a namespace, we
only have to reference the name of our exporting application. But if that
application was in a namespace, then the namespace must be specified as
part of the application name.)
d) The stream Id is - ExportedNews
22. Add a FileSink to the MyImport.spl code that observes the In stream and writes
it out in a csv format to a file called result.dat in the default data directory.
() as Sink = FileSink(In) {
param file
: "result.dat";
format : csv;
}
23. Save your work.
5-53
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
Click on the Streams Explorer tab. If you need to, expand Streams Instances.
Confirm your instance is running. If not, right click on the instance and select
Start Instance. If you had to start your instance, display the Console view and
select Streams Studio Console. This will let you see if your instance started
correctly. You could also scroll to the right in the Streams Explorer to see the
status of your instance.
Return back to the Project Explorer.
Under ExportImport expand <default_namespace>.
Right click on MyExport and MyImport individually and select Launch. On the
Edit Configuration dialogs, select Apply and then select Continue.
Open a command window by right-clicking on the desktop and selecting Open
in Terminal.
Change directory to /home/labfiles
cd /home/labfiles
You are now going to execute netcat in order to pass data to your Stream
applications TCPSource operator.
nc streamshost 1234 < news_ticker_nodelay.dat
In the Project Explorer right click on the data folder and select Refresh.
Expand the data folder and open result.dat. The data read by the TCPSource
in MyExport.spl should be displayed.
Close all of your open tabs.
Contract the ExportImport project.
Return to the Streams Explorer.
Under Streams Jobs, right click on MyExport and select Cancel job. Then do
the same for MyImport.
You can close the command window as well.
5-54
Code solutions
namespace application;
composite SourceSink {
graph
stream <rstring ticker, rstring tradeDate, rstring
closingPrice, rstring volume> StockReport = FileSource() {
param file
: "/home/labfiles/stock_report_nodelay.dat";
format : csv;
}
() as Sink = FileSink(StockReport) {
param file
: "copyfile.dat";
format : csv;
}
}
Common.spl
type NewsTicker = rstring agency, rstring category, rstring
summary ;
MyExport.spl
composite MyExport
{
graph
stream<NewsTicker> News = TCPSource() {
param role : server;
name : "streamshost";
port : 1234u;
}
() as Exporter = Export(News) {
param streamId : "ExportedNews";
}
5-55
MyImport.spl
composite MyImport
{
graph
stream<NewsTicker> In = Import(){
param applicationName : "MyExport";
streamId : "ExportedNews";
}
() as Sink = FileSink(In) {
param file : "result.dat";
format: csv;
}
}
5-56
6-2
Demonstration 1
The Beacon and Custom operators
Demonstration 1
6-15
Demonstration 1:
The Beacon and Custom operators
Purpose:
You want to work with two utility operators. The Beacon operator is a way to
generate test tuples. The Custom operator gives you a skeleton operator
where you can submit tuples.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
6-16
Graphical Editor
Task 1. Beacon Operator.
1.
13.
14.
15.
6-17
6-18
6-19
6-20
SPL Editor
The code solution follows this demonstration.
Note: This will be the last time that you will be told to code the graph clause. By now
you should grasp the concept that it is needed.
The first operator for you to code is a Beacon operator. This operator is to emit
ten tuples on a stream. Each tuple has only a single integer attribute which starts
with a value of zero and gets incremented by 1 for each emitted tuple.
Fortunately there is a function IterationCount() that can be called to get the
iteration count.
7. Code a Beacon operator with the following criterion:
Stream name - Beat
Iterations - 10
Attribute - uint64 val
6-21
Next code a Custom operator. This operator is to observe the Beat stream and
is to emit two streams, one called Even and one called Odd. It will alternate
between the two output streams for each input tuple. Meaning tuples, 1, 3,5,7,
and 9 will be emitted on the Odd stream. Tuples 0,2,4,6, and 8 will be emitted
on the Even stream.
You will need to make use of a logic clause. Define a boolean variable called sw,
set it initially to true, and allow it to be changed. Then each time a new tuple
arrives on the Beat stream, check the value of sw to determine on which output
port the tuple is to be written. Your code should look as follows:
logic state : mutable boolean sw = true;
logic onTuple Beat :{if (sw) {
sw = false;
submit(Beat,Even);
}
else {
sw = true;
submit(Beat, Odd);
}
}
2.
3.
You need to have some way to verify that your program is working properly,
Code another Custom operator that will print the lone attribute value in the tuple
from the even stream. Remember, you will have to cast the integer attribute to a
string.
Save your work.
6-22
6.
7.
6-23
Code solutions
composite Main {
graph
stream<uint64 val> Beat = Beacon() {
param iterations
: 10u;
output Beat
: val = IterationCount();
}
(stream<uint64 val> Even; stream<uint64 val> Odd) =
Custom(Beat) {
logic state
onTuple Beat
}
() as PrintEven = Custom(even) {
logic onTuple Even
: println((rstring)val);
}
}
6-24
Demonstration 2
Functors and Filters
Demonstration 2
6-37
Demonstration 2:
Functors and Filters
Purpose:
This demonstration has you code a Functor operator to select only those
records that are for rooms that are laboratories and convert the recorded
room temperature from Fahrenheit to Celsius. You will then code user logic in
a Filter operator and finally you will conclude with coding a DynamicFilter
operator.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 30 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
6-38
Graphical Editor
Task 1. Background.
Read data in a csv format from the /home/labfiles/room_sensor.dat file. Filter the
tuples so that you are only working with data from labs (roomType is equal to an
L). Enrich and transform the tuple data. Concatenate the character string
Laboratory to the roomId and convert the temperature of the room from
Fahrenheit to Celsius. Then write the results to a file called labtempdata.dat. This
will require FileSource, Filter, Functor, and FileSink operators. (You could eliminate
the Filter operator and just include the predicate in the Functor operator, but then
you would not get to code a Filter operator.)
uint32
rstring
rstring
float64
uint32
Drag a Filter operator into the Main composite. (The Filter operator is under
spl.relational.)
Connect the output port of the FileSource operator to the input port of the Filter
operator.
6-39
8.
float64
6-40
6-41
7.
8.
9.
6-42
17. Click on the bottom output port for the DynamicFilter operator. Connect that
port to the input port that appears on the new Custom operator.
18. For the properties of the second Custom operator
a) In the General properties, rename the Alias to Usink.
b) In the Logic edit area, code the following:
onTuple Unmatched: println("Unmatched: "+ (rstring)num);
19. Save your work if needed.
20. Finally create a file called keyfile.txt in the default data directory. In the Project
Explorer, right-click on Resources->data and select New->File. Give it a File
name of keyfile.txt. Click Finish.
21. Add two values to the file, 2 and 5 on separate lines. Save the file and close it.
2
5
2.
3.
Once again, since we are printing out the results to the console, you may like to
build a standalone application. Then launch your application. It will terminate on
its own. If you set your build type to standalone before you do a save of your
code, then a standalone application will automatically be built.
Look at the printed results in the console. You should see several initial
iterations of the data where all of the values are Unmatched. Then once
keyfile.txt is read by the FileSource operator and that data is passed to the
DynamicFilter operator, you should begin to see that the values of 2 and 5 are
being Matched.
Close all opened tabs. Contract your DynamicFilter project.
6-43
SPL Editor
Task 1. Background.
Read data in a csv format from the /home/labfiles/room_sensor.dat file. Filter the
tuples so that you are only working with data from labs (roomType is equal to an
L). Enrich and transform the tuple data. Concatenate the character string
Laboratory to the roomId and convert the temperature of the room from
Fahrenheit to Celsius. Then write the results to a file called labtempdata.dat. This
will require FileSource, Filter, Functor, and FileSink operators. (You could eliminate
the Filter operator and just include the predicate in the Functor operator, but then
you would not get to code a Filter operator.)
Code solution follows this demonstration.
uint32
rstring
rstring
float64
uint32
Code a Filter operator that observes the SensorSource stream and only
forwards those tuples were the roomType is equal to "L". The output streams
name is LabOnly which will have the same schema as the input SensorSource
stream
6-44
7.
float64
c) Output attributes
Output the attribute labName so that the character string Laboratory is
concatenated with the roomId attribute ("Laboratory " + roomId)
Output the attribute cTemp as ((temp - 32.0) * 5.0 / 9.0)
8.
9.
Code the following FileSink operator which observes the NewLabData stream
and writes the tuples to a file, labtempdata.dat, where the output data is in a
csv format
Save your updates.
6-45
3.
Right click on DynamicFilter and select SPL Source File. Take the defaults
and click Finish.
4. Close the SPL Graphical editor and open Main.spl using the SPL editor.
5. In the Editor view, code a Beacon operator that generates 30 tuples with a 0.1
second pause between emitting each tuple.
The properties for the Beacon operator are:
a) The output stream name is Data
b) The schema for the stream is:
num
uint64
otherData rstring
c) Both the iteration and period parameter will be used. iteration is 30 and
period is 0.1.
d) The output for the Data stream will set
num equal to IterationCount() % (uint64)10
otherData equal to "Other Data"
6.
Next code a FileSource operator that will read a file that contains the values
that are to be used in the filter predicate. This operator is to wait one and a half
seconds before it reads in the data.
The properties for the FileSource to:
a) Output stream name is AddKey
b) The schema for AddKey
key uint64
c) The file to be read is keyfile.txt
d) Format for the file is csv
e) Specify an initDelay value of 1.5
7.
Code a DynamicFilter that observes the Data stream that is emitted by the
Beacon operator on input port 0 and also observes the AddKey stream, emitted
by the FileSource operator, on input port 1. It emits two streams. The stream on
output port 0 is called Matched. The stream on output port 1 is called
UnMatched. The schema for both output streams will have the same schema
as the Data stream. The DynamicFilter The attribute that is examined by the
filter is the num attribute in the Data stream. The key attribute in the AddKey
stream is used to add new values to the filtering criterion.
Code two Custom operators to print out the results.
6-46
8.
The first Custom operator is to have an alias of Msink. This is to observe the
Matched stream. Using the onTuple of the logic clause execute the following:
println("Matched: "+ (rstring)num)
9. Code a second Custom operator that has an alias of Usink. This operator is to
observe the Unmatched stream. Using the onTuple of the logic clause execute
the following:
println("Unmatched: "+ (rstring)num)
10. Save your work.
11. Finally create a file called keyfile.txt in the default data directory. In the Project
Explorer view, right-click on data and select New->File. Give it a File name of
keyfile.txt. Click Finish.
12. Add two values to the file, 2 and 5 on separate lines. Save the file and close it.
2
5
2.
3.
Once again, since we are printing out the results, you may like to build a
standalone application. Then launch your application. It will terminate on its
own. If you set your build type to standalone before you do a save of your code,
then a standalone application will automatically be built.
Look at the printed results. You should see several initial iterations of the data
where all of the values are Unmatched. Then once keyfile.txt is read by the
FileSource operator and that data is passed to the DynamicFilter operator, you
should begin to see that the values of 2 and 5 are being Matched.
Close all opened tabs. Contract your DynamicFilter project.
6-47
Code solutions
Code for Filter and Functor operator
composite Main {
graph
stream<uint32 time, rstring roomId, rstring roomType,
float64 temp, uint32 lastMotion> SensorSource =
FileSource() {
param
file
format
: /home/labfiles/room_sensor.dat";
: csv;
}
stream<SensorSource> LabOnly = Filter(SensorSource){
param
filter
: roomType == "L";
}
stream<rstring labName, float64 cTemp> NewLabData =
Functor(SensorSource) {
param
filter
: roomType == "L";
output
NewLabData
: labName = "Laboratory " + roomId,
cTemp = ((temp-32.0)*5.0/9.0);
}
() as Sink1 = FileSink(NewLabData){
param
file
: "labtempdata.dat";
format
: csv;
}
}
Code for DynamicFilter operator
composite Main {
graph
stream<uint64 num, rstring otherData> Data = Beacon() {
param
iterations
: 30u;
period
: 0.1;
output
Data
: num = IterationCount() %
(uint64)10,
otherData = "Other Data";
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
6-48
}
stream<uint64 key> AddKey = FileSource() {
param
file
: "keyfile.txt";
format
: csv;
initDelay
: 1.5;
}
(stream<Data> Matched;stream<Data> Unmatched) =
DynamicFilter(Data;AddKey) {
param
key
: Data.num;
addKey
: AddKey.key;
}
() as Msink = Custom(Matched) {
logic
onTuple Matched
: println("Matched: "+
(rstring)num);
}
() as Usink = Custom(Unmatched) {
logic
onTuple Unmatched : println("Unmatched: "+
(rstring)num);
}
}
6-49
Demonstration 3
Splits
In this demonstration, you will:
Demonstration 3
6-59
Demonstration 3:
Splits
Purpose:
You want to code a Split operator to split a single stream into two streams.
The split criterion will be based upon an input attribute that is of type list.
You are going to code a FileSource operator that reads records from a file.
Within the data will be a list of integers. The emitted stream from the
FileSource is observed by a Split operator. Based upon the values in the
integer list, the current tuple gets forwarded to the correct stream. The data
then is printed in order to see how the tuples were split.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
6-60
Graphical Editor
Task 1. Split operator.
1.
2.
3.
4.
5.
6.
This Split operator is to observe stream In and have two output ports. The stream
for output port 0 is to be called Out1. The stream for output port 1 is to be called
Out2. The output schemas will be the same as the schema of the input stream.
Use the values in the attribute idx to determine to which output port the tuple is to
be directed.
A wrinkle is being added to this operator. I would like for you to add user logic so
that when a tuple is observed on the input port, the following is executed.
println("out: " + name + (rstring)idx);
7. Connect the output port of the FileSource operator to the input port of the Split
operator.
8. Properties for the Output Ports
a) For Port 0, rename the stream name to Out1.
b) Set the output stream schema for Port 0 to <extends> In
c) Add a second output port.
d) For Port 1, rename the stream name to Out2.
e) Set the output stream schema for Port 1 to <extends> In
Copyright IBM Corp. 2009, 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
6-61
9.
In the Properties view, click Logic. In the logic edit area, code the following:
onTuple ln : println("out: " + name + (rstring)idx);
10. In the Properties view, click Param
a) Add the index parameter
b) The value for the index parameter is idx.
11. Use a single Custom operator to print whether a tuple is arriving on port 0 or
port 1. Drag a Custom operator to the Main composite.
12. Connect the top output port of the Split operator to the newly appeared input
port for the Custom operator.
13. Connect the second output port of the Split operator to a second "magically
appearing" port for the Custom operator.
14. In the Properties view for the Custom operator, click General.
15. Rename the alias to PrintSink.
16. In the Properties view, click Logic. In the Logic edit area, code the following:
onTuple Out1: println("Out1");
onTuple Out2: println("Out2");
17. Save your work.
6-62
SPL Editor
Code solution follows this demonstration.
6.
Code a Split operator. This Split operator will observe stream In and have two
output ports. The stream for output port 0 is to be called Out1. The stream for
output port 1 is to be called Out2. The output schemas for both output ports will
be the same as the input schema. You will use the values in the attribute idx to
determine to which output port the tuple is to be directed.
We are going to add a wrinkle to this operator. I would like for you to add user
logic so that when a tuple is observed on the input port, the following is
executed.
println("out: " + name + (rstring)idx);
The Split operators properties
a) The logic clause will be
logic onTuple In : println("out: " + name + (rstring)idx);
b) The param clause will be
param index: idx;
Code a single Custom operator to print whether a tuple is arriving on port 0 or
port 1.
6-63
7.
8.
6-64
Code solution
composite Main {
graph
stream<rstring name, int32 num, list<int32> idx> In =
FileSource() {
param file
: "/home/labfiles/splitfile.dat";
format
: csv;
}
(stream<In> Out1; stream<In> Out2) = Split(In) {
logic onTuple In : println("out: " + name +
(rstring)idx);
param index
: idx;
}
() as PrintSink = Custom(Out1 : Out2) {
logic onTuple Out1 : println("Out1");
onTuple Out2 : println("Out2");
}
}
6-65
U n i t 7 W i n d o wi n g a n d J o i n s
7-2
U n i t 7 W i n d o wi n g a n d J o i n s
Demonstration 1
Joins
In this demonstration, you will:
Use the Join operator
Demonstration 1: Joins
7-40
U n i t 7 W i n d o wi n g a n d J o i n s
Demonstration 1:
Joins
Purpose:
You want to code a Join operator to join an electrical companys rate data
with customers usage data in order to determine billing information.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
7-41
U n i t 7 W i n d o wi n g a n d J o i n s
Graphical Editor
Task 1. Background information.
In order for you to write the correct Join operator, you need some background
information and some assumptions.
Hourly records for electrical power consumption are captured for each home
using smart meters and are streamed every twenty-four hours. (For testing
purposes, this data will be in a file, elec_usage.dat.)
The electrical power company has rates based upon time of day usage that differ
with the summer and winter seasons. (For testing purposes, this data will be in a
file, elec_pricing.dat.)
Power consumption tuples are to be joined with the appropriate rate tuple in
order to calculate billing information.
1.
Look at the test data. In Eclipse select File->Open File. Drill down on
File System->home->labfiles. Double-click on elec_pricing.dat.
There are five records for the summer season, two sets of hours that are
designated as off peak hours, two designated as intermediate and one
designated as peak, each with their own rates.
Also there are records for the winter season as well and that the first winter rate
record has a delay value.
For this to work, rate tuples for a particular season must be held for a period of
time. Then when it is time to switch to another seasons rates, the rates for the
new season are read and held. To make this process flexible, a version number
has been assigned to each group of rate records. Through the use of an attribute
delta sliding window that is based upon this version number and a delta amount
of 0, the power company could at some time in the future increase the billing
granularity without changing the application.
For testing purposes, there are both summer and winter rates in the pricing file
and there are both summer and winter usage records in the usage file. To be
able to simulate matching summer pricing with summer usage and winter pricing
with winter usage, delay values in the data must be used.
7-42
U n i t 7 W i n d o wi n g a n d J o i n s
7-43
U n i t 7 W i n d o wi n g a n d J o i n s
5.
6.
7.
8.
9.
file - /home/labfiles/elec_usage.dat
format - csv
hasDelayField - true
Drag a Join operator to the Main composite.
Connect the output port for the Consumer FileSource to the top port of the
Join operator.
Connect the output port for the Company FileSource to the bottom port of the
Join operator.
In the Properties view for the Join operator, click Input Ports. (Change the
port aliases. Although this is not a requirement, later on, it will make things less
confusing.)
a) For Port 0 rename the Alias to ElecUse.
b) For Port 1 rename the Alias to ElecRate.
7-44
U n i t 7 W i n d o wi n g a n d J o i n s
7-45
U n i t 7 W i n d o wi n g a n d J o i n s
3.
4.
5.
7-46
U n i t 7 W i n d o wi n g a n d J o i n s
SPL Editor
See Background section in the Graphical Editor steps above before performing the
following steps.
Code solution appears after this demonstration.
7-47
U n i t 7 W i n d o wi n g a n d J o i n s
6.
Code a second FileSource operator that will read the data in the elec_usage.dat
file.
a) Output stream name is Usage
b) Schema for the Usage stream:
time
uint32
device
rstring
wattshours decimal32
c) Param
7.
file - /home/labfiles/elec_usage.dat
format - csv
hasDelayField - true
In your Main.spl program add a Join operator that will join the Pricing stream
with the Usage stream. Your Join operator will
a) Emit a stream called ElectricBill.
b) Read the Pricing stream into a sliding window that keeps all tuples for a
particular version of pricing (delta(version, 0u)
c) For the Usage stream be a one-sided join. So the Usage stream will have
count(0)
d) Remember, usage records are hourly-based but the pricing records are
time period based. You want to match where the time attribute from the
Usage stream is greater than or equal to the startTime attribute from the
Pricing stream and less than the endTime attribute from the Pricing
stream.
e) Since you are outputting customer billing data, you want to emit the
following using an output clause for the ElectricBill stream:
time - from the Usage stream
season - from the Pricing stream
ratecat - from the Pricing stream
device - from the Usage stream
kW - wattshours/ 1000.00dw
rate - price from the Pricing stream)
bill - (price * (wattshours / 1000.00dw))
8.
7-48
U n i t 7 W i n d o wi n g a n d J o i n s
9.
For Param
a) file - pricingdetail.dat
b) format - csv
3.
4.
5.
7-49
U n i t 7 W i n d o wi n g a n d J o i n s
Code solution
composite Main
{
graph
stream<uint32 startTime, uint32 endTime, uint32 version,
rstring season, rstring ratecat, decimal32 price> Pricing =
FileSource() {
param file
format
hasDelayField
: "/home/labfiles/elec_pricing.dat" ;
: csv ;
: true ;
}
stream<uint32 time, rstring device, decimal32 wattshours>
Usage = FileSource() {
param
file
: "/home/labfiles/elec_usage.dat" ;
format
: csv ;
hasDelayField : true ;
}
stream<uint32 time, rstring season, rstring ratecat, rstring
device, decimal32 kW, decimal32 rate,
decimal32 bill> ElectricBill = Join(Pricing; Usage) {
window
Pricing
Usage
param match
output ElectricBill
: sliding, delta(version,0u);
: sliding, count(0);
: time >= startTime && time < endTime;
: kW = wattshours / 1000.0dw,
rate = price,
bill = price * (wattshours / 1000.0dw);
}
() as Sink = FileSink(ElectricBill) {
param file
: "pricingdetail.dat" ;
format
: csv ;
}
}
7-50
U n i t 7 W i n d o wi n g a n d J o i n s
7-51
Aggregation, Punctuation,
and Sorting
8-2
Demonstration 1
Aggregate
Demonstration 1: Aggregate
8-16
Demonstration 1:
Aggregate
Purpose:
This demonstration has you code an Aggregate operator to calculate the
number of records for a particular stock, the average closing price and the
total volume of shares. It will also demonstrate the difference between
groupBy and partitioning.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 20 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
8-17
Graphical Editor
Task 1. Background.
Use a tumbling window to access three tuples. Then calculate the total number of
tuples (makes sense doesnt it? Tumbling window with three tuples and you need to
count them.), the average price, and the total number of shares. Do this using the
groupBy parameter and then via partitioning in order to see the difference.
rstring
rstring
decimal32
uint32
c) file - /home/labfiles/stock_report_nodelay.dat
d) format - csv
6.
rstring
int32
decimal32
uint32
8-18
8.
8-19
7.
8.
8-20
SPL Editor
Use a tumbling window to access three tuples. Then calculate the total number of
tuples (makes sense doesnt it? Tumbling window with three tuples and you need to
count them.), the average price, and the total number of shares. Do this using the
groupBy parameter and then via partitioning in order to see the difference.
The solution code follows this demonstration.
rstring
rstring
decimal32
uint32
c) file - /home/labfiles/stock_report_nodelay.dat
d) format - csv
6.
rstring
int32
decimal32
uint32
8-21
8.
8-22
7.
8.
8-23
Code solution
Aggregate operator using groupBy
composite Main {
graph
stream<rstring symbol, rstring dateTime, decimal32 closingPrice,
uint32 volume> StockReport = FileSource(){
param
file
format
: "/home/labfiles/stock_report_nodelay.dat";
: csv;
}
stream<rstring symbol, int32 recordCnt, decimal32 avgPrice,
uint32 volume> StockReducer = Aggregate(StockReport){
window
param
output
StockReport
groupBy
: tumbling, count(3);
: symbol;
StockReducer
: symbol = Any(symbol),
avgPrice = Average(closingPrice),
recordCnt = Count(),
volume = Sum(volume);
}
() as Sink = FileSink(StockReducer) {
param
file
: "stock_reduced.dat";
format
: csv;
}
}
8-24
8-25
Demonstration 2
Sort and Punctor
8-43
Demonstration 2:
Sort and Punctor
Purpose:
You want to code a Sort operator and a Punctor operator in order to group
transactions for the same state as well as highlighting transactions that are
equal to or greater than 1000.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated Time: 40 minutes
User/Password:
- student/ibm2blue
- streamsadmin/ibm2blue
- root/password
8-44
Graphical Editor
Gather ten tuples in a window and sort them on the specified sort keys. Then use
the Punctor operator to insert punctuation whenever the value for the state changes.
But just doing that would be too easy so you are to also isolate any transaction
where the amount attribute is 1000 or greater. In this case, isolate means to output
a punctuation both before and after the tuple.
uint32
rstring
rstring
decimal32
c) file - /home/labfiles/storesales.dat
d) format - csv
6.
7.
8.
8-45
3.
4.
8-46
5.
6.
8-47
SPL Editor
Gather ten tuples in a window and sort them on the specified sort keys. Then use
the Punctor operator to insert punctuation whenever the value for the state changes.
But just doing that would be too easy so you are to also isolate any transaction
where the amount attribute is 1000 or greater. In this case, isolate means to output
a punctuation both before and after the tuple.
The solution code follows this demonstration.
uint32
rstring
rstring
decimal32
c) file - /home/labfiles/storesales.dat
d) format - csv
6.
8-48
8-49
2.
3.
8-50
Code solution
With no punctuation output
composite Main
{
graph
stream<uint32 storeNumber, rstring city, rstring st,
decimal32 amount> SalesDetail = FileSource(){
param
file
format
: "/home/labfiles/storesales.dat" ;
: csv ;
}
stream<SalesDetail> SortedDetail = Sort(SalesDetail){
window
param
SalesDetail
sortBy
: tumbling, count(10) ;
: storeNumber, city, st, amount ;
}
stream<SortedDetail> DividedDetail = Punctor(SortedDetail) {
param
punctuate
: st != SortedDetail[1].st ||amount
>= 1000.00dw
||(SortedDetail[1].amount >= 1000.00dw;
position
: before ;
}
() as FileSink_1 = FileSink(DividedDetail) {
param
file
format
: "divideddetail.dat";
: csv ;
}
}
FileSink that allows punctuation
() as FileSink_1 = FileSink(DividedDetail) {
param
file
: "divideddetail.dat";
format
: csv ;
writePuncutations : true
8-51
8-52
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
9-2
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
Demonstration 1
Barriers and Switches
9-21
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
Demonstration 1:
Barriers and Switches
Purpose:
You will code two applications. One will employ a Barrier operator to combine
related tuples from multiple streams and another one will demonstrate the
working of the Switch operator.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
9-22
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
4.
5.
9-23
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
12. Drag a Delay operator to the Main composite. Position it to the right of the 2nd
Beacon operator.
13. Connect the output port of the Beacon operator to the input port of the Delay
operator.
14. Specify the following properties for the Delay operator.
a) Output stream - DelayedData
b) Output schema - same as the input stream
c) For Param, set the delay to 5.0
15. Drag a Barrier operator to the Main composite.
16. Connect the output port of PrtFast to one of the input ports of the Barrier
operator. Connect the output port of the Delay operator to the other input port of
the Barrier operator.
17. Specify the following properties for the Barrier operator.
a) Output stream - CombinedData
b) Output schema
source1 rstring
source2 rstring
c) In the Properties view, click Output. Expand combinedData and set
source1 barrierInfo1
source2 barrierInfo2
18. Drag a Custom operator to the Main composite.
19. Connect the output port of the Barrier operator to the input port of the Custom
operator.
20. In the Properties view for this Custom operator, click General and rename the
Alias to PrtCombined.
21. In the Properties view, click Logic. In the edit area, code the following:
onTuple CombinedData : println(source1 + " " + source2);
22. Save your work.
9-24
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
4.
9-25
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
5.
9-26
U n i t 9 Ti m i n g a n d c o o r d i n a t i o n
2.
3.
Build a standalone application and then launch it. (Since you are printing data,
you must run as a standalone application.)
You should see "Beacon Data" print for about five seconds. Then you should
see the printing pause for about five seconds. Then this process repeats several
times.
Close the opened tabs in the Editor view.
Also, contract the SwitchProj project as well.
Results:
You coded to two applications. One employed a Barrier operator to combine
related tuples from multiple streams and another one demonstrated the
working of the Switch operator.
9-27
Consistent Regions
Consistent Regions
Demonstration 1
Consistent Regions
Consistent Regions
11-20
Demonstration 1:
Consistent Regions
Purpose:
You want to create a Consistent Region over a Beacon and FileSink operator
to create a fault tolerant application.
Note: This demonstration has two sets of instructions: one using the SPL Graphical
Editor, and another for coding manually using the SPL Editor, with code solutions
posted at the end.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
11-21
Graphical Editor
Task 1. Create the operators.
1.
2.
3.
4.
5.
6.
7.
Now we will add a FileSink to write results to. The operator will include logic to
simulate a fault.
Drag a FileSink operator to the Main composite.
Connect the Beacon to the FileSink operator.
11-22
8.
4.
5.
6.
7.
8.
11-23
11-24
11. Right-click the job (the most recent topmost one, if you have other un-cancelled
jobs) and select Show Instance Graph.
12. Bring up the terminal you opened and position it so you can see it alongside the
instance graph.
13. Once the operator is up and running (The instance graph is all green) you
should see the sequence of numbers being printed in the terminal as they are
written to the file.
14. Also notice the grey bars in the instance graph will periodically change to
>>>>>> symbols. This indicates a checkpoint for the consistent region is being
established.
15. Look for a few things to happen as our FileSink runs into its programmed crash
at the 60th tuple:
a) The terminal will stop printing numbers after 59.
b) The instance graph will turn red and alerts will be thrown.
c) As the FileSink operator is relaunched, the grey bars in the graph will change
to <<<<<<, indicating a reset to the last checkpoint.
d) The terminal will resume printing numbers, starting with 60. No tuples have
been lost.
16. When youre ready, close the terminal and cancel the job by right-clicking it in
the Streams Explorer and selecting Cancel Job.
17. Close the opened tabs in the Editor view.
18. Also, contract the ConsistentRegions project.
11-25
SPL Editor
Task 1. Create the operators.
1.
2.
3.
4.
5.
11-26
6.
2.
Adding a consistent region is simple. Add the following line, directly above your
Beacon operator in the editor:
@consistent(trigger = periodic, period = 0.05)
The trigger determines how often a consistent state will be established. It can be
periodic or operator-driven. For operator-driven triggers the interval is
determined by a corresponding parameter in the operator. For example, in the
case of a Beacon, you would need to set triggerCount, which establishes a
consistent region every n iterations.
The period is 0.05. Note that this very short interval is a bit overkill. We are only
using it so you will be able to see the checkpoint operation happening you will
see in a little while.
If you go back to the graphical editor, there should now be a grey bar across the
Beacon and FileSink operators, as well as an @ symbol indicating that there are
annotations for that operator.
11-27
3.
4.
1.
2.
3.
4.
5.
6.
7.
Before running your application we will do a couple things so you can see the
consistent regions in action.
In the Project Explorer, expand Consistent Regions -> Resources
Right click data and select New -> File
Name the file results.txt
Open a terminal.
Run the following command to go to the output directory of the application:
cd
/home/student/StreamsStudio/workspace/ConsistentRegions/da
ta
Now run tailf on the results file so we can watch it being updated.
tailf results.txt
Go back to Streams Studio. Keep this terminal open.
Note that when your Streams instance was first set up, two properties were set:
instance.checkpointRepository
instance.checkpointRepositoryConfiguration
These properties are necessary so that your instance can use consistent
regions. The checkpointRepositoryConfiguration property on your image is set
to use /tmp/streamscheckpoint as the directory for checkpointing. This
directory must already be created in order for the consistent region feature of
this demo to work.
Open a Linux console and cd to the /tmp directory.
cd /tmp
Take a look to see if the streamscheckpoint directory has already been
created within /tmp.
ls -l
11-28
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Look for streamscheckpoint within /tmp. Not there already? If not, then
create the directory and set its permissions to 777.
mkdir /tmp/streamscheckpoint
chmod -R 777 /tmp/streamscheckpoint
ls -l
Confirm the directory is now there and move on to the next step.
Build and run your application. Make sure you are running in Distributed mode.
Open the Streams Explorer tab.
Expand Streams Jobs.
Right-click the job (the most recent topmost one, if you have other un-cancelled
jobs) and select Show Instance Graph.
Bring up the terminal you opened and position it so you can see it alongside the
instance graph.
Once the operator is up and running (The instance graph is all green) you
should see the sequence of numbers being printed in the terminal as they are
written to the file.
Also notice the grey bars in the instance graph will periodically change to
>>>>>> symbols. This indicates a checkpoint for the consistent region is being
established.
Look for a few things to happen as our FileSink runs into its programmed crash
at the 60th tuple:
a) The terminal will stop printing numbers after 59.
b) The instance graph will turn red and alerts will be thrown.
c) As the FileSink operator is relaunched, the grey bars in the graph will change
to <<<<<<, indicating a reset to the last checkpoint.
d) The terminal will resume printing numbers, starting with 60. No tuples have
been lost.
When youe ready, close the terminal and cancel the job by right-clicking it in the
Streams Explorer and selecting Cancel Job.
Close the opened tabs in the Editor view.
Also, contract the ConsistentRegions project.
11-29
: num = IterationCount() ;
}
() as JCP = JobControlPlane()
{
}
() as FileSink_1 = FileSink(Sequence)
{
logic
onTuple Sequence :
{
if(num == (uint64)60 && getRelaunchCount() == 0u)
{
abort() ;
}
}
param
file
: "results.txt" ;
}
}
11-30
Debugging
Debugging
Unit 13 Debugging
Demonstration 1
Debugging
Set breakpoints
Update tuples
Remove tuples
Debugging
Demonstration 1: Debugging
13-36
Unit 13 Debugging
Demonstration 1:
Debugging
Purpose:
You want to work with the debugging capabilities for the InfoSphere Streams
Processing Language. You want to investigate Streams metrics information.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
Task 1. Breakpoints.
1.
2.
3.
4.
5.
6.
7.
You are to add a breakpoint on the output port of your FileSource operator. Add
a tracepoint on the output port of the Functor operator. Then at some point
create an inject point on the input port of the Functor operator.
In the Eclipse Project Explorer view, expand the application project called
Debugging. Then expand Resources.
Open the Main.spl file using the SPL editor. (Right-click Main.spl and select
Open With->SPL Editor.) Once again there is a FileSource operator that
reads from a data file.
Note that the filter parameter that only allows tuples that have a ticker attribute
equal to either IBM or GOOG to be emitted.
You are to use a standalone application for the debug demonstration. You could
use a distributed application as well but for this demonstration, debugging a
standalone application will be easier.
First prepare your project to create a standalone application. Expand
<default_namespace>->Main.
Right-click on Main and select New->Standalone Build.
To run in debug mode, you must build your application with the -g option. Since
you are using the IDE, this can be done automatically. On the Main Standalone dialog, select Debug.
From the Streams Debugger (SDB) drop down, select Debug application
with SDB. Then click OK.
Right-click on Standalone and select Set Active. This rebuilds the application
with the debug option.
13-37
Unit 13 Debugging
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
13-38
Unit 13 Debugging
21. Now set an injection point for the input port of the Filter operator. This creates a
tuple with all attributes set to their default values. (The command starts with a
lower case i. The font below may look like a lowercase l.)
i SelectedStocks i 0
22. Update some of the attribute values as follows. (Note that you must refer to the
probe point that was assigned to the injection point):
u 2 ticker "GOOG"
u 2 volume "10000"
23. Continue the injection point. Enter c 2.
24. Display the tuples in the tracepoint cache. s 1 t. Your new tuple was emitted by
the Functor and was written to the tracepoint cache.
25. Remove the breakpoint and the tracepoint.
r 0
r 1
c *
26. Return to the Eclipse IDE. Terminate your standalone application. (Click on the
red square.)
27. Refresh your data folder and view the contents of result.dat. The second record
is the one for which you changed the ticker from HPQ to IBM. The GOOG tuple
that you deleted is not in the output but the GOOG tuple that you inserted is.
28. Close all of the opened tabs and contract your Debugging project.
Task 2. Metrics.
1.
2.
3.
4.
5.
This section is to allow you to become more familiar with the metrics capabilities
of InfoSphere Streams.
To work with the Streams metrics, you have to be running a distributed
application. Start your Streams instance.
In the Eclipse IDE, click the Streams Explorer tab.
Under Instances right-click default:StreamsInstance@StreamsDomain and
select Start instance if your instance is not already running.
Click the Project Explorer tab.
Build the Main composite in the DebugInstGraph project.
Expand the DebugInstGraph project.
Drill down on DebuginstGraph-><default_namespace->Main.
Right-click Distributed and select Build.
6.
7.
13-39
Unit 13 Debugging
8.
9.
13.
14.
15.
16.
17.
18.
19.
20.
A new dialogue box will open asking you to select an instance to add to the
Metrics. Select default:StreamsInstance@StreamsDomain from the list and click
OK.
In the Metrics view, expand default: ->0:Main->Main. You can view metrics on
a PE basis or on an operator basis. This application has each operator in its
own PE, hence the reason that there are ten operators listed as well as ten PEs.
Expand Aggregate_1 and select Input[0]. you can see the number of tuples
processed, dropped, queued, etc. over time.
Right-click Output[0]:Aggregate_1_out0 and select Show Data.
As it turns out for this simple application, there is only a single attribute in the
Aggregate operators output tuple. Select value:uint64. Then click OK.
In a second or two a new Properties view is opened and the actual value of the
value attribute is being displayed.
Below the Properties view titlebar, click Stop. Then close this Properties view.
Expand PE:0[Heathy]->Output[0]::Beacon_1_out0.
First note that PE:0{Heathy} indicates that PE:0 is running properly.
13-40
Unit 13 Debugging
21. Click on any of the expanded items and you can see the metrics collected at
that particular level.
22. Right-click PE:0[Heathy] and you can see that you can access tracing and log
files.
23. In the Streams Explorer right-click the instance and select Set Service Trace
Levels.
24. From this dialog you can set the logging and tracing levels. You can do so for
the overall instance or just focus on a particular service. Click Cancel.
25. Close both the Metrics and Instance Graph view tabs.
26. Make sure that the Properties tab has the focus.
27. In the Streams Explorer, Drill down to Streams Domains -> StreamsDomain
-> Instances -> default: -> 0:Main. Note the information in the Properties view.
28. Expand 0:Main and drill down on some of the items. Look at the information
that gets displayed for each item in the Properties view. This is another great
source of information that you can use when trying to debug a problem.
29. Right-click 0:Main and select Cancel Job.
30. Right-click default:StreamsInstance@StreamsDomain and select Stop
Instance.
31. Return to the Project Explorer and contract the DebugInstGraph project.
Results:
You want to work with the debugging capabilities for the InfoSphere Streams
Processing Language. You want to investigate Streams metrics information.
13-41
Toolkits
Toolkits
U n i t 1 4 To o l k i t s
Demonstration 1
Utilize the database and data mining toolkits
Toolkits
14-39
U n i t 1 4 To o l k i t s
Demonstration 1:
Utilize the database and data mining toolkits
Purpose:
You will utilize the database and data mining toolkits. You will be introduced
to some of the additional operators that are not part of the SPL Standard
Toolkit. You will use a clustering model to score Stream data that you will
read from a flat file.
Although there are additional toolkits supplied by IBM for InfoSphere Streams, this
demonstration will only deal with the data mining toolkit.
A clustering mining model has been created and exported in a PMML format. That
clustering model will be used to score Stream data that will be read from a flat file.
Note: This exercise has two sets of instructions: one using the SPL Graphical Editor,
and another for coding manually using the SPL Editor, with code solutions posted at the
end.
Estimated time: 30 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
14-40
U n i t 1 4 To o l k i t s
Graphical Editor
Task 1. Data Mining.
1.
2.
3.
4.
Although there are several data mining operators supplied by the Mining Toolkit
for InfoSphere Streams, the demonstration just focuses on the Clustering
operator.
The data mining operators require that you create a scoring model using some
other data mining tool and export that model into a PMML format which is then
referenced in the Streams data mining operator.
This demonstration has a FileSource operator read a file of client banking
information. That client data is scored using a clustering mining model. A
Functor removes some of the attributes from the tuple initially and the results
are written to a file. Then the program will be expanded to have the results
written into a MySQL table.
In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
Name the project Clustering. Click Finish.
Right click on Clustering and select New->SPL Source File. Take the defaults
and click Finish.
To speed things along use a schema that was already created for you. Copy the
file that has the schema definition to your project directory. Open a commandline session. (Click on the red hat icon and select Applications->System
Tools->Terminal.)
14-41
U n i t 1 4 To o l k i t s
5.
6.
14-42
U n i t 1 4 To o l k i t s
7.
8.
9.
Next in the palette area, under Toolkits expand com.ibm.streams.mining>com.ibm.streams.mining.scoring and drag a Clustering operator to the
Main composite. Place it to the right of the FileSource operator.
10. Connect the output port of the FileSource operator to the input port of the
Clustering operator.
11. Update the properties for the Clustering operator.
a) Output Ports:
Once again use the <extends> option to use the previously defined type
called Client when defining the output schema.
Also add two additional attributes to the output schema.
clusterindex - int64
score - float64
b) Param:
model - "/home/labfiles/mining/BankClustering.pmml"
14-43
U n i t 1 4 To o l k i t s
14-44
U n i t 1 4 To o l k i t s
16. Drag a FileSink operator to the Main composite. Connect the output port of the
Functor to the input port of the FileSink.
17. The parameters for the FileSink are:
file - "clusteredresults.dat"
format - txt
18. Save your work.
2.
3.
14-45
U n i t 1 4 To o l k i t s
3.
4.
In the properties for the ODBCAppend operator, click on Param. Specify the
following parameters.
connectionDocument
- "/home/labfiles/mining/connection.xml"
connection
- "Mining"
access
- "ClusterSink"
The unqualified ODBCAppend operator requires its own use statement. But
once again, the SPL Graphical editor supplies the use statement for you.
use com.ibm.streams.db::*;
The key to the ODBCAppend operator is the connection document. A copy of
the file is located at the end of this demonstration. Take a look at it.
The ODBCAppend operator references a connection called Mining. This is
matched to the connection_specification name found in connection.xml. It
specifies the database name, mining, and the userid and password used to
connect to that database.
The access parameter of the ODBCAppend operator is matched to the
access_specification name. It specifies the table to be accessed. It also
specifies which Stream attributes are to be used to insert data into the table.
The order of the attributes in the native_schema matches the order of columns
in the table. This means that the value in the stream attribute, client_id, gets
inserted into the first column in the mining.bankclustering table.
Save your work.
3.
4.
14-46
U n i t 1 4 To o l k i t s
5.
6.
7.
14-47
U n i t 1 4 To o l k i t s
SPL Editor
Task 1. Data Mining.
1.
2.
3.
4.
Although there are several data mining operators supplied by the Mining Toolkit
for InfoSphere Streams, the demonstration just focuses on the Clustering
operator.
The data mining operators require that you create a scoring model using some
other data mining tool and export that model into a PMML format which is then
referenced in the Streams data mining operator.
This demonstration has a FileSource operator read a file of client banking
information. That client data is scored using a clustering mining model. A
Functor removes some of the attributes from the tuple and initially, the results
are written to a file. Then the program will be expanded to have the results
written into a MySQL table.
In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
Name the project Clustering. Click Finish.
Right click on Clustering and select New->SPL Source File. Take the defaults
and click Finish.
To speed things along use a schema that was already created for you. Copy the
file that has the schema definition to your project directory. Open a commandline session. (Click on the red hat icon and select Applications->Systems>Terminal.)
14-48
U n i t 1 4 To o l k i t s
5.
6.
14-49
U n i t 1 4 To o l k i t s
7.
8.
You will have to code a use directive to point to the Clustering operator. Add
the following statement to the beginning of your Main.spl source file.
use com.ibm.streams.mining.scoring::*;
In the Editor view code a FileSource operator:
The output schema - Client
Output stream name - ClientData
The file being read - /home/labfiles/bankcustomers.dat
File format - csv
9.
Next code a Clustering operator that reads the ClientData stream and scores
the tuple data using the /home/labfiles/mining/BankClustering.pmml model.
Then emit a stream called ResultClustering that is comprised of the Client
type, and two additional attributes, clusterindex that is of type int64 and score
that is of type float64.
10. Save your work.
11. The names of the attributes defined in the Client type must be matched to the
attribute names defined in the mining schema. To help you, the following shows
the matching. (CLIENT_ID does not need to be matched since it is a
supplementary attribute.)
model
: "/home/labfiles/mining/BankClustering.pmml";
age
: "AGE";
creditcard
: "BANKCARD";
joined_accounts : "JOINED_ACCOUNTS";
nbr_years_cli
: "NBR_YEARS_CLI";
average_balance : "AVERAGE_BALANCE";
profession
: "PROFESSION";
savings_account : "SAVINGS_ACCOUNT";
online_access
: "ONLINE_ACCESS";
marital_status
: "MARITAL_STATUS";
gender
: "GENDER";
14-50
U n i t 1 4 To o l k i t s
12. Code a Functor that observes the ResultClustering stream and emits a
stream called Shrink with the following attributes.
client_id - rstring
age - rstring
gender - rstring
clusterindex - int64
score - float64
13. Using a FileSink operator, write the stream emitted from the Functor to a file
called clusteredresults.dat, which is to be located in your default data
directory. Write the data out using the txt format.
14. Save your work.
2.
3.
2.
14-51
U n i t 1 4 To o l k i t s
3.
4.
3.
4.
5.
6.
7.
14-52
U n i t 1 4 To o l k i t s
14-53
U n i t 1 4 To o l k i t s
14-54
U n i t 1 4 To o l k i t s
14-55
U n i t 1 4 To o l k i t s
14-56
SPL Functions
SPL Functions
Demonstration 1
Implement a native function and a non-native function
SPL Functions
15-23
Demonstration 1:
Implement a native function and a non-native function
Purpose:
In this demonstration, you will implement an SPL user-defined function and a
function written in C++.
After completing this demonstration, you should be able to:
Describe the concept of an SPL user-defined function
List the steps necessary to implement a C++ user-defined function
Describe how to put a user-defined function in a toolkit and make it generally
accessible.
This demonstration works with both a composite function and a native function. The
composite function raises an integer to some positive integer power. The second native
function determines if a passed value is even or odd.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
1.
2.
3.
4.
This function will calculate an integer raised to some power. We will pass in two
integer values and return an integer value.
The name of the function is power. It will reside in a namespace of SPLType.
Right-click on the SPLFunction and select New->SPL Source File.
Type in a Namespace of SPLType.
Uncheck Generate Main composite.
Change the File name to Power.spl. Click Finish.
Notice that Power.spl was opened in the Editor view but since it is not a Main
composite, there are no graphical capabilities.
15-24
5.
6.
7.
8.
Right-click the editor canvas and select Open with SPL Editor.
The only code displayed should be namespace SPLType;. If it is not there, then
code the following:
namespace SPLType;
Next code the following after the namespace statement:
public int32 power(int32 num, uint32 exp) {
mutable uint32 i = 1;
mutable int32 val = num;
if (exp == 0u)
val = 1;
else
while (i < exp){
val = val * num;
i++;
}
return(val);
}
So what did you just code? The first statement defines a public function called
power that returns an integer value. Two integers are passed when the function
is invoked. Remember that a variable, by default, cannot be changed. To allow
a variable to be changed, it must be defined as mutable. If the exponent integer
is a zero, then you return a value of 1 otherwise the function loops to calculate
the integer raised to the appropriate exponent.
Save your work.
15-25
5.
Save your work. Close the SPL Editor tab for Main.spl. Accept the
replacement of the editor content.
Use the Beacon operator to generate four tuples where each tuple will have the
integer 3 raised to an increasing power through the use of the power function.
6. Drag a Beacon operator to the Main composite.
7. In the Properties for the Beacon operator, select Output Ports.
8. The output schema is:
total - int32
exp - uint32
9. Select Param. Specify iterations with a value of 4u.
10. Select Output. Expand Beacon_1_out0;
The value for total - power(3, (uint32)IterationCount())
The value for exp - (uint32)IterationCount()
11. Drag a Custom operator to the Main composite. Connect the output port of the
Beacon operator to the input port of the Custom operator. (Remember, that the
input port for the Custom operator will "magically" appear.
12. In the Properties for the Custom operator click Input Ports. Note that the
name of the input stream is Beacon_1_out0. (You will need to know the name
of the generated input stream when coding the onTuple clause.
13. Select Logic. Type the following code:
onTuple Beacon_1_out0 : println((rstring)exp + " " +
(rstring)total);
14. Save your work.
15-26
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
15-27
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
This simple function will be passed an unsigned integer and will return a string
that specifies if the integer is even or odd.
In Eclipse create a new SPL project by clicking File->New->Project. Select
SPL Project and click Next.
Name the project CFunction. Click Finish.
Right-click on CFunction and select New->SPL Source File. Take the defaults
and click Finish.
Create a namespace. Right click on CFunction and select New->SPL
Namespace.
Type in a namespace name of CType.MyFunctions. Click on the Folder path
drop-down. Notice that this namespace can be implemented in two different
ways. Select CFunction/Ctype.MyFunctions. Click Finish.
Right-click on CFunction and select New->C++ Native Function.
For Namespace, select CType.MyFunctions.
Give a name of evenodd.
Set the return type to String. Click Next.
Change the Header file to evenodd.h. Leave CPPNamespace blank.
Click Finish.
Expand the CFunction project, then Resources and then impl. Your C++
artifacts resides in the directories under the impl directory. Right click on src and
select New->File. Give it a name of evenodd.cpp and click Finish.
Right click on include and select New->File. Give it a name of evenodd.h. Click
Finish.
In evenodd.h type the following header information:
#ifndef evenodd_H_
#define evenodd_H_
// Define SPL types and functions
#include "SPL/Runtime/Function/SPLFunctions.h"
namespace CType {
namespace MyFunctions {
SPL::rstring evenodd(SPL::uint32 const & num);
}
}
#endif
Save your work.
15-28
15-29
35. Use a Custom operator to print the results. Drag a Custom operator to the
Main composite.
36. Connect the output port of the Functor operator to the input port of the Custom
operator.
37. In the Properties for the Custom operator, click Input Ports. Note the name of
the input stream.
38. In the Properties, click Logic.
39. Type the following: (It assumes that the input stream name is Functor_1_out0 update yours with the name you noted on the Input Ports tab if necessary.)
onTuple Functor_2_out0 : printStringLn((rstring)num + " "
+ kind);
40. Save your work. You will get an error since the evenodd function does not exist.
The evenodd function needs to be compiled and a shared library created. You
need a makefile to do that. Actually, you will use a makefile to build the
application as well. Using a makefile to build the application also allows for the
compilation and linkage of the function at the same time. That way a two-step
process is handled by a single step.
To save you some time, the makefiles are supplied. You can find examples of
these makefiles in the InfoSphere Streams Studio Installation and Users Guide.
41. Here is the gist of what needs to be done. /home/labfiles/CFunction/makefile
needs to be copied to your CFunction application directory. From a Commandline execute the following:
cp /home/labfiles/CFunction/makefile
~/StreamsStudio/workspace/CFunction
42. /home/labfiles/evenodd/makefile needs to be copied to the impl directory that is
under your application directory. From a Command-line execute the following:
cp /home/labfiles/evenodd/makefile
~/StreamsStudio/workspace/CFunction/impl
43. Since you are printing out the results, create a standalone application. Under
<default_namespace> right click on Main and select New->Standalone
Build. Then set it to active.
44. Right-click on the CFunction project and select Configure SPL Build.
45. From the Builder type drop-down, select External builder. Keep the filled in
options and click OK.
46. Next you have to work on your function model. Open Resources>CType.MyFunctions->native.function->function.xml.
47. Expand Native functions->Function Set evenodd.h->Prototypes. Select
Function.
15-30
15-31
15-32
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
Demonstration 1
Code a C++ non-generic primitive operator
16-26
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
Demonstration 1:
Code a C++ non-generic primitive operator
You are to write a primitive operator in C++ that observes tuples on a single
input port. It is to add two integer attributes from the input tuple together and
then multiply the result by an optional parameter value. A new tuple is then
emitted on a single output port.
Most non-generic primitive operators are written to accomplish a particular
task and have known schemas.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
The input schema will be:
num1 - int32
num2 - int32
desc - rstring
The output schema will be:
num1 - int32
num2 - int32
desc - rstring
result - int32
16-27
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
16-28
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
15. First add code that forces this method to a single thread and then define the
input and output tuples. Use the assign method that automatically generates the
code to copy input attribute values to their corresponding output attribute values.
Next do the required calculation and assign the result to the output attribute
called result. Finally submit the output tuple on the first (and only) output port.
To simplify things I suggest that you remove all of the sample code from this
method. Then type the following:
AutoPortMutex apm(_mutex, *this);
IPort0Type & ituple = static_cast<IPort0Type &>(tuple);
OPort0Type otuple;
otuple.assign(ituple);
otuple.set_result((ituple.get_num1() + ituple.get_num2())
* multiply);
submit(otuple, 0);
Note: Because there is an input attribute called num1, you are able to access its
value using a generated function called, ituple.get_num1(). Also, to set the value for
the output attribute result, you use the generated function
otuple.set_result(somevalue).
2.
3.
4.
5.
6.
There should be a CppAddOp.xml tab that is open. Select it. (If it is not, in the
same folder where the skeleton templates are located, right click on
CppAddOp.xml and select Open With->Other. Then select SPL Operator
Model Editor.)
Click on the Properties tab.
In the Operator Model editor, expand Operator->C++. Then select Context.
In the Properties view, scroll through the different property values that can be
modified.
In the Operator Model editor, right click on Context and select New Child.
Listed are additional elements that you can add to the operator model. Close
the menu.
For the Operator Model, select Parameters. In the Properties view, expand
Misc. You can see that you can specify if any parameters are allowed. By
default, this is set to true.
16-29
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Right click on Parameters and select New Child->Parameter. You are now
going to specify properties for a particular parameter.
In the Properties view, scroll down and expand Misc.
In the Name field type in multiplier. (This is why you checked for a parameter
called multiplier in the C++ code.)
Keep Optional as true. This states that this is an optional parameter. Change
the Type to int32. (Once again, this is why you did not have to worry about
checking to see if the value passed was numeric.)
In the Operator Model editor, expand both Input Ports and Output Ports. The
properties for the Port Open Set can be used to specify properties for all input
or output ports. You can right click on either Input Ports or Output Ports and
add properties that will describe particular input and output ports. For this
demonstration, you do not require any changes.
Save your Operator Model.
In the Edit view, click on the Main.spl tab to return to the SPL Graphical editor
for the Main composite.
Drag a FileSource operator to the Main composite. You are to read a file
called, input.data, that is in the default data directory. The format of this file is
csv and each tuple has three attributes, two int32 attributes and one rstring
attribute.
In the Properties view for the FileSource operator, select Output Ports.
Define the following schema:
num1 - int32
num2 - int32
desc - rstring
In the Properties view, select Param.
Add the following parameters:
file - "input.dat"
format - csv
Next add the CppAddOp operator that you just coded to the Main composite. In
the palette area, expand Toolkits->CppPrimitive->MyOperators.Utils. Then
drag CppAddOp to the Main composite.
Connect the output port of the FileSource operator to the input port of the
CppAddOp operator. (Once again this port will "magically" appear.)
Select the CppAddOp operator and in the Properties view, select Param.
Add multiplier parameter. Set the value to 3;
In the Properties view, select Output Ports.
16-30
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
16-31
U n i t 1 6 S P L C + + N o n - G e n e r i c P r i m i t i ve O p e r a t o r s
16-32
Demonstration 1
Code a Java primitive operator
17-23
Demonstration 1:
Code a Java primitive operator
Purpose:
You will implement the same non-generic operator that you did in the
previous demonstration except you will be coding in Java.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
17-24
17-25
4.
5.
8.
9.
10.
11.
12.
13.
To speed things along, copy code from the CppPrimitive project. Open the
Main.spl for the CppPrimitive project.
Right-click the FileSource operator for the CppPrimitive Main composite and
select Copy.
Right-click in the Main composite for the JavaPrimitive project and select
Paste.
In the palette for the JavaPrimitive, expand Toolkits->JavaPrimitive>MyOperators.Utils and drag JavaAddOp to the Main composite.
Connect the output port for the FileSource operator to the input port of the
JavaAddOp operator.
Select the JavaAddOp operator and in the Properties view, select Output
Ports.
Define the following schema for the output port 0:
num1 - int32
num2 - int32
desc - rstring
result - int32
In the Properties view, select Param.
Add the multiplier parameter and set it equal to 3.
Copy the Custom operator from the Main composite of the CppPrimitive
project to the Main composite of the JavaPrimitive project.
Connect the output port of the JavaAddOp operator to the input port of the
Custom operator.
Save your work.
Go through the steps to build a standalone application and set it to be Active.
(Remember that for this demonstration, the Main is under the default
namespace.)
17-26
2.
3.
4.
Copy the input.dat file under the data folder in the CppPrimitive project to the
data folder in the JavaPrimitive project. (Or create a new file and add the
following records.)
25,2,"value should be 81"
30,4,"value should be 102"
Launch your standalone application.
Since the multiplier parameter is optional, you might try removing it to see the
effect.
Close all opened tabs and contract your projects.
Information: Once again you did not have to code a use statement when using the
SPL Graphical editor.
17-27
@PrimitiveOperator(name="JavaAddOp", namespace="MyOperator.Utils",
description="Java Operator JavaAddOp")
@InputPorts({@InputPortSet(description="Port that ingests tuples",
cardinality=1, optional=false, windowingMode=WindowMode.NonWindowed,
windowPunctuationInputMode=WindowPunctuationInputMode.Oblivious),
@InputPortSet(description="Optional input ports", optional=true,
windowingMode=WindowMode.NonWindowed,
windowPunctuationInputMode=WindowPunctuationInputMode.Oblivious)})
@OutputPorts({@OutputPortSet(description="Port that produces tuples",
cardinality=1, optional=false,
windowPunctuationOutputMode=WindowPunctuationOutputMode.Generating),
@OutputPortSet(description="Optional output ports", optional=true,
windowPunctuationOutputMode=WindowPunctuationOutputMode.Generating)})
public class JavaAddOp extends AbstractOperator {
private int multiply = 1;
@Override
public synchronized void initialize(OperatorContext context) throws
Exception {
super.initialize(context);
@Override
public synchronized void process(StreamingInput<Tuple> port, Tuple
tuple) throws Exception {
StreamingOutput<OutputTuple> out = getOutput(0);
OutputTuple outTuple = out.newTuple();
17-28
outTuple.assign(tuple);
outTuple.setInt("result",
multiply * (tuple.getInt("num1") + tuple.getInt("num2")));
out.submit(outTuple);
}
@Parameter(name="multiplier", optional=true)
public void setFilter(int multiply) {
this.multiply = multiply;
}
public int getFilter() {
return multiply;
}
}
Results:
You implemented the same non-generic operator that you did in the previous
demonstration except you coded in Java.
17-29
Demonstration 1
Code a generic primitive operator
18-29
Demonstration 1:
Code a generic primitive operator
Purpose:
You want to implement the C++ non-generic primitive operator as a generic
primitive operator. The operator will have a single input stream and a single
output stream.
You are essentially going to implement the C++ non-generic primitive operator as a
generic primitive operator. The operator is to only have a single input stream and a
single output stream. The assumption is that the input and output schemas match each
other, except for the addition of an integer attribute called result that is added to the
output stream schema. Other than this one attribute, you will not know the names of the
attributes, the number of attributes, nor their types.
There is a subtle change to the operator as well. Instead of adding two attributes and
multiplying the results by a parameter value, add all input integer attributes together and
multiply the result by a parameter value.
Estimated time: 45 minutes
User/Password:
student/ibm2blue
streamsadmin/ibm2blue
root/password
18-30
2.
3.
4.
5.
18-31
6.
7.
18-32
8.
Next is the definition of the output tuple. The print the value of $outTupleType
and follows it by otuple.
<%=$outTupleType%> otuple;
And finally is the initialization of a Perl variable to totInt = 0.
<%my $total = "totInt = 0";
Next begins the iteration through all of the attributes in the input tuple. For each
input tuple you are generating a call to a set method for the corresponding
output tuple and passing the get method of the input tuple. Essentially you are
setting the output tuples to their corresponding input tuple values. You could
have coded the assign method, like was done in the other primitive operator
demonstrations. But this technique allowed you to see something different.
foreach my $attr (@{$inputPort->getAttributes()}) { %>
otuple.set_<%=$attr->getName()%>(ituple.get_<%=$attr>getName()%>());
Also, for each input attribute, you are checking its type to see if it is a 32-bit
integer. If it is, then the get method of that attribute as well as a plus sign is
concatenated to the Perl variable that was initialized earlier. The value of Perl
variable $total eventually will be a statement that adds each input integer
attribute together and placing the result it totInt.
<%if ($attr->getSPLType() eq "int32") {
$total = $total . " + ituple.get_" . $attr>getName()."()";
}
}%>
Finally you print the value of the $total Perl variable.
<%=$total%>;
You set the output attribute, result, equal to the sum of all integer attributes
multiplied by the parameter passed in.
otuple.set_result(totInt*<%=$multiply%>);
And then the tuple is submitted on port 0.
submit(otuple, 0);
Save your work.
18-33
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Once again, the easiest thing to do is to copy operators from the Main.spl in
CppPrimitive to your Main composite in GenPrimitive.
Copy the FileSource operator from the CppPrimitive project into the Main
composite for the GenPrimitive project.
In the SPL Graphical editor palette, expand Toolkits->GenPrimitive>MyOperators.Utils.
Drag the GenAddOp operator to the Main composite.
Connect the output port of the FileSource operator to the input port of the
GenAddOp operator.
Select the GenAddOp operator. In the Properties view, select Output Ports.
Add an output port.
Make this output port schema the same as the input port schema with the
exception that you are going to add an additional attribute. result - int32
(Suggestion is to use the <extends> option and then add an additional
attribute.)
In the Properties view, select Param.
Add the multiplier parameter and set it to 3.
Copy the Custom operator from the CppPrimitive project to the Main
composite of the GenPrimitive project.
Connect the output port of the GenAddOp operator to the input port of the
Custom operator.
Save your work.
Copy the input.dat that is under the data folder in the CppPrimitive project to
the data folder in the GenPrimitive project or create a new file with the
following values:
25,2,"value should be 81"
30,4,"value should be 102"
18-34
14. Do the necessary steps to create a Standalone Build and set it to Active.
(Remember that your Main composite is in the default namespace.)
Task 6. Extra.
1.
Since this is the last lab of the class, I am sure that every student will want to try
additional permutations.
You might try adding additional int32 attributes to your stream (and obviously to
your data as well) and verify that the operator is generic in the regard that it will
add any number of integer attributes and multiply the result by the parameter
value.
18-35
18-36
18-37
IBM Training