Vous êtes sur la page 1sur 281

Lab: Pentaho Data Integration Overview

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Install the Training Platform


Objective: Ensure all students have their technical environment set up
and ready for hands-on labs
Tasks
Unzip the training content (pentahotraining.zip) to the directory
C:\pentahotraining (no spaces in the name)
Verify you have for example the folders labs, mysql and pdi-ee below
your chosen folder after unzipping
Start the MySQL database for the trainings with
[]\mysql\start_mysql.bat
Notes:
Use mysql\stop_mysql.bat to shutdown MySQL when you want to
shutdown your computer
Remember to restart the MySQL database the next morning

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 2

Lab: Install the Training Platform


Tasks
Please start []\pdi-ee\start-servers.bat
Wait until the Pentaho Enterprise Console is started:

Wait until the Data Integration Server is started:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 3

Lab: Install the Training Platform


Tasks
Login into the Pentaho Enterprise Console from your Browser
http://localhost:8088
Login with username admin, password is password
Check Running Data Integration Server only
When you already see the installed PDI license and valid dates you
can continue with the next page and skip this page
Otherwise: Add your
test license file

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 4

Lab: Install the Training Platform


Tasks
You see the installed PDI license and the valid dates.
Scroll down and press OK

Eventually
Scroll down

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 5

Lab: Install the Training Platform


Tasks
You see the following screen of the Pentaho Enterprise Console

The license installation is complete, you can close the Browser

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 6

Lab: Install the Training Platform


Tasks (only when the license installation did not succeed)
Alternative for the license installation:
To install a Pentaho Enterprise Edition Key from the command line
interface, follow the below instructions:
Start a command line interface with CMD
Navigate to the installer directory with
cd []\pdi-ee\license-installer
Run the install_license.bat script with the install switch and the
location and name of your license file:
install_license.bat install []\Pentaho PDI Enterprise Edition.lic
Upon completing this task, you should see a message that says, "The
license has been successfully processed. Thank you."

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 7

Lab: First steps


Objective: Gain experience with Spoon, the general handling of steps
and hops, and running and previewing transformations
Tasks
Please start Spoon with []\pdi-ee\launch-designer.bat
Choose Cancel when prompted to use no repository

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 8

Lab: First steps


Tasks
Create a new transformation

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 9

Lab: First steps


Tasks
Drag the Generate Rows step on the canvas.
Drag the Dummy step on the canvas.
Connect the two steps with a hop.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 10

Lab: First steps


Tasks
Double-click Generate Rows to edit the step.
Change the Limit (number of rows) in Generate Rows to 100,000
Add the field FirstCol as illustrated below.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 11

Lab: First steps


Tasks
Click the Preview button in the step window.
Close the preview window and click OK in the step window.
Select one of your steps and press the Preview on the toolbar.
Note the Stop button appears after clicking Quick launch.

Examine the different behavior of the Close and Stop buttons and
notice the Resume button in the Log view.
Preview the transformation again when the Resume button is active.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 12

Lab: First steps


Tasks
Be sure your transformation is stopped.
Change the number of generated rows in the step to 1,000,000 rows.
Start your transformation and look at the Log view.
Right-click on the Row Generator step and select Change the
number of copies from the context menu.
Set the number of copies of Row Generator step to start to 5.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 13

Lab: First steps


Tasks
Start your transformation and look at the Log view.

Notice the transform produced five times more data and the Dummy
step has to handle a higher load.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 14

Lab: First steps


Tasks
Create five more Dummy steps by using copying and pasting.
Connect the steps to the Generate Row and select Distribute when
prompted.

Start the transformation and look at the Log view.


Notice:
Each Dummy step now takes a part of the work load.
In this case it is the same like changing the number of copies to
start to 5 for the Dummy step.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 15

Lab: Data flow


Objective: Understand the impact of parallel processing
Tasks
Create a new transformation and draw the following on the canvas.

Set each Generate rows step to produce 1000 rows or more (no
fields are needed).
Set each Add sequence step to use a different counter name.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 16

Lab: Data flow


Tasks
Preview the data at the Dummy step (the union of the two data
streams).
Look at the mixed number sequences somewhere in the list (the
position in the list depends on your CPU).

Sort on the column valuename by clicking on the headline to notice


the double values.
Change the counter names of both Add Sequence steps to lab2
and view the result again. (Note: The number of rows to preview
must be equal or more the number of total rows, e.g. 2000 rows
when each step produces 1000 rows.)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 17

Lab: Safe Mode


Objective: Mix two data streams into one stream and get familiar with
the safe mode, error analyzing and fixing
Tasks
Create a new transformation and draw the following on the canvas.

Enter the following fields identical in each Generate Rows step.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 18

Lab: Safe Mode


Tasks
Execute the transformation in Safe Mode and preview the data.
Change field type of the ThisIsANumber field of the Generate Rows
2 step to an Integer.
Execute the transformation in Safe Mode.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 19

Lab: Safe Mode


Tasks
Notice an error will arise and will be shown in your Log view.
Click Show error lines to see the error lines.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 20

Lab: Safe Mode


Tasks
Select Check Selected Steps from the context menu of the Dummy
step to see the same error.
Click Verify in the toolbar to verify the entire transformation.
Check Show successful results in the bottom left hand corner to all
results.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 21

Lab: Safe Mode


Tasks
Choose Show input fields from the context menu of the Dummy step
to view its input fields.
Choose Show output fields from the context menu of the Generate
Rows 2 to view its output fields.

Select the second field and click Edit origin step to correct the error
by changing the field type of the ThisIsANumber field to Number.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 22

Labs: Input and Output

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Create a Database Connection


Objective: Become familiar with creating database connections and
using the database explorer.
Tasks
Create a new transformation.
Choose New from the context menu of the Database connections.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 24

Lab: Create a Database Connection


Enter these properties for the pentaho_oltp database
Make sure to change the port to 3999 and use pentaho as the
password

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 25

Lab: Create a Database Connection


Click the Test button.

Click OK to close the Connection report window.


Click OK to confirm and close the new connection window.
Choose Share in the context menu of the connection.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 26

Lab: Create a Database Connection


Repeat steps to create connection to the pentaho_olap database.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 27

Lab: Create a Database Connection


Click the button Feature List in one of the new connections

The listed driver class and URL might be useful for other Pentaho
products.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 28

Lab: Create a Database Connection


Ensure both connections are shared. They will be used in subsequent
transformations.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 29

Lab: Database Explorer


Objective: To get familiar with the handling of the database explorer.
Tasks
Choose Explore from the context menu of the pentaho_oltp
connection.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 30

Lab: Database Explorer


Preview on the first 100 rows of
the table customer.
If Excel is installed, select,
copy and paste all rows in
Excel.
Look at the other tables to get
familiar with the training data.
Try the other buttons but do not
truncate the table. Model and
Visualize will be covered in
another chapters later on.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 31

Lab: Table Input & Table Output


Objective: Transfer data between another databases. Become familiar
with the SQL generator and field alterations.
Tasks
Draw the following steps and hops on the canvas.

Use the pentaho_oltp connection in the Table Input.


Click the Get SQL select statement button in the Table Input step.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 32

Lab: Table Input & Table Output


Select the table orderdetails and include all fields.

Click OK to confirm and close the window.


Enter the following parameters in the Table output step.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 33

Lab: Table Input & Table Output


Click the SQL button below to generate the needed SQL statement to
generate the new table in the pentaho_olap database.

Ensure the generated SQL is similar the text below.

Click Execute to create the target table.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 34

Lab: Table Input & Table Output


Now close all windows (the Table output step with OK).
Let the transformation run this is the log view after finishing:

Mind the difference between the columns Read/Written and


Input/Output.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 35

Lab: Table Input & Table Output


Change the SQL statement in the Table input step, adding a new field called
total as the calculation quantityordered * priceeach.

Execute the transformation and notice the error.


Click the SQL button in the Table output step to correct the problem.

Execute the generated Table output SQL.


Execute the transformation.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 36

Lab: Text File Output


Objective: To get familiar with the Text File Output step.
Tasks
Ensure the previous transformation is saved.
Choose Save As from the File menu to save it with a new name.
Within Settings of the Transformation notice the changed
transformation name.
Replace the Table Output step by a Text File Output step

Enter a directory and filename within the Text File Output step

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 37

Lab: Text File Output


Select no fields in the Text File Output step and all will be included in
the text file
Execute the transformation
Open the created text file with an editor and examine the content

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 38

Lab: CSV File Input & Insert/Update


Objective: Get familiar with the CSV File Input and Insert/Update steps.
Tasks
Create a new transformation.
Drag the CSV File Input step on the canvas
enter the filename from the previous lab and
change the delimiter to ;

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 39

Lab: CSV File Input & Insert/Update


Click the Get Fields button and check if the output looks similar to this
(depending on your locale, the Decimal and Group characters might be
different)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 40

Lab: CSV File Input & Insert/Update


Add the Insert/Update step to the canvas.
Add a hop between the CSV File Input step and the Insert/Update step.
Set the following properties on the Insert/Update step, press the Get
fields button and select the two key fields like below

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 41

Lab: CSV File Input & Insert/Update


Press the Get update fields button and keep all fields

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 42

Lab: CSV File Input & Insert/Update


In the Insert/Update step, click the SQL button and execute the
generated SQL.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 43

Lab: CSV File Input & Insert/Update


Execute the transformation and the step metrics should look like this

Execute the transformation again and it should look like this

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 44

Lab: CSV File Input & Insert/Update


Change the source data file.
Add a line like the first data line in example below (make sure the
orderlinenumber is 99 to be unique).
And change the quantityordered of another line.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 45

Lab: CSV File Input & Insert/Update


Execute the transformation again with the changed source file.
Notice one output and one updated line.

Execute the transformation again with the same source file.


Notice no outputs and no updates.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 46

Lab: Table Input with Parameters


Objective: Learn how to include a SQL WHERE clause thats dynamically
set by a parameter. E.g. for loading only segments (deltas) of data.
Tasks
Draw the following on the canvas.

Edit the Generate Rows step and change the limit to 1.

Enter the following fields in the Generate Rows step.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 47

Lab: Table Input with Parameters


Edit the Table Input step and enter the following information.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 48

Lab: Table Input with Parameters


Edit the Table Output step and enter the following information.

Click the SQL button and generate the table.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 49

Lab: Table Input with Parameters


Execute the transformation.
Examine the results and the target table.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 50

Optional Lab: Copy Table Wizard


Objective: Get familiar with the Copy Table Wizard.
Tasks
Create a new transformation.
Choose Copy table wizard (not the Copy tabless wizard) from the
Wizard menu.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 51

Optional Lab: Copy Table Wizard


Select a table and examine the results.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 52

Labs: Data Warehouse Steps

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Source to Target Mapping


Objectives: Get familiar with the mapping from the source
(pentaho_oltp) to target database (pentaho_olap).
Tasks:
Please review the database schemas for pentaho_oltp and
pentaho_olap
Create a database mapping

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 54

Lab: Slowly Changing Dimensions


Objectives: Get familiar with the concept of slowly changing dimensions
and the Dimension lookup/update step.
Tasks:
Draw the following on the canvas (steps: Table Input, Dimension
lookup/update)

Load all fields from the customers table:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 55

Lab: Slowly Changing Dimensions


Enter the following in the dim_customers table:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 56

Lab: Slowly Changing Dimensions


Enter the following in the dim_customers table for the fields:

Now let the transformation run and the target dimension table
dim_cutomer is loaded please verify the contents.
Note: If you attend the Building Analytic Solutions Using Pentaho, you
will need this table later on.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 57

Lab: Slowly Changing Dimensions


Now we test the Type I and Type II functionality with the following
tasks:
Save this transformation with a different name and change the target
dimension table to test_dim_customer.
Let it run for an initial load.
Replace the Dimension lookup/update step with a Text File Output and
use only the needed fields:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 58

Lab: Slowly Changing Dimensions


Let it run and the contents of your text file output would look like this:

Now create a new transformation.


Use the Text File Input step and the Dimension Update/Lookup:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 59

Lab: Slowly Changing Dimensions


Let it run and you will see one output row:

Try to analyze and find the version 2 for a specific row and compare the
contents:

We found two versions because the Text File Input trimmed the spaces
in the postalcode field.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 60

Lab: Slowly Changing Dimensions


Change other fields in the source text file and see what happens.
Try also the other possibilities of the Type of dimension update, like
Punch through (change all versions) and Update (change only the latest
version).

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 61

Lab: Slowly Changing Dimensions


Now try to lookup specific versions of your customer dimension.
This could be achieved e.g. by the following:

Enter the following for the Generate Rows properties:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 62

Lab: Slowly Changing Dimensions


Do a dimension lookup for the test_dim_customer Update the
dimension must be unchecked. (Note: This means fields are added to
only the stream and the table is not changed.)

Enter the date field from before:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 63

Lab: Slowly Changing Dimensions


And get the version as an additional lookup field:

Depending on your used date you should get different versions of your
dimension row for the customer 177 (do a preview on the step) in this
case with a leading space or not (also try leaving the date field empty).

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 64

Optional Lab: Junk Dimensions


Objectives: To get familiar with the Combination lookup/update step
and junk dimensions.
Tasks:
Draw the following on the canvas:

One possible use case: Generate a technical key as an id for


customer data entered in a web front end.
For this use case we leave out our existing customernumber.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 65

Optional Lab: Junk Dimensions


Enter the following in
the Combination step:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 66

Optional Lab: Junk Dimensions


Press SQL to generate the table and let the transformation run.
By subsequent executions you will see no output rows after the first
run.
Look at the data and you will see a hashcode for speedy access.

Try to add new rows or change existing ones from your source text file.
Note: A junk dimension is often used to combine simple dimension
attributes, where each unique combination is the distinct key.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 67

Labs: Lookups and Field Transforms

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Load the Table stage_countryterritory


Objectives: Create a dimension table for country to territory mappings
in the staging area.
Tasks overview:
Load all countries from table customers
The country UK must be mapped to U.K.
We have a mapping file country_territory.txt to use.
Japan should be treated different and the territory should also be
Japan.
All unknown mappings should be marked as UNKNOWN
Store this in a dimension table stage_countryterritory

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 69

Lab: Load the Table stage_countryterritory


Detailed tasks example:
(load customer countries is a Table Input step,
country territory.txt is a Text File Output step,
stage_countryterritory is a Dimension lookup/update step)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 70

Lab: Load the Table stage_countryterritory


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 71

Lab: Load the Table stage_countryterritory


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 72

Lab: Load the Table stage_countryterritory


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 73

Lab: Load the Table stage_countryterritory


Detailed tasks example:

Note: Leave the source value blank for a null to mapping.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 74

Lab: Load the Table stage_countryterritory


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 75

Lab: Load the Table stage_countryterritory


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 76

Lab: Load the Table stage_countryterritory


Result example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 77

Lab: Load the Table stage_countryterritory


Remarks:
In this example only the stream lookup makes sense because the
lookup (source) data is not coming from a database.
If you get multiple entries for a country: check if you used the trim()
function in the "load customer countries". Can you explain why
multiple entries could occur in this case?
How can you ensure the stage_countryterritory is actual? Are there
any other ways how you can maintain this? (hint: fact loading)
When this error arises: Data truncation: Data too long for column
'territory' at row 1  change the territory to length 7 (text file input)
Check if there are any unknown values: e.g. make sure UK is mapped
to U.K.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 78

Labs for: Set Transformations

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Build the Dimension dim_salesrep


Objectives: Build the dimension dim_salesrep with a 3 level hierarchy.
Get familiar especially with the steps Join Rows, Merge Join and Sort.
Tasks overview:
Get the president with loading from table employees
(WHERE reportsto is null)
Build the level 1 salesrepname1 from firstname and lastname
In another stream load all employees and build the full name
Build subsequent 3 levels by a Join Rows or Merge Join
At the end do a lookup at table offices and get the fields:
city, state, country, territory
Store this in a dimension table dim_salesrep

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 80

Lab: Build the Dimension dim_salesrep


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 81

Lab: Build the Dimension dim_salesrep


Detailed tasks example:

Enter a space
here.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 82

Lab: Build the Dimension dim_salesrep


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 83

Lab: Build the Dimension dim_salesrep


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 84

Lab: Build the Dimension dim_salesrep


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 85

Lab: Build the Dimension dim_salesrep


Detailed tasks example:
Repeat this for the next levels
Now, add the office information:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 86

Lab: Build the Dimension dim_salesrep


Detailed tasks example:
Store this in the dimension table and rename the fields accordingly

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 87

Lab: Build the Dimension dim_salesrep


Result example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 88

Lab: Build the Dimension dim_salesrep


Remarks:
In this case, Join Rows and Merge Join can be used homogenous.
Both steps need the input to be sorted on the join fields.
Another approach to build this dimension would be from bottom to
top.
We use two table input steps that load from the same table within
the same transformation. Depending on the database this could lead
to locking situations.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 89

Labs for: Pivot Transformations

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Load the Budget from an Excel-file


Objectives: Get familiar with the Row Normaliser step.
Tasks overview:
Load the Excel file manufacturer_budget.xls
Pivot this table for a more database friendly format and store the
month in a separate column.
Write this into the staging table stage_budget

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 91

Lab: Load the Budget from an Excel-file


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 92

Lab: Load the Budget from an Excel-file


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 93

Lab: Load the Budget from an Excel-file


Detailed tasks example:
The source Excel-file looks like this feel free to enter more data.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 94

Lab: Load the Budget from an Excel-file


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 95

Lab: Load the Budget from an Excel-file


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 96

Lab: Load the Budget from an Excel-file


Result example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 97

Lab: Load the Table stage_countryterritory


Remarks:
Not given/entered values can be filtered out.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 98

Optional Lab: Create a Reporting Budget


Objectives: Create an Excel file with the budget and months in the
columns. This is used to get familiar with the Row Denormaliser step.
Tasks overview:
To simulate budget data: Load orders from orders_basic (limited
data set)
Use a database join with orderdetails_basic
Calculate the month from the year
Calculate the budget year by adding 1 to the old year
Sort by budgetyear, productcode
Strip of the not needed fields
Use the Denormalizer step
Write to an Excel file

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 99

Optional Lab: Create a Reporting Budget


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 100

Optional Lab: Create a Reporting Budget


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 101

Optional Lab: Create a Reporting Budget


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 102

Optional Lab: Create a Reporting Budget


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 103

Optional Lab: Create a Reporting Budget


Detailed tasks example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 104

Optional Lab: Create a Reporting Budget


Result example:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 105

Optional Lab: Create a Reporting Budget


Remarks:
Try to replace the Select Values step by a Grouping step. The result
should be the same. Do you know why? (hint: aggregate function in
the Denormalizer step)
Try to leave out the Select Values step (also without the Grouping
step) and look at the result.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 106

Lab: Load Fact Table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Load Fact Table


Objectives: Understand how to build an efficient, streaming
transformation to process large volumes of data doing lookup and
calculations
Tasks overview:
Source data

Transaction Data

Lookup Order data


Dimension Lookups

Time Dimension Lookups

Product Dimension Lookup

Customer Dimension Lookup


Calculations
Prepare values for Insert
Insert into Sales Fact Table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 108

Transaction Data
Create a new Transformation fact_sales_basic.ktr
Add a Table Input step to the canvas
Name: orderdetails
Connection: pentaho_oltp
SQL:
SELECT
ordernumber
,productcode
,quantityordered
,priceeach
,orderlinenumber
FROM orderdetails

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 109

Lookup Order Data


Add a Database Lookup step to
the canvas

Name: orders
Connection: pentaho_oltp
Lookup Table: orders
Enable Cache: Ticked
Cache Size: 1000
Keys
ordernumber | = | ordernumber
Values
orderdate | | | Date
requireddate | | | Date
shippeddate | | | Date
status | | | String
customernumber | | | Integer

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 110

Time Dimension Lookups


Add a Table Input step to the
canvas (DO NOT connect) to
anything

Name: dim_time
Connection: pentaho_olap
SQL:
SELECT
timeid
, timedate
FROM dim_time

Set Data Movement for this step


to Copy

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 111

Time Dimension Lookups (cont)


Add a Stream Lookup field to the canvas and connect both orders and
dim_time to it

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 112

Time Dimension Lookups (cont)


Configure the Stream Lookup as
follows

Name: lookup ordertimeid


Lookup Step: dim_time
Keys
orderdate | timedate
Fields to retrieve
timeid | ordertimeid | Integer

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 113

Time Dimension Lookups (cont)


Repeat this lookup TWO times for
both requireddate and
shippeddate

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 114

Product Dimension Lookup


Add a Dimension lookup / update
to the canvas and configure as
follows

Name: dim_product
Update the dimension: Unticked
Connection: pentaho_olap
Target Table: dim_product
Cache Size in Rows: 5000
Keys
productcode | productcode
Fields
productvendor | productvendor
Technical Key Field: productid

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 115

Customer Dimension Lookup


Add a Dimension lookup / update to
the canvas and configure as follows
Name: dim_customer
Update the dimension: Unticked
Connection: pentaho_olap
Target Table: dim_customer
Cache Size in Rows: 5000
Keys
customernumber|customernumber
Fields
customersalesrepid|customersalesrepid
Technical Key Field: customerid

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 116

Checkpoint
Your overall transformation should look like this. Run it in Preview and
ensure you are getting sensible data

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 117

Calculations
Add a Calculator Step to the
canvas and connect

Name: totalprice
Fields
New Field: totalprice
Calculation: A * B
Field A: quantityordered
Field B: priceeach
Value Type: Number
Remove: N

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 118

Prepare values for Insert


Add a Select Values step to the
canvas

Name: Select the fact columns


Select and Alter
ordernumber
orderlinenumber
status
ordertimeid
requiredtimeid
shippedtimeid
productid
customerid
customersalesrepid
quantityordered
totalprice

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 119

Insert into Sales Fact Table


Add a Table Output step to the
canvas

Name: fact_sales
Connection: pentaho_olap
Target Table: fact_sales
Truncate Table: Ticked

NOTE: This is a full reload of the


fact table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 120

Run the Transformation


The number of records from orderdetails should match the number of
inserts for fact_sales

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 121

Optional Lab
Load only 2001 orders by adding a Filter between orders and first time
lookup

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 122

Lab: Create a Job

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Create a Job


Objectives: Understand how to build a job with transformations,
jobs, and error notifications
Tasks overview:

Create a job stage1


Create a job stage2
Create a job stage3
Create a job create_a_job with subjobs
Run job and review results
Add an Error Email to job
Create an error, test error

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 124

Create job stage1


Create a new job (File -> New -> Job)
Save the job as stage1.kjb
Add a Start entry to the canvas
Add a Transformation entry to the canvas and connect it to the Start

Edit the details (double click)


Configure the name to be 01_demo_dim_product
Click Browser, navigate to 01_demo_dim_product.ktr

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 125

Create job stage1 (cont)


Repeat the addition of a Transformation entry for the following
02_lab_dim_customer_basic.ktr
03_demo_stage_countryterritory_sales_org.ktr
04_lab_stage_countryterritory_geo.ktr

Add a Success step to the end of the Transform and connect


Run the Transformation
Review the Log window to ensure that you are seeing those transformations
are being executed

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 126

Create job stage2


Follow the same steps from stage1 for a new job stage2
Create a new Job, save it as stage2.kjb
Add a Start and a Success to the ends of the transform
Configure the following transformations and connect them

05_lab_dim_salesrep.ktr
05_lab_load_budget.ktr
05_lab_manufacturer_report.ktr

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 127

Create job stage3


Follow the same steps from stage1 for a new job stage3
Create a new Job, save it as stage3.kjb
Add a Start and a Success to the ends of the transform
Configure the following transformations and connect them

06_demo_dim_time.ktr
07_demo_dim_customer_advanced.ktr

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 128

Create job create_a_job


Create a new Job, save it as create_a_job.kjb
Add a Start and a Success to the ends of the transform
Add a Job entry to the canvas and connect it to Start

Configure the Job Filename


Browse (...) to find stage1.kjb

Add two additional Job entries to the canvas for

stage2.kjb
stage3.kjb

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 129

Create job create_a_job (cont)


Add a Transformation entry to this job so we can see you can mix all
types of Job entries
08_lab_fact_sales_basic

Run the Job


Notice the hierarchical job
execution (tree structure)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 130

Create job create_a_job (cont)


Now we add a Mail job entry to send mails on error:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 131

Create job create_a_job (cont)


Start the local mail server in your pentaho training directory with:
start_james.bat
Enter the following parameters in the Mail job entry (replace the mail
address with your mail address and you get an e-mail when you are
connected to the internet).

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 132

Create job create_a_job (cont)


Now enter the server parameters:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 133

Create job create_a_job (cont)


Now enter the message:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 134

Create job create_a_job (cont)


We simulate an error e.g. by changing the first job name to any filename
that does not exist.

Let the job run and you see the failure and the Send Mail execution:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 135

Labs for: Advanced Job Concepts

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Process and Load a Set of Files


Objectives: Get familiar with processing a set of files and setting
variables between jobs and transformation.
Tasks overview:
process all test*.txt files in the actual directory of the job
Convert each of it to Excel.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 137

Lab: Process and Load a Set of Files


Detailed tasks example: Overview

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 138

Lab: Process and Load a Set of Files


Detailed tasks example: Job: Process all files

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 139

Lab: Process and Load a Set of Files


Detailed tasks example: Transformation: Get list of filenames

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 140

Lab: Process and Load a Set of Files


Detailed tasks example: Job: Process one file

For both, uncheck this:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 141

Lab: Process and Load a Set of Files


Detailed tasks example: Transformation: Set FILENAME variable

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 142

Lab: Process and Load a Set of Files


Detailed tasks example: Transformation: Process one file

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 143

Lab: Process and Load a Set of Files


Detailed tasks example: Transformation: Process one file

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 144

Lab: Process and Load a Set of Files


Result example:
Your Job log:

You have two Excel files, e.g.: test1.txt.xls

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 145

Optional Lab: Store the Files in a Table


Objectives: For the Text File input, use an alternative and use
"Accept filenames from previous steps". Thus the transformation could
be simplified without using variables.
Tasks overview:
Store the contents instead in a table test_jobimport.
Strip off all variables and try to simplify your jobs and
transformations.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 146

Labs: Common JavaScript Uses

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Check your Credit Card Number


Objectives: Get familiar with the build-in functions and the general use
of the JavaScript step.
Tasks:
Create a new Transformation.
Draw the following steps on the canvas:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 148

Lab: Check your Credit Card Number


Rename Dummy(do nothing) to valid
Rename Dummy(do nothing) 2 to fraud
In the Text File Input use the file CreditCardNumbers.txt
When you press the Get fields button you get a numercial field,
change the type to String and set the length to 50.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 149

Lab: Check your Credit Card Number


Open the JavaScrip step and use the build-in function LuhnCheck().
The Luhn algorithm was designed to protect against accidental
errors, not malicious attacks. Most credit cards and many
government identification numbers use the algorithm as a simple
method of distinguishing valid numbers from collections of random
digits.
Uncheck the compatibility mode.
Look at the Input Fields and double click on CCNumber
Look at the Special Functions and double click on LuhnCheck
Edit your script
Press on Get variables.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 150

Lab: Check your Credit Card Number


Your JavaScript step could look like this:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 151

Lab: Check your Credit Card Number


Change your Filter step:

Lets run and check the numbers: You will see all are not valid, but only
two of them should not be valid so the result is wrong.
If you want: check your own by adding to the Text File.
Can you think of, what the problem is?

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 152

Lab: Check your Credit Card Number


Try different methods of getting rid of the non digit characters:
replace()
getDigitsOnly()
E.g.: var result=LuhnCheck(getDigitsOnly(CCNumber));
Now the result should look like this and you can split the stream in
valid and fraud correctly.

Warning: If I were you, I wouldn't try to charge anything with them.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 153

Lab (optional): Analyze Accounting Data


Objectives: Get familiar with if statements, string functions and
conversions.
Tasks:
Create a new Transformation.
Load the file Accounting.txt
Process the data by a JavaScript with the following rules
When account number is below 5000 and number starts with a D
for Debit, change the sign and multiply the amount by minus one.
When account number is above or equal 5000 and number starts
with a C for Credit, change the sign and multiply the amount by
minus one.
Use case example: In a balance with assets and liabilities you have
changed signs by definition of the account number.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 154

Lab (optional): Analyze Accounting Data


Check, if your result looks like this and if the decimal amounts are
shown.

Note for version 3.0.1/2: You need to change the meta data (format is
0.00) with a Select values step to show 2 decimal places with the
preview (internally the data is stored correctly), also see PDI-812.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 155

Labs: Using XML

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: XML Enrichment


Objectives: Get familiar with loading XML, adding additional data and
writing it back to XML.
Tasks overview:
Load the file Products.xml
Look up the missing product description from table
pentaho_oltp.products
Write out the file ProductsDesc.xml

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 157

Lab: XML Enrichment


Tasks:
Add the following steps to the canvas

For the Get data from XML step:


Use the Products.xml file
Select the following XPath location:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 158

Lab: XML Enrichment


Continued for the Get data from XML step:
Load the fields by pressing the Get fields button
You need to change the Number to String since there are leading
spaces in the test data within the numeric fields

For the Database Lookup step (other alternatives are possible)


Use the connection pentaho_oltp
and the table products

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 159

Lab: XML Enrichment


Continued for the Database Lookup step
Take the following keys and values:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 160

Lab: XML Enrichment


For the XML Output step
Select the file ProductsDesc.xml
Chose meaningful parent and row XML element names:

You dont have to enter fields, when you want to use all.
After execution, look at the resulting XML file and you should have
one additional element with the product description like this:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 161

Lab: Get Data from XML and XML Join


Objectives: Get familiar with the usage of the Get Data from XML and
XML Join step.
Tasks overview:
Create a XML file with statistics elements for each product like this:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 162

Lab: Get Data from XML and XML Join


Taks:
Look at the sample provided in the labs
Get data from XML and XML Join.ktr
Loot at the step Get data from XML: Products.xml, the XPath and
the Fields would it still work when you change XPath to
/products only? (remember to set the trim type to both
especially for the numeric fields)
Do you understand why Generate rows stream is needed?
Where could you change it, that the same product code is not
included within the Quantities node?

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 163

Labs: Portable Transformations and Jobs

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Database Connections with Variables


Objectives: Get familiar with the use of variables in database
connections. Also see the differences of setting variables from different
points and how to set them for a test environment.
Tasks:
Create a new transformation.
Create a new connection named pentaho_var and use the variable
${pentaho_db} instead the database name (Password: pentaho)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 165

Lab: Database Connections with Variables


Press Test and you will get an error.
Press ok and you see your database connections like this:

Now select from the menu Edit / Set Enviroment Variables and enter
pentaho_oltp for the variable pentaho_db:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 166

Lab: Database Connections with Variables


In your database connection pentaho_var, press Test and you will get an
error again.
So, there is a problem testing a database connection with variables set
by Set Environment Variables
Circumvention: Execute your transformation and enter the right values
in the variables section if necessary:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 167

Lab: Database Connections with Variables


You have no steps defined, so nothing happens.
But: Now test your pentaho_var connection again:

You can also Explore your database.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 168

Lab: Database Connections with Variables


Lets try another ongoing with setting a variable:
Create a new transformation.
Draw the following on the canvas:

Enter the following for the Generate Rows step (mind to use now a
different database name, pentaho_olap):

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 169

Lab: Database Connections with Variables


Enter the following for the Set Variables step:

Let this transformation run.


Move to your first transformation and Explore your pentaho_var
database  you will find out, it is still the old pentaho_oltp and not
pentaho_olap.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 170

Lab: Database Connections with Variables


Make sure to safe your both transformations.
Close Spoon and restart.
Open at first your transformation setting the variable to
pentaho_olap.
Let this run.
Make sure your transformation with the pentaho_var connection is
not open.
Now open your transformation with the pentaho_var connection.
The Test or Explore will fail.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 171

Lab: Database Connections with Variables


Execute your transformation and you will see a new default for the
pentaho_db variable:

After launching you can Test and Explore your pentaho_olap


database correctly.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 172

Lab: Database Connections with Variables


Optional: Test another variables behaviour:
Close Spoon and restart.
Open at first your transformation setting the variable to
pentaho_olap.
Let this run.
Draw a Text File Input step on the canvas and open it.
Check in the list of variables if pentaho_db is in there it is missing:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 173

Lab: Database Connections with Variables


Open a new transformation and do the same:
Draw a Text File Input step on the canvas and open it.
Check in the list of variables if pentaho_db is in there now it is
and it is set correctly:

Conclusion:
Be careful when testing your variables environment in the design
tool. It can be different from your run-time behavior.
Mostly variables are set at the time when a transformation is
opened.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 174

Optional Lab: Shared Connections


Objectives: Get familiar with the use of shared connections.
Tasks:
Create a new transformation.
Define another place for your shared objects file.
Create a new connection and share this.
Create another transformation and define the location from above
for the shared objects file.
Try to find out some behavior about changing und using this shared
connection.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 175

Lab: Configure Logging

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Configure Logging


Objectives: Understand how to configure Jobs and Transformations to log
to a database
Tasks overview:

Create Transformation log table


Configure logging_trans1 to log to table
Run, review results
Configure steps for logging_trans1
Configure logging_trans2 to log to table
Run, review results
Create Job log table
Configure logging_job to log to table
Run, review results

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 177

Setup
Open the following three documents
logging_trans1.ktr
logging_trans2.ktr
logging_job.ktr
Run each of them, to ensure they run with no issues before starting the
lab

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 178

Create Transformation log table


Place the editor on logging_trans1
Edit the Transformation Settings
(Transformation -> Settings)
On the Logging tab configure the
connection and table as follows

Log Connection: pentaho_olap


Log table: pdi_log_trans
Check LOG_FIELD

Click on the SQL button

This DDL will create the table


PDI needs for logging
Click Execute

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 179

Configure logging_trans1 for Logging


Assuming you just completed the previous step, there is no additional
configuration steps needed
Click OK
Use the Database Explorer an the following table in pentaho_olap:
pdi_log_transform
There should be NO entries

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 180

Run, review results


Run the Transformation
Look again at the log table with the database explorer.
You should see one row with your transformation execution

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 181

Run, review results (cont)


Run the Transformation again
You should see a new Batch and one more record

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 182

Configure Steps for logging_trans1


Open the Transformation Settings dialog of logging_trans1 and set the
following steps
READ log step: dummy
WRITE log step: five thousand rows
Run the transformation and you should now see the number of ROWS from
those two steps in the logging table
select * from pdi_transform_log

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 183

Configure logging_trans2 for Logging


Open logging_trans2
Open the Transformation Settings
Dialog
Configure as follows

Write log step 100 rows


Output log step Text File Output
Log Connection: pentaho_olap
Table: pdi_log_trans
Check LOG_FIELD

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 184

Run, review results


Execute logging_trans2
You should see two new entries along with results for LINES_WRITTEN and
LINES_OUTPUT
select * from pdi_transform_log

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 185

Create Job log table


Open logging_job.kjb
Open the Job Settings dialog (Job > Settings)
Configure as follows

Log Connection : pentaho_olap


Log Table: PDI_JOB_LOG
Check Logfield

Click SQL to get the DDL for the


table

Click Execute to create the


table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 186

Run, review results


If you just completed the previous step, you're already done with
configuring this job for logging
Run the job
Look at the results of the Job logging

You see the line numbers are taken from the last transformation
You can Copy/Past the Log field into a Text editor and see the whole log

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 187

Optional labs
You can look at the logs for your job / transformation and compare the
results
You can test on your own with the logging of
Steps
Performance
Logging channels

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 188

Lab: Error Handling within Transformations

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Input / Output Error Handling


Objectives: To get familiar with the Step Error handling and to test this
by replacing the logic of the Input / Output step. (The technique used in
Insert/Update is first to do a lookup and then perform an insert or an
update when needed.)
Tasks:
Create a new transformation.
Draw the following on the canvas:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 190

Lab: Input / Output Error Handling


In the Text File Input:
Load the file test_products.txt
Change the field msrp and set the type to Number and Decimal to .

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 191

Lab: Input / Output Error Handling


In the Table Output step:
Use table test_products and create the table by pressing the button
SQL and add a primary key to the SQL:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 192

Lab: Input / Output Error Handling


Let the transformation run and you get the following error:

Our aim it to update the changed prices (msrp field) for the duplicate
fields.
Now use the Step Error Handling logic and add an Update step:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 193

Lab: Input / Output Error Handling


In the Update step:
Use table test_products and enter the keys and update fields:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 194

Lab: Input / Output Error Handling


For the Table Output step add the Error Handling logic:
Set the Target step to Update. All other information are not
required for our use case.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 195

Lab: Input / Output Error Handling


Now the transformation looks like this:

When you execute it, the log shows 2 updated lines and 4 rejected
(should be also 2, its a small bug - PDI-422).

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 196

Optional: Input / Output Error Handling


Objectives: Get familiar with the error fields.
Tasks:
Add special checking: If the error code is not TOP001, abort the
transformation.
The transformation could like this:

When you have extra time, try to figure out how to check for the
more exact part of the error description like Duplicate entry.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 197

Labs: Calculate Time between Orders

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Calculate Time Between Orders


Objectives: Understand how to use Kettle to
implement a common pattern of elapsed time
between events
Tasks overview:

Source Order Data


Sort
Calculate Previous Date
Calculate Elapsed Time
Select output values
Output to Text File

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 199

Source Order Data


Create a new, blank
transformation
Save it
Add a Table Input to the canvas
Use pentaho_oltp as your source
Use the following SQL to get the
order data
SELECT
ordernumber, orderdate,
customernumber FROM orders

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 200

Sort
Add a Sort Step to the canvas
and connect it to source data
Sort by

CUSTOMERNUMBER
ORDERDATE

NOTE: Refer to ETL Pattern slides


for why this sort ordering was
chosen.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 201

Calculate Previous Date


Add a Analytic Query step to the canvas and connected it to Sort
Use the Group field customernumber
Enter a new field PREV_ORDER_DATE, Subject orderdate, LAG "N"
rows BACKWARD in get Subject and set N to 1

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 202

Calculate Previous Date (cont)


Do a Preview on this Analytic
Query step to make sure you have

NULL values for a customers first


order and their PREVIOUS order
date for all others

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 203

Calculated Elapsed Time


Add a Calculator step to the canvas and connect it to previous
Analytic Query step
Enter the following calculation:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 204

Calculated Elapsed Time (cont)


Do a Preview on the Calculator
step to make sure you have NULL
values for a customers first order
and the proper number of days
from this order to their previous
order

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 205

Select output Columns


Add a Select Values step to the
canvas and connect it to the
previous Calculator step
Select the values we wish to
output to a text file

ordernumber
DAYS_BETWEEN_ORDERS

Preview this step to ensure you


have only these two fields

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 206

Output to Text File


Add a Text File Output step to the canvas and connect to previous
Select Values
Change the Filename of the text file output to be:
${Internal.Transformation.Filename.Directory}/order_date_elapsed_days

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 207

Output to Text File (cont)


On the Fields tab in the Text
File output step hit the Get
Fields button
You should now see the two fields
ordernumber and
DAYS_BETWEEN_ORDERS in the list
of fields to be output to the text
file

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 208

Output to Text File (cont)


Execute the transformation
Using Notepad (or another text
editor)
Navigate to the directory and open
up the file
order_date_elapsed_days.txt
This file can now be used in
subsequent transformations to add
the days_between_orders to the
order fact table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 209

Optional Lab: Calculate Quantity Difference


Change the SQL for source to the
following:
SELECT o.ordernumber,
o.orderdate, o.customernumber,
sum(od.quantityordered) as
quantity from orders o,
orderdetails od where
o.ordernumber = od.ordernumber
group by o.ordernumber,
o.orderdate, o.customernumber

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Modify the calculate days


between orders to include a new
calculation
QUANT_DIFFERENCE
(this quantity prev quantity)
Add this to the text file

US and Worldwide: +1 (866) 660-7555 | Slide 210

Lab: Enterprise Repository

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Enterprise Repository


Objectives: Get comfortable with the handling of the Enterprise
Repository, importing, locking, moving and copying transformations and
security concepts.
Tasks overview:
Start the DI Server
Define and use an Enterprise Repository
Create a public project folder and /dev / test / prod folders
Import an existing transformation from the file system
Move a transformation
Lock the transformation
Create a new user
Login with another user
Optional: Change the user group, copy a transformation, delete and
restore, use another version

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 212

Setup
Tasks
If the EE Data Integration Server is not started, yet please start
it with []\pdi-ee\start-servers.bat
Check if the Pentaho Enterprise Console is started:

Check if the Data Integration Server is started

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 213

Setup
Tasks
When Spoon is already started:
Please close all open jobs and transformations
Select from the menu: Tools / Repository / Connect
Otherwise start it with []\pdi-ee\launch-designer.bat
You will prompted to connect to a repository
Add a new Enterprise Repository

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 214

Setup
Enter an ID (server reference) and name (local reference) for
your repository connection

Log on to the Enterprise Repository by


entering the following credentials:
user name = joe, password = password.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 215

Add a new folder structure


Explore the repository

You see your home directory joe

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 216

Add a new folder structure


Below public: add a new folder pdi2000
To do this right click on public and chose New Folder
Below pdi2000: add new folders dev, test and prod

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 217

Import a Transformation
Close the Repository Explorer
Select from the menu: File / Import from an XML file
Select the file 08_lab_fact_sales_basic.ktr

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 218

Import a Transformation
Save the transformation in the repository into folder
pdi2000 / dev

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 219

Move a Transformation
Start the repository explorer
Drag and drop the transformation from
public/pdi2000/dev to home/joe

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 220

Lock a Transformation
Lock the transformation
Add a Log Note

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 221

Create a new User


Change to the Security tab
Add a new user and assign him the Admin role

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 222

Log in with another user


Close the repository explorer
Close the fact_sales_basic transformation
Disconnect from the repository (Tools / Repository /
Disconnect)
Connect with the new user test
Explore the repository
You can see all other users home directory since you have
Admin rights (also joes home directory)
Change the rights of the test user and delete the Admin
rights
Disconnect / Connect with test user and check your new
permissons (you no longer see other users home
directories, but you see the public folders)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 223

Optional
Joe has locked the transformation
Look at the Lock Notes
Login with Joe and move the transformation to a public folder
Can the test user with Admin rights open and save the locked
transformation?
Can the test user without Admin rights open and save the locked
transformation?

Use the Save as functionality to copy a transformation


Delete a transformation and restore it from the Trash
Select a different version to Open and Restore

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 224

Lab: Scheduling and Monitoring

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Scheduling and Monitoring


Objectives: Get comfortable with the scheduling and monitoring features.
Tasks overview:
Basic Scheduling
Monitoring via Spoon
Monitoring via the Pentaho Enterprise Console

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 226

Scheduling
Tasks
Please make sure the EE Data Integration Server is started
Connect to your Enterprise Repository
Open the transformation from the previous lab (in home/joe or
the public folder)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 227

Scheduling
Tasks
Run the transformation

Check if everything runs fine (without scheduling)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 228

Scheduling
Tasks
Schedule the transformation and select from the menu:
Action / Schedule
Select run Now and repeat every 10 minutes
[otherwise the transformation will not show up when it is
finished and no more schedules are planned]

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 229

Scheduling
Tasks
Change to the Schedule perspective

Look at the details of your transformation

Change back to the Data Integration perspective

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 230

Monitoring within Spoon: Slave Server Monitoring


Tasks
Add a new Slave server (the Data Integration Server to monitor)
Enter the following details (password is password)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 231

Monitoring within Spoon: Slave Server Monitoring


Tasks
Monitor the DI Server

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 232

Monitoring with the Enterprise Console


Tasks
Login into the Pentaho Enterprise Console from your Browser
http://localhost:8088
Login with username admin, password is password
Check Running Data Integration Server only, press OK
Click on Carte Configuration

Enter the following details (password is password)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 233

Monitoring with the Enterprise Console


Tasks
Click on Register

Select Register from Repository and press Add (+)

Enter the following:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 234

Monitoring with the Enterprise Console


Tasks
Select the new repository reference and click on Browse

Select the transformation and click on Register

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 235

Monitoring with the Enterprise Console


Tasks
Click on Monitoring Status

Double click on one of the transformations

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 236

Monitoring with the Enterprise Console


Tasks
After double clicking at one of the transformations:
Have a look at the step metrics and Carte log

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 237

Monitoring with the Enterprise Console


Tasks
Define the logging table in the transformation

Save the transformation


Change to the Enterprise Console
Select the transformation and press Run

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 238

Monitoring with the Enterprise Console


Tasks
You see the status changes to Running

Press on Refresh and wait until status gets back to Waiting


To get nice results you may need to start again
Double click again on the transformation

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 239

Monitoring with the Enterprise Console


Tasks
After double clicking at one of the transformations:
Have a look at the Performance Trend

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 240

Monitoring with the Enterprise Console


Tasks (Optional)
Set the Minimum and Maximum Duration and press Apply, e.g.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 241

Monitoring with the Enterprise Console


Tasks (Optional)
Change to the Status page and see the Alerts when they apply

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 242

Lab: Agile BI and PDI

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Agile BI and PDI


Objectives: Get comfortable with the Agile BI functionality of PDI.
Tasks overview:
Modify the existing fact loading transformation to get more
fields in the output table
Get comfortable working with the Analyzer

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 244

Modify the existing fact loading transformation


Tasks
Please make sure the EE Data Integration Server is started
Connect to your Enterprise Repository
Open the transformation from the previous lab (in home/joe or
the public folder)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 245

Modify the existing fact loading transformation


Tasks
Save this transformation as fact_sales_basic_AgileBI
Open the dim_product step

Modify the fields section and add these fields:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 246

Modify the existing fact loading transformation


Tasks
Open the dim_customer step

Modify the fields section and add these fields:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 247

Modify the existing fact loading transformation


Tasks
Open the select the fact columns step

Modify the fields section and select these fields:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 248

Modify the existing fact loading transformation


Tasks
Open the fact_sales step

Rename the step to fact_sales_agilebi


Change the table name to fact_sales_agilebi
Press SQL to create the target table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 249

Modify the existing fact loading transformation


Tasks
Execute the transformation to fill the new table

After the transformation is finished, click on the Analyzer at the


fact_sales_agilebi table output step:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 250

Work with the Analyzer


Tasks
The Analyzer started up and you can drag and drop the following
fields on the canvas (see the result on the next page):
Productvendor, QUANTITYORDERED

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 251

Work with the Analyzer


Tasks
Drag and drop also the following field on the column (and not
the row): Status

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 252

Work with the Analyzer


Tasks
The result looks like this:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 253

Work with the Analyzer


Tasks
Test to remove the Status again by dragging to the garbage (in
the right bottom of the canvas)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 254

Work with the Analyzer


Tasks
Get familiar with the other functions on your own:

Undo
Redo
Show report options (e.g. Show Grand Totals for Rows)
Toggle Filters (e.g. add a filter for Customercountry)
Toggle Fields
Toggle Layout
[]

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 255

Lab: Constraint and Index Management

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Constraint and Index Management


Objectives: Understand how to configure execute pre and post
processing tasks and how they can improve performance
Tasks overview:
Test Transformation with Indexes enabled during INSERTs
Add Drop Indexes Entry
Add Create Indexes Entry
Test Transformations with Indexes dropped during INSERTs

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 257

Test Transformation with Indexes


Open the Job job.kjb and Transformation transformation.kjb
Notice the transformation does NOTHING except truncate/insert
records on the prepost_table
Run the JOB and write down how long it takes to run (approx 3-4m)

Starting time : <<time here >> Spoon - Starting job...


Ending time : << time here >> Spoon - Job has ended.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 258

Add Drop Indexes Entry


Add a SQL Script Entry to the canvas
Name: drop
Connection: pentaho_olap
SQL:

<< copy and paste from note in job>>

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 259

Add Create Indexes Entry


Add a SQL Script entry to the canvas
Name: create
Connection: pentaho_olap
SQL:

<< copy and paste from note in job>>

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 260

Test Transformation without Indexes


Run the JOB and write down how long it takes to run (approx 2m)
Starting time : <<time here >> Spoon - Starting job...
Ending time : << time here >> Spoon - Job has ended.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 261

Optional Lab
See if you can further improve the
performance by tweaking the
prepost_table step in the
transformation

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 262

Lab: Clustering and Partitioning

2011, Pentaho. All Rights Reserved. www.pentaho.com.

Lab: Basic clustering


Objectives: Understand how to set up and use a basic cluster
Tasks overview:
Set up 3 carte cluster nodes
Run a simple transformation in that cluster

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 264

Lab: Basic clustering


Tasks
Start three instances of carte.bat with the parameter of your local IP
address (localhost is 127.0.0.1) and three different ports or create a
batch file for this. For instance enter this on the command prompt:
start carte.bat 127.0.0.1 8081
start carte.bat 127.0.0.1 8082
start carte.bat 127.0.0.1 8083
After starting the instances, you see the following (example of the
first cluster-node):

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 265

Lab: Basic clustering


Tasks
Create a new transformation and save it
Configure one master and two slave servers as cluster-nodes:
Click on New in the context menu of Slave Server

Create one master server


(Passwort is cluster)

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 266

Lab: Basic clustering


Tasks
Create two slave servers

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 267

Lab: Basic clustering


Tasks
Configure the cluster and select cluster-nodes:
Click on New in the context menu of Kettle cluster schemas
Enter the details and select the slave servers

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 268

Lab: Basic clustering


Tasks
Draw the following steps on the canvas:

Leave the defaults for Generate rows and Add sequence


Change the Table output step to the following

Press SQL to generate the target table

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 269

Lab: Basic clustering


Tasks
Execute the transformation without clustering
Check the result in the target table (valuename is in the range from 1
to 10)
Change the Add sequence step to run clustered
Right click on Add sequence and select Clustering
Select ClusterSchema

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 270

Lab: Basic clustering


Tasks
Execute the transformation in the cluster
(Optionally: Check only Show transformations to see what is running
on the nodes)

You will not see any new results on the Step metrics tab
Check the Logging tab and you will see this at the bottom

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 271

Lab: Basic clustering


Tasks
You could check the results:
Look at the console output of the carte servers
Click on Monitor for one of the slave servers and select the
transformation, for instance:

Open a browser and enter the URL: http://localhost:8081 (for the


master) or http://localhost:8082 or http://localhost:8083 (for the
slaves)
Login with cluster / cluster

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 272

Lab: Basic clustering


Tasks
You will see how many rows are processed by each cluster-node
Also look at the target table now (you will see the sequence 1 to 5 and
1 to 5)
Do you understand why this is different to the non-clustered
execution?
Preparation for the next lab:
Extend the lab to get advanced information about the clustering and
nodes.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 273

Lab: Basic partitioning


Objectives: Understand how to use basic partitioning
Tasks overview:
Run a transformation in a cluster and partition the data that goes into
each slave server by a partitioning scheme.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 274

Lab: Basic partitioning


Tasks
Change the transformation and include the Table output step into the
cluster

Add a Partition schema and enter the following:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 275

Lab: Basic partitioning


Tasks
Change the steps Get Variables and Table output to use the
Partition (select Partitioning in the context menu) and select:

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 276

Lab: Basic partitioning


Tasks
Execute it in the cluster
When you look at the result in the target table you see:
Values 1,3 and 5 of valuename go always to slave2
Values 2 and 4 of valuename go always to slave1
That is different to the distribution before without partitioning
Optional tasks
Create a partitioned database connection and use this in the table
output step

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 277

Lab: Basic partitioning


Optional Tasks
Change the Partitioning schema
Deselect Dynamically create the schema
Press Import Partitions

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 278

Lab: Basic partitioning


Optional Tasks
In the Table Output step
Select the partitioned transformation
Change the target table name(!)
Press SQL (the tables will be created in both databases)
Please observe the Dynamic (Dx2) changed to Partitioned (Px2) on the
canvas

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 279

Lab: Basic partitioning


Optional Tasks
Execute the transformation and check one of the tables (in one
partitioned database pentaho_olap or pentaho_oltp):

As assumed only special values entered one of the target partitioned
database tables from one of the slaves.

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 280

2011, Pentaho. All Rights Reserved. www.pentaho.com.

US and Worldwide: +1 (866) 660-7555 | Slide 281

Vous aimerez peut-être aussi