Académique Documents
Professionnel Documents
Culture Documents
Eventually
Scroll down
Examine the different behavior of the Close and Stop buttons and
notice the Resume button in the Log view.
Preview the transformation again when the Resume button is active.
Notice the transform produced five times more data and the Dummy
step has to handle a higher load.
Set each Generate rows step to produce 1000 rows or more (no
fields are needed).
Set each Add sequence step to use a different counter name.
Select the second field and click Edit origin step to correct the error
by changing the field type of the ThisIsANumber field to Number.
The listed driver class and URL might be useful for other Pentaho
products.
Enter a directory and filename within the Text File Output step
Now let the transformation run and the target dimension table
dim_cutomer is loaded please verify the contents.
Note: If you attend the Building Analytic Solutions Using Pentaho, you
will need this table later on.
Try to analyze and find the version 2 for a specific row and compare the
contents:
We found two versions because the Text File Input trimmed the spaces
in the postalcode field.
Depending on your used date you should get different versions of your
dimension row for the customer 177 (do a preview on the step) in this
case with a leading space or not (also try leaving the date field empty).
Try to add new rows or change existing ones from your source text file.
Note: A junk dimension is often used to combine simple dimension
attributes, where each unique combination is the distinct key.
Enter a space
here.
Transaction Data
Transaction Data
Create a new Transformation fact_sales_basic.ktr
Add a Table Input step to the canvas
Name: orderdetails
Connection: pentaho_oltp
SQL:
SELECT
ordernumber
,productcode
,quantityordered
,priceeach
,orderlinenumber
FROM orderdetails
Name: orders
Connection: pentaho_oltp
Lookup Table: orders
Enable Cache: Ticked
Cache Size: 1000
Keys
ordernumber | = | ordernumber
Values
orderdate | | | Date
requireddate | | | Date
shippeddate | | | Date
status | | | String
customernumber | | | Integer
Name: dim_time
Connection: pentaho_olap
SQL:
SELECT
timeid
, timedate
FROM dim_time
Name: dim_product
Update the dimension: Unticked
Connection: pentaho_olap
Target Table: dim_product
Cache Size in Rows: 5000
Keys
productcode | productcode
Fields
productvendor | productvendor
Technical Key Field: productid
Checkpoint
Your overall transformation should look like this. Run it in Preview and
ensure you are getting sensible data
Calculations
Add a Calculator Step to the
canvas and connect
Name: totalprice
Fields
New Field: totalprice
Calculation: A * B
Field A: quantityordered
Field B: priceeach
Value Type: Number
Remove: N
Name: fact_sales
Connection: pentaho_olap
Target Table: fact_sales
Truncate Table: Ticked
Optional Lab
Load only 2001 orders by adding a Filter between orders and first time
lookup
05_lab_dim_salesrep.ktr
05_lab_load_budget.ktr
05_lab_manufacturer_report.ktr
06_demo_dim_time.ktr
07_demo_dim_customer_advanced.ktr
stage2.kjb
stage3.kjb
Let the job run and you see the failure and the Send Mail execution:
Lets run and check the numbers: You will see all are not valid, but only
two of them should not be valid so the result is wrong.
If you want: check your own by adding to the Text File.
Can you think of, what the problem is?
Note for version 3.0.1/2: You need to change the meta data (format is
0.00) with a Select values step to show 2 decimal places with the
preview (internally the data is stored correctly), also see PDI-812.
You dont have to enter fields, when you want to use all.
After execution, look at the resulting XML file and you should have
one additional element with the product description like this:
Now select from the menu Edit / Set Enviroment Variables and enter
pentaho_oltp for the variable pentaho_db:
Enter the following for the Generate Rows step (mind to use now a
different database name, pentaho_olap):
Conclusion:
Be careful when testing your variables environment in the design
tool. It can be different from your run-time behavior.
Mostly variables are set at the time when a transformation is
opened.
Setup
Open the following three documents
logging_trans1.ktr
logging_trans2.ktr
logging_job.ktr
Run each of them, to ensure they run with no issues before starting the
lab
You see the line numbers are taken from the last transformation
You can Copy/Past the Log field into a Text editor and see the whole log
Optional labs
You can look at the logs for your job / transformation and compare the
results
You can test on your own with the logging of
Steps
Performance
Logging channels
Our aim it to update the changed prices (msrp field) for the duplicate
fields.
Now use the Step Error Handling logic and add an Update step:
When you execute it, the log shows 2 updated lines and 4 rejected
(should be also 2, its a small bug - PDI-422).
When you have extra time, try to figure out how to check for the
more exact part of the error description like Duplicate entry.
Sort
Add a Sort Step to the canvas
and connect it to source data
Sort by
CUSTOMERNUMBER
ORDERDATE
ordernumber
DAYS_BETWEEN_ORDERS
Setup
Tasks
If the EE Data Integration Server is not started, yet please start
it with []\pdi-ee\start-servers.bat
Check if the Pentaho Enterprise Console is started:
Setup
Tasks
When Spoon is already started:
Please close all open jobs and transformations
Select from the menu: Tools / Repository / Connect
Otherwise start it with []\pdi-ee\launch-designer.bat
You will prompted to connect to a repository
Add a new Enterprise Repository
Setup
Enter an ID (server reference) and name (local reference) for
your repository connection
Import a Transformation
Close the Repository Explorer
Select from the menu: File / Import from an XML file
Select the file 08_lab_fact_sales_basic.ktr
Import a Transformation
Save the transformation in the repository into folder
pdi2000 / dev
Move a Transformation
Start the repository explorer
Drag and drop the transformation from
public/pdi2000/dev to home/joe
Lock a Transformation
Lock the transformation
Add a Log Note
Optional
Joe has locked the transformation
Look at the Lock Notes
Login with Joe and move the transformation to a public folder
Can the test user with Admin rights open and save the locked
transformation?
Can the test user without Admin rights open and save the locked
transformation?
Scheduling
Tasks
Please make sure the EE Data Integration Server is started
Connect to your Enterprise Repository
Open the transformation from the previous lab (in home/joe or
the public folder)
Scheduling
Tasks
Run the transformation
Scheduling
Tasks
Schedule the transformation and select from the menu:
Action / Schedule
Select run Now and repeat every 10 minutes
[otherwise the transformation will not show up when it is
finished and no more schedules are planned]
Scheduling
Tasks
Change to the Schedule perspective
Undo
Redo
Show report options (e.g. Show Grand Totals for Rows)
Toggle Filters (e.g. add a filter for Customercountry)
Toggle Fields
Toggle Layout
[]
Optional Lab
See if you can further improve the
performance by tweaking the
prepost_table step in the
transformation
You will not see any new results on the Step metrics tab
Check the Logging tab and you will see this at the bottom
As assumed only special values entered one of the target partitioned
database tables from one of the slaves.