Vous êtes sur la page 1sur 6

Pipeline Partitions in Bulk Data Movement

Sessions

© 2009 Informatica
Abstract
You can increase the number of pipeline partitions in a bulk data movement session to improve performance. Increasing the
number of partitions allows the PowerCenter Integration Service to process partitions of data concurrently.

Supported Versions
¨ PowerExchange 9.0

Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Partitioning Schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Setting Up Pass-Through Partitioning Without SQL Overrides for Offloaded Data Sources. . . . . . . . . . . . . . . . . 3
Setting Up Pass-Through Partitioning with SQL Overrides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Reading Data into the First Partition Only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Setting Up Key-Range Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Introduction
Using pipeline partitions allows the Integration Service to process partitions of data concurrently.The degree of concurrent
processing depends on the data source and partitioning scheme that you use. For offloaded DB2 unload, VSAM, and
sequential data sets, PowerExchange opens a single connection to the data source and uses multithreaded processing to
read and process the source data in multiple partitions. For other data sources, PowerExchange opens multiple connections
to the data source or reads the source data into a single partition. In this case, you can redistribute the data at a repartition
point to use partitioning in subsequent pipeline stages.

While processing data, the Integration Service might process data out of sequence because of the varying rates at which the
partitions process data.

For more information about pipeline partitioning, see the PowerCenter Advanced Workflow Guide.

2
Partitioning Schemes
The following table summarizes the partitioning schemes that you can use for different types of bulk data sources:

Partitioning Scheme Supported Data Sources Description

Key range All relational bulk data sources Rows from the data source are partitioned based on
key range values.
This partitioning scheme is the recommended one
for the relational data sources.

Pass through partitioning Offloaded DB2 unload, VSAM, and PowerExchange reads the source data once and
without overrides sequential data sets automatically distributes the rows among the
partitions.
PowerExchange ignores the Worker Threads
connection attribute when pass-through partitioning
without overrides is used with these data sources.
This partitioning scheme is the recommended one
for these data sources.

All other nonrelational bulk data sources Data is read into the first partition only. You can
specify round-robin partitioning at a subsequent
partition point to redistribute the data.

Pass through partitioning with All Data for each row is read into a partition based on
overrides the SQL override.
The partitions run independently of each other and
treat each query as an independent PowerExchange
request.

Setting Up Pass-Through Partitioning Without SQL Overrides for


Offloaded Data Sources
Use this procedure to set up pass-through partitioning without SQL overrides for offloaded DB2 unload, VSAM, and
sequential data sets. For this partitioning scheme, PowerExchange reads the source data once and automatically distributes
the rows across the partitions.

To set up pass-through partitioning without SQL overrides for offloaded data sources:

1. In the Workflow Manager, connect to a PowerCenter repository.


2. To configure a PowerExchange application connection, click Connections > Application.
The Application Connection Browser dialog box appears.
3. In the Select Type field, select PWX NRDB Batch, select an object, and click Edit.
4. In the Connection Object Definition dialog box, select one of the following values in the Offload Processing list:
¨ Filter Before. Offload column-level processing to the Integration Service machine, but continue to filter data on
the source system.
¨ Filter After. Offload column-level processing and data filtering to the Integration Service machine.

5. Click OK.
6. In the Task Developer, double-click the session to open the Edit Tasks dialog box.

3
7. On the Properties tab, select one of the following values for the PWX Partition Strategy attribute:
¨ Single Connection. PowerExchange creates a single connection to the data source. Any overrides specified
for the first partition are used for all partitions. With this option, if you specify any overrides for other partitions
that are different from the overrides for the first partition, the session fails with an error message.
¨ Overrides Driven. With this option, if specified overrides are the same for all partitions, PowerExchange
creates a single connection to the data source. If the overrides are not identical for all partitions,
PowerExchange creates multiple connections.
8. On the Mapping tab, click the Partitions view.
9. Select the Source Qualifier transformation, and click Edit Partition Point.
The Edit Partition Point dialog box appears.
10. Click Add for each partition that you want to add.
11. Verify that the Partition Type is Pass Through.
12. Click OK.
13. On the Mapping tab, click the Transformations view.
The Properties area displays the SQL Query attribute for each partition.
14. Verify that the SQL Query attribute is empty or is identical for each partition.
If the queries are not identical, perform the following steps for each partition, in the order shown:
¨ Click the browse button in the SQL Query attribute.

¨ Clear the query in the SQL Editor dialog box, or enter the same query for each partition.

¨ Click OK.

15. Click OK.

Setting Up Pass-Through Partitioning with SQL Overrides


Use this procedure to set up pass-through partitioning with SQL overrides for nonrelational data sources that are not
offloaded. The partitions run independently of each other, treating each query as an independent PowerExchange request.

To set up pass-through partitioning with SQL overrides:

1. In the Task Developer, double-click the session to open the session properties.
2. On the Mapping tab, click the Partitions view.
3. Select the Source Qualifier transformation, and click Edit Partition Point.
The Edit Partition Point dialog box appears.
4. Click Add for each partition that you want to add.
5. Verify that the Partition Type is Pass Through.
6. Click OK.
7. On the Mapping tab, click the Transformations view.
The Properties area displays the SQL Query attribute for each partition.
8. For each partition, click the browse button in the SQL Query field. Then enter the query in the SQL Editor dialog
box, and click OK.
Tip: If you entered a query in the Designer when you configured the Source Qualifier transformation, the query
appears in the SQL Query field for each partition. To override the query, edit it in the SQL Editor dialog box, and
click OK.

4
Reading Data into the First Partition Only
Use this procedure to read data into the first partition only for nonrelational data sources that are not offloaded. You must set
up pass-through partitioning without overrides. With this partitioning scheme, you can use partitioning in subsequent pipeline
stages.

To read data into the first partition only:

1. In the Task Developer, double-click the session to open the session properties.
2. On the Mapping tab, click the Partitions view.
3. Select the Source Qualifier transformation, and click Edit Partition Point.
The Edit Partition Point dialog box appears.
4. Click Add for each partition that you want to add.
5. Verify that the Partition Type is Pass Through.
6. Click OK.
7. On the Mapping tab, click the Transformations view.
The Properties area displays the SQL Query attribute for each partition.
8. Verify that the SQL Query attribute is empty or is identical for each partition.
If the queries are not identical, perform the following steps for each partition, in the order shown:
¨ In the SQL Query field, click the Browse button.

¨ Clear the query in the SQL Editor dialog box, or enter the same query for each partition.

¨ Click OK.

9. On the Mapping tab, click the Partitions view.


10. If you want to add a partition point, click the Add Partition Point icon.
11. If you want to redistribute the rows in a partition that follows the Source Qualifier transformation, click Edit
Partition Point. Then select Round Robin in the Partition Type list, and click OK.
12. Click OK.

Setting Up Key-Range Partitioning


Use this procedure to set up key-range partitioning for relational data sources.

To set up key-range partitioning:

1. In the Task Developer, double-click the session to open the session properties.
2. On the Mapping tab, click the Partitions view.
3. Select the Source Qualifier transformation, and click Edit Partition Point.
The Edit Partition Point dialog box appears.
4. Click Add for each partition that you want to add.
5. Select Key Range in the Partition Type list.
6. In the Edit Partition Key dialog box, select one or more ports for the key, and click OK.
7. For each partition, enter values in the Start Range and End Range boxes.
8. Click OK.

5
Author
Jim Middleton
Principal Technical Writer

Vous aimerez peut-être aussi