Vous êtes sur la page 1sur 6

Running Workflows and Sessions on a Grid

By PenchalaRaju.Yanamala

This chapter includes the following topics:

Running Workflows and Sessions on a Grid Overview


Running Workflows on a Grid
Running Sessions on a Grid
Working with Partition Groups
Grid Connectivity and Recovery
Configuring a Workflow or Session to Run on a Grid
Working with Partition Groups

When you run a session on a grid, the Data Transformation Manager process
(DTM) forms groups of session threads called partition groups. A partition group
is a group of reader, writer, or transformation threads that run in a single DTM
process. A partition group might include one or more pipeline stages. A pipeline
stage is the section of a pipeline executed between any two partition points.
Some transformations are not partitionable across a grid. When a transformation
is not partitionable across a grid, the DTM creates a single partition group for the
transformation threads and runs those threads on a single node.

Forming Partition Groups Without Resource Requirements

If the session has more than one partition, the DTM forms partition groups based
on the partitioning configuration. For example, a session is configured with two
partitions. The DTM creates partition groups for the threads in each partition, and
the Load Balancer distributes the groups to two nodes.

Rules and Guidelines

The Integration Service uses the following rules and guidelines to create partition
groups:

The Integration Service limits the number of partition groups to the number of
nodes in a grid.
When a transformation is partitionable locally, the DTM process forms one
partition group for the transformation threads, and runs that group in one DTM
process. The following transformations are partitioned locally:
-
Custom transformation configured to partition locally
-
External Procedure transformation
-
Cached Lookup transformation
-Unsorted Joiner transformation
-
SDK Reader or Writer transformation configured to partition locally

Working with Caches

The Integration Service creates index and data caches for the Aggregator, Rank,
Joiner, Sorter, and Lookup transformations. When the session contains more
than one partition, the transformation threads may be distributed to more than
one node in the grid. To create a single data and index cache for these
transformation threads, verify that the root directory and cache directory point to
the same location for all nodes in the grid.

When the Integration Service creates a cache for a Lookup transformation in a


shared location, it builds a cache for the first partition group, and subsequent
partition groups use this cache. When you do not configure a shared location for
the Lookup transformation cache files, each service process on a separate node
fetches data from the database or source files to create a cache. If the source
data changes frequently, the caches created on separate nodes can be
inconsistent.

Grid Connectivity and Recovery

When you run a workflow or session on a grid, service processes and DTM
processes run on different nodes. Network failures can cause connectivity loss
between processes running on separate nodes. Services may shut down
unexpectedly, or you may disable the Integration Service or service processes
while a workflow or session is running. The Integration Service failover and
recovery behavior in these situations depends on the service process that is
disabled, shuts down, or loses connectivity. Recovery behavior also depends on
the following factors:

High availability option. When you have high availability, workflows fail over to
another node if the node or service shuts down. If you do not have high
availability, you can manually restart a workflow on another node to recover it.
Recovery strategy. You can configure a workflow to suspend on error. You
configure a recovery strategy for tasks within the workflow. When a workflow
suspends, the recovery behavior depends on the recovery strategy you
configure for each task in the workflow.
Shutdown mode. When you disable an Integration Service or service process,
you can specify that the service completes, aborts, or stops processes running
on the service. Behavior differs when you disable the Integration Service or you
disable a service process. Behavior also differs when you disable a master
service process or a worker service process. The Integration Service or service
process may also shut down unexpectedly. In this case, the failover and
recovery behavior depend on which service process shuts down and the
configured recovery strategy.
Running mode. If the workflow runs on a grid, the Integration Service can
recover workflows and tasks on another node. If a session runs on a grid, you
cannot configure a resume recovery strategy.
Operating mode. If the Integration Service runs in safe mode, recovery is
disabled for sessions and workflows.
Note: You cannot configure an Integration Service to fail over in safe mode if it
runs on a grid.

Configuring a Workflow or Session to Run on a Grid

Before you can run a session or workflow on a grid, the grid must be assigned to
multiple nodes, and the Integration Service must be configured to run on the grid.
You create the grid and assign the Integration Service in the PowerCenter
Administration Console. You may need to verify these settings with the domain
administrator.

To run a workflow or session on a grid, configure the following properties and


settings:

Workflow properties. On the General tab of the workflow properties, assign an


Integration Service to run the workflow. Verify that the Integration Service is
configured to run on a grid.
Session properties. To run a session on a grid, enable the session to run on a
grid in the Config Object tab of the session properties.
Resource requirements. You configure resource requirements on the General
tab of the Session, Command, and predefined Event-Wait tasks.
Related Topics:
Assigning Resources to Tasks

Rules and Guidelines

Use the following rules and guidelines when you configure a session or workflow
to run on a grid:

If you override a service process variable, ensure that the Integration Service
can access input files, caches, logs, storage and temporary directories, and
source and target file directories.
To ensure that a Session, Command, or predefined Event-Wait task runs on a
particular node, configure the Integration Service to check resources and
specify a resource requirement for a the task.
To ensure that session threads for a mapping object run on a particular node,
configure the Integration Service to check resources and specify a resource
requirement for the object.
When you run a session that creates cache files, configure the root and cache
directory to use a shared location to ensure consistency between cache files.
Ensure the Integration Service builds the cache in a shared location when you
add a partition point at a Joiner transformation and the transformation is
configured for 1:n partitioning. The cache for the Detail pipeline must be shared.
Ensure the Integration Service builds the cache in a shared location when you
add a partition point at a Lookup transformation, and the partition type is not
hash auto-keys.
When you run a session that uses dynamic partitioning, and you want to
distribute session threads across all nodes in the grid, configure dynamic
partitioning for the session to use the “Based on number of nodes in the grid”
method.
You cannot run a debug session on a grid.
You cannot configure a resume recovery strategy for a session that you run on
a grid.
Configure the session to run on a grid when you work with sessions that take a
long time to run.
Configure the workflow to run on a grid when you have multiple concurrent
sessions.
You can run a persistent profile session on a grid, but you cannot run a
temporary profile session on a grid.
When you use a Sequence Generator transformation, increase the number of
cached values to reduce the communication required between the master and
worker DTM processes and the repository.
To ensure that the Log Viewer can accurately order log events when you run a
workflow or session on a grid, use time synchronization software to ensure that
the nodes of a grid use a synchronized date/time.
If the workflow uses an Email task in a Windows environment, configure the
same Microsoft Outlook profile on each node to ensure the Email task can run.

Vous aimerez peut-être aussi