Vous êtes sur la page 1sur 159

INFORMATICA

PREPARED BY:

Ammar Hasan

CONTENTS
CHAPTER 1: TOOL KNOWLEDGE
1.1 Informatica PowerCenter
1.2 Product Overview
1.2.1 PowerCenter Domain
1.2.2 Administration Console
1.2.3 PowerCenter Repository
1.2.4 PowerCenter Client
1.2.5 Repository Service
1.2.6 INTEGRATION SERVICE
1.2.7 WEB SERVICES HUB
1.2.8 DATA ANALYZER
1.2.9 METADATA MANAGER

CHAPTER 2:
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9

REPOSITORY MANAGER

Adding a Repository to the Navigator


Configuring a Domain Connection
Connecting to a Repository
Viewing Object Dependencies
Validating Multiple Objects
Comparing Repository Objects
Truncating Workflow and Session Log Entries
Managing User Connections and Locks
Managing Users and Groups

2.10 Working with Folders

CHAPTER 3:

DESIGNER

3.1 Source Analyzer


3.1.1 Working with Relational Sources
3.1.2 Working with Flat Files
3.2 Target Designer
3.3 Mappings
3.4 Transformations
3.4.1 Working with Ports
3.4.2 Using Default Values for Ports
3.4.3 User-Defined Default Values
3.5 Tracing Levels
3.6 Basic First Mapping
3.7 Expression Transformation
3.8 Filter Transformation
3.9 Router Transformation

3.10
3.11
3.12
3.13
3.14
3.15
3.16

3.17
3.18
3.19
3.20

3.21

3.22
3.23
3.24
3.25
3.26

3.27
3.28
3.29

Union Transformation
Sorter Transformation
Rank Transformation
Aggregator Transformation
Joiner Transformation
Source Qualifier
Lookup Transformation
3.16.1 Lookup Types
3.16.2 Lookup Transformation Components
3.16.3 Connected Lookup Transformation
3.16.4 Unconnected Lookup Transformation
3.16.5 Lookup Cache Types: Dynamic, Static, Persistent, Shared
Update Strategy
Dynamic Lookup Cache Use
Lookup Query
Lookup and Update Strategy Examples
Example to Insert and Update without a Primary Key
Example to Insert and Delete based on a condition
Stored Procedure Transformation
3.21.1 Connected Stored Procedure Transformation
3.21.2 Unconnected Stored Procedure Transformation
Sequence Generator Transformation
Mapplets: Mapplet Input and Mapplet Output Transformations
Normalizer Transformation
XML Sources Import and usage
Mapping Wizards
3.26.1 Getting Started
3.26.2 Slowly Changing Dimensions
Mapping Parameters and Variables
Parameter File
Indirect Flat File Loading

CHAPTER 4:

WORKFLOW MANAGER

4.1 Informatica Architecture


4.1.1 Integration Service Process
4.1.2 Load Balancer
4.1.3 DTM Process
4.1.4 Processing Threads
4.1.5 Code Pages and Data Movement
4.1.6 Output Files and Caches
4.2 Working with Workflows
4.2.1 Assigning an Integration Service
4.2.2 Working with Links
4.2.3 Workflow Variables
4.2.4 Session Parameters

4.3 Working with Tasks


4.3.1 Session Task
4.3.2 Email Task
4.3.3 Command Task
4.3.4 Working with Event Tasks
4.3.5 Timer Task
4.3.6 Decision Task
4.3.7 Control Task
4.3.8 Assignment Task
4.4 Schedulers
4.5 Worklets
4.6 Partitioning
4.6.1 Partitioning Attributes
4.6.2 Partitioning Types
4.6.3 Some Points
4.7 Session Properties
4.8 Workflow Properties

Chapter 1

Informatica
PowerCenter

CHAPTER 1: TOOL KNOWLEDGE


1.1 INFORMATICA POWERCENTER
Informatica PowerCenter is a powerful ETL tool from Informatica Corporation.
Informatica Corporation products are:
Informatica PowerCenter
Informatica on demand
Informatica B2B Data Exchange
Informatica Data Quality
Informatica Data Explorer

Informatica PowerCenter is a single, unified enterprise data integration


platform for accessing, discovering, and integrating data from virtually any business
system, in any format, and delivering that data throughout the enterprise at any
speed.

Informatica PowerCenter Editions


Because every data integration project is different and includes many variables
such as data volumes, latency requirements, IT infrastructure, and methodologies
Informatica offers three PowerCenter Editions and a suite of PowerCenter Options to
meet your projects and organizations specific needs.
Standard Edition
Real Time Edition
Advanced Edition

Informatica PowerCenter Standard Edition


PowerCenter Standard Edition is a single, unified enterprise data integration platform
for discovering, accessing, and integrating data from virtually any business system,
in any format, and delivering that data throughout the enterprise to improve
operational efficiency.
Key features include:
A high-performance data integration server
A global metadata infrastructure
Visual tools for development and centralized administration
Productivity tools to facilitate collaboration among architects, analysts, and
developers

Informatica PowerCenter Real Time Edition


Packaged for simplicity and flexibility, PowerCenter Real Time Edition extends
PowerCenter Standard Edition with additional capabilities for integrating and
provisioning transactional or operational data in real-time. PowerCenter Real Time
Edition provides the ideal platform for developing sophisticated data services and
delivering timely information as a service, to support all business needs. It provides
the perfect real-time data integration complement to service-oriented architectures,
application integration approaches, such as enterprise application integration (EAI),
enterprise service buses (ESB), and business process management (BPM).
Key features include:
Change data capture for relational data sources
Integration with messaging systems
Built-in support for Web services
Dynamic partitioning with data smart parallelism
Process orchestration and human workflow capabilities

Informatica PowerCenter Real Time Edition


PowerCenter Advanced Edition addresses requirements for organizations that are
standardizing data integration at an enterprise level, across a number of projects and
departments. It combines all the capabilities of PowerCenter Standard Edition and
features additional capabilities that are ideal for data governance and Integration
Competency Centers.
Key features include:
Dynamic partitioning with data smart parallelism
Powerful metadata analysis capabilities
Web-based data profiling and reporting capabilities

Informatica PowerCenter Options


A range of options are available to extend PowerCenters core data integration
capabilities.

Data Cleanse and Match Option features powerful, integrated cleansing and
matching capabilities to correct and remove duplicate customer data.
Data Federation Option enables a combination of traditional physical and virtual
data integration in a single platform.
Data Masking Option protects sensitive, private information by masking it in
flight to produce realistic-looking data, reducing the risk of security and compliance
breaches.

Enterprise Grid Option enhances scalability and delivers optimal performance


while reducing
environments.

the

administrative

overhead

of

supporting

grid

computing

High Availability Option minimizes service interruptions during hardware and/or


software outages and reduces costs associated with data downtime.
Metadata Exchange Options coordinate technical and business metadata from
data modeling tools, business intelligence tools, source and target database catalogs,
and PowerCenter repositories.

Partitioning

Option helps IT organizations maximize their technology


investments by enabling hardware and software to jointly scale to handle large
volumes of data and users.
Pushdown Optimization Option enables data transformation processing,

where appropriate, to be pushed down into any relational database to make the
best use of existing database assets.

Team-Based

Development
Option
facilitates
collaboration
among
development, quality assurance, and production administration teams and across
geographically disparate teams.

Unstructured Data Option expands PowerCenters data access capabilities to


include unstructured data formats, providing virtually unlimited access to all
enterprise data formats.

1.2 PRODUCT OVERVIEW


PowerCenter provides an environment that allows you to load data into a centralized
location, such as a data warehouse or operational data store (ODS). You can extract
data from multiple sources, transform the data according to business logic you build
in the client application, and load the transformed data into file and relational
targets.
PowerCenter also provides the ability to view and analyze business information and
browse and analyze metadata from disparate metadata repositories.
PowerCenter includes the following components:
PowerCenter domain
Administration Console
PowerCenter repository
PowerCenter Client
Repository Service
Integration Service
Web Services Hub
SAP BW Service
Data Analyzer
Metadata Manager
PowerCenter Repository Reports

1.2.1 POWERCENTER DOMAIN:


PowerCenter has a service-oriented architecture that provides the ability to scale
services and share resources across multiple machines. PowerCenter provides the
PowerCenter domain to support the administration of the PowerCenter services. A
domain is the primary unit for management and administration of services in
PowerCenter.
A domain contains the following components:
One or more nodes
Service Manager
Application services

One or more nodes: A node is the logical representation of a machine in a


domain. A domain may contain more than one node. The node that hosts the domain
is the master gateway for the domain. You can add other machines as nodes in the
domain and configure the nodes to run application services, such as the Integration
Service or Repository Service. All service requests from other nodes in the domain
go through the master gateway.

Service Manager: The Service Manager is built in to the domain to support the
domain and the application services. The Service Manager runs on each node in the
domain. The Service Manager starts and runs the application services on a machine.

The Service Manager performs the following functions:


Alerts: Provides notifications about domain and service events.
Authentication: Authenticates user requests
Authorization: Authorizes user requests for services.
Domain configuration: Manages domain configuration metadata.
Node configuration: Manages node configuration metadata.
Licensing: Registers license information and verifies license information
Logging: Provides accumulated log events from each service in the domain.

Application services: A group of services that represent PowerCenter serverbased

functionality.
Repository Service: Manages connections to the PowerCenter repository.
Integration Service: Runs sessions and workflows.
Web Services Hub: Exposes PowerCenter functionality to external clients
through web services.
SAP BW Service: Listens for RFC requests from SAP NetWeaver BW and
initiates workflows to extract from or load to SAP BW.

1.2.2 ADMINISTRATION CONSOLE


The Administration Console is a web application that we use to manage a
PowerCenter domain. If we have a user login to the domain, you can access the
Administration Console. Domain objects include services, nodes, and licenses.
Use the Administration Console to perform the following tasks in the domain:
Manage application services: Manage all application services in the domain, such
as the Integration Service and Repository Service.
Configure nodes: Configure node properties, such as the backup directory and
resources. We can also shut down and restart nodes.
Manage domain objects: Create and manage objects such as services, nodes,
licenses, and folders. Folders allow you to organize domain objects and to manage
security by setting permissions for domain objects.
View and edit domain object properties: You can view and edit properties for all
objects in the domain, including the domain object.
View log events: Use the Log Viewer to view domain, Integration Service, SAP BW
Service, Web Services Hub, and Repository Service log events.
Other domain management tasks include applying licenses, managing grids and
resources, and configuring security.

1.2.3 POWERCENTER REPOSITORY


The PowerCenter repository resides in a relational database. The repository database
tables contain the instructions required to extract, transform, and load data and
store administrative information such as user names, passwords, permissions, and
privileges. PowerCenter applications access the repository through the Repository
Service.
We administer the repository using the Repository Manager Client tool, the
PowerCenter Administration Console, and command line programs.

Global repository: The global repository is the hub of the repository domain. Use
the global repository to store common objects that multiple developers can use
through shortcuts. These objects may include operational or Application source
definitions, reusable transformations, mapplets, and mappings.
Local repositories: A local repository is any repository within the domain that is
not the global repository. Use local repositories for development. From a local
repository, you can create shortcuts to objects in shared folders in the global
repository. These objects include source definitions, common dimensions and
lookups, and enterprise standard transformations. You can also create copies of
objects in non-shared folders.
PowerCenter supports versioned repositories. A versioned repository can store
multiple versions of an object. PowerCenter version control allows you to efficiently
develop, test, and deploy metadata into production.

1.2.4 POWERCENTER CLIENT


The PowerCenter Client consists of the following applications that we use to manage
the repository, design mappings, mapplets, and create sessions to load the data:
Designer
Data Stencil
Repository Manager
Workflow Manager
Workflow Monitor

Designer:
Use the Designer to create mappings that contain transformation instructions for the
Integration Service.
The Designer has the following tools that you use to analyze sources, design target
schemas, and build source-to-target mappings:
Source Analyzer: Import or create source definitions.
Target Designer: Import or create target definitions.
Transformation Developer: Develop transformations to use in mappings.
You can also develop user-defined functions to use in expressions.
Mapplet Designer: Create sets of transformations to use in mappings.
Mapping Designer: Create mappings that the Integration Service uses to
extract, transform, and load data.

Data Stencil
Use the Data Stencil to create mapping template that can be used to generate
multiple mappings. Data Stencil uses the Microsoft Office Visio interface to create
mapping templates. Not used by a developer usually.

Repository Manager
Use the Repository Manager to administer repositories. You can navigate through
multiple folders and repositories, and complete the following tasks:

Manage users and groups: Create, edit, and delete repository users and
user groups. We can assign and revoke repository privileges and folder
permissions.

Perform folder functions: Create, edit, copy, and delete folders. Work
we perform in the Designer and Workflow Manager is stored in folders. If we
want to share metadata, you can configure a folder to be shared.

View metadata: Analyze sources, targets, mappings, and shortcut


dependencies, search by keyword, and view the properties of repository
objects.

We create repository objects using the Designer and Workflow Manager Client tools.
We can view the following objects in the Navigator window of the Repository
Manager:
Source definitions: Definitions of database objects (tables, views, synonyms) or
files that provide source data.
Target definitions: Definitions of database objects or files that contain the target
data.
Mappings: A set of source and target definitions along with transformations
containing business logic that you build into the transformation. These are the
instructions that the Integration Service uses to transform and move data.
Reusable transformations: Transformations that we use in multiple mappings.
Mapplets: A set of transformations that you use in multiple mappings.
Sessions and workflows: Sessions and workflows store information about how and
when the Integration Service moves data. A workflow is a set of instructions that
describes how and when to run tasks related to extracting, transforming, and loading
data. A session is a type of task that you can put in a workflow. Each session
corresponds to a single mapping.

Workflow Manager
Use the Workflow Manager to create, schedule, and run workflows. A workflow is a
set of instructions that describes how and when to run tasks related to extracting,
transforming, and loading data.
The Workflow Manager has the following tools to help us develop a workflow:
Task Developer: Create tasks we want to accomplish in the workflow.
Worklet Designer: Create a worklet in the Worklet Designer. A worklet is an object
that groups a set of tasks. A worklet is similar to a workflow, but without scheduling
information. We can nest worklets inside a workflow.
Workflow Designer: Create a workflow by connecting tasks with links in the
Workflow Designer. You can also create tasks in the Workflow Designer as you
develop the workflow.
When we create a workflow in the Workflow Designer, we add tasks to the workflow.
The Workflow Manager includes tasks, such as the Session task, the Command task,
and the Email task so you can design a workflow. The Session task is based on a
mapping we build in the Designer.
We then connect tasks with links to specify the order of execution for the tasks we
created. Use conditional links and workflow variables to create branches in the
workflow.

Workflow Monitor
Use the Workflow Monitor to monitor scheduled and running workflows for each
Integration Service.
We can view details about a workflow or task in Gantt Chart view or Task view. We
can run, stop, abort, and resume workflows from the Workflow Monitor. We can view
sessions and workflow log events in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow
Monitor continuously receives information from the Integration Service and
Repository Service. It also fetches information from the repository to display historic
information.

1.2.5 REPOSITORY SERVICE


All repository client applications access the repository database tables through the
Repository Service. The Repository Service protects metadata in the repository by
managing repository connections and using object-locking to ensure object
consistency. The Repository Service also notifies us when another user modifies or
deletes repository objects we are using.
Each Repository Service manages a single repository database. We can configure a
Repository Service to run on multiple machines, or nodes, in the domain. Each
instance running on a node is called a Repository Service process. This process
accesses the database tables and performs most repository-related tasks.
The Repository Service uses native drivers to communicate with the repository
database.
A repository domain is a group of repositories that you can connect to
simultaneously in the PowerCenter Client. They share metadata through a
special type of repository called a global repository.
The Repository Service is a separate, multi-threaded process that retrieves, inserts,
and updates metadata in the repository database tables. The Repository Service
ensures the consistency of metadata in the repository. A Repository Service process
is an instance of the Repository Service that runs on a particular machine, or node.
The Repository Service accepts connection requests from the following applications:

PowerCenter Client: Use the Designer and Workflow Manager to create and
store mapping metadata and connection object information in the repository. Use the
Workflow Monitor to retrieve workflow run status information and session logs
written by the Integration Service. Use the Repository Manager to organize and
secure metadata by creating folders, users, and groups.

Command line programs pmrep and infacmd: Use pmrep to perform


repository metadata administration tasks, such as listing repository objects or
creating and editing users and groups. Use infacmd to perform service-related
functions, such as creating or removing a Repository Service.

Integration Service (IS): When we start the IS, it connects to the repository to
schedule workflows. When we run a workflow, the IS retrieves workflow task and
mapping metadata from the repository. IS writes workflow status to the repository.
Web Services Hub: When we start the Web Services Hub, it connects to the
repository to access web-enabled workflows. The Web Services Hub retrieves
workflow task and mapping metadata from the repository and writes workflow status
to the repository.
SAP BW Service: Listens for RFC requests from SAP NetWeaver BW and initiates
workflows to extract from or load to SAP BW.

We install the Repository Service when we install PowerCenter Services. After we


install the PowerCenter Services, we can use the Administration Console to manage
the Repository Service.

Repository Connectivity:
PowerCenter applications such as the PowerCenter Client, the Integration Service,
pmrep, and infacmd connect to the repository through the Repository Service.

The following process describes how a repository client application connects to the
repository database:
1) The repository client application sends a repository connection request to the
master gateway node, which is the entry point to the domain. This is node B
in the diagram.
2) The Service Manager sends back the host name and port number of the node
running the Repository Service. If you have the high availability option, you
can configure the Repository Service to run on a backup node. Node A in
above diagram.
3) The repository client application establishes a link with the Repository Service
process on node A. This communication occurs over TCP/IP.

4) The Repository Service process communicates with the repository


database and performs repository metadata transactions for the client
application.

Understanding Metadata
The repository stores metadata that describes how to extract, transform, and load
source and target data. PowerCenter metadata describes several different kinds of
repository objects. We use different PowerCenter Client tools to develop each kind of
object.
If we enable version control, we can store multiple versions of metadata objects in
the repository.
We can also extend the metadata stored in the repository by associating information
with repository objects. For example, when someone in our organization creates a
source definition, we may want to store the name of that person with the source
definition. We associate information with repository metadata using metadata
extensions.

Administering Repositories
We use the PowerCenter Administration Console, the Repository Manager, and the
pmrep and infacmd command line programs to administer repositories.

Back up repository to a binary file


Restore repository from a binary file
Copy repository database tables
Delete repository database tables
Create a Repository Service
Remove a Repository Service
Create folders to organize metadata
Add repository users and groups
Configure repository security

1.2.6 INTEGRATION SERVICE


The Integration Service reads workflow information from the repository. The
Integration Service connects to the repository through the Repository Service to
fetch metadata from the repository.
A workflow is a set of instructions that describes how and when to run tasks related
to extracting, transforming, and loading data. The Integration Service runs workflow
tasks. A session is a type of workflow task. A session is a set of instructions that
describes how to move data from sources to targets using a mapping.
It extracts data from the mapping sources and stores the data in memory while it
applies the transformation rules that you configure in the mapping. The Integration
Service loads the transformed data into the mapping targets.
The Integration Service can combine data from different platforms and source types.
For example, you can join data from a flat file and an Oracle source. The Integration
Service can also load data to different platforms and target types.

1.2.7 WEB SERVICES HUB


The Web Services Hub is a web service gateway for external clients. It processes
SOAP requests from web service clients that want to access PowerCenter
functionality through web services. Web service clients access the Integration
Service and Repository Service through the Web Services Hub.
When we install PowerCenter Services, the PowerCenter installer installs the Web
Services Hub.
The Web Services Hub hosts the following web services:

Batch web services: Run and monitor web-enabled workflows.


Real-time web services: Create service workflows that allow you to read
and write messages to a web service client through the Web Services Hub.

This is not used by Informatica Developer normally and not in scope of our
training.

1.2.8 DATA ANALYZER


PowerCenter Data Analyzer provides a framework to perform business analytics on
corporate data. With Data Analyzer, we can extract, filter, format, and analyze
corporate information from data stored in a data warehouse, operational data store,
or other data storage models. Data Analyzer uses a web browser interface to view
and analyze business information at any level.
Data Analyzer extracts, filters, and presents information in easy-to-understand
reports. We can use Data Analyzer to design, develop, and deploy reports and set up
dashboards and alerts to provide the latest information to users at the time and in
the manner most useful to them.
Data Analyzer has a repository that stores metadata to track information about
enterprise metrics, reports, and report delivery. Once an administrator installs Data
Analyzer, users can connect to it from any computer that has a web browser and
access to the Data Analyzer host.
This is a different tool and is out of scope for our training.

1.2.9 METADATA MANAGER


PowerCenter Metadata Manager is a metadata management tool that you can use to
browse and analyze metadata from disparate metadata repositories. Metadata
Manager helps us understand and manage how information and processes are
derived, the fundamental relationships between them, and how they are used.
Metadata Manager uses Data Analyzer functionality. We can use the embedded Data
Analyzer features to design, develop, and deploy metadata reports and dashboards.
Metadata Manager uses PowerCenter workflows to extract metadata from source
repositories and load it into a centralized metadata warehouse called the Metadata
Manager Warehouse.
This is a different tool and is out of scope for our training.

Chapter 2

Repository
Manager

CHAPTER 2:

REPOSITORY MANAGER

We can navigate through multiple folders and repositories and perform basic
repository tasks with the Repository Manager. This is an administration tool and
used by Informatica Administrator.

Repository Manager Tasks:

Add domain connection information


Add and connect to a repository
Work with PowerCenter domain and repository connections
Search for repository objects or keywords
View object dependencies
Compare repository objects
Truncate session and workflow log entries
View user connections
Release locks
Exchange metadata with other business intelligence tools

Add a repository to the Navigator, and then configure the domain


connection information when we connect to the repository.

2.1 Adding a Repository to the Navigator


1. In any of the PowerCenter Client tools, click Repository > Add.

2. Enter the name of the repository and a valid repository user name.
3. Click OK.
Before we can connect to the repository for the first time, we must configure the
connection information for the domain that the repository belongs to.

2.2 Configuring a Domain Connection


1. In a PowerCenter Client tool, select the Repositories node in the Navigator.
2. Click Repository > Configure Domains to open the Configure Domains dialog
box.

3. Click the Add button. The Add Domain dialog box appears.
4. Enter the domain name, gateway host name, and gateway port number.
5. Click OK to add the domain connection.

2.3 Connecting to a Repository


1. Launch a PowerCenter Client tool.
2. Select the repository in the Navigator and click Repository > Connect, or
double-click the repository.
3. Enter a valid repository user name and password.
4. Click Connect.
Click on more button to add, change or view domain information.

2.4 Viewing Object Dependencies


Before we change or delete repository objects, we can view dependencies to see the
impact on other objects. For example, before you remove a session, we can find out
which workflows use the session. We can view dependencies for repository objects in
the Repository Manager, Workflow Manager, and Designer tools.
Steps:
1. Connect to the repository.
2. Select the object of use in navigator.
3. Click Analyze and Select the dependency we want to view.

2.5 Validating Multiple Objects


We can validate multiple objects in the repository without fetching them into the
workspace. We can save and optionally check in objects that change from invalid to
valid status as a result of the validation. We can validate sessions, mappings,
mapplets, workflows, and worklets.
Steps:
1.
2.
3.
4.
5.

Select the objects you want to validate.


Click Analyze and Select Validate
Select validation options from the Validate Objects dialog box
Click Validate.
Click a link to view the objects in the results group.

2.6 Comparing Repository Objects


We can compare two repository objects of the same type to identify differences
between the objects. For example, we can compare two sessions to check for
differences. When we compare two objects, the Repository Manager displays their
attributes.
Steps:
1.
2.
3.
4.

In the Repository Manager, connect to the repository.


In the Navigator, select the object you want to compare.
Click Edit > Compare Objects.
Click Compare in the dialog box displayed.

2.7 Truncating Workflow and Session Log Entries


When we configure a session or workflow to archive session logs or workflow logs,
the Integration Service saves those logs in local directories. The repository also
creates an entry for each saved workflow log and session log. If we move or delete a
session log or workflow log from the workflow log directory or session log directory,
we can remove the entries from the repository.
Steps:
1. In the Repository Manager, select the workflow in the Navigator window or in
the Main window.
2. Choose Edit > Truncate Log. The Truncate Workflow Log dialog box appears.
3. Choose to delete all workflow and session log entries or to delete all workflow
and session log entries with an end time before a particular date.
4. If you want to delete all entries older than a certain date, enter the date and
time.
5. Click OK.

2.8 Managing User Connections and Locks


In the Repository Manager, we can view and manage the following items:
Repository object locks: The repository locks repository objects and folders by
user. The repository creates different types of locks depending on the task. The
Repository Service locks and unlocks all objects in the repository.
User connections: Use the Repository Manager to monitor user connections to the
repository. We can end connections when necessary.
Types
1.
2.
3.

of locks created:
In-use lock: Placed on objects we want to view
Write-intent lock: Placed on objects we want to modify.
Execute lock: Locks objects we want to run, such as workflows and sessions

Steps:
1.
2.
3.
4.

Launch the Repository Manager and connect to the repository.


Click Edit > Show User Connections or Show locks
The locks or user connections will be displayed in a window.
We can do the rest as per our need.

2.9 Managing Users and Groups


1.
2.
3.
4.
5.
6.

In the Repository Manager, connect to a repository.


Click Security > Manage Users and Privileges.
Click the Groups tab to create Groups. or
Click the Users tab to create Users
Click the Privileges tab to give permissions to groups and users.
Select the options available to add, edit, and remove users and groups.

There are two default repository user groups:


Administrators: This group initially contains two users that are created by default.
The default users are Administrator and the database user that created the
repository. We cannot delete these users from the repository or remove them from
the Administrators group.
Public: The Repository Manager does not create any default users in the Public
group.

2.10 Working with Folders


We can create. Edit or delete folder as per our need.
1. In the Repository Manager, connect to a repository.
2. Click Folder > Create.
Enter the following information:

3. Click ok.

Chapter 3

Designer

CHAPTER 3: DESIGNER
The Designer has tools to help us build mappings and mapplets so we can specify
how to move and transform data between sources and targets. The Designer helps
us create source definitions, target definitions, and transformations to build the
mappings.
The Designer lets us work with multiple tools at one time and to work in multiple
folders and repositories at the same time. It also includes windows so we can view
folders, repository objects, and tasks.

Designer Tools:

Source Analyzer: Use to import or create source definitions for flat file, XML,
COBOL, Application, and relational sources.
Target Designer: Use to import or create target definitions.
Transformation Developer: Use to create reusable transformations.
Mapplet Designer: Use to create mapplets.
Mapping Designer: Use to create mappings.

Designer Windows:

Navigator: Use to connect to and work in multiple repositories and folders.


Workspace: Use to view or edit sources, targets, mapplets, transformations,
and mappings.
Status bar: Displays the status of the operation we perform.
Output: Provides details when we perform certain tasks, such as saving work
or validating a mapping
Overview: An optional window to simplify viewing workbooks containing
large mappings or a large number of objects.
Instance Data: View transformation data while you run the Debugger to
debug a mapping.
Target Data: View target data while you run the Debugger to debug a
mapping.

Overview Window

Designer Windows
Designer Tasks:

Add a repository.
Print the workspace.
View date and time an object was last saved.
Open and close a folder.
Create shortcuts.
Check out and in repository objects.
Search for repository objects.
Enter descriptions for repository objects.
View older versions of objects in the workspace.
Revert to a previously saved object version.
Copy objects.
Export and import repository objects.
Work with multiple objects, ports, or columns.
Rename ports.
Use shortcut keys.

3.1 SOURCE ANALYZER


In Source Analyzer, we define the source definitions that we will use in a mapping.
We can either import a source definition or manually create the definition.
We can import or create the following types of source definitions in the Source
Analyzer:

Relational tables, views, and synonyms


Fixed-width and delimited flat files that do not contain binary data.
COBOL files
XML files
Data models using certain data modeling tools through Metadata Exchange
for Data Models

3.1.1 Working with Relational Sources


Special Character Handling:
We can import, create, or edit source definitions with table and column names
containing special characters, such as the slash (/) character through the Designer.
When we use the Source Analyzer to import a source definition, the Designer retains
special characters in table and field names.
However, when we add a source definition with special characters to a mapping, the
Designer either retains or replaces the special character. Also, when we generate the
default SQL statement in a Source Qualifier transformation for a relational source,
the Designer uses quotation marks around some special characters. The Designer
handles special characters differently for relational and non-relational sources.

Importing a Relational Source Definition


1. Connect to repository.
2. Right click the folder where you want to import source definition and click
open. The folder which is connected gets bold. We can work in only one folder
at a time.
3. In the Source Analyzer, click Sources > Import from Database.
4. Select the ODBC data source used to connect to the source database. If you
need to create or modify an ODBC data source, click the Browse button to
open the ODBC Administrator. Create the data source, and click OK. Select
the new ODBC data source.

5)
6)
7)
8)
9)

Enter a database user name and password to connect to the database.


Click Connect. Table names will appear.
Select the relational object or objects you want to import.
Click OK.
Click Repository > Save.

Updating a Relational Source Definition


We can update a source definition to add business names or to reflect new column
names, datatypes, or other changes. We can update a source definition in the
following ways:
Edit the definition: Manually edit the source definition if we need to configure
properties that we cannot import or if we want to make minor changes to the source
definition.
Reimport the definition: If the source changes are significant, we may need to
reimport the source definition. This overwrites or renames the existing source
definition. We can retain existing primary key-foreign key relationships and
descriptions in the source definition being replaced.

Editing Relational Source Definitions


1) Select Tools -> Source Analyzer
2) Drag the table you want to edit in workspace.
3) In the Source Analyzer, double-click the title bar of the source definition. Or
Right click the table and click edit.
4) In table tab, we can rename, add owner name, business description or edit
database type.
5) Click the Columns Tab. Edit column names, datatypes, and restrictions. Click
OK.

3.1.2 Working with Flat Files


To use flat files as sources, targets, and lookups in a mapping we must import or
create the definitions in the repository. We can import or create flat file source
definitions in the Source Analyzer.
We can import fixed-width and delimited flat file definitions that do not contain
binary data. When importing the definition, the file must be in a directory
local to the client machine. In addition, the Integration Service must be able to
access all source files during the session.

Special Character Handling:


When we import a flat file in the Designer, the Flat File Wizard uses the file name as
the name of the flat file definition by default. We can import a flat file with any valid
file name through the Flat File Wizard. However, the Designer does not recognize
some special characters in flat file source and target names.
When we import a flat file, the Flat File Wizard changes invalid characters and spaces
into underscores ( _ ). For example, you have the source file "sample
prices+items.dat". When we import this flat file in the Designer, the Flat File Wizard
names the file definition sample_prices_items by default.

To import a fixed-width flat file definition:


1. Open the Source Analyzer and click Sources > Import from File. The Open
Flat File dialog box appears.
2. Browse and Select the file you want to use.
3. Select a code page.
4. Click OK.
5. Edit the following settings:

6) Click Next. Follow the directions in the wizard to manipulate the column
breaks in the file preview window. Move existing column breaks by dragging
them. Double-click a column break to delete it.
7) Click next and Enter column information for each column in the file.
8) Click Finish.
9) Click Repository > Save.

To import a delimited flat file definition:


Delimited flat files are always character-oriented and line sequential. The column
precision is always measured in characters for string columns and in significant digits
for numeric columns. Each row ends with a newline character. We can import a
delimited file that does not contain binary data or multibyte character data greater
than two bytes per character.
Steps:
1) Repeat Steps 1-5 as in case of fixed width.
2) Click Next.
3) Enter the following settings:
Delimiters

Required

Treat
Consecutive
Delimiters as
One
Escape
Character

Optional

Remove Escape
Character From
Data
Use Default Text
Length

Optional

Optional

If selected, the Flat File Wizard uses the entered


default text length for all string datatypes.

Text Qualifier

Required

Quote character that defines the boundaries of text


strings. Choose No Quote, Single Quote, or Double
Quotes.

Optional

Character used to separate columns of data. Use the


Other field to enter a different delimiter.
If selected, the Flat File Wizard reads one or more
consecutive column delimiters as one.

Character immediately preceding a column delimiter


character embedded in an unquoted string, or
immediately preceding the quote character in a
quoted string.
Clear this option to include the escape character in
the output string.

4) Enter column information for each column in the file.


5) Click Finish.
6) Click Repository > Save.

Editing Flat File Definitions


1) Select Tools -> Source Analyzer
2) Drag the file you want to edit in workspace.
3) In the Source Analyzer, double-click the title bar of the source definition.
We can edit source or target flat file definitions using the following
definition tabs:
Table tab: Edit properties such as table name, business name, and
flat file properties.
Columns tab: Edit column information such as column names,
datatypes, precision, and formats.
Properties tab: View the default numeric and datetime format
properties in the Source Analyzer and the Target Designer. You can
edit these properties for each source and target instance in a mapping
in the Mapping Designer.
Metadata Extensions tab: Extend the metadata stored in the
repository by associating information with repository objects, such as
flat file definitions.
4) Click the Advanced button to edit the flat file properties. A different dialog box
appears for fixed-width and delimited files.
5) Do the changes as needed.
6) Click OK.
7) Click Repository > Save.

The way to handle target flat files is also same as described in the above
sections. Just make sure that instead of Source Analyzer,
Select Tools -> Target Designer.
Rest is same.

3.2 TARGET DESIGNER


Before we create a mapping, we must define targets in the repository. Use the
Target Designer to import and design target definitions. Target definitions include
properties such as column names and data types.

Types of target definitions:

Relational: Create a relational target for a particular database platform.


Flat file: Create fixed-width and delimited flat file target definitions.
XML file: Create an XML target definition to output data to an XML file.

Ways of creating target definitions:


1. Import the definition for an existing target: Import the target definition
from a relational target or a flat file. The Target Designer uses a Flat File
Wizard to import flat files.
2. Create a target definition based on a source definition: Drag a source
definition into the Target Designer to make a target definition and edit it to
make necessary changes..
3. Create a target definition based on a transformation or mapplet: Drag
a transformation into the Target Designer to make a target definition.
4. Manually create a target definition: Create a target definition in the
Target Designer.
5. Design several related targets: Create several related target definitions at
the same time. You can create the overall relationship, called a schema, and
the target definitions, through wizards in the Designer.
After we create a relational target table definition, we need to create a table in
database also.
Steps:
1. In the Target Designer, select the relational target definition you want to
create in the database. If you want to create multiple tables, select all
relevant table definitions.
2. Click Targets > Generate/Execute SQL.
3. Click Connect and select the database where the target table should be
created. Click OK to make the connection.
4. Click Generate SQL File if you want to create the SQL script, or Generate and
Execute if you want to create the file, and then immediately run it.
5. Click Close.

3.3 MAPPINGS
A mapping is a set of source and target definitions linked by transformation objects
that define the rules for data transformation. Mappings represent the data flow
between sources and targets. When the Integration Service runs a session, it uses
the instructions configured in the mapping to read, transform, and write data.

Mapping Components:

Source definition: Describes the characteristics of a source table or file.


Transformation: Modifies data before writing it to targets. Use different
transformation objects to perform different functions.
Target definition: Defines the target table or file.
Links: Connect sources, targets, and transformations so the Integration
Service can move the data as it transforms it.

The work of Informatica Developer is to make mappings as per client


requirements. We drag source definition and target definition in workspace.
We create various transformations to modify the data as per the need.
We then run the mappings by creating session and workflow.
We also unit test the mappings.

Steps to create a mapping:


1. Open the Mapping Designer.
2. Click Mappings > Create, or drag a repository object into the workspace.
3. Enter a name for the new mapping and click OK.

3.4 TRANSFORMATIONS
A transformation is a repository object that generates, modifies, or passes data. You
configure logic in a transformation that the Integration Service uses to transform
data. The Designer provides a set of transformations that perform specific functions.
For example, an Aggregator transformation performs calculations on groups of data.
Transformations in a mapping represent the operations the Integration Service
performs on the data. Data passes through transformation ports that we link in a
mapping or mapplet.

Types of Transformations:
Active: An active transformation can change the number of rows that pass through
it, such as a Filter transformation that removes rows that do not meet the filter
condition.
Passive: A passive transformation does not change the number of rows that pass
through it, such as an Expression transformation that performs a calculation on data
and passes all rows through the transformation.

Connected: A connected transformation is connected to other transformations in


the mapping.

Unconnected: An unconnected transformation is not connected to other


transformations in the mapping. An unconnected transformation is called within
another transformation, and returns a value to that transformation.

Reusable: Reusable transformations can be used in multiple mappings. These are


created in Transformation Developer tool. Or promote a non-reusable
transformation from the Mapping Designer.
We can create most transformations as a non-reusable or reusable.
External Procedure transformation can be created as a reusable
transformation only.
Source Qualifier is not reusable.
Non reusable: Non-reusable transformations exist within a single mapping. These
are created in Mapping Designer tool.

Single-Group Transformation: Transformations that have one input and one


output group.

Multi-Group Transformations: Transformations that have multiple input


groups, multiple output groups, or both. A group is the representation of a row of
data entering or leaving a transformation. Example: Union, Router, Joiner, HTTP etc.

3.4.1 Working with Ports


After we create a transformation, we need to add and configure ports using the Ports
tab. Ports are equivalent to columns in Informatica.
Creating Ports:
We can create a new port in the following ways:

Drag a port from another transformation. When we drag a port from another
transformation the Designer creates a port with the same properties, and it
links the two ports. Click Layout > Copy Columns to enable copying ports.
Click the Add button on the Ports tab. The Designer creates an empty port
you can configure.

3.4.2 Using Default Values for Ports


All transformations use default values that determine how the Integration Service
handles input null values and output transformation errors.

Input port: The system default value for null input ports is NULL. It displays
as a blank in the transformation. If an input value is NULL, the Integration
Service leaves it as NULL.
Output port: The system default value for output transformation errors is
ERROR.
The
default
value
appears
in
the
transformation
as
ERROR(`transformation error'). If a transformation error occurs, the
Integration Service skips the row. The Integration Service notes all input rows
skipped by the ERROR function in the session log file.
Input/output port: The system default value for null input is the same as
input ports, NULL. The system default value appears as a blank in the
transformation. The default value for output transformation errors is the same
as output ports.

Note: Variable ports do not support default values. The Integration Service initializes
variable ports according to the datatype.
Note: The Integration Service ignores user-defined default values for unconnected
transformations.

3.4.3 User-defined default values


Constant value: Use any constant (numeric or text), including NULL.
Example: 0, 9999, Unknown Value, NULL
Constant expression: We can include a transformation function with constant
parameters. Example: 500 * 1.75, TO_DATE('January 1, 1998, 12:05 AM'), ERROR
('Null not allowed')
ERROR: Generate a transformation error. Write the row and a message in the
session log or row error log. The Integration Service writes the row to session log or
row error log based on session configuration.
Use the ERROR function as the default value when we do not want null values to
pass into a transformation. For example, we might want to skip a row when the input
value of DEPT_NAME is NULL. You could use the following expression as the default
value:
ERROR('Error. DEPT is NULL')
ABORT: Abort the session. Session aborts when the Integration Service encounters
a null input value. The Integration Service does not increase the error count or write
rows to the reject file.
Example: ABORT(DEPT is NULL')

3.5 TRACING LEVELS


When we configure a transformation, we can set the amount of detail the Integration
Service writes in the session log.

We set tracing level in Properties tab of a transformation.

Level

Description

Normal

Integration Service logs initialization and status information, errors


encountered and skipped rows due to transformation row errors.
Summarizes session results, but not at the level of individual rows.

Terse

Integration Service logs initialization information and error messages


and notification of rejected data.

Verbose
Initialization

In addition to normal tracing, Integration Service logs additional


initialization details; names of index and data files used, and detailed
transformation statistics.

Verbose
Data

In addition to verbose initialization tracing, Integration Service logs


each row that passes into the mapping.
Allows the Integration Service to write errors to both the session log
and error log when you enable row error logging.
Integration Service writes row data for all rows in a block when it
processes a transformation.

Change the tracing level to a Verbose setting only when we need to debug a
transformation that is not behaving as expected.

To add a slight performance boost, we can also set the tracing level to Terse.

3.6 BASIC FIRST MAPPING


First make sure that we have created a shared folder and a folder with the name of
developer along with user as described in Installation Guide.
We will transfer data from EMP table in source to EMP_Tgt table in target.
Also create an ODBC connection for source and target database.

Importing Source Definition:


1.
2.
3.
4.
5.
6.
7.
8.
9.

Select the Shared folder. Right click on it and select open.


Shared folder will become bold. It means we are now connected to it.
Click on tools-> Source Analyzer
Now we will import the source table definitions in shared folder.
Click on Source -> Import from database
In box displayed, give connection information for source database.
Click Connect. Tables in source database will be displayed.
Select the tables of use and click OK.
Table definition will be displayed. We can edit it as per need.

Note: We can edit the source definition by dragging the table in Source Analyzer
only.

Creating Target Table EMP_Tgt in Target database


1. Connect to the Shared folder. Tools-> Target Designer
2. Now drag the EMP table definition from left side pane to target designer.
3. We will see the EMP table definition in Target Designer.
4. Right click EMP -> Edit -> Click on rename and give name as EMP_Tgt
5. Apply -> Ok.
6. Now we will create this table in target database.
7. Click Target -> Select generate/ execute SQL.
8. Click on connect and give login information for target database.
9. Then select the options of table generation.
10. Click Generate/Execute button.
11. Repository -> Save
We are doing this for our practice only. In a project, all the source tables
and target tables are created by DBA. We just import the definition of
tables.

Now we have all the tables we need in shared folder.


We now need to create shortcut to these in our folder.
1.
2.
3.
4.
5.

Right Click on Shared folder and select disconnect.


Select the folder where we want to create the mapping.
Right click on folder and click open. The folder will become bold.
We will now create shortcut to the tables of need in our work folder.
Click + sign on Shared folder and open + sign on Sources and Select
EMP table.
6. Now click Edit -> Copy
7. Now select the folder where which is bold.
8. Click Edit -> Paste Shortcut
9. Do the same for all source and target tables.
10. Also rename all the shortcuts and remove Shortcut_to_ from all.
11. Repository Save

Shortcut use:

If we will select paste option, then the copy of EMP table definition will be
created.
Suppose, we are 10 people and 5 using shortcut and 5 are copying the
definition of EMP.
Now suppose the definition of EMP changes in database.
We will now reimport the EMP definition and old definition will be replaced.
Developers who were using shortcuts will see that the changes have
been reflected in mapping automatically.
Developers using copy will have to reimport manually.
So for maintenance and ease, we use shortcuts to source and target
definitions in our folder and short to other reusable transformations and
mapplets.

Creating Mapping:
1.
2.
3.
4.
5.
6.
7.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping -> Create -> Give mapping name. Ex: m_basic_mapping
Drag EMP from source and EMP_Tgt from target in mapping.
Link ports from SQ_EMP to EMP_Tgt.
Click Mapping -> Validate
Repository -> Save

Creating Session:
Now we will create session in workflow manager.
1.
2.
3.
4.
5.

Open Workflow Manager -> Connect to repository


Open the folder with same name in which we created mapping.
Make sure folder is bold.
Now click tool
Task Developer
Click Task -> Create -> Select Session task and give name.
s_m_basic_mapping
6. Select the correct mapping from the list displayed.
7. Click Create and done.
8. Now right click session and click edit.
9. Select mapping tab.
10. Go to SQ_EMP in source and give the correct relational connection for
it.
11. Do the same for EMP_Tgt.
12. Also for target table, Give Load Type option as Normal and Also select
Truncate Target Table Option.
13. Task -> Validate

Creating Workflow:
1.
2.
3.
4.
5.
6.
7.
8.

Now Click Tools -> Workflow Designer


Workflow -> Create -> Give name like wf_basic_mapping
Click ok
START task will be displayed. It is the starting point for Informatica server.
Drag session to workflow.
Click Task-> Link Task. Connect START to the session.
Click Workflow -> Validate
Repository Save.

Now open Workflow Monitor first.


1. Go back to Workflow Manager. Select the workflow and right click on the
workflow wf_basic_mapping.
2. Select Start Workflow.
You can view the status in Workflow Monitor.
Check the data in target table.
Command: select * from table_name;

3.7 EXPRESSION TRANSFORMATION

Passive and connected transformation.

Use the Expression transformation to calculate values in a single row before we write
to the target. For example, we might need to adjust employee salaries, concatenate
first and last names, or convert strings to numbers.
Use the Expression transformation to perform any non-aggregate calculations.
Example: Addition, Subtraction, Multiplication, Division, Concat, Uppercase
conversion, lowercase conversion etc.
We can also use the Expression transformation to test conditional statements before
we output the results to target tables or other transformations. Example: IF, Then,
Decode
There

are 3 types of ports in Expression Transformation:


Input
Output
Variable: Used to store any temporary calculation.

Calculating Values
To use the Expression transformation to calculate values for a single row, we must
include the following ports:

Input or input/output ports for each value used in the calculation: For
example: To calculate Total Salary, we need salary and commission.
Output port for the expression: We enter one expression for each output
port. The return value for the output port needs to match the return value of
the expression.

We can enter multiple expressions in a single Expression transformation. We can


create any number of output ports in the transformation.
Example: Calculating Total Salary of an Employee

Import the source table EMP in Shared folder. If it is already there, then dont
import.
In shared folder, create the target table Emp_Total_SAL. Keep all ports as in
EMP table except Sal and Comm in target table. Add Total_SAL port to store
the calculation.
Create the necessary shortcuts in the folder.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping -> Create -> Give mapping name. Ex: m_totalsal
Drag EMP from source in mapping.
Click Transformation -> Create -> Select Expression from list. Give name and
click Create. Now click done.
6. Link ports from SQ_EMP to Expression Transformation.
7. Edit Expression Transformation. As we do not want Sal and Comm in target,
remove check from output port for both columns.
8. Now create a new port out_Total_SAL. Make it as output port only.
9. Click the small button that appears in the Expression section of the dialog box
and enter the expression in the Expression Editor.
10. Enter expression SAL + COMM. You can select SAL and COMM from Ports tab
in expression editor.

11. Check the expression syntax by clicking Validate.


12. Click OK -> Click Apply -> Click Ok.
13. Now connect the ports from Expression to target table.
14. Click Mapping -> Validate
15. Repository -> Save

Create Session and Workflow as described earlier. Run the workflow and
see the data in target table.

As COMM is null, Total_SAL will be null in most cases. Now open your
mapping and expression transformation. Select COMM port, In Default Value
give 0. Now apply changes. Validate Mapping and Save.
Refresh the session and validate workflow again. Run the workflow and see
the result again.
Now use ERROR in Default value of COMM to skip rows where COMM is null.
Syntax: ERROR(Any message here)

Similarly, we can use ABORT function to abort the session if COMM is null.
Syntax: ABORT(Any message here)
Make sure to double click the session after doing any changes in mapping. It will
prompt that mapping has changed. Click OK to refresh the mapping. Run workflow
after validating and saving the workflow.

3.8 FILTER TRANSFORMATION

Active and connected transformation.

We can filter rows in a mapping with the Filter transformation. We pass all the rows
from a source transformation through the Filter transformation, and then enter a
filter condition for the transformation. All ports in a Filter transformation are
input/output and only rows that meet the condition pass through the Filter
transformation.

Example: to filter records where SAL>2000

Import the source table EMP in Shared folder. If it is already there, then dont
import.
In shared folder, create the target table Filter_Example. Keep all fields as in
EMP table.
Create the necessary shortcuts in the folder.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping -> Create -> Give mapping name. Ex: m_filter_example
Drag EMP from source in mapping.
Click Transformation -> Create -> Select Filter from list. Give name and click
Create. Now click done.
6. Pass ports from SQ_EMP to Filter Transformation.
7. Edit Filter Transformation. Go to Properties Tab
8. Click the Value section of the Filter condition, and then click the Open button.
9. The Expression Editor appears.
10. Enter the filter condition you want to apply.
11. Click Validate to check the syntax of the conditions you entered.
12. Click OK -> Click Apply -> Click Ok.
13. Now connect the ports from Filter to target table.
14. Click Mapping -> Validate
15. Repository -> Save

Create Session and Workflow as described earlier. Run the workflow and
see the data in target table.

How to filter out rows with null values?


To filter out rows containing null values or spaces, use the ISNULL and IS_SPACES
functions to test the value of the port. For example, if we want to filter out rows that
contain NULLs in the FIRST_NAME port, use the following condition:
IIF(ISNULL(FIRST_NAME),FALSE,TRUE)
This condition states that if the FIRST_NAME port is NULL, the return value is FALSE
and the row should be discarded. Otherwise, the row passes through to the next
transformation.

3.9 ROUTER TRANSFORMATION

Active and connected transformation.

A Router transformation is similar to a Filter transformation because both


transformations allow you to use a condition to test data. A Filter transformation
tests data for one condition and drops the rows of data that do not meet the
condition. However, a Router transformation tests data for one or more conditions
and gives you the option to route rows of data that do not meet any of the conditions
to a default output group.
Example: If we want to keep employees of France, India, US in 3 different tables,
then we can use 3 Filter transformations or 1 Router transformation.

Mapping A uses three Filter transformations while Mapping B produces the same
result with one Router transformation.
A Router transformation consists of input and output groups, input and output ports,
group filter conditions, and properties that we configure in the Designer.

Working with Groups


A Router transformation has the following types of groups:

Input: The Group that gets the input ports.


Output: User Defined Groups and Default Group. We cannot modify or delete
output ports or their properties.

User-Defined Groups: We create a user-defined group to test a condition based


on incoming data. A user-defined group consists of output ports and a group filter
condition. We can create and edit user-defined groups on the Groups tab with the
Designer. Create one user-defined group for each condition that we want to specify.
The Default Group: The Designer creates the default group after we create one
new user-defined group. The Designer does not allow us to edit or delete the default
group. This group does not have a group filter condition associated with it. If all of
the conditions evaluate to FALSE, the IS passes the row to the default group.
Example: Filtering employees of Department 10 to EMP_10, Department 20
to EMP_20 and rest to EMP_REST

Source is EMP Table.


Create 3 target tables EMP_10, EMP_20 and EMP_REST in shared folder.
Structure should be same as EMP table.
Create the shortcuts in your folder.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_router_example
Drag EMP from source in mapping.
Click Transformation -> Create -> Select Router from list. Give name and
click Create. Now click done.
6. Pass ports from SQ_EMP to Router Transformation.
7. Edit Router Transformation. Go to Groups Tab
8. Click the Groups tab, and then click the Add button to create a user-defined
group. The default group is created automatically..
9. Click the Group Filter Condition field to open the Expression Editor.
10. Enter a group filter condition. Ex: DEPTNO=10
11. Click Validate to check the syntax of the conditions you entered.

12. Create another group for EMP_20. Condition: DEPTNO=20


13. The rest of the records not matching the above two conditions will be passed
to DEFAULT group. See sample mapping
14. Click OK -> Click Apply -> Click Ok.
15. Now connect the ports from router to target tables.
16. Click Mapping -> Validate
17. Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all 3 target tables.

Sample Mapping:

Difference between Router and Filter


1> We cannot pass rejected data forward in filter but we can pass it in router.
Rejected data is in Default Group of router.
2> Filter has no default group.

3.10 UNION TRANSFORMATION

Active and Connected transformation.

The Union transformation is a multiple input group transformation that you can use
to merge data from multiple pipelines or pipeline branches into one pipeline branch.
It merges data from multiple sources similar to the UNION ALL SQL statement to
combine the results from two or more SQL statements.

Union Transformation Rules and Guidelines

We can create multiple input groups, but only one output group.
We can connect heterogeneous sources to a Union transformation.
All input groups and the output group must have matching ports. The
precision, datatype, and scale must be identical across all groups.
The Union transformation does not remove duplicate rows. To remove
duplicate rows, we must add another transformation such as a Router or Filter
transformation.
We cannot use a Sequence Generator or Update Strategy transformation
upstream from a Union transformation.

Union Transformation Components


When we configure a Union transformation, define the following components:
Transformation tab: We can rename the transformation and add a description.
Properties tab: We can specify the tracing level.
Groups tab: We can create and delete input groups. The Designer displays groups
we create on the Ports tab.
Group Ports tab: We can create and delete ports for the input groups. The Designer
displays ports we create on the Ports tab.
We cannot modify the Ports, Initialization Properties, Metadata Extensions, or Port
Attribute Definitions tabs in a Union transformation.
Create input groups on the Groups tab, and create ports on the Group Ports
tab. We can create one or more input groups on the Groups tab. The
Designer creates one output group by default. We cannot edit or delete the
default output group.
Example: to combine data of tables EMP_10, EMP_20 and EMP_REST

Import tables EMP_10, EMP_20 and EMP_REST in shared folder in Sources.


Create a target table EMP_UNION_EXAMPLE in target designer. Structure
should be same EMP table.
Create the shortcuts in your folder.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_union_example
Drag EMP_10, EMP_20 and EMP_REST from source in mapping.
Click Transformation -> Create -> Select Union from list. Give name and click
Create. Now click done.
6. Pass ports from SQ_EMP_10 to Union Transformation.
7. Edit Union Transformation. Go to Groups Tab
8. One group will be already there as we dragged ports from SQ_DEPT_10 to
Union Transformation.
9. As we have 3 source tables, we 3 need 3 input groups. Click Add button to
add 2 more groups. See Sample Mapping
10. We can also modify ports in ports tab.
11. Click Apply -> Ok.
12. Drag target table now.
13. Connect the output ports from Union to target table.
14. Click Mapping -> Validate
15. Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all 3 source
tables.

Sample mapping picture

3.11 SORTER TRANSFORMATION

Connected and Active Transformation


The Sorter transformation allows us to sort data.
We can sort data in ascending or descending order according to a specified
sort key.
We can also configure the Sorter transformation for case-sensitive sorting,
and specify whether the output rows should be distinct.

When we create a Sorter transformation in a mapping, we specify one or more ports


as a sort key and configure each sort key port to sort in ascending or descending
order. We also configure sort criteria the PowerCenter Server applies to all sort key
ports and the system resources it allocates to perform the sort operation.
The Sorter transformation contains only input/output ports. All data passing through
the Sorter transformation is sorted according to a sort key. The sort key is one or
more ports that we want to use as the sort criteria.

Sorter Transformation Properties


1. Sorter Cache Size:
The PowerCenter Server uses the Sorter Cache Size property to determine the
maximum amount of memory it can allocate to perform the sort operation. The
PowerCenter Server passes all incoming data into the Sorter transformation
before it performs the sort operation.

We can specify any amount between 1 MB and 4 GB for the Sorter cache
size.
If it cannot allocate enough memory, the PowerCenter Server fails the
session.
For best performance, configure Sorter cache size with a value less than
or equal to the amount of available physical RAM on the PowerCenter
Server machine.
Informatica recommends allocating at least 8 MB (8,388,608 bytes) of
physical memory to sort data using the Sorter transformation.

2. Case Sensitive:
The Case Sensitive property determines whether the PowerCenter Server
considers case when sorting data. When we enable the Case Sensitive property,
the PowerCenter Server sorts uppercase characters higher than lowercase
characters.

3. Work Directory
Directory PowerCenter Server uses to create temporary files while it sorts data.

4. Distinct:
Check this option if we want to remove duplicates. Sorter will sort data according
to all the ports when it is selected.

Example: Sorting data of EMP by ENAME

Source is EMP table.


Create a target table EMP_SORTER_EXAMPLE in target designer. Structure
same as EMP table.
Create the shortcuts in your folder.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_sorter_example
Drag EMP from source in mapping.
Click Transformation -> Create -> Select Sorter from list. Give name and click
Create. Now click done.
6. Pass ports from SQ_EMP to Sorter Transformation.
7. Edit Sorter Transformation. Go to Ports Tab
8. Select ENAME as sort key. CHECK mark on KEY in front of ENAME.
9. Click Properties Tab and Select Properties as needed.
10. Click Apply -> Ok.
11. Drag target table now.
12. Connect the output ports from Sorter to target table.
13. Click Mapping -> Validate
14. Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.

Sample Sorter Mapping

3.12 RANK TRANSFORMATION

Active and connected transformation

The Rank transformation allows us to select only the top or bottom rank of data. It
allows us to select a group of top or bottom values, not just one value.
During the session, the PowerCenter Server caches input data until it can perform
the rank calculations.

Rank Transformation Properties

Cache Directory where cache will be made.


Top/Bottom Rank as per need
Number of Ranks Ex: 1, 2 or any number
Case Sensitive Comparison can be checked if needed
Rank Data Cache Size can be set
Rank Index Cache Size can be set

Ports in a Rank Transformation


Ports

Number
Required

Description

1 Minimum

Port to receive data from another transformation.

1 Minimum

Port we want to pass to other transformation.

Not needed

Can use to store values or calculations to use in an expression.

Only 1

Rank port. Rank is calculated according to it. The Rank port is


an input/output port. We must link the Rank port to another
transformation. Example: Total Salary

Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation.
The PowerCenter Server uses the Rank Index port to store the ranking position for
each row in a group.
For example, if we create a Rank transformation that ranks the top five salaried
employees, the rank index numbers the employees from 1 to 5.
The RANKINDEX is an output port only.
We can pass the rank index to another transformation in the mapping or
directly to a target.
We cannot delete or edit it.

Defining Groups
Rank transformation allows us to group information. For example: If we want to
select the top 3 salaried employees of each Department, we can define a group for
department.
By defining groups, we create one set of ranked rows for each group.
We define a group in Ports tab. Click the Group By for needed port.
We cannot Group By on port which is also Rank Port.
1> Example: Finding Top 5 Salaried Employees

EMP will be source table.


Create a target table EMP_RANK_EXAMPLE in target designer. Structure
should be same as EMP table. Just add one more port Rank_Index to store
RANK INDEX.
Create the shortcuts in your folder.

Creating Mapping:
1.
2.
3.
4.
5.
6.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_rank_example
Drag EMP from source in mapping.
Create an EXPRESSION transformation to calculate TOTAL_SAL.
Click Transformation -> Create -> Select RANK from list. Give name and click
Create. Now click done.
7. Pass ports from Expression to Rank Transformation.
8. Edit Rank Transformation. Go to Ports Tab
9. Select TOTAL_SAL as rank port. Check R type in front of TOTAL_SAL.
10. Click Properties Tab and Select Properties as needed.
11. Top in Top/Bottom and Number of Ranks as 5.
12. Click Apply -> Ok.
13. Drag target table now.
14. Connect the output ports from Rank to target table.
15. Click Mapping -> Validate
16. Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.

2> Example: Finding Top 2 Salaried Employees for every DEPARTMENT


Open the mapping made above. Edit Rank Transformation.
Go to Ports Tab. Select Group By for DEPTNO.
Go to Properties tab. Set Number of Ranks as 2.
Click Apply -> Ok.
Mapping -> Validate and Repository Save.
Refresh the session by double clicking. Save the changed and run workflow to see
the new result.

Sample Rank Mapping

RANK CACHE
When the PowerCenter Server runs a session with a Rank transformation, it
compares an input row with rows in the data cache. If the input row out-ranks a
stored row, the PowerCenter Server replaces the stored row with the input row.
Example: PowerCenter caches the first 5 rows if we are finding top 5 salaried
employees. When 6th row is read, it compares it with 5 rows in cache and places it in
cache is needed.

1> RANK INDEX CACHE:


The index cache holds group information from the group by ports. If we are
using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO

2> RANK DATA CACHE:


It holds row data until the PowerCenter Server completes the ranking and is
generally larger than the index cache. To reduce the data cache size, connect
only the necessary input/output ports to subsequent transformations.

All Variable ports if there, Rank Port, All ports going out from RANK
transformation are stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.

3.13 AGGREGATOR TRANSFORMATION

Connected and Active Transformation


The Aggregator transformation allows us to perform aggregate calculations, such
as averages and sums.
Aggregator transformation allows us to perform calculations on groups.

Components of the Aggregator Transformation


1>
2>
3>
4>

Aggregate expression
Group by port
Sorted Input
Aggregate cache

1> Aggregate Expressions

Entered in an output port.


Can include non-aggregate expressions and conditional clauses.

The transformation language includes the following aggregate functions:


AVG, COUNT , MAX, MIN, SUM
FIRST, LAST
MEDIAN, PERCENTILE, STDDEV, VARIANCE
Single Level Aggregate Function: MAX(SAL)
Nested Aggregate Function: MAX( COUNT( ITEM ))

Nested Aggregate Functions

In Aggregator transformation, there can be multiple single level functions or


multiple nested functions.
An Aggregator transformation cannot have both types of functions together.
MAX( COUNT( ITEM )) is correct.
MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate
function nested within another aggregate function

Conditional Clauses
We can use conditional clauses in the aggregate expression to reduce the number of
rows used in the aggregation. The conditional clause can be any clause that
evaluates to TRUE or FALSE.
SUM( COMMISSION, COMMISSION > QUOTA )

Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate expression.
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

2> Group By Ports

Indicates how to create groups.


When grouping data, the Aggregator transformation outputs the last row of
each group unless otherwise specified.
The Aggregator transformation allows us to define groups for aggregations, rather
than performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.

In Aggregator Transformation, Open Ports tab and select Group By as


needed.

3> Using Sorted Input

Use to improve session performance.


To use sorted input, we must pass data to the Aggregator transformation
sorted by group by port, in ascending or descending order.
When we use this option, we tell Aggregator that data coming to it is already
sorted.
We check the Sorted Input Option in Properties Tab of the transformation.
If the option is checked but we are not passing sorted data to the
transformation, then the session fails.

4> Aggregator Caches

The PowerCenter Server stores data in the aggregate cache until it completes
aggregate calculations.
It stores group values in an index cache and row data in the data cache. If
the PowerCenter Server requires more space, it stores overflow values in
cache files.

Note: The PowerCenter Server uses memory to process an Aggregator


transformation with sorted ports. It does not use cache memory. We do not
need to configure cache memory for Aggregator transformations that use
sorted ports.

1> Aggregator Index Cache:


The index cache holds group information from the group by ports. If we are
using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO

2> Aggregator Data Cache:


DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.
Columns in Data Cache:
Variable ports if any
Non group by input/output ports.
Non group by input ports used in non-aggregate output expression.
Port containing aggregate function
Example: All ports except DEPTNO in our mapping example.

1> Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table.

EMP will be source table.


Create a target table EMP_AGG_EXAMPLE in target designer. Table should
contain DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
Create the shortcuts in your folder.

Creating Mapping:
1.
2.
3.
4.
5.
6.
7.
8.
9.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_agg_example
Drag EMP from source in mapping.
Click Transformation -> Create -> Select AGGREGATOR from list. Give name
and click Create. Now click done.
Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.
Edit AGGREGATOR Transformation. Go to Ports Tab
Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL,
OUT_SUM_SAL
Open Expression Editor one by one for all output ports and give the
calculations. Ex: MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)

10. Click Apply -> Ok.


11. Drag target table now.
12. Connect the output ports from Rank to target table.
13. Click Mapping -> Validate
14. Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.

2> Example: To calculate MAX, MIN, AVG and SUM of salary of EMP table for
every DEPARTMENT
Open the mapping made above. Edit Rank Transformation.
Go to Ports Tab. Select Group By for DEPTNO.
Click Apply -> Ok.
Mapping -> Validate and Repository Save.
Refresh the session by double clicking. Save the changed and run workflow to see
the new result.

Scene1: What will be output of the picture below?

Here we are not doing any calculation or group by.


In this case, the DEPTNO and SAL of last record of EMP table will be passed to
target.
Scene2: What will be output of the above picture if Group By is done on
DEPTNO?
Here we are not doing any calculation but Group By is there on DEPTNO.
In this case, the last record of every DEPTNO from EMP table will be passed to
target.
Scene3: What will be output of the EXAMPLE 1?
In Example 1, we are calculating MAX, MIN, AVG and SUM but we are not doing any
Group By.
In this DEPTNO of last record of EMP table will be passed. The calculations however
will be correct.
Scene4: What will be output of the EXAMPLE 2?
In Example 1, we are calculating MAX, MIN, AVG and SUM for every DEPT.
In this DEPTNO and the correct calculations for every DEPTNO will be passed to
target.
Scene5: Use SORTED INPUT in Properties Tab and Check output

3.14 JOINER TRANSFORMATION

Connected and Active Transformation


Used to join source data from two related heterogeneous sources residing in
different locations or file systems. Or, we can join data from the same source.
If we need to join 3 tables, then we need 2 Joiner Transformations.
The Joiner transformation joins two sources with at least one matching port.
The Joiner transformation uses a condition that matches one or more pairs of
ports between the two sources.

Example: To join EMP and DEPT tables.

EMP and DEPT will be source table.


Create a target table JOINER_EXAMPLE in target designer. Table should
contain all ports of EMP table plus DNAME and LOC as shown below.
Create the shortcuts in your folder.

Creating Mapping:
1>
2>
3>
4>

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_joiner_example
Drag EMP, DEPT, Target. Create Joiner Transformation. Link as shown below.

5>
6>
7>
8>

Specify the join condition in Condition tab. See steps on next page.
Set Master in Ports tab. See steps on next page.
Mapping -> Validate
Repository -> Save.

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.

JOIN CONDITION:
The join condition contains ports from both input sources that must match for the
PowerCenter Server to join two rows.
Example: DEPTNO=DEPTNO1 in above.
1. Edit Joiner Transformation -> Condition Tab
2. Add condition

We can add as many conditions as needed.


Only = operator is allowed.

If we join Char and Varchar datatypes, the PowerCenter Server counts any spaces
that pad Char values as part of the string. So if you try to join the following:
Char (40) = abcd and Varchar (40) = abcd
Then the Char value is abcd padded with 36 blank spaces, and the PowerCenter
Server does not join the two fields because the Char field contains trailing spaces.
Note: The Joiner transformation does not match null values.

MASTER and DETAIL TABLES


In Joiner, one table is called as MASTER and other as DETAIL.
MASTER table is always cached. We can make any table as MASTER.
Edit Joiner Transformation -> Ports Tab -> Select M for Master table.
Table with less number of rows should be made MASTER to improve
performance.
Reason:
When the PowerCenter Server processes a Joiner transformation, it reads
rows from both sources concurrently and builds the index and data cache
based on the master rows. So table with fewer rows will be read fast and
cache can be made as table with more rows is still being read.
The fewer unique rows in the master, the fewer iterations of the join
comparison occur, which speeds the join process.

JOINER TRANSFORMATION PROPERTIES TAB

Case-Sensitive String Comparison: If selected, the PowerCenter Server


uses case-sensitive string comparisons when performing joins on string
columns.
Cache Directory: Specifies the directory used to cache master or detail rows
and the index to these rows.
Join Type: Specifies the type of join: Normal, Master Outer, Detail Outer, or
Full Outer.
Tracing Level
Joiner Data Cache Size
Joiner Index Cache Size
Sorted Input

JOIN TYPES
In SQL, a join is a relational operator that combines data from multiple tables into a
single result set. The Joiner transformation acts in much the same manner, except
that tables can originate from different databases or flat files.
Types

of Joins:
Normal
Master Outer
Detail Outer
Full Outer

Note: A normal or master outer join performs faster than a full outer or detail outer
join.
Example: In EMP, we have employees with DEPTNO 10, 20, 30 and 50. In DEPT, we
have DEPTNO 10, 20, 30 and 40. DEPT will be MASTER table as it has less rows.
Normal Join:
With a normal join, the PowerCenter Server discards all rows of data from the master
and detail source that do not match, based on the condition.
All employees of 10, 20 and 30 will be there as only they are matching.
Master Outer Join:
This join keeps all rows of data from the detail source and the matching rows from
the master source. It discards the unmatched rows from the master source.

All data of employees of 10, 20 and 30 will be there.


There will be employees of DEPTNO 50 and corresponding DNAME and LOC
columns will be NULL.

Detail Outer Join:


This join keeps all rows of data from the master source and the matching rows from
the detail source. It discards the unmatched rows from the detail source.

All employees of 10, 20 and 30 will be there.


There will be one record for DEPTNO 40 and corresponding data of EMP
columns will be NULL.

Full Outer Join:


A full outer join keeps all rows of data from both the master and detail sources.

All data of employees of 10, 20 and 30 will be there.


There will be employees of DEPTNO 50 and corresponding DNAME and LOC
columns will be NULL.
There will be one record for DEPTNO 40 and corresponding data of EMP
columns will be NULL.

USING SORTED INPUT

Use to improve session performance.


To use sorted input, we must pass data to the Joiner transformation sorted by
the ports that are used in Join Condition.

We check the Sorted Input Option in Properties Tab of the transformation.


If the option is checked but we are not passing sorted data to the
transformation, then the session fails.
We can use SORTER to sort data or Source Qualifier in case of
relational tables.

JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds Index
cache and Data Cache based on MASTER table.
1> Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX
CACHE.
Example: DEPTNO in our mapping.
2> Joiner Data Cache:
Master column not in join condition and used for output to other
transformation or target table are in Data Cache.
Example: DNAME and LOC in our mapping example.

JOINER TRANSFORMATION TIPS

Perform joins in a database when possible.


Join sorted data when possible.
For a sorted Joiner transformation, designate as the master source the source
with fewer duplicate key values.
Joiner can't be used in following conditions:
1. Either input pipeline contains an Update Strategy transformation.
2. We connect a Sequence Generator transformation directly before the
Joiner transformation.

3.15 SOURCE QUALIFIER T/F

Active and Connected Transformation.


The Source Qualifier transformation represents the rows that the PowerCenter
Server reads when it runs a session.
It is only transformation that is not reusable.
Default transformation except in case of XML or COBOL files.

Tasks performed by Source Qualifier:

Join data originating from the same source database: We can join two
or more tables with primary key-foreign key relationships by linking the
sources to one Source Qualifier transformation.
Filter rows when the PowerCenter Server reads source data: If we
include a filter condition, the PowerCenter Server adds a WHERE clause to the
default query.
Specify an outer join rather than the default inner join: If we include a
user-defined join, the PowerCenter Server replaces the join information
specified by the metadata in the SQL query.
Specify sorted ports: If we specify a number for sorted ports, the
PowerCenter Server adds an ORDER BY clause to the default SQL query.
Select only distinct values from the source: If we choose Select Distinct,
the PowerCenter Server adds a SELECT DISTINCT statement to the default
SQL query.
Create a custom query to issue a special SELECT statement for the
PowerCenter Server to read source data: For example, you might use a
custom query to perform aggregate calculations.

The entire above are possible in Properties Tab of Source Qualifier t/f.
SAMPLE MAPPING TO BE MADE:

Source will be EMP and DEPT tables.


Create target table as showed in Picture above.
Create shortcuts in your folder as needed.

Creating Mapping:
1>
2>
3>
4>
5>
6>
7>

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give mapping name. Ex: m_SQ_example
Drag EMP, DEPT, Target.
Right Click SQ_EMP and Select Delete from the mapping.
Right Click SQ_DEPT and Select Delete from the mapping.
Click Transformation -> Create -> Select Source Qualifier from List -> Give
Name -> Click Create
8> Select EMP and DEPT both. Click OK.
9> Link all as shown in above picture.
10> Edit SQ -> Properties Tab -> Open User defined Join -> Give Join condition
EMP.DEPTNO=DEPT.DEPTNO. Click Apply -> OK
(More details after 2 pages)
11> Mapping -> Validate
12> Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.

SQ PROPERTIES TAB

1> SOURCE FILTER:


We can enter a source filter to reduce the number of rows the PowerCenter
Server queries.
Note: When we enter a source filter in the session properties, we override the
customized SQL query in the Source Qualifier transformation.
Steps:
1>
2>
3>
4>
5>

In the Mapping Designer, open a Source Qualifier transformation.


Select the Properties tab.
Click the Open button in the Source Filter field.
In the SQL Editor Dialog box, enter the filter. Example: EMP.SAL>2000
Click OK.

Validate the mapping. Save it. Now refresh session and save the changes. Now
run the workflow and see output.

2> NUMBER OF SORTED PORTS:


When we use sorted ports, the PowerCenter Server adds the ports to the
ORDER BY clause in the default query.
By default it is 0. If we change it to 1, then the data will be sorted by column
that is at the top in SQ. Example: DEPTNO in above figure.
If we want to sort as per ENAME, move ENAME to top.
If we change it to 2, then data will be sorted by top two columns.
Steps:
1> In the Mapping Designer, open a Source Qualifier transformation.
2> Select the Properties tab.
3> Enter any number instead of zero for Number of Sorted ports.
4> Click Apply -> Click OK.
Validate the mapping. Save it. Now refresh session and save the changes. Now
run the workflow and see output.

3> SELECT DISTINCT:


If we want the PowerCenter Server to select unique values from a source, we
can use the Select Distinct option.
Just check the option in Properties tab to enable it.

4> PRE and POST SQL Commands

The PowerCenter Server runs pre-session SQL commands against the


source database before it reads the source.
It runs post-session SQL commands against the source database after
it writes to the target.
Use a semi-colon (;) to separate multiple statements.

5> USER DEFINED JOINS


Entering a user-defined join is similar to entering a custom SQL query.
However, we only enter the contents of the WHERE clause, not the entire
query.

We can specify equi join, left outer join and right outer join only. We
cannot specify full outer join. To use full outer join, we need to write
SQL Query.

Steps:
1> Open the Source Qualifier transformation, and click the Properties tab.
2> Click the Open button in the User Defined Join field. The SQL Editor Dialog
box appears.
3>Enter the syntax for the join.

4> Click OK -> Again Ok.


Validate the mapping. Save it. Now refresh session and save the changes. Now
run the workflow and see output.
Join Type

Syntax

Equi Join

DEPT.DEPTNO=EMP.DEPTNO

Left Outer Join

{EMP LEFT OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO}

Right Outer Join

{EMP RIGHT OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO}

Curly braces are needed in Syntax.


Try all the above & also FULL outer. Session fails for FULL OUTER. {EMP FULL
OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO}

6> SQL QUERY


For relational sources, the PowerCenter Server generates a query for each Source
Qualifier transformation when it runs a session. The default query is a SELECT
statement for each source column used in the mapping. In other words, the
PowerCenter Server reads only the columns that are connected to another
transformation.

In mapping above, we are passing only SAL and DEPTNO from SQ_EMP to
Aggregator transformation. Default query generated will be:

SELECT EMP.SAL, EMP.DEPTNO FROM EMP

Viewing the Default Query


1. Open the Source Qualifier transformation, and click the Properties tab.
2. Open SQL Query. The SQL Editor displays.
3. Click Generate SQL.

4. The SQL Editor displays the default query the PowerCenter Server uses to
select source data.
5. Click Cancel to exit.

Note: If we do not cancel the SQL query, the PowerCenter Server overrides
the default query with the custom SQL query.
We can enter an SQL statement supported by our source database. Before entering
the query, connect all the input and output ports we want to use in the mapping.
Example: As in our case, we cant use full outer join in user defined join, we can
write SQL query for FULL OUTER JOIN:
SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC, EMP.EMPNO,
EMP.JOB, EMP.SAL, EMP.COMM, EMP.DEPTNO FROM
EMP FULL OUTER JOIN DEPT ON DEPT.DEPTNO=EMP.DEPTNO
WHERE
SAL>2000

EMP.ENAME,

We also added WHERE clause. We can enter more conditions and write
more complex SQL.

We can write any query. We can join as many tables in one query as
required if all are in same database. It is very handy and used in most of the
projects.

Important Points:

When creating a custom SQL query, the SELECT statement must list the
port names in the order in which they appear in the transformation.
Example: DEPTNO is top column; DNAME is second in our SQ mapping.
So when we write SQL Query, SELECT statement have name DNAME
first, DNAME second and so on. SELECT DEPT.DEPTNO, DEPT.DNAME

Once we have written a custom query like above, then this query will
always be used to fetch data from database. In our example, we used
WHERE SAL>2000. Now if we use Source Filter and give condition
SAL>1000 or any other, then it will not work. Informatica will always
use the custom query only.

Make sure to test the query in database first before using it in SQL
Query. If query is not running in database, then it wont work in
Informatica too.

Also always connect to the database and validate the SQL in SQL query
editor.

3.16 LOOKUP TRANSFORMATION

Passive Transformation
Can be Connected or Unconnected. Dynamic lookup is connected.
Use a Lookup transformation in a mapping to look up data in a flat file or a
relational table, view, or synonym.
We can import a lookup definition from any flat file or relational database to
which both the PowerCenter Client and Server can connect.
We can use multiple Lookup transformations in a mapping.

The PowerCenter Server queries the lookup source based on the lookup ports in the
transformation. It compares Lookup transformation port values to lookup source
column values based on the lookup condition. Pass the result of the lookup to other
transformations and a target.
We can use the Lookup transformation to perform following:

Get a related value: EMP has DEPTNO but DNAME is not there. We use
Lookup to get DNAME from DEPT table based on Lookup Condition.
Perform a calculation: We want only those Employees whos SAL >
Average (SAL). We will write Lookup Override query.
Update slowly changing dimension tables: Most important use. We can
use a Lookup transformation to determine whether rows already exist in the
target.

3.16.1 LOOKUP TYPES


We can configure the Lookup transformation to perform the following types of
lookups:

Connected or Unconnected
Relational or Flat File
Cached or Uncached

Relational Lookup:
When we create a Lookup transformation using a relational table as a lookup source,
we can connect to the lookup source using ODBC and import the table definition as
the structure for the Lookup transformation.
We can override the default SQL statement if we want to add a WHERE clause
or query multiple tables.
We can use a dynamic lookup cache with relational lookups.

Flat File Lookup:


When we use a flat file for a lookup source, we can use any flat file definition in the
repository, or we can import it. When we import a flat file lookup source, the
Designer invokes the Flat File Wizard.

Cached or Uncached Lookup:


We can check the option in Properties Tab to Cache to lookup or not. By default,
lookup is cached.

Connected and Unconnected Lookup


Connected Lookup

Unconnected Lookup

Receives input values directly from


the pipeline.

Receives input values from the result of a


:LKP expression in another transformation.

We can use a dynamic or static


cache.

We can use a static cache.

Cache includes all lookup columns


used in the mapping.

Cache includes all lookup/output ports in the


lookup condition and the lookup/return port.

If there is no match for the lookup


condition, the PowerCenter Server
returns the default value for all
output ports.

If there is no match for the lookup condition,


the PowerCenter Server returns NULL.

If there is a match for the


condition, the PowerCenter
returns the result of the
condition for all lookup/output

If there is a match for the lookup condition,


the PowerCenter Server returns the result of
the lookup condition into the return port.

Pass multiple output


another transformation.

lookup
Server
lookup
ports.

values

to

Supports user-defined default values.

Pass
one
output
transformation.

value

to

another

Does not support user-defined default


values.

3.16.2 LOOKUP T/F COMPONENTS


Define the following components when we configure a Lookup transformation in a
mapping:
Lookup source
Ports
Properties
Condition

1. Lookup Source:
We can use a flat file or a relational table for a lookup source. When we create a
Lookup t/f, we can import the lookup source from the following locations:
Any relational source or target definition in the repository
Any flat file source or target definition in the repository
Any table or file that both the PowerCenter Server and Client machine can
connect to
The lookup table can be a single table, or we can join multiple tables in the same
database using a lookup SQL override in Properties Tab.

2. Ports:
Ports

Lookup
Type

Number
Needed

Description

Connected
Unconnected

Minimum 1

Input port to Lookup. Usually ports used for


Join condition are Input ports.

Connected
Unconnected

Minimum 1

Ports going to another transformation from


Lookup.

Connected
Unconnected

Minimum 1

Lookup port. The Designer automatically


designates each column in the lookup source
as a lookup (L) and output port (O).

Unconnected

1 Only

Return port. Use only in unconnected Lookup


t/f only.

3. Properties Tab
Options

Lookup
Type

Description

Lookup SQL
Override

Relational

Overrides the default SQL statement to query the


lookup table.

Lookup Table
Name

Relational

Specifies the name of the table from which the


transformation looks up and caches values.

Lookup Caching
Enabled

Flat File,
Relational

Indicates whether the PowerCenter Server caches


lookup values during the session.

Lookup Policy on
Multiple Match

Flat File,
Relational

Determines what happens when the Lookup


transformation finds multiple rows that match the
lookup condition. Options: Use First Value or Use
Last Value or Use Any Value or Report Error

Lookup
Condition

Flat File,
Relational

Displays the lookup condition you set in the


Condition tab.

Connection
Information

Relational

Specifies the database containing the lookup table.

Source Type

Flat File,
Relational

Lookup is from a database or flat file.

Lookup Cache
Directory Name

Flat File,
Relational

Location where cache is build.

Lookup Cache
Persistent

Flat File,
Relational

Whether to use Persistent Cache or not.

Dynamic Lookup
Cache

Flat File,
Relational

Whether to use Dynamic Cache or not.

Recache From
Lookup Source

Flat File,
Relational

To rebuild cache if cache source changes and we


are using Persistent Cache.

Insert Else
Update

Relational

Use only with dynamic caching enabled. Applies to


rows entering the Lookup transformation with the
row type of insert.

Update Else
Insert

Relational

Use only with dynamic caching enabled. Applies to


rows entering the Lookup transformation with the
row type of update.

Lookup Data
Cache Size

Flat File,
Relational

Data Cache Size

Lookup Index
Cache Size

Flat File,
Relational

Index Cache Size

Cache File Name


Prefix

Flat File,
Relational

Use only with persistent lookup cache. Specifies the


file name prefix to use with persistent lookup cache
files.

Some other properties for Flat Files are:

Datetime Format
Thousand Separator
Decimal Separator
Case-Sensitive String Comparison
Null Ordering
Sorted Input

4: Condition Tab
We enter the Lookup Condition. The PowerCenter Server uses the lookup condition to
test incoming values. We compare transformation input values with values in the
lookup source or cache, represented by lookup ports.

The datatypes in a condition must match.


When we enter multiple conditions, the PowerCenter Server evaluates each
condition as an AND, not an OR.
The PowerCenter Server matches null values.
The input value must meet all conditions for the lookup to return a value.
=, >, <, >=, <=, != Operators can be used.
Example: IN_DEPTNO = DEPTNO
In_DNAME = 'DELHI'

Tip: If we include more than one lookup condition, place the conditions with an equal
sign first to optimize lookup performance.
Note:
1. We can use = operator in case of Dynamic Cache.
2. The PowerCenter Server fails the session when it encounters multiple keys for
a Lookup transformation configured to use a dynamic cache.

3.16.3 Connected Lookup Transformation


Example: To create a connected Lookup Transformation

EMP will be source table. DEPT will be LOOKUP table.


Create a target table CONN_Lookup_EXAMPLE in target designer. Table
should contain all ports of EMP table plus DNAME and LOC as shown below.
Create the shortcuts in your folder.

Creating Mapping:
1.
2.
3.
4.
5.
6.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_CONN_LOOKUP_EXAMPLE
Drag EMP and Target table.
Connect all fields from SQ_EMP to target except DNAME and LOC.
Transformation-> Create -> Select LOOKUP from list. Give name and click
Create.
7. The Following screen is displayed.

8. As DEPT is the Source definition, click Source and then Select DEPT.

9> Click Ok.

10> Now Pass DEPTNO from SQ_EMP to this Lookup. DEPTNO from SQ_EMP will
be named as DEPTNO1. Edit Lookup and rename it to IN_DEPTNO in ports
tab.
11> Now go to CONDITION tab and add CONDITION.
DEPTNO = IN_DEPTNO and Click Apply and then OK.
Link the mapping as shown below:

12> We are not passing IN_DEPTNO and DEPTNO to any other transformation
from LOOKUP; we can edit the lookup transformation and remove the
OUTPUT check from them.
13> Mapping -> Validate
14> Repository -> Save

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.
Make sure to give connection for LOOKUP Table also.

We use Connected Lookup when we need to return more than one column from
Lookup table.
There is no use of Return Port in Connected Lookup.

SEE PROPERTY TAB FOR ADVANCED SETTINGS

3.16.4 Unconnected Lookup Transformation


An unconnected Lookup transformation is separate from the pipeline in the mapping.
We write an expression using the :LKP reference qualifier to call the lookup within
another transformation.
Steps to configure Unconnected Lookup:
1> Add input ports.
2> Add the lookup condition.
3> Designate a return value.
4> Call the lookup from another transformation.
Example: To create a unconnected Lookup Transformation

EMP will be source table. DEPT will be LOOKUP table.


Create a target table UNCONN_Lookup_EXAMPLE in target designer. Table
should contain all ports of EMP table plus DNAME as shown below.
Create the shortcuts in your folder.

Creating Mapping: See Sample mapping picture below:


1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_UNCONN_LOOKUP_EXAMPLE
Drag EMP and Target table.
Now Transformation-> Create -> Select EXPRESSION from list. Give name
and click Create. Then Click Done.
6. Pass all ports from SQ_EMP to EXPRESSION transformation.
7. Connect all fields from EXPRESSION to target except DNAME.
8. Transformation-> Create -> Select LOOKUP from list. Give name and click
Create.
9. Follow the steps as in Connected above to create Lookup on DEPT table.
10. Click Ok.
11. Now Edit the Lookup Transformation. Go to Ports tab.
12. As DEPTNO is common in source and Lookup, create a port IN_DEPTNO
ports tab. Make it Input port only and Give Datatype same as DEPTNO.
13. Designate DNAME as Return Port. Check on R to make it.

14. Now add a condition in Condition Tab.


DEPTNO = IN_DEPTNO and Click Apply and then OK.
15. Now we need to call this Lookup from Expression Transformation.
16. Edit Expression t/f and create a new output port out_DNAME of datatype as
DNAME. Open the Expression editor and call Lookup as given below:

We double click Unconn in bottom of Functions tab and as we need only


DEPTNO, we pass only DEPTNO as input.
17. Validate the call in Expression editor and Click OK.
18. Mapping -> Validate
19. Repository Save.

Create Session and Workflow as described earlier. Run the


workflow and see the data in target table.
Make sure to give connection information for all tables.
Make sure to give connection for LOOKUP Table also.

3.16.5 Lookup Caches


We can configure a Lookup transformation to cache the lookup table. The Integration
Service (IS) builds a cache in memory when it processes the first row of data in a
cached Lookup transformation.
The Integration Service also creates cache files by default in the $PMCacheDir. If the
data does not fit in the memory cache, the IS stores the overflow values in the cache
files. When session completes, IS releases cache memory and deletes the cache files.

If we use a flat file lookup, the IS always caches the lookup source.
We set the Cache type in Lookup Properties.

Lookup Cache Files


1. Lookup Index Cache:
Stores data for the columns used in the lookup condition.
2. Lookup Data Cache:
For a connected Lookup transformation, stores data for the connected output
ports, not including ports used in the lookup condition.
For an unconnected Lookup transformation, stores data from the return port.

Types of Lookup Caches:


1. Static Cache
By default, the IS creates a static cache. It caches the lookup file or table and
looks up values in the cache for each row that comes into the transformation.
The IS does not update the cache while it processes the Lookup transformation.

2. Dynamic Cache
To cache a target table or flat file source and insert new rows or update existing
rows in the cache, use a Lookup transformation with a dynamic cache.
The IS dynamically inserts or updates data in the lookup cache and passes data
to the target.
Target table is also our lookup table. No good for performance if table is huge.

3. Persistent Cache
If the lookup table does not change between sessions, we can configure the
Lookup transformation to use a persistent lookup cache.
The IS saves and reuses cache files from session to session, eliminating the time
required to read the lookup table.

4. Recache from Source


If the persistent cache is not synchronized with the lookup table, we can
configure the Lookup transformation to rebuild the lookup cache.
If Lookup table has changed, we can use this to rebuild the lookup cache.

5. Shared Cache

Unnamed cache: When Lookup transformations in a mapping have


compatible caching structures, the IS shares the cache by default. You can
only share static unnamed caches.
Named cache: Use a persistent named cache when we want to share a
cache file across mappings or share a dynamic and a static cache. The
caching structures must match or be compatible with a named cache. You
can share static and dynamic named caches.

Building Connected Lookup Caches


We can configure the session to build caches sequentially or concurrently.
When we build sequential caches, the IS creates caches as the source rows
enter the Lookup transformation.
When we configure the session to build concurrent caches, the IS does not
wait for the first row to enter the Lookup transformation before it creates
caches. Instead, it builds multiple caches concurrently.
1. Building Lookup Caches Sequentially:

2. Building Lookup Caches Concurrently:

To configure the session to create concurrent caches


Edit Session -> In Config Object Tab-> Additional Concurrent Pipelines for
Lookup Cache Creation -> Give a value here (Auto By Default)

Note: The IS builds caches for unconnected Lookups sequentially only.

3.17 UPDATE STRATEGY TRANSFORMATION

Active and Connected Transformation

Till now, we have only inserted rows in our target tables. What if we want to
update, delete or reject rows coming from source based on some condition?
Example: If Address of a CUSTOMER changes, we can update the old address or
keep both old and new address. One row is for old and one for new. This way we
maintain the historical data.
Update Strategy is used with Lookup Transformation. In DWH, we create a Lookup
on target table to determine whether a row already exists or not. Then we insert,
update, delete or reject the source record as per business need.
In PowerCenter, we set the update strategy at two different levels:
1. Within a session
2. Within a Mapping

1. Update Strategy within a session:


When we configure a session, we can instruct the IS to either treat all rows in the
same way or use instructions coded into the session mapping to flag rows for
different database operations.
Session Configuration:
Edit Session -> Properties -> Treat Source Rows as: (Insert, Update, Delete,
and Data Driven). Insert is default.
Specifying Operations for Individual Target Tables:

You can set the following update strategy options:


Insert: Select this option to insert a row into a target table.
Delete: Select this option to delete a row from a table.
Update: We have the following options in this situation:
o Update as Update. Update each row flagged for update if it exists in
the target table.
o Update as Insert. Inset each row flagged for update.
o Update else Insert. Update the row if it exists. Otherwise, insert it.
Truncate table: Select this option to truncate the target table before loading
data.

2. Flagging Rows within a Mapping


Within a mapping, we use the Update Strategy transformation to flag rows for insert,
delete, update, or reject.
Operation

Constant

Numeric Value

INSERT

DD_INSERT

UPDATE

DD_UPDATE

DELETE

DD_DELETE

REJECT

DD_REJECT

Update Strategy Expressions:


Frequently, the update strategy expression uses the IIF or DECODE function from
the transformation language to test each row to see if it meets a particular condition.
IIF( ( ENTRY_DATE > APPLY_DATE), DD_REJECT, DD_UPDATE )
Or
IIF( ( ENTRY_DATE > APPLY_DATE), 3, 2 )

The above expression is written in Properties Tab of Update Strategy T/f.


DD means DATA DRIVEN

Forwarding Rejected Rows:


We can configure the Update Strategy transformation to either pass rejected rows to
the next transformation or drop them.

See Properties Tab for the Option.

Steps:
1. Create Update Strategy Transformation
2. Pass all ports needed to it.
3. Set the Expression in Properties Tab.
4. Connect to other transformations or target.

3.18 DYNAMIC CACHE WORKING


We can use a dynamic cache with a relational lookup or a flat file lookup. For
relational lookups, we might configure the transformation to use a dynamic cache
when the target table is also the lookup table. For flat file lookups, the dynamic
cache represents the data to update in the target table.

When
cache

Dynamic cache is used to INSERT new rows in target or UPDATE


changed rows only.
It is not good to use when data is huge.
Dynamic Cache is UPDATED as Integration Service reads rows from source.
the Integration Service reads a row from the source, it updates the lookup
by performing one of the following actions:
Inserts the row into the cache if row is not in cache.
Updates the row in the cache if row has changed.
Makes no change to the cache if there is no change in row.

To use Dynamic Cache, first Edit Lookup Transformation -> Properties Tab ->
Select Dynamic Cache Option
Also Select Insert Else Update or Update Else Insert Option

We can update a target table only when it has a Primary Key.


If there is no Primary Key, then we need to Use Update Override
Option in Properties Tab of Target Table.

Dynamic Cache Properties


NewLookupRow:
The Designer adds this port to a Lookup transformation when we select Dynamic
Cache in Properties Tab. We cannot edit or delete this port. Integration Service (IS)
assigns a value to the port, depending on the action it performs to the lookup cache.
NewLookupRow Value

Description

IS does not update or insert the row in the cache.

IS inserts the row into the cache.

IS updates the row into the cache.

Associated Port:
Associate lookup ports with either an input/output port or a sequence ID. Each
Lookup Port is associated with a source port so that it can compare the changes.
Also, we can generate of Sequence 1, 2, 3 and so on with it. Sequence ID is
available when datatype is Integer or Small Int.

Ignore Null Inputs for Updates


We can set this property for every column. We just need to CHECK the port for
which we want to use this property.
Suppose, in target the COMM of an Employee is 500 but in Source the new COMM is
NULL, and we do not want the NULL to be updated in target. We use the above
property for it.

Ignore In Comparison:
When we do not want to compare any column in source with target, then we can use
this option. Ex: Hiredate will be always same so no need to compare.

In above:
The top most port is NewLookupRow. Its hidden.
All Lookup table ports have been PREV_ before them.
ENAME has been associated with PREV_ENAME and so are others.
PREV_COMM port has been checked for Ignore Null Inputs for updates.
PREV_HIREDATE has been checked for Ignore in Comparison.

Example: Working with Dynamic Cache using Update Strategy.

EMP will be source table.


Create a target table DYNAMIC_LOOKUP. Structure same as EMP. Make
EMPNO as Primary Key.
Create Shortcuts as necessary.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_DYNAMIC_LOOKUP_EXAMPLE
Drag EMP and target Table.
Transformation-> Create -> Select LOOKUP from list. Give name and click
Create.
6. Create a Lookup on DYNAMIC_LOOKUP table as created in connected one.
7. Drag all ports from SQ_EMP to Lookup Transformation.
8. Edit lookup transformation. Edit all ports and add PREV_ before them.
9. Also remove 1 added in names of all ports coming from SQ_EMP.
10. Now Go to Properties tab -> Select Dynamic Cache and Insert Else Update
Option.
11. Now Associate ports and Set Ignore Null Inputs or Ignore In Comparison
Option as shown in Picture above.
12. Transformation -> Create -> Select Filter -> Give name and Click Done.
13. Pass all ports as shown in mapping below to filter and give condition:
NewLookupRow != 0
14. Transformation -> Create -> Select Update Strategy -> Give name and Click
Done.
15. Pass all ports from Filter to Update Strategy and give Update Strategy
Expression: IIF(NewLookupRow
= 1,DD_INSERT,DD_UPDATE)
IIF
16. Link all the needed ports from Update Strategy to target.
17. Mapping -> Validate and Repository Save.

Configuring Sessions with a Dynamic Lookup Cache


Edit Session -> Properties -> Treat Source Rows as: Data Driven as we used
Update Strategy.
We must also define the following update strategy target table options:
Select Insert
Select Update as Update
Do not select Delete

Create Session and Workflow as usual. First time all rows will be inserted.
Now Change the data of target table in Oracle and Run workflow again.
You can see how the data is updated as per the properties selected.

SESSION WILL FAIL FOR THIS. SEE 3.19 LOOKUP QUERY

We pass the data from Lookup Cache and not source to Filter. This is because the
Cache is updated regularly and contains the most updated data.
Example
Source:
EMPNO
9000
9001
9002
9003

of cache:
Name
Amit Kumar
Rahul Singh
Sanjay
Sumit Singh

SAL
9000
9500
8000
7000

DEPTNO
10
20
30
20

Data Already in target:


EMPNO Name
9000
Amit Kumar
9001
Rahul Singh

SAL
8000
9500

DEPTNO
10
20

EMPNO
9000
9001

Name
Amit Kumar
Rahul Singh

SAL
8000
9500

DEPTNO
10
20

EMPNO
9000
9001
9002
9003

Name
Amit Kumar
Rahul Singh
Sanjay
Sumit Singh

SAL
9000
9500
8000
7000

DEPTNO
10
20
30
20

Initial Cache:
NewlookupRow

Updated Cache:
NewlookupRow
2
0
1
1

3.19 LOOKUP QUERY


The workflow for DYNAMIC CACHE will fail. The reason for this is LOOKUP Query.
We can see the default Lookup Query in Properties tab in Lookup Override.
Steps:
1. Edit-> Lookup Transformation-> Properties Tab
2. Lookup SQL Override -> Generate SQL
The default SQL for above example is:
SELECT
Dynamic_Lookup.PREV_ENAME as PREV_ENAME,
Dynamic_Lookup.PREV_JOB as PREV_JOB,
Dynamic_Lookup.PREV_MGR as PREV_MGR,
Dynamic_Lookup.PREV_HIREDATE as PREV_HIREDATE,
Dynamic_Lookup.PREV_SAL as PREV_SAL,
Dynamic_Lookup.PREV_COMM as PREV_COMM,
Dynamic_Lookup.PREV_DEPTNO as PREV_DEPTNO,
Dynamic_Lookup.PREV_EMPNO as PREV_EMPNO
FROM Dynamic_Lookup
This query has been generated because the column names have been prefixed with PREV_ in
ports tab. The Dynamic_Lookup table has no column as PREV_ENAME and so on.
So we need to write correct query here.
SELECT
Dynamic_Lookup.ENAME as PREV_ENAME,
Dynamic_Lookup.JOB as PREV_JOB,
Dynamic_Lookup.MGR as PREV_MGR,
Dynamic_Lookup.HIREDATE as PREV_HIREDATE,
Dynamic_Lookup.SAL as PREV_SAL,
Dynamic_Lookup.COMM as PREV_COMM,
Dynamic_Lookup.DEPTNO as PREV_DEPTNO,
Dynamic_Lookup.EMPNO as PREV_EMPNO
FROM Dynamic_Lookup
See the convention here. ENAME is column name in table in Database but in Lookup
the column name is PREV_ENAME. So we need to write query in such way that
column name in table is matched to port name in Lookup.

Also if in above, we will not write AS, then lookup will not work and an ERROR
TE_7001 is displayed. It is mandatory to write AS after a column in lookup.

So we always need to write query as


SELECT
COLUMN1 as COL_1, COLUMN2 as COL_2, COLUMN3 as COL_3 FROM ABC
Here COL_1, COL_2, COL_3 are LOOKUP PORTS NAME.

DEFAULT LOOKUP QUERY


The default lookup query contains the following statements:
SELECT:
The SELECT statement includes all the lookup ports in the mapping. You can view the
SELECT statement by generating SQL using the Lookup SQL Override property.
ORDER BY:
The ORDER BY clause orders the columns in the same order they appear in the
Lookup transformation. The Integration Service generates the ORDER BY clause.
We cannot view this when you generate the default SQL using the
Lookup SQL Override property.
We can see this after we run workflow and then view Session Log.
To increase performance, we can suppress the default ORDER BY clause and
enter an override ORDER BY with fewer columns.
Place two dashes `--' after the ORDER BY override to suppress the
generated ORDER BY clause.
Example: Select A as A, B as B, C as C from ABC ORDER BY A--

Writing a Lookup Query that involves more than 1 table:


First Create a Lookup and all the ports needed manually or create a lookup on table
having maximum number of columns in query.
Make sure the ports have been named correctly. Say there are 4 columns A,B,C & D.
Query:
SELECT A as A, B as B, C as C, D as D
FROM (select a, b, c, d from ABC, xyz, dsf, jhk where )
In the bracket write query involving any number of tables. Make sure that the query
is working in Database and then use it here. Also make sure the sequence in which
columns are being returned.

3.20 Lookup and Update Strategy Examples


Example1: To insert if record is not present in target
and Update if record has changed.

EMP will be source table.


Create a target table INS_UPD_NO_PK_EXAMPLE. Structure same as EMP.
Create Shortcuts as necessary.

Creating Mapping: (Picture next page)


1.
2.
3.
4.
5.
6.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_UPD_INS_NO_PK_EXAMPLE
Drag EMP and target Table.
Transformation -> Create Expression and pass all ports from SQ_EMP to it.
Transformation-> Create -> Select LOOKUP from list. Give name and click
Create.
7. Create an unconnected lookup on UPD_INS_NO_PK_EXAMPLE. Keep only
EMPNO in lookup and set it as return port. Add IN_EMPNO port and add
condition in Condition Tab.
8. Create an output port OUT_NEW_FLAG and call the Lookup.
IIF(ISNULL(:LKP.LKPTRANS(EMPNO)),0,1)
9. 0 means new record and 1 means old record that are to be updated.
10. Now create a router transformation and pass all ports from EXP to Router.
11. Create two groups in Router. NEW_RECORD and CHANGED_RECORD.
Condition for NEW_RECORD: OUT_NEW_FLAG = 0 and Condition for
CHANGED_RECORD: OUT_NEW_FLAG = 1
12. Transformation -> Create -> Select Update Strategy -> Give name
UPD_INS_RECORD and Click Done.
13. Transformation -> Create -> Select Update Strategy -> Give name
UPD_UPDATE_RECORD and Click Done.
14. Pass records from NEW_RECORD group to INS_NEW_RECORD and in Update
Strategy Expression just write DD_INSERT
15. Pass records from CHANGED_RECORD group to UPD_UPDATE_RECORD and in
Update Strategy Expression just write DD_UPDATE
16. Link all the needed ports from INS_NEW_RECORD to target.
17. Now Select Target table and Click Edit -> Copy.
18. Now Again Click Edit -> Paste. This will create a copy of target definition.
19. Link all the needed ports from UPD_UPDATE_RECORD to target copy.
20. Mapping -> Validate and Repository Save.

Make Session. Do not select Truncate and Delete option for


targets. Also Use NORMAL mode. Make Workflow
Run and See data. Now change the data in target and update
some fields and delete 3-4 rows.
Run workflow again and see the target data.
There is no update.
There is no update as there is no Primary Key.

As there is Primary key, we need to write Update Override.


Steps:
1. Edit target table INS_UPD_NO_PK_EXAMPLE1. This is copy of table definition.
2. Properties Tab -> Update Override -> Generate SQL.
Default SQL is: UPDATE INS_UPD_NO_PK_EXAMPLE SET EMPNO = :TU.EMPNO,
ENAME = :TU.ENAME, JOB = :TU.JOB, MGR = :TU.MGR, HIREDATE =
:TU.HIREDATE, SAL = :TU.SAL, COMM = :TU.COMM, DEPTNO = :TU.DEPTNO
3. There is no Where clause in this SQL. We need to modify it.
UPDATE INS_UPD_NO_PK_EXAMPLE SET ENAME = :TU.ENAME, JOB = :TU.JOB, MGR
= :TU.MGR, HIREDATE = :TU.HIREDATE, SAL = :TU.SAL, COMM = :TU.COMM, DEPTNO
= :TU.DEPTNO WHERE EMPNO = :TU.EMPNO
4.
5.
6.
7.

Paste this modified SQL there and click OK.


Click Apply -> OK.
Mapping -> Validate
Repository Save.

Refresh Session by double clicking on it.


Save all the changes and run workflow again.
Now see data has been updated.

Example2: To Insert if record is not present in target


and Delete it if SAL in Source < Sal in Target.

EMP will be source table.


Create a target table INS_DELETE_EXAMPLE. Structure same as EMP. Make
EMPNO as Primary Key.
We cannot delete a record as there is no Primary Key.
Create Shortcuts as necessary.

Creating Mapping: (Picture on next page)


1.
2.
3.
4.
5.
6.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_INS_DELETE_EXAMPLE
Drag EMP and target Table.
Transformation -> Create Expression and pass all ports from SQ_EMP to it.
Transformation-> Create -> Select LOOKUP from list. Give name and click
Create.
7. Create connected lookup and Keep only EMPNO and SAL in lookup. Add
IN_EMPNO port and add condition in Condition Tab.
8. Pass EMP from SQ_EMP to Lookup. Pass EMPNO and SAL from Lookup to
Expression and rename them as PREV_EMPNO and PREV_SAL. Make them
only as Input ports to expression.
9. Create 2 output ports of in Expression: OUT_NEW_FLAG, OUT_DELETE_FLAG
of Integer Datatype.
10. Expression for OUT_NEW_FLAG: IIF(ISNULL(PREV_EMPNO),1,0) Checks for
new record.
11. Expression for OUT_DELETE_FLAG: IIF( NOT ISNULL(PREV_EMPNO) AND SAL
< PREV_SAL,1,0) Checks for delete condition.
12. Now Create a router transformation and pass all ports from EXP to Router.
13. Create two groups in Router. NEW_RECORD and DELETE_RECORD. Condition
for NEW_RECORD: OUT_NEW_FLAG
OUT_NEW_FLAG = 1 and Condition for CHANGED_RECORD:
OUT_DELETE_FLAG = 1
14. Transformation -> Create -> Select Update Strategy -> Give name
UPD_INS_RECORD and Click Done.
15. Transformation -> Create -> Select Update Strategy -> Give name
UPD_DELETE_RECORD and Click Done.
16. Pass records from NEW_RECORD group to INS_NEW_RECORD except
OUT_NEW_FLAG and OUT_DELETE_FLAG and in Update Strategy Expression
just write DD_INSERT
17. Pass records from CHANGED_RECORD group to UPD_UPDATE_RECORD
except OUT_NEW_FLAG and OUT_DELETE_FLAG and in Update Strategy
Expression just write DD_DELETE
18. Link all the needed ports from INS_NEW_RECORD to target.
19. Now Select Target table and Click Edit -> Copy.
20. Now Again Click Edit -> Paste. This will create a copy of target definition.
21. Link all the needed ports from UPD_DELETE_RECORD to target copy.
22. Mapping -> Validate and Repository Save.

Make Session. Do not select Truncate and Select Delete option


for targets. Also Use NORMAL mode. Make Workflow
Run and See data. Now change the data in target and update
some SAL field in target. Delete 3-4 rows.
Run workflow again and see the target data.

3.21 STORED PROCEDURE T/F

Passive Transformation
Connected and Unconnected Transformation
Stored procedures are stored and run within the database.

A Stored Procedure transformation is an important tool for populating and


maintaining databases. Database administrators create stored procedures to
automate tasks that are too complicated for standard SQL statements.
Use of

Stored Procedure in mapping:


Check the status of a target database before loading data into it.
Determine if enough space exists in a database.
Perform a specialized calculation.
Drop and recreate indexes. Mostly used for this in projects.

Data Passes Between IS and Stored Procedure


One of the most useful features of stored procedures is the ability to send data to
the stored procedure, and receive data from the stored procedure. There are three
types of data that pass between the Integration Service and the stored procedure:
Input/output parameters: Parameters we give as input and the parameters
returned from Stored Procedure.
Return values: Value returned by Stored Procedure if any.
Status codes: Status codes provide error handling for the IS during a workflow. The
stored procedure issues a status code that notifies whether or not the stored
procedure completed successfully. We cannot see this value. The IS uses it to
determine whether to continue running the session or stop.

Specifying when the Stored Procedure Runs


Normal: The stored procedure runs where the transformation exists in the mapping
on a row-by-row basis. We pass some input to procedure and it returns some
calculated values. Connected stored procedures run only in normal mode.
Pre-load of the Source: Before the session retrieves data from the source, the
stored procedure runs. This is useful for verifying the existence of tables or
performing joins of data in a temporary table.
Post-load of the Source: After the session retrieves data from the source, the
stored procedure runs. This is useful for removing temporary tables.
Pre-load of the Target: Before the session sends data to the target, the stored
procedure runs. This is useful for dropping indexes or disabling constraints.
Post-load of the Target: After the session sends data to the target, the stored
procedure runs. This is useful for re-creating indexes on the database.

Using a Stored Procedure in a Mapping


1.
2.
3.
4.
5.

Create the stored procedure in the database.


Import or create the Stored Procedure transformation.
Determine whether to use the transformation as connected or unconnected.
If connected, map the appropriate input and output ports.
If unconnected, either configure the stored procedure to run pre- or postsession, or configure it to run from an expression in another transformation.
6. Configure the session.

Stored Procedures:
Connect to Source database and create the stored procedures given below:
CREATE OR REPLACE procedure sp_agg (in_deptno in number, max_sal out number,
min_sal out number, avg_sal out number, sum_sal out number)
As
Begin
select max(Sal),min(sal),avg(sal),sum(sal) into max_sal,min_sal,avg_sal,sum_sal
from emp where deptno=in_deptno group by deptno;
End;
/

CREATE OR REPLACE procedure sp_unconn_1_value(in_deptno in number, max_sal


out number)
As
Begin
Select max(Sal) into max_sal from EMP where deptno=in_deptno;
End;
/

3.21.1 Connected Stored Procedure T/F


Example: To give input as DEPTNO from DEPT table and find the MAX, MIN,
AVG and SUM of SAL from EMP table.

DEPT will be source table. Create a target table SP_CONN_EXAMPLE with


fields DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL & SUM_SAL.
Write Stored Procedure in Database first and Create shortcuts as needed.

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_SP_CONN_EXAMPLE
Drag DEPT and Target table.
Transformation -> Import Stored Procedure -> Give Database Connection ->
Connect -> Select the procedure sp_agg from the list.

6. Drag DEPTNO from SQ_DEPT to the stored procedure input port and also to
DEPTNO port of target.
7. Connect the ports from procedure to target as shown below:

8. Mapping -> Validate


9. Repository -> Save

Create Session and then workflow.


Give connection information for all tables.
Give connection information for Stored Procedure also.
Run workflow and see the result in table.

3.21.2 Unconnected Stored Procedure T/F


An unconnected Stored Procedure transformation is not directly connected to the
flow of data through the mapping. Instead, the stored procedure runs either:

From an expression: Called from an expression transformation.


Pre- or post-session: Runs before or after a session.

Method of returning the value of output parameters to a port:

Assign the output value to a local variable.


Assign the output value to the system variable PROC_RESULT. (See Later)

Example 1: DEPTNO as input and get MAX of Sal as output.

DEPT will be source table.


Create a target table with fields DEPTNO and MAX_SAL of decimal datatype.
Write Stored Procedure in Database first and Create shortcuts as needed.

Creating Mapping:
1.
2.
3.
4.
5.
6.
7.
8.
9.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_sp_unconn_1_value
Drag DEPT and Target table.
Transformation -> Import Stored Procedure -> Give Database Connection ->
Connect -> Select the procedure sp_unconn_1_value from the list. Click OK.
Stored Procedure has been imported.
T/F -> Create Expression T/F. Pass DEPTNO from SQ_DEPT to Expression T/F.
Edit expression and create an output port OUT_MAX_SAL of decimal datatype.
Open Expression editor and call the stored procedure as below:

Click OK and connect the port from expression to target as in mapping below:

10. Mapping -> Validate


11. Repository Save.

Create Session and then workflow.


Give connection information for all tables.
Give connection information for Stored Procedure also.
Run workflow and see the result in table.

PROC_RESULT use:

If the stored procedure returns a single output parameter or a return value,


we the reserved variable PROC_RESULT as the output variable.
Example: DEPTNO as Input and MAX Sal as output :
:SP.SP_UNCONN_1_VALUE(DEPTNO,PROC_RESULT)

If the stored procedure returns multiple output parameters, you must create
variables for each output parameter.
Example: DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
as output then:
1. Create four variable ports in expression VAR_MAX_SAL,
VAR_MIN_SAL, VAR_AVG_SAL and VAR_SUM_SAL.
2. Create four output ports in expression OUT_MAX_SAL,
OUT_MIN_SAL, OUT_AVG_SAL and OUT_SUM_SAL.
3. Call the procedure in last variable port says VAR_SUM_SAL.
:SP.SP_AGG (DEPTNO, VAR_MAX_SAL,VAR_MIN_SAL, VAR_AVG_SAL,
PROC_RESULT)

Example 2:
DEPTNO as Input and MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL as O/P.
Stored Procedure to drop index in Pre Load of Target
Stored Procedure to create index in Post Load of Target

DEPT will be source table. Create a target table SP_UNCONN_EXAMPLE


with fields DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL & SUM_SAL.
Write Stored Procedure in Database first and Create shortcuts as needed.

Stored procedures are given below to drop and create index on target.
Make sure to create target table first.

Stored Procedures to be created in next example in Target Database:

Create or replace procedure CREATE_INDEX


As
Begin
Execute immediate 'create index unconn_dept on SP_UNCONN_EXAMPLE(DEPTNO)';
End;
/
Create or replace procedure DROP_INDEX
As
Begin
Execute immediate 'drop index unconn_dept';
End;
/

Creating Mapping:
1.
2.
3.
4.
5.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_sp_unconn_1_value
Drag DEPT and Target table.
Transformation -> Import Stored Procedure -> Give Database Connection
-> Connect -> Select the procedure sp_agg from the list. Click OK.
6. Stored Procedure has been imported.
7. T/F -> Create Expression T/F. Pass DEPTNO from SQ_DEPT to Expression
T/F.
8. Edit Expression and create 4 variable ports and 4 output ports as
shown below:

9. Call the procedure in last variable port VAR_SUM_SAL.


10. :SP.SP_AGG (DEPTNO, VAR_MAX_SAL, VAR_MIN_SAL, VAR_AVG_SAL,
PROC_RESULT)
11. Click Apply and Ok.
12. Connect to target table as needed.
13. Transformation -> Import Stored Procedure -> Give Database Connection
for target -> Connect -> Select the procedure CREATE_INDEX and
DROP_INDEX from the list. Click OK.
14. Edit DROP_INDEX -> Properties Tab -> Select Target Pre Load as Stored
Procedure Type and in call text write drop_index. Click Apply -> Ok.
15. Edit CREATE_INDEX -> Properties Tab -> Select Target Post Load as
Stored Procedure Type and in call text write create_index. Click Apply ->
Ok.

16. Mapping -> Validate


17. Repository -> Save

Create Session and then workflow.


Give connection information for all tables.
Give connection information for Stored Procedures also.
Also make sure that you execute the procedure CREATE_INDEX on
database before using them in mapping. This is because, if there is no
INDEX on target table, DROP_INDEX will fail and Session will also fail.
Run workflow and see the result in table.

3.22 SEQUENCE GENERATOR T/F


Passive and Connected Transformation.
The Sequence Generator transformation generates numeric values.
Use the Sequence Generator to create unique primary key values, replace
missing primary keys, or cycle through a sequential range of numbers.
We use it to generate Surrogate Key in DWH environment mostly. When we want to
maintain history, then we need a key other than Primary Key to uniquely identify the
record. So we create a Sequence 1,2,3,4 and so on. We use this sequence as the
key. Example: If EMPNO is the key, we can keep only one record in target and cant
maintain history. So we use Surrogate key as Primary key and not EMPNO.

Sequence Generator Ports


The Sequence Generator transformation provides two output ports: NEXTVAL and
CURRVAL.
We cannot edit or delete these ports.
Likewise, we cannot add ports to the transformation.

NEXTVAL:
Use the NEXTVAL port to generate sequence numbers by connecting it to a
transformation or target.
For example, we might connect NEXTVAL to two target tables in a mapping to
generate unique primary key values.

Sequence in Table 1 will be generated first. When table 1 has been loaded, only then
sequence for table 2 will be generated.

CURRVAL:
CURRVAL is NEXTVAL plus the Increment By value.
We typically only connect the CURRVAL port when the NEXTVAL port is
already connected to a downstream transformation.
If we connect the CURRVAL port without connecting the NEXTVAL port,
the Integration Service passes a constant value for each row.
When we connect the CURRVAL port in a Sequence Generator
transformation, the Integration Service processes one row in each block.
We can optimize performance by connecting only the NEXTVAL port in a
mapping.

Example: To use Sequence Generator transformation

EMP will be source.


Create a target EMP_SEQ_GEN_EXAMPLE in shared folder. Structure same as
EMP. Add two more ports NEXT_VALUE and CURR_VALUE to the target table.
Create shortcuts as needed.

Creating Mapping:
1.
2.
3.
4.
5.
6.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_seq_gen_example
Drag EMP and Target table.
Connect all ports from SQ_EMP to target table.
Transformation -> Create -> Select Sequence Generator for list -> Create ->
Done
7. Connect NEXT_VAL and CURR_VAL from Sequence Generator to target.
8. Validate Mapping
9. Repository -> Save

Create Session and then workflow.


Give connection information for all tables.
Run workflow and see the result in table.

SEE Properties for more Settings on Next Page:

Sequence Generator Properties:


Setting

Required/
Optional

Description

Start Value

Required

Start value of the generated sequence that we want


IS to use if we use Cycle option.
Default is 0.

Increment By

Required

Difference between two consecutive values from the


NEXTVAL port.

End Value

Optional

Maximum value the Integration Service generates.

Current Value

Optional

First value in the sequence.


If cycle option used, the value must be greater than
or equal to the start value and less the end value.

Cycle

Optional

If selected, the Integration Service cycles through the


sequence range. Ex: Start Value:1 End Value 10
Sequence will be from 1-10 and again start from 1.

Reset

Optional

By default, last value of sequence during session is


saved to repository. Next time the sequence is
started from the valued saved.
If selected, the Integration Service generates values
based on the original current value for each session.

Please try to use properties yourself to know more.

POINTS:

If Current value is 1 and end value 10, no cycle option. There are 17 records
in source. In this case session will fail.
If we connect just CURR_VAL only, the value will be same for all records.
If Current value is 1 and end value 10, cycle option there. Start value is 0.
There are 17 records in source. Sequence: 1 2 10. 0 1 2 3
To make above sequence as 1-10 1-20, give Start Value as 1. Start value is
used along with Cycle option only.
If Current value is 1 and end value 10, cycle option there. Start value is 1.
There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in
repository. If we run session again, sequence will start from 8.
Use reset option if you want to start sequence from CURR_VAL every time.

3.23 MAPPLETS

A mapplet is a reusable object that we create in the Mapplet Designer.


It contains a set of transformations and lets us reuse that transformation logic
in multiple mappings.
Created in Mapplet Designer in Designer Tool.

We need to use same set of 5 transformations in say 10 mappings. So instead of


making 5 transformations in every 10 mapping, we create a mapplet of these 5
transformations. Now we use this mapplet in all 10 mappings. Example: To create
a surrogate key in target. We create a mapplet using a stored procedure to create
Primary key for target table. We give target table name and key column name as
input to mapplet and get the Surrogate key as output.
Mapplets help simplify mappings in the following ways:

Include source definitions: Use multiple source definitions and source


qualifiers to provide source data for a mapping.
Accept data from sources in a mapping
Include multiple transformations: As many transformations as we need.
Pass data to multiple transformations: We can create a mapplet to feed
data to multiple transformations. Each Output transformation in a mapplet
represents one output group in a mapplet.
Contain unused ports: We do not have to connect all mapplet input and
output ports in a mapping.

Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input
transformation in the mapplet. We can create multiple pipelines in a mapplet.

We use Mapplet Input transformation to give input to mapplet.


Use of Mapplet Input transformation is optional.

Mapplet Output:
The output of a mapplet is not connected to any target table.
We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one
connected port in the mapplet.

Example1: We will join EMP and DEPT table. Then calculate total salary. Give
the output to mapplet out transformation.

EMP and DEPT will be source tables.


Output will be given to transformation Mapplet_Out.

Steps:
1.
2.
3.
4.
5.
6.
7.

Open folder where we want to create the mapping.


Click Tools -> Mapplet Designer.
Click Mapplets-> Create-> Give name. Ex: mplt_example1
Drag EMP and DEPT table.
Use Joiner transformation as described earlier to join them.
Transformation -> Create -> Select Expression for list -> Create -> Done
Pass all ports from joiner to expression and then calculate total salary as
described in expression transformation.
8. Now Transformation -> Create -> Select Mapplet Out from list -> Create
-> Give name and then done.
9. Pass all ports from expression to Mapplet output.
10. Mapplet -> Validate
11. Repository -> Save

Use of mapplet in mapping:

We can mapplet in mapping by just dragging the mapplet from mapplet folder
on left pane as we drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the
ports from the Input and Output transformations. These are referred to as the
mapplet input and mapplet output ports.
Make sure to give correct connection information in session.

Making a mapping: We will use mplt_example1, and then create a filter


transformation to filter records whose Total Salary is >= 1500.

mplt_example1 will be source.


Create target table same as Mapplet_out transformation as in picture above.

Creating Mapping
1.
2.
3.
4.
5.
6.
7.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example1
Drag mplt_Example1 and target table.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations
after filter if needed.
8. Validate mapping and Save it.

Make session and workflow.


Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.

Example2: We will join EMP and DEPT table. The ports of DEPT table will be
passed to mapplet in mapping. We will use MAPPLET_INPUT to pass ports of
DEPT to joiner. Then calculate total salary. Give the output to mapplet out
transformation.

EMP will be source table in mapplet designer.


DEPT ports will be created in MAPPLET INPUT and passed to joiner.
Output will be given to transformation Mapplet_Out.

Steps:
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapplet Designer.
3. Click Mapplets-> Create-> Give name. Ex: mplt_example1
4. Drag EMP table.
5. Transformation -> Create -> Select Mapplet Input for list->Create -> Done
6. Edit Mapplet Input.
7. Go to ports tab and add 3 ports DEPTNO, DNAME and LOC.
8. Use Joiner transformation as described earlier to join them.
9. Transformation -> Create -> Select Expression for list -> Create -> Done
10. Pass all ports from joiner to expression and then calculate total salary as
described in expression transformation.
11. Now Transformation -> Create -> Select Mapplet Out from list -> Create
-> Give name and then done.
12. Pass all ports from expression to Mapplet output.
13. Mapplet -> Validate
14. Repository -> Save

Making a mapping: We will use mplt_example2, and then create a filter


transformation to filter records whose Total Salary is >= 1500.

mplt_example2 will be source.


Create target table same as Mapplet_out transformation as in picture above.

Creating Mapping
1.
2.
3.
4.
5.
6.
7.
8.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example2
Drag DEPT, mplt_Example2 and target table.
Pass all ports from DEPT to mplt_Example2 for input ports.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations
after filter if needed.
9. Validate mapping and Save it.

Make session and workflow.


Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.

3.24 NORMALIZER TRANSFORMATION

Active and Connected Transformation.


The Normalizer transformation normalizes records from COBOL and relational
sources, allowing us to organize the data.
Use a Normalizer transformation instead of the Source Qualifier
transformation when we normalize a COBOL source.
We can also use the Normalizer transformation with relational sources to
create multiple rows from a single row of data.

Example 1: To create 4 records of every employee in EMP table.

EMP will be source table.


Create target table Normalizer_Multiple_Records. Structure same as EMP and
datatype of HIREDATE as VARCHAR2.
Create shortcuts as necessary.

Creating Mapping
1.
2.
3.
4.
5.
6.
7.
8.
9.

Open folder where we want to create the mapping.


Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_ Normalizer_Multiple_Records
Drag EMP and Target table.
Transformation->Create->Select Expression-> Give name, Click create, done.
Pass all ports from SQ_EMP to Expression transformation.
Transformation-> Create-> Select Normalizer-> Give name, create & done.
Try dragging ports from Expression to Normalizer. Not Possible.
Edit Normalizer and Normalizer Tab. Add columns. Columns equal to columns
in EMP table and datatype also same.
10. Normalizer doesnt have DATETIME datatype. So convert HIREDATE to char in
expression t/f. Create output port out_hdate and do the conversion.
11. Connect ports from Expression to Normalizer.
12. Edit Normalizer and Normalizer Tab. As EMPNO identifies source records
and we want 4 records of every employee, give OCCUR for EMPNO as 4.

13. Click Apply and then OK.


14. Add link as shown in mapping below:

15. Mapping -> Validate


16. Repository -> Save

Make session and workflow.


Give connection information for source and target table.
Run workflow and see result.

Example 2: To break rows into columns


Source: Roll_Number
100
101
102

Name
Amit
Rahul
Jessie

ENG
78
67
56

Target: Roll_Number
100
100
100
101
101
101
102
102
102

Name
Amit
Amit
Amit
Rahul
Rahul
Rahul
Jessie
Jessie
Jessie

Marks
78
67
90
67
87
78
56
89
97

HINDI
67
87
89

MATHS
90
78
97

Make source as a flat file. Import it and create target table.


Create Mapping as before. In Normalizer tab, create only 3 ports
Roll_Number, Name and Marks as there are 3 columns in target table.
Also as we have 3 marks in source, give Occurs as 3 for Marks in
Normalizer tab.
Connect accordingly and connect to target.
Validate and Save
Make Session and workflow and Run it. Give Source File Directory and Source
File name for source flat file in source properties in mapping tab of session.
See the result.

3.25 XML Sources Import and usage

This is to import XML sources and use it in our mapping.


In case of XML sources, XML Qualifier is used by default.
Active and Connected Transformation.

Steps:
1. Open Shared Folder -> Tools -> Source Analyzer
2. Sources -> Import XML Definition.
3. Browse for location where XML file is present. To import the definition, we
should have XML file in our local system on which we are working.
4. Select the file and click open.
5. Option for Override Infinite Length is not set. Do you want to set it is
displayed.
6. Click Yes.
7. Check Override all infinite lengths with value and give value as 2.
8. Do not modify other options and Click Ok.
9. Click NEXT and then click FINISH
10. Definition has been imported and can be used in mapping as we select other
sources.

SESSION PROPERTIES

Open the session for mapping where we used XML sources.


In mapping tab, select the XML source.
In properties, we do not give relational connection here.
We give Source File Directory and Source Filename information.

3.26 MAPPING WIZARDS


The Designer provides two mapping wizards to help us create mappings quickly and
easily. Both wizards are designed to create mappings for loading and maintaining
star schemas, a series of dimensions related to a central fact table.
Note: We do not use them in projects and instead make the mappings manually.
Two wizards are:
1. Getting Started Wizard
2. Slowly Changing Dimensions Wizard
Use the following sources with a mapping wizard:
Flat file
Relational
Application
Shortcut to a flat file, relational, or Application sources

3.26.1 Getting Started Wizard


It creates mappings to load static fact and dimension tables and slowly growing
dimension tables.
The Getting Started Wizard can create two types of mappings:
Simple Pass Through
Slowly Growing Target

1. SIMPLE PASS THROUGH

Loads a static fact or dimension table by inserting all rows.


Use this mapping when we want to drop all existing data from the table
before loading new data.
Use the truncate target table option in the session properties, or use a presession shell command to drop or truncate the target before each session run.

Steps:
1. Open the folder where we want to create the mapping.
2. In the Mapping Designer, click Mappings > Wizards > Getting Started.
3. Enter a mapping name and select Simple Pass Through, and click next.
4. Select a source definition to use in the mapping.
5. Enter a name for the mapping target table and click Finish.
6. To save the mapping, click Repository > Save.

2. SLOWLY GROWING TARGET

Loads a slowly growing fact or dimension table by inserting new rows.


Use this mapping to load new data when existing data does not require
updates.
The Slowly Growing Target mapping filters source rows based on user-defined
comparisons, and then inserts only those found to be new to the target.

Handling Keys: When we use the Slowly Growing Target option, the Designer
creates an additional column in target, PM_PRIMARYKEY. In this column, the
Integration Service generates a primary key for each row written to the target,
incrementing new key values by 1.
Steps:
1. Open the folder where we want to create the mapping.
2. In the Mapping Designer, click Mappings > Wizards > Getting Started.
3. Enter a mapping name and select Slowly Growing Target, and click next.
4. Select a source definition to be used in the mapping.
5. Enter a name for the mapping target table. Click Next.
6. Select the column or columns from the Target Table Fields list that we want
the Integration Service to use to look up data in the target table. Click Add.
These columns are used to compare source and target.

We select EMPNO as it is key column in the source to compare with target.

7. Click Finish.
8. To save the mapping, click Repository > Save.
Note: The Fields to Compare for Changes field is disabled for the Slowly
Growing Targets mapping.

Slowly Growing target example

3.26.2 Slowly Changing Dimension Wizard


It creates mappings to load slowly changing dimension tables based on the amount
of historical dimension data we want to keep and the method we choose to handle
historical dimension data.
The SCD wizard can create following types of mappings:
SCD Type 1 Dimension mapping
SCD Type 2 Dimension/Version Data mapping
SCD Type 2 Dimension/Flag Current mapping
SCD Type 2 Dimension/Effective Date Range mapping
SCD Type 3 Dimension mapping

1. SCD TYPE 1 DIMENSION MAPPING

If row exists in source and not in target, then the row is inserted in target.
If row exists in source and target but there is some change, the row in target
table is updated.
Use this mapping when we do not want a history of previous dimension data.

Handling Keys: When we use the SCD Type1 option, the Designer creates an
additional column in target, PM_PRIMARYKEY. Value incremented by +1.
Steps:
1. Open the folder where we want to create the mapping.
2. In the Mapping Designer, click Mappings > Wizards > Slowly Changing
Dimension.
3. Enter a mapping name and select Type 1 Dimension, and click Next.
4. Select a source definition to be used by the mapping.
5. Enter a name for the mapping target table. Click Next.
6. Select the column or columns we want to use as a lookup condition from the
Target Table Fields list and click add.
7. Select the column or columns we want the Integration Service to compare for
changes, and click add.

8. Click Finish.
9. To save the mapping, click Repository > Save.
Configuring Session: In the session properties, click the Target Properties settings
on the Mappings tab. To ensure the Integration Service loads rows to the target
properly, select Insert and Update as Update for each relational target.
Flow1: New record is inserted into target table.
Flow2: Changed record is updated into target table.

Note: In the Type 1 Dimension mapping, the Designer uses two instances of the
same target definition to enable inserting and updating data in the same target
table. Generate only one target table in the target database.

2. SCD TYPE 2 DIMENSION/VERSION DATA MAPPING

The Type 2 Dimension/Version Data mapping filters source rows based on


user-defined comparisons and inserts both new and changed records into the
target.
Changes are tracked in the target table by versioning the primary key and
creating a version number for each record in the table.
In the Type 2 Dimension/Version Data target, the latest record has the
highest version number and the highest incremented primary key.

When we use this option, the Designer creates two additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_VERSION_NUMBER: The IS generates a version number for each row
written to the target.
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 2 Dimension in
Step 3.
2. Click Next. Select Keep the `Version' Number in Separate Column.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Note: Designer uses two instances of the same target definition to enable the two
separate data flows to write to the same target table. Generate only one target table
in the target database.
Configuring Session: In the session properties, click the Target Properties settings
on the Mappings tab. To ensure the Integration Service loads rows to the target
properly, select Insert for each relational target.
Flow1: New record is inserted into target table.
Flow2: Changed record is inserted into target table.

3. SCD TYPE 2 DIMENSION/FLAG CURRENT MAPPING

The Type 2 Dimension/Flag Current mapping filters source rows based on


user-defined comparisons and inserts both new and changed records into the
target.
In the Type 2 Dimension/Flag Current target, the latest record has a current
flag set to 1 and the highest incremented primary key.

When we use this option, the Designer creates two additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_CURRENT_FLAG: The Integration Service flags the current row "1" and
all previous versions "0".
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 2 Dimension in
Step 3.
2. Click Next. Select Mark the `Current' Dimension Record with a Flag.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Note: In the Type 2 Dimension/Flag Current mapping, the Designer uses three
instances of the same target definition to enable the three separate data flows to
write to the same target table. Generate only one target table in the target database.
Configuring Session: In the session properties, click the Target Properties settings
on the Mappings tab. To ensure the Integration Service loads rows to the target
properly, select Insert and Update as Update for each relational target.
Flow1: New record is inserted into target table.
Flow2: Changed record is inserted into target table.
Flow2: Current Flag of changed record is updated in target table.

4. SCD TYPE 2 DIMENSION/EFFECTIVE DATE RANGE

The Type 2 Dimension/Effective Date Range mapping filters source rows


based on user-defined comparisons and inserts both new and changed
records into the target.
Changes are tracked in the target table by maintaining an effective date
range for each version of each record in the target.

When we use this option, the Designer creates 3 additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_BEGIN_DATE: For each new and changed record, it is populated with
SYSDATE. This Sysdate is the date on which ETL process runs.
3. PM_END_DATE: It is populated as NULL when record is inserted. A new
record is inserted when a record changes. However, PM_END_DATE of
changed record is updated with SYSDATE.
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 2 Dimension in
Step 3.
2. Click Next. Select Mark the Dimension Records with their Effective Date
Range.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Configuring Session: It is same as we did in SCD Type Flag Current.
Flow1: New record is inserted into target table with PM_BEGIN_DATE as SYSDATE.
Flow2: Changed record is inserted into target with PM_BEGIN_DATE as SYSDATE.
Flow2: END_DATE of changed record is updated in target table.

5. SCD TYPE 3 DIMENSION MAPPING

Inserts new records.


Updates changed values in existing records. When updating an existing
dimension, the Integration Service saves existing data in different columns of
the same row and replaces the existing data with the updates.
Optionally uses the load date to track changes.
It maintains partial history. Only one previous value is changed.

When we use this option, the Designer creates two additional fields in the target:
1. PM_PRIMARYKEY: The Integration Service generates a primary key for
each row written to the target.
2. PM_PREV_ColumnName: The Designer generates a previous column
corresponding to each column for which we want historical data. The IS keeps
the previous version of record data in these columns.
3. PM_EFFECT_DATE: An optional field. The IS uses the system date to
indicate when it creates or updates a dimension.
Steps:
1. Follow Steps 1-7 as we did in SCD Type1, except Select Type 3 Dimension in
Step 3.
2. Click Next. Select Effective Date if desired.
3. Click Finish.
4. To save the mapping, click Repository > Save.
Configuring Session: It is same as we did in SCD Type Flag Current.
Flow1: New record is inserted into target table.
Flow2: Changed record is updated in the target table.

3.27 MAPPING PARAMETERS & VARIABLES


Mapping parameters and variables represent values in mappings and mapplets.
When we use a mapping parameter or variable in a mapping, first we declare the
mapping parameter or variable for use in each mapplet or mapping. Then, we define
a value for the mapping parameter or variable before we run the session.

MAPPING PARAMETERS

A mapping parameter represents a constant value that we can define before


running a session.
A mapping parameter retains the same value throughout the entire session.

Example: When we want to extract records of a particular month during ETL process,
we will create a Mapping Parameter of data type and use it in query to compare it
with the timestamp field in SQL override.

After we create a parameter, it appears in the Expression Editor.


We can then use the parameter in any expression in the mapplet or mapping.
We can also use parameters in a source qualifier filter, user-defined join, or
extract override, and in the Expression Editor of reusable transformations.

MAPPING VARIABLES

Unlike mapping parameters, mapping variables are values that can change
between sessions.
The Integration Service saves the latest value of a mapping variable to the
repository at the end of each successful session.
We can override a saved value with the parameter file.
We can also clear all saved values for the session in the Workflow Manager.

We might use a mapping variable to perform an incremental read of the source. For
example, we have a source table containing timestamped transactions and we want
to evaluate the transactions on a daily basis. Instead of manually entering a session
override to filter source data each time we run the session, we can create a mapping
variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows
whose transaction date equals $$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment one
day each time the session runs. If we set the initial value of $$IncludeDateTime to
8/1/2004, the first time the Integration Service runs the session, it reads only rows
dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime
to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next
time it runs the session, it reads only rows from August 2, 2004.

Used in following transformations:

Expression
Filter
Router
Update Strategy

Initial and Default Value:


When we declare a mapping parameter or variable in a mapping or a mapplet, we
can enter an initial value.
When the Integration Service needs an initial value, and we did not declare an initial
value for the parameter or variable, the Integration Service uses a default value
based on the datatype of the parameter or variable.
Data
Numeric
String
Datetime

Variable Values:

Default Value
0
Empty String
1/1/1

Start value and current value of a mapping variable

Start Value:
The start value is the value of the variable at the start of the session. The
Integration Service looks for the start value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value

Current Value:
The current value is the value of the variable as the session progresses. When a
session starts, the current value of a variable is the same as the start value. The
final current value for a variable is saved to the repository at the end of a successful
session. When a session fails to complete, the Integration Service does not update
the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a mapping
variable, the start value of the variable is saved to the repository.

Variable Datatype and Aggregation Type


When we declare a mapping variable in a mapping, we need to configure the
datatype and aggregation type for the variable. The IS uses the aggregate type of a
mapping variable to determine the final current value of the mapping variable.
Aggregation types are:
Count: Integer and small integer datatypes are valid only.
Max: All transformation datatypes except binary datatype are valid.
Min: All transformation datatypes except binary datatype are valid.

Variable Functions
Variable functions determine how the Integration Service calculates the current value
of a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. It
ignores rows marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It
ignores rows marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the
variable value when a row is marked for insertion, and subtracts one when the row is
marked for deletion. It ignores rows marked for update or reject. Aggregation type
set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it
compares the final current value of the variable to the start value of the variable.
Based on the aggregate type of the variable, it saves a final value to the repository.

Creating Mapping Parameters and Variables


1. Open the folder where we want to create parameter or variable.
2. In the Mapping Designer, click Mappings > Parameters and Variables. -orIn the Mapplet Designer, click Mapplet > Parameters and Variables.
3. Click the add button.

4. Enter name. Do not remove $$ from name.


5. Select Type and Datatype. Select Aggregation type for mapping variables.
6. Give Initial Value. Click ok.

Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table.


Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME,
DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.
TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that
changes every month)
SET_VAR: We will be added one month to the HIREDATE of every employee.
Create shortcuts as necessary.

Creating Mapping
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example
4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list -> Create -> Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.
7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial value 1500.
9. Create variable $$var_min of MIN aggregation type and initial value 1500.
10. Create variable $$var_count of COUNT aggregation type and initial value 0.
COUNT is visible when datatype is INT or SMALLINT.
11. Create variable $$var_set of MAX aggregation type.

12.
13. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,
out_COUNT_VAR and out_SET_VAR.
14. Open expression editor for TOTAL_SAL. Do the same as we did earlier for
SAL+ COMM. To add $$BONUS to it, select variable tab and select the
parameter from mapping parameter. SAL + COMM + $$Bonus
15. Open Expression editor for out_max_var.
16. Select the variable function SETMAXVARIABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below.
SETMAXVARIABLE($$var_max,SAL)

17. Open Expression editor for out_min_var and write the following expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:

21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.

Make session and workflow.


Give connection information for source and target table.
Run workflow and see result.

3.28 PARAMETER FILE

A parameter file is a list of parameters and associated values for a workflow,


worklet, or session.
Parameter files provide flexibility to change these variables each time we run
a workflow or session.
We can create multiple parameter files and change the file we use for a
session or workflow. We can create a parameter file using a text editor such
as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session
properties.

A parameter file contains the following types of parameters and variables:

Workflow variable: References values and records information in a


workflow.
Worklet variable: References values and records information in a worklet.
Use predefined worklet variables in a parent workflow, but we cannot use
workflow variables from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to
session, such as a database connection or file name.
Mapping parameter and Mapping variable

USING A PARAMETER FILE


Parameter files contain several sections preceded by a heading. The heading
identifies the Integration Service, Integration Service process, workflow, worklet, or
session to which we want to assign parameters or variables.

Sample Parameter File for Our example:


In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0

CONFIGURING PARAMTER FILE


We can specify the parameter file name and directory in the workflow or session
properties.
To enter a parameter file in the workflow properties:
1.
2.
3.
4.
5.

Open a Workflow in the Workflow Manager.


Click Workflows > Edit.
Click the Properties tab.
Enter the parameter directory and name in the Parameter Filename field.
Click OK.

To enter a parameter file in the session properties:


1.
2.
3.
4.
5.

Open a session in the Workflow Manager.


Click the Properties tab and open the General Options settings.
Enter the parameter directory and name in the Parameter Filename field.
Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
Click OK.

3.29 INDIRECT LOADING FOR FLAT FILES


Suppose, you have 10 flat files of same structure. All the flat files have same number
of columns and datatype. Now we need to transfer all the 10 files to same target.
Names of files are say EMP1, EMP2 and so on.
Solution1:
1. Import one flat file definition and make the mapping as per need.
2. Now in session give the Source File name and Source File Directory location of
one file.
3. Make workflow and run.
4. Now open session after workflow completes. Change the Filename and
Directory to give information of second file. Run workflow again.
5. Do the above for all 10 files.
Solution2:
1. Import one flat file definition and make the mapping as per need.
2. Now in session give the Source Directory location of the files.
3. Now in Fieldname use $InputFileName. This is a session parameter. See
4.2.4 for session parameters.
4. Now make a parameter file and give the value of $InputFileName.
$InputFileName=EMP1.txt
5. Run the workflow
6. Now edit parameter file and give value of second file. Run workflow again.
7. Do same for remaining files.
Solution3:
1. Import one flat file definition and make the mapping as per need.
2. Now make a notepad file that contains the location and name of each 10 flat
files.
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory
location fields, give the name and location of above created file.
4. In Source filetype field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.

Sample Mapping to be made

Chapter 4

Workflow
Manager

4.1 INTEGRATION SERVICE ARCHITECTURE

The Integration Service moves data from sources to targets based on


workflow and mapping metadata stored in a repository.
When a workflow starts, the Integration Service retrieves mapping, workflow,
and session metadata from the repository. It extracts data from the mapping
sources and stores the data in memory while it applies the transformation
rules configured in the mapping.
The Integration Service loads the transformed data into one or more targets.

To move data from sources to targets, the Integration Service uses the following
components:
Integration Service process
Load Balancer
Data Transformation Manager (DTM) process

4.1.1 INTEGRATION SERVICE PROCESS


The Integration Service starts an Integration Service process to run and monitor
workflows. The Integration Service process accepts requests from the PowerCenter
Client and from pmcmd. It performs the following tasks:

Manages workflow scheduling.


Locks and reads the workflow.
Reads the parameter file.
Creates the workflow log.
Runs workflow tasks and evaluates the conditional links connecting tasks.
Starts the DTM process or processes to run the session.
Writes historical run information to the repository.
Sends post-session email in the event of a DTM failure.

4.1.2 LOAD BALANCER


The Load Balancer is a component of the Integration Service that dispatches tasks to
achieve optimal performance and scalability. When we run a workflow, the Load
Balancer dispatches the Session, Command, and predefined Event-Wait tasks within
the workflow.
The Load Balancer dispatches tasks in the order it receives them. When the Load
Balancer needs to dispatch more Session and Command tasks than the Integration
Service can run, it places the tasks it cannot run in a queue. When nodes become
available, the Load Balancer dispatches tasks from the queue in the order
determined by the workflow service level.

4.1.3 DTM PROCESS


When the workflow reaches a session, the Integration Service process starts the DTM
process. The DTM is the process associated with the session task. The DTM process
performs the following tasks:

Retrieves and validates session information from the repository.


Performs pushdown optimization when the session is configured for pushdown
optimization.
Adds partitions to the session when the session is configured for dynamic
partitioning.
Expands the service process variables, session parameters, and mapping
variables and parameters.
Creates the session log.
Validates source and target code pages.
Verifies connection object permissions.
Runs pre-session shell commands, stored procedures, and SQL.
Sends a request to start worker DTM processes on other nodes when the
session is configured to run on a grid.
Creates and run mapping, reader, writer, and transformation threads to
extract, transform, and load data.
Runs post-session stored procedures, SQL, and shell commands.
Sends post-session email.

4.1.4 PROCESSING THREADS


The DTM allocates process memory for the session and divides it into buffers. This is
also known as buffer memory. The default memory allocation is 12,000,000 bytes.
The DTM uses multiple threads to process data in a session. The main DTM thread is
called the master thread.
The master thread can create the following types of threads:
Mapping Threads: One mapping thread for each session.
Pre- and Post-Session Threads: One thread created.
Reader Threads: One thread for each partition
Transformation Threads: One thread for each partition
Writer Threads: One thread for each partition

4.1.5 CODE PAGES and DATA MOVEMENT


A code page contains the encoding to specify characters in a set of one or more
languages. An encoding is the assignment of a number to a character in the
character set.
The Integration Service can move data in either ASCII or Unicode data movement
mode. These modes determine how the Integration Service handles character data.
We choose the data movement mode in the Integration Service configuration
settings. If we want to move multibyte data, choose Unicode data movement mode.
ASCII Data Movement Mode: In ASCII mode, the Integration Service recognizes
7-bit ASCII and EBCDIC characters and stores each character in a single byte.
Unicode Data Movement Mode: Use Unicode data movement mode when sources
or targets use 8-bit or multibyte character sets and contain character data.

4.1.6 OUTPUT FILES and CACHES


The Integration Service creates the following output files:
Workflow log
Session log
Session details file
Performance details file
Reject files
Row error logs
Recovery tables and files
Control file
Post-session email
Output file
Cache files
Session Details: When we run a session, the Workflow Manager creates session
details that provide load statistics for each target in the mapping. We can monitor
session details during the session or after the session completes. Session details
include information such as table name, number of rows written or rejected, and
read and write throughput.
Control File: When we run a session that uses an external loader, the Integration
Service process creates a control file and a target flat file. The control file contains
information about the target flat file such as data format and loading instructions for
the external loader. The control file has an extension of .ctl. We can view the control
file and the target flat file in the target file directory
Output File: If the session writes to a target file, the Integration Service process
creates the target file based on a file target definition.
Cache Files: When the Integration Service process creates memory cache, it also
creates cache files. The Integration Service process creates cache files Joiner, Rank,
Lookup, Aggregator, Sorter transformations and XML target.

4.2 WORKING WITH WORKFLOWS


A workflow is a set of instructions that tells the Integration Service how to run tasks
such as sessions, email notifications, and shell commands.

4.2.1 ASSIGNING AN INTEGRATION SERVICE


Before we can run a workflow, we must assign an Integration Service to run it.
Steps to assign IS from Workflow Properties:
1.
2.
3.
4.
5.

In the Workflow Designer, open the Workflow.


Click Workflows > Edit.
On the General tab, click the Browse Integration Services button.
Select the Integration Service that you want to run the workflow.
Click OK twice to select the Integration Service for the workflow.

Steps to assign IS from Menu:


1. Close all folders in the repository.
2. Click Service > Assign Integration Service.
3. From the Choose Integration Service list, select the service we want to
assign.
4. From the Show Folder list, select the folder we want to view.
5. Click the Selected check box for each workflow you want the Integration
Service to run.
6. Click Assign.

4.2.2 WORKING WITH LINKS

Use links to connect each workflow task.


We can specify conditions with links to create branches in the workflow.
The Workflow Manager does not allow us to use links to create loops in the
workflow. Each link in the workflow can run only once.

Valid Workflow:

Example of loop:

Specifying Link Conditions:

Once we create links between tasks, we can specify conditions for each link to
determine the order of execution in the workflow.
If we do not specify conditions for each link, the Integration Service runs the
next task in the workflow by default.
Use predefined or user-defined workflow variables in the link condition.

Steps:
1. In the Workflow Designer workspace, double-click the link you want to
specify.
2. The Expression Editor appears.
3. In the Expression Editor, enter the link condition. The Expression Editor
provides predefined workflow variables, user-defined workflow variables,
variable functions, and Boolean and arithmetic operators.
4. Validate the expression using the Validate button.

Using the Expression Editor:


The Workflow Manager provides an Expression Editor for any expressions in the
workflow. We can enter expressions using the Expression Editor for the following:
Link conditions
Decision task
Assignment task

4.2.3 WORKFLOW VARIABLES


We can create and use variables in a workflow to reference values and record
information.
Types of workflow variables:
Predefined workflow variables
User-defined workflow variables

Predefined workflow variables


The Workflow Manager provides predefined workflow variables for tasks within a
workflow. Types of Predefined workflow variables are:

System variables:
Use the SYSDATE and WORKFLOWSTARTTIME system variables within a workflow.

Task-specific variables:
The Workflow Manager provides a set of task-specific variables for each task in the
workflow. The Workflow Manager lists task-specific variables under the task name in
the Expression Editor.
Task-specific
variable
Condition
EndTime
ErrorCode
ErrorMsg
FirstErrorCode
FirstErrorMsg
PrevTaskStatus

SrcFailedRows
SrcSuccessRows
StartTime
Status

TgtFailedRows
TgtSuccessRows
TotalTransErrors

Description
Result of decision condition expression. NULL if
task fails.
Date and time when a task ended.
Last error code for the associated task. 0 if there
is no error.
Last error message for the associated task. Empty
String if there is no error.
Error code for the first error message in the
session. 0 if there is no error.
First error message in the session. Empty String if
there is no error.
Status of the previous task in the workflow that IS
ran. Can be ABORTED, FAILED, STOPPED,
SUCCEEDED.
Total number of rows the Integration Service failed
to read from the source.
Total number of rows successfully read from the
sources.
Date and time when task started.
Status of the previous task in the workflow. Can
be ABORTED, DISABLED, FAILED, NOTSTARTED,
STARTED, STOPPED, SUCCEEDED.
Total number of rows the Integration Service failed
to write to the target.
Total number of rows successfully written to the
target
Total number of transformation errors.

Task Type
Decision
Task
All Tasks
All Tasks
All Tasks
Session
Session
All Tasks

Session
Session
All Tasks
All Tasks

Session
Session
Session

User-Defined Workflow Variables


We can create variables within a workflow. When we create a variable in a workflow,
it is valid only in that workflow. Use the variable in tasks within that workflow. We
can edit and delete user-defined workflow variables.
Integration Service holds two different values for a workflow variable during a
workflow run:
Start value of a workflow variable
Current value of a workflow variable
The Integration Service looks for the start value of a variable in the following order:
1. Value in parameter file
2. Value saved in the repository (if the variable is persistent)
3. User-specified default value
4. Datatype default value
Persistent means value is saved to the repository.
To create a workflow variable:
1.
2.
3.
4.
5.

In the Workflow Designer, create a new workflow or edit an existing one.


Select the Variables tab.
Click Add and enter a name for the variable.
In the Datatype field, select the datatype for the new variable.
Enable the Persistent option if we want the value of the variable retained from
one execution of the workflow to the next.

6. Enter the default value for the variable in the Default field.
7. To validate the default value of the new workflow variable, click the Validate
button.
8. Click Apply to save the new workflow variable.
9. Click OK to close the workflow properties.

4.2.4 SESSION PARAMETERS

Session parameters represent values we might want to change between


sessions, such as a database connection or source file.
Use session parameters in the session properties, and then define the
parameters in a parameter file.
The Workflow Manager provides one built-in session parameter,
$PMSessionLogFile. Used to change the name of log file.

Example: Suppose we want to read data from 10 different databases containing


same table and then transfer to the same database table.
Solution1: Open Session and give connection for each database 10 times and then
run the workflow.
Solution2: Create a Session parameter for source database and give its value in
parameter file.
Session Parameter Type
Database Connection
Source File
Target File
Lookup File
Reject File

Naming Convention
$DBConnectionName
$InputFileName
$OutputFileName
$LookupFileName
$BadFileName

Source file, target file, lookup file, reject file parameters are used for Flat
Files.

Steps to configure a database connection parameter:


1. Open any session and edit it.
2. In the session properties, click the Mapping tab and click Connections settings
for the sources or targets node.
3. Click the Open button in the Value field.
4. In the Relational Connection Browser, select Use Connection Variable.
5. Enter a name for the database connection parameter. Name the connection
parameter $DBConnectionName.
6. In the General Options settings of the Properties tab, enter a parameter file
and directory in the Parameter Filename field. Click OK.

Steps for Using a Source File Parameter


1.
2.
3.
4.

Select a source under the Sources node on the Mapping tab.


Go to the Properties settings.
In the Source Filename field, enter the source file parameter name.
Name all source file parameters $InputFileName. If you want the parameter
to represent both the source file name and location, clear the Source
Directory field.
5. In the General Options settings of the Properties tab, enter a parameter file
and directory in the Parameter Filename field. Click OK.
Similarly give the parameter for target and reject file in Target properties.
For Lookup file parameter, select Lookup file in Transformations node and
give the parameter there for Lookup file name.

4.3 WORKING WITH TASKS


The Workflow Manager contains many types of tasks to help you build workflows and
worklets. We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Session
Email
Command
Event-Raise
Event-Wait
Timer
Decision
Assignment
Control

Tool where task can be created


Task Developer
Workflow Designer
Worklet Designer
Workflow Designer
Worklet Designer

Reusable or not
Yes
Yes
Yes
No
No
No
No
No
No

4.3.1 SESSION TASK

A session is a set of instructions that tells the PowerCenter Server how and
when to move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the
Session tasks sequentially or concurrently, depending on our needs.
The PowerCenter Server creates several files and in-memory caches
depending on the transformations and options used in the session.

4.3.2 EMAIL TASK

Steps:
1.
2.
3.
4.
5.
6.
7.
8.
9.

The Workflow Manager provides an Email task that allows us to send email
during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.

In the Task Developer or Workflow Designer, choose Tasks-Create.


Select an Email task and enter a name for the task. Click Create.
Click Done.
Double-click the Email task in the workspace. The Edit Tasks dialog box
appears.
Click the Properties tab.
Enter the fully qualified email address of the mail recipient in the Email User
Name field.
Enter the subject of the email in the Email Subject field. Or, you can leave
this field blank.
Click the Open button in the Email Text field to open the Email Editor.
Click OK twice to save your changes.

Example: To send an email when a session completes:


Steps:
1.
2.
3.
4.
5.
6.
7.
8.

Create a workflow wf_sample_email


Drag any session task to workspace.
Edit Session task and go to Components tab.
See On Success Email Option there and configure it.
In Type select reusable or Non-reusable.
In Value, select the email task to be used.
Click Apply -> Ok.
Validate workflow and Repository -> Save

We can also drag the email task and use as per need.
We can set the option to send email on success or failure in
components tab of a session task.

4.3.3 COMMAND TASK


The Command task allows us to specify one or more shell commands in UNIX or
DOS commands in Windows to run during the workflow.
For example, we can specify shell commands in the Command task to delete reject
files, copy a file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as
the pre- or post-session shell command for a Session task. This is done in
COMPONENTS TAB of a session. We can run it in Pre-Session Command or
Post Session Success Command or Post Session Failure Command.
Select the Value and Type option as we did in Email task.

Example: to copy a file sample.txt from D drive to E.


Command:

COPY D:\sample.txt E:\ in windows

Steps for creating command task:


1. In the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select Command Task for the task type.
3. Enter a name for the Command task. Click Create. Then click done.
4. Double-click the Command task. Go to commands tab.
5. In the Commands tab, click the Add button to add a command.
6. In the Name field, enter a name for the new command.
7. In the Command field, click the Edit button to open the Command Editor.
8. Enter only one command in the Command Editor.
9. Click OK to close the Command Editor.
10. Repeat steps 5-9 to add more commands in the task.
11. Click OK.

Steps to create the workflow using command task:


1. Create a task using the above steps to copy a file in Task Developer.

2.
3.
4.
5.

Open Workflow Designer. Workflow -> Create -> Give name and click ok.
Start is displayed. Drag session say s_m_Filter_example and command task.
Link Start to Session task and Session to Command Task.
Double click link between Session and Command and give condition in editor
as
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository -> Save

4.3.4 WORKING WITH EVENT TASKS


We can define events in the workflow to specify the sequence of task execution.
Types of Events:
Pre-defined event: A pre-defined event is a file-watch event. This event
waits for a specified file to arrive at a given location.
User-defined event: A user-defined event is a sequence of tasks in the
workflow. We create events and then raise them as per need.
Steps
1.
2.
3.
4.

for creating User Defined Event:


Open any workflow where we want to create an event.
Click Workflow-> Edit -> Events tab.
Click to Add button to add events and give the names as per need.
Click Apply -> Ok. Validate the workflow and Save it.

Types of Events Tasks:


EVENT RAISE: Event-Raise task represents a user-defined event. We use
this task to raise a user defined event.
EVENT WAIT: Event-Wait task waits for a file watcher event or user defined
event to occur before executing the next session in the workflow.
Example1: Use an event wait task and make sure that session s_filter_example
runs when abc.txt file is present in D:\FILES folder.
Steps for creating workflow:
1.
2.
3.
4.
5.
6.

Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
Task -> Create -> Select Event Wait. Give name. Click create and done.
Link Start to Event Wait task.
Drag s_filter_example to workspace and link it to event wait task.
Right click on event wait task and click EDIT -> EVENTS tab.
Select Pre Defined option there. In the blank space, give directory and
filename to watch. Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.

Example 2: Raise a user defined event when session s_m_filter_example succeeds.


Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1.
2.
3.
4.

Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
Workflow -> Edit -> Events Tab and add events EVENT1 there.
Drag s_m_filter_example and link it to START task.
Click Tasks -> Create -> Select EVENT RAISE from list. Give name
ER_Example. Click Create and then done.
5. Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User
Defined Event and Select EVENT1 from the list displayed. Apply -> OK.
7. Click link between ER_Example and s_m_filter_example and give the
condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT.
Click Create and then done.
9. Link EW_WAIT to START task.
10. Right click EW_WAIT -> EDIT-> EVENTS tab.
11. Select User Defined there. Select the Event1 by clicking Browse Events
button.
12. Apply -> OK.
13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14. Mapping -> Validate
15. Repository -> Save.
16. Run workflow and see.

4.3.5 TIMER TASK


The Timer task allows us to specify the period of time to wait before the PowerCenter
Server runs the next task in the workflow.
The Timer task has two types of settings:
Absolute time: We specify the exact date and time or we can choose a userdefined workflow variable to specify the exact time. The next task in workflow
will run as per the date and time specified.
Relative time: We instruct the PowerCenter Server to wait for a specified
period of time after the Timer task, the parent workflow, or the top-level
workflow starts.
Example: Run session s_m_filter_example relative to 1 min after the timer task.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example.
Click Create and then done.
3. Link TIMER_Example to START task.
4. Right click TIMER_Example-> EDIT -> TIMER tab.
5. Select Relative Time Option and Give 1 min and Select From start time of this
task Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it to TIMER_Example.
8. Workflow-> Validate and Repository -> Save.

4.3.6 DECISION TASK

The Decision task allows us to enter a condition that determines the execution
of the workflow, similar to a link condition.
The Decision task has a pre-defined variable called
$Decision_task_name.condition that represents the result of the decision
condition.
The PowerCenter Server evaluates the condition in the Decision task and sets
the pre-defined condition variable to True (1) or False (0).
We can specify one decision condition per Decision task.

Example: Command Task should run only if either s_m_filter_example or


S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok.
2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and
link both of them to START task.
3. Click Tasks -> Create -> Select DECISION from list. Give name
DECISION_Example. Click Create and then done. Link DECISION_Example to
both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE.
4. Right click DECISION_Example-> EDIT -> GENERAL tab.
5. Set Treat Input Links As to OR. Default is AND. Apply and click OK.
6. Now edit decision task again and go to PROPERTIES Tab. Open the Expression
editor by clicking the VALUE section of Decision Name attribute and enter the
following condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR
$S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED
7. Validate the condition -> Click Apply -> OK.
8. Drag command task and S_m_sample_mapping_EMP task to workspace and
link them to DECISION_Example task.
9. Double click link between S_m_sample_mapping_EMP & DECISION_Example
& give the condition: $DECISION_Example.Condition = 0. Validate & click OK.
10. Double click link between Command task and DECISION_Example and give
the condition: $DECISION_Example.Condition = 1. Validate and click OK.
11. Workflow Validate and repository Save.
12. Run workflow and see the result.

4.3.7 CONTROL TASK

We can use the Control task to stop, abort, or fail the top-level workflow or
the parent workflow based on an input link condition.
A parent workflow or worklet is the workflow or worklet that contains the
Control task.
We give the condition to the link connected to Control Task.

Control Option

Description

Fail Me

Fails the control task.

Fail Parent

Marks the status of the WF or worklet that contains the


Control task as failed.

Stop Parent

Stops the WF or worklet that contains the Control task.

Abort Parent

Aborts the WF or worklet that contains the Control task.

Fail Top-Level WF

Fails the workflow that is running.

Stop Top-Level WF

Stops the workflow that is running.

Abort Top-Level WF

Aborts the workflow that is running.

Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and link all of them to START task.
3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
Click Create and then done.
4. Link all sessions to the control task cntr_task.
5. Double click link between cntr_task and any session say s_m_filter_example
and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
6. Repeat above step for remaining 2 sessions also.
7. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to
OR. Default is AND.
8. Go to PROPERTIES tab of cntr_task and select the value Fail top level
Workflow for Control Option. Click Apply and OK.
9. Workflow Validate and repository Save.
10. Run workflow and see the result.

4.3.8 ASSIGNMENT TASK

The Assignment task allows us to assign a value to a user-defined workflow


variable.
See Workflow variable topic to add user defined variables.
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to
assign values or expressions to user-defined variables.
We cannot assign values to pre-defined workflow

Steps to create Assignment Task:


1. Open any workflow where we want to use Assignment task.
2. Edit Workflow and add user defined variables.
3. Choose Tasks-Create. Select Assignment Task for the task type.
4. Enter a name for the Assignment task. Click Create. Then click Done.
5. Double-click the Assignment task to open the Edit Task dialog box.
6. On the Expressions tab, click Add to add an assignment.
7. Click the Open button in the User Defined Variables field.
8. Select the variable for which you want to assign a value. Click OK.
9. Click the Edit button in the Expression field to open the Expression Editor.
10. Enter the value or expression you want to assign.
11. Repeat steps 7-10 to add more variable assignments as necessary.
12. Click OK.

We can use the User Defined Variable in our link conditions as per the need
and also calculate or set the value of variable in Assignment Task.

4.4 SCHEDULERS
We can schedule a workflow to run continuously, repeat at a given time or interval,
or we can manually start a workflow. The Integration Service runs a scheduled
workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings
by editing the scheduler. If we change schedule settings, the Integration Service
reschedules the workflow according to the new settings.

A scheduler is a repository object that contains a set of schedule settings.


Scheduler can be non-reusable or reusable.
The Workflow Manager marks a workflow invalid if we delete the scheduler
associated with the workflow.
If we choose a different Integration Service for the workflow or restart the
Integration Service, it reschedules all workflows.
If we delete a folder, the Integration Service removes workflows from the
schedule.

The Integration Service does not run the workflow if:


The prior workflow run fails.
We remove the workflow from the schedule
The Integration Service is running in safe mode

Creating a Reusable Scheduler

For each folder, the Workflow Manager lets us create reusable schedulers so
we can reuse the same set of scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of
scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the deleted
scheduler becomes invalid. To make the workflows valid, we must edit them
and replace the missing scheduler.

Steps:
1.
2.
3.
4.
5.
6.

Open the folder where we want to create the scheduler.


In the Workflow Designer, click Workflows > Schedulers.
Click Add to add a new scheduler.
In the General tab, enter a name for the scheduler.
Configure the scheduler settings in the Scheduler tab.
Click Apply and OK.

Configuring Scheduler Settings


Configure the Schedule tab of the scheduler to set run options, schedule options,
start options, and end options for the schedule.
There
1.
2.
3.

are 3 run options:


Run on Demand
Run Continuously
Run on Server initialization

1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The
Integration Service then starts the next run of the workflow as soon as it finishes
the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The
Integration Service then starts the next run of the workflow according to settings
in Schedule Options.
Schedule options for Run on Server initialization:

Run Once: To run the workflow just once.


Run every: Run the workflow at regular intervals, as configured.
Customized Repeat: Integration Service runs the workflow on the dates
and times specified in the Repeat dialog box.

Start options for Run on Server initialization:

Start Date
Start Time

End options for Run on Server initialization:

End on: IS stops scheduling the workflow in the selected date.


End After: IS stops scheduling the workflow after the set number of
workflow runs.
Forever: IS schedules the workflow as long as the workflow does not fail.

Creating a Non-Reusable Scheduler


1. In the Workflow Designer, open the workflow.
2. Click Workflows > Edit.
3. In the Scheduler tab, choose Non-reusable. Select Reusable if we want to
select an existing reusable scheduler for the workflow.
Note: If we do not have a reusable scheduler in the folder, we must
create one before we choose Reusable.
4. Click the right side of the Scheduler field to edit scheduling settings for the
non- reusable scheduler

5. If we select Reusable, choose a reusable scheduler from the Scheduler


Browser dialog box.
6. Click Ok.

Some Points:
To remove a workflow from its schedule, right-click the workflow in the
Navigator window and choose Unschedule Workflow.
To reschedule a workflow on its original schedule, right-click the workflow in
the Navigator window and choose Schedule Workflow.

4.5 WORKLETS

A worklet is an object that represents a set of tasks that we create in the


Worklet Designer.
Create a worklet when we want to reuse a set of workflow logic in more than
one workflow.
To run a worklet, include the worklet in a workflow.
Worklet is created in the same way as we create Workflows. Tasks are also
added in the same way as we do in workflows. We can link tasks and give link
conditions in same way.

Worklets can be:

Reusable Worklet: Crested in Worklet Designer.


1. In the Worklet Designer, click Worklet > Create.
2. Enter a name for the worklet.
3. Click OK.
4. Add tasks as needed. Give links and conditions.
5. Worklet -> Validate
6. Repository -> Save
Non-Reusable Worklet: Crested in Workflow Designer.
1. In the Workflow Designer, open a workflow.
2. Click Tasks > Create.
3. For the Task type, select Worklet.
4. Enter a name for the task.
5. Click Create.
6. Click Done.
To
1.
2.
3.

add tasks to a non-reusable worklet:


Create a non-reusable worklet in the Workflow Designer workspace.
Right-click the worklet and choose Open Worklet.
Add tasks in the worklet by using the Tasks toolbar or click Tasks >
Create in the Worklet Designer.
4. Connect tasks with links.

Some Points:
We cannot run two instances of the same worklet concurrently in the same
workflow.
We cannot run two instances of the same worklet concurrently across two
different workflows.
Each worklet instance in the workflow can run once.

4.6 PARTITIONING

A pipeline consists of a source qualifier and all the transformations and


targets that receive data from that source qualifier.
When the Integration Service runs the session, it can achieve higher
performance by partitioning the pipeline and performing the extract,
transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or


writer thread. The number of partitions in any pipeline stage equals the number of
threads in the stage. By default, the Integration Service creates one partition
in every pipeline stage.

4.6.1 PARTITIONING ATTRIBUTES


1. Partition points
By default, IS sets partition points at various transformations in the pipeline.
Partition points mark thread boundaries and divide the pipeline into stages.
A stage is a section of a pipeline between any two partition points.

2. Number of Partitions
We can define up to 64 partitions at any partition point in a pipeline.
When we increase or decrease the number of partitions at any partition point,
the Workflow Manager increases or decreases the number of partitions at all
partition points in the pipeline.
Increasing the number of partitions or partition points increases the number
of threads.
The number of partitions we create equals the number of connections to the
source or target. For one partition, one database connection will be used.

3. Partition types
The Integration Service creates a default partition type at each partition
point.
If we have the Partitioning option, we can change the partition type. This
option is purchased separately.
The partition type controls how the Integration Service distributes data
among partitions at partition points.

4.6.2 PARTITIONING TYPES


1. Round Robin Partition Type
In round-robin partitioning, the Integration Service distributes rows of data
evenly to all partitions.
Each partition processes approximately the same number of rows.
Use round-robin partitioning when we need to distribute rows evenly and do
not need to group data among partitions.
2. Pass-Through Partition Type
In pass-through partitioning, the Integration Service processes data without
redistributing rows among partitions.
All rows in a single partition stay in that partition after crossing a passthrough partition point.
Use pass-through partitioning when we want to increase data throughput, but
we do not want to increase the number of partitions.
3. Database Partitioning Partition Type
Use database partitioning for Oracle and IBM DB2 sources and IBM DB2
targets only.
Use any number of pipeline partitions and any number of database partitions.
We can improve performance when the number of pipeline partitions equals
the number of database partitions.
Database Partitioning with One Source
When we use database partitioning with a source qualifier with one source, the
Integration Service generates SQL queries for each database partition and
distributes the data from the database partitions among the session partitions
equally.
For example, when a session has three partitions and the database has five
partitions, 1st and 2nd session partitions will receive data from 2 database
partitions each. Thus four DB partitions used. 3rd Session partition will receive
data from the remaining 1 DB partition.
Partitioning a Source Qualifier with Multiple Sources Tables
The Integration Service creates SQL queries for database partitions based on the
number of partitions in the database table with the most partitions.
If the session has three partitions and the database table has two partitions, one
of the session partitions receives no data.
4. Hash Auto-Keys Partition Type
The Integration Service uses all grouped or sorted ports as a compound
partition key.
Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and
unsorted Aggregator transformations to ensure that rows are grouped
properly before they enter these transformations.
5. Hash User-Keys Partition Type
The Integration Service uses a hash function to group rows of data among
partitions.
We define the number of ports to generate the partition key.
We choose the ports that define the partition key

6. Key range Partition Type


We specify one or more ports to form a compound partition key.
The Integration Service passes data to each partition depending on the
ranges we specify for each port.
Use key range partitioning where the sources or targets in the pipeline are
partitioned by key range.
Example: Customer 1-100 in one partition, 101-200 in another and so on. We
define the range for each partition.

4.6.3 Some Points


1. Only one partition is created for following transformations:
Custom Transformation
External Procedure Transformation
XML Target Instance
Joiner Transformation
2. If we partition a session with a flat file target the Informatica server creates one
target file for each partition. We can configure session properties to merge these
target files into one.

4.7 SESSION PROPERTIES


1. GENERAL TAB
By default, the General tab appears when we edit a session task.
General Tab has following options:
Rename: Optional and can be used to rename a session.
Description: Optional and provides a description for session.
Mapping name: Required and represents name of the mapping associated
with the session task.
Fail Parent if this task fails: Optional and Fails the parent worklet or
workflow if this task fails.
Fail parent if this task does not run: Optional and Fails the parent worklet
or workflow if this task does not run.
Disable this task: Optional and Disables the task. Treat the input links as
AND or OR: Required and Runs the task when all or one of the input link
conditions evaluate to True.
Last 4 options appear only in the Workflow Designer.

2. PROPERTIES TAB
Property
Write Backward Compatible
Session Log File
Session Log File Name
Session Log File Directory
Parameter File Name
Enable Test Load
Number of Rows to Test
$Source Connection Value

Required/
Optional
Optional
Optional
Required
Optional
Optional
Optional
Optional

$Target Connection Value

Optional

Treat Source Rows As

Required

Commit Type

Required

Commit Interval

Required

Recovery Strategy

Required

Description
Select to write session log to a file.

Location where session log is created.


Name and location of parameter file.
To test a mapping.
Number of rows of source data to test.
Enter the database connection we want
to use for $Source variable.
Enter the database connection we want
to use for $Target variable.
Indicates how the IS treats all source
rows. Can be Insert, Update, Delete or
Data Driven
Determines whether the Integration
Service uses a source- or target-based
or user-defined commit.
Indicates the number of rows after
which commit is fired.
See on Next Page

We can configure performance settings on the Properties tab. In


Performance settings, we can increase memory size, collect performance
details, and set configuration parameters.

RECOVERY STRATEGY
Workflow recovery allows us to continue processing the workflow and workflow tasks
from the point of interruption. We can recover a workflow if the Integration Service
can access the workflow state of operation.
The Integration Service recovers tasks in the workflow based on the recovery
strategy of the task.

By default, the recovery strategy for Session and Command tasks is to fail the
task and continue running the workflow.
We can configure the recovery strategy for Session and Command tasks.
The strategy for all other tasks is to restart the task.

Recovery Strategy Options for session task:


Resume from the last checkpoint.
Restart task.
Fail task and continue workflow.
Recovery Strategy Options for Command task:
Restart task.
Fail task and continue workflow.
Target Recovery Tables
When the Integration Service runs a session that has a resume recovery strategy, it
writes to recovery tables on the target database system. The following recovery
tables are used:
PM_RECOVERY
PM_TGT_RUN_ID
Recovery Options
Suspend Workflow on Error: Available in Workflow
Suspension Email: Available in Workflow
Enable HA Recovery: Available in Workflow
Automatically Recover Terminated Tasks: Available in Workflow
Maximum Automatic Recovery Attempts: Available in Workflow
Recovery Strategy: Available in Session and Command
Fail Task If Any Command Fails: Available in Command

3. CONFIG OBJECT TAB


We can configure the following settings in the Config Object tab:
Advanced. Advanced settings allow you to configure constraint-based
loading, lookup caches, and buffer sizes.
Log Options. Log options allow you to configure how you want to save the
session log.
Error Handling. Error Handling settings allow us to determine if the session
Stops or continues when it encounters pre-session command errors, stored
procedure errors, or a specified number of session errors.
Partitioning Options. Partitioning options allow the Integration Service to
determine the number of partitions to create at run time.
Constraint-based loading:
Enable this when in same mapping, there are two target tables and both are related
to each other in PK and FK relation. One is master table and other is child table.
When enabled, Informatica will write first to master table and then to child table
maintaining referential integrity.
Dynamic Partitioning:
Can configure dynamic partitioning using one of the following methods:
Disabled
Based on number of partitions
Based on number of nodes in grid
Based on source partitioning

4. MAPPING TAB (Transformations View)


We can configure the following:
Connections
Sources
Targets
Transformations

5. MAPPING TAB (Partitioning View)


We can configure the partitioning options here.

6. COMPONENTS TAB
In the

Components tab, we can configure the following:


Pre-Session Command
Post-Session Success Command
Post-Session Failure Command
On Success Email
On Failure Email

4.8 WORKFLOW PROPERTIES


1. GENERAL TAB
Property
Name
Comments
Integration Service

Required/
Optional
Required
Optional
Required

Suspension Email

Optional

Disabled

Optional

Suspend on Error

Optional

Description
Name of workflow
Comment that describes the workflow.
Integration Service that runs the
workflow by default.
Email message that the Integration
Service sends when a task fails and the
Integration
Service
suspends
the
workflow.
Disables the workflow from the
schedule.
The Integration Service suspends the
workflow when a task in the workflow
fails.

2. PROPERTIES TAB
Properties tab has the following options:
Parameter File Name
Write Backward Compatible Workflow Log File: Select to write workflow
log to a file. It is Optional.
Workflow Log File Name
Workflow Log File Directory
Save Workflow Log By: Required and Options are By Run and By
Timestamp
Save Workflow Log For These Runs: Required. How many logs needs to
be saved for a workflow.
Enable HA Recovery: Not required.
Automatically recover terminated tasks: Not required.
Maximum automatic recovery attempts: Not required.

3. SCHEDULER TAB
The Scheduler Tab lets us schedule a workflow to run continuously, run at a
given interval, or manually start a workflow.

4. VARIABLE TAB
It is used to declare User defined workflow variables.

5. EVENTS TAB
Before using the Event-Raise task, declare a user-defined event on the Events
tab.