Informatica Aggregator and Expression Transformations Guide

240349441-Informatica-Question-Answer-Set.
pdf
1. Aggregator Transformation
1. What is an Aggregator Transformation?
Answer:
An aggregator is an Active, Connected transformation which performs aggregate
calculations like AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV,
SUM and VARIANCE.
2. How an Expression Transformation differs from Aggregator Transformation?
Answer:
An Expression Transformation performs calculation on a row-by-row basis, whereas an
Aggregator Trans-formation performs calculations on groups.
3. Does an Aggregator Transformation support only aggregate expressions?
Answer:
Apart from aggregate expressions, aggregator transformation supports non-aggregate
expressions and con-ditional clauses.
4. Give one example for each of Conditional Aggregation, Non-Aggregate expression
and Nested Aggregation.
Answer:
 Use conditional clauses in the aggregate expression to reduce the number of rows
used in the ag-gregation. The conditional clause can be any clause that evaluates
to TRUE or FALSE.
 SUM (SALARY, JOB = ‘CLERK’)
 Use non-aggregate expressions in group by ports to modify or replace groups.
 IIF (PRODUCT = ‘Brown Bread’, ‘Bread’, PRODUCT)
 Nested aggregation expression can include one aggregate function within another
aggregate func-tion.
 MAX (COUNT (PRODUCT))
5. How does Aggregator Transformation handle NULL values?
Answer:
14
By default, the aggregator transformation treats null values as NULL in aggregate
functions. But we can specify to treat null values in aggregate functions as NULL
or zero.
6. What are the performance considerations when working with Aggregator Transfor-
mation?
Answer:
 Filter the unnecessary data before aggregating it. Place a Filter transformation
in the mapping be-fore the aggregator transformation to reduce unnecessary
aggregation.
 Improve performance by connecting only the necessary input/output ports to
subsequent transfor-mations, thereby reducing the size of the data cache.
 Use Sorted input which reduces the amount of data cached and improves session
performance.
Aggregator performance improves dramatically if records are sorted before passing
to the aggregator and “Sorted Input” option under aggregator properties is checked.
The record set should be sorted on those col-umns that are used in Group By
operation.
It is often a good idea to sort the record set in database level (click here to see
why?) e.g. inside a source qualifier transformation, unless there is a chance that
already sorted records from source qualifier can again become unsorted before
reaching aggregator.
7. What are the uses of index and data cache?
Answer:
The group data is stored in index files whereas Row data stored in data files.
8. What differs when we choose Sorted Input for Aggregator Transformation?
Answer:
Integration Service creates the index and data caches files in memory to process
the Aggregator transfor-mation. If the Integration Service requires more space as
allocated for the index and data cache sizes in the transformation properties, it
stores overflow values in cache files i.e. paging to disk.
One way to increase session performance is to increase the index and data cache
sizes in the transformation properties.
But when we check Sorted Input the Integration Service uses memory to process an
Aggregator transfor-mation it does not use cache files.
15
9. Under what conditions selecting Sorted Input in aggregator will still not boost
session per-formance?
Answer:
 Incremental Aggregation, session option is enabled.
 The aggregate expression contains nested aggregate functions.
 When session property, Treat Source rows as is set to data driven.
10.Under what condition selecting Sorted Input in aggregator may fail the session?
Answer:
 If the input data is not sorted correctly, the session will fail.
 Also if the input data is properly sorted, the session may fail if the sort order
by ports and the group by ports of the aggregator are not in the same order.
11.Suppose we do not group by on any ports of the aggregator what will be the
output.
Answer:
If we do not use an input port in group-by neither in aggregate expression, the
Integration Ser-vice will return only the last row value of the column for the
input rows.
For example, if we have 100 rows coming from source then aggregator will output
only the last record (100th record)
12.What is the expected value if the column in an aggregator transformation is
neither a group by nor an aggregate expression?
Answer:
Integration Service produces one row for each group based on the group by ports.
The columns which are neither part of the key nor aggregate expression will return
the corresponding value of last record of the group received.
However, if we specify particularly the FIRST function, the Integration Service
then returns the value of the specified first row of the group. So default is the
LAST function.
13.What is Incremental Aggregation?
16
Answer:
We can enable the session option, Incremental Aggregation for a session that
includes an Aggregator Trans-formation. When the Integration Service performs
incremental aggregation, it actually passes changed source data through the mapping
and uses the historical cache data to perform aggregate calculations in-
crementally.
14.Sorted input for aggregator transformation will improve performance of mapping.
How-ever, if sorted input is used for nested aggregate expression or incremental
aggregation, then the mapping may result in session failure. Explain why?
Answer:
In case of a nested aggregation, there are multiple levels of sorting associated as
each aggregation function will require one sorting pass, and after the first level
of aggregation, the sort order of the group by column may get jumbled up, so before
the second level of aggregation, Informatica must internally sort it again.
However, if we already indicate that input is sorted, Informatica will not do this
sorting - resulting into fail-ure. In incremental aggregation, the aggregate
calculations are stored in historical cache on the server. In this his-torical
cache the data may not be in sorted order. If we give sorted input, the records
come as presorted for that particular run but in the historical cache the data may
not be in the sorted order.
15.How can we delete duplicate record using Informatica Aggregator?
Answer:
One way to handle duplicate records in source batch run is to use an Aggregator
Transformation and using the Group By checkbox on the ports having duplicate
occurring data. Here you can have the flexibility to se-lect the last or the first
of the duplicate column value records.
16.Scenario Implementation 1
Suppose in our Source Table we have data as given below:
Student Name
Subject Name
Marks
Sam
Maths
100
Tom
Maths
80
Sam
Physical Science
80
John
Maths
75
Sam
Life Science
70
John
Life Science
100
John
Physical Science
85
Tom
Life Science
100
Tom
Physical Science
85
17
We want to load our Target Table as:
Student Name
Maths
Life Science
Physical Science
Sam
100
70
80
John
75
100
85
Tom
80
100
85
Describe your approach.
Answer:
Here our scenario is to convert many rows to one row, and the transformation which
will help us to achieve this is Aggregator.
Our Mapping will look like this:
We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT
ascending.
Now based on STUDENT_NAME in GROUP BY clause the following output subject columns
are populated as
 MATHS: MAX( MARKS, SUBJECT = ’Maths’ )
 LIFE_SC: MAX( MARKS, SUBJECT = ’Life Science’ )
 PHY_SC: MAX( MARKS, SUBJECT = ’Physical Science’ )
18
Source:
100
XYZ
AAA
100
XYZ
BBB
100
XYZ
CCC
The expected output data: 100 XYZ AAA BBB CCC Which transformations are used for
this?
Answer:
Use an Aggregator transformation with variable.
19
2. Expression Transformation
1. What is an Expression Transform?
Answer:
Expression is a Passive connected transformation used to calculate values in a
single row before you write to the target. We can use the Expression transformation
to perform any non-aggregate calculations. We can al-so use the Expression
transformation to test conditional statements before you output the results to
target tables or other transformations.
For example, we might need to adjust employee salaries, concatenate first and last
names, or convert strings to numbers.
2. How many types of ports are there in Expression transform?
Answer:
There are three types of ports- INPUT, OUTPUT, and VARIABLE
3. What is the execution order of the ports in an expression?
Answer:
 All ports are executed TOP TO BOTTOM in a serial physical ordering fashion, but
they are done in the following groups:
 All input ports are pushed values first.
 Then all variables are executed (top to bottom physical ordering in the
expression).
 Last - all output expressions are executed to push values to output ports
You can utilize this to your advantage, by placing lookups in to variables, then
using the variables "later" in the execution cycle.
4. Describe the approach for the requirement. Suppose the input is:
Col1
Col2
10
a
20
b
30
c
40
20
50
d
The desired output is:
Col1
Col2
10
a
20
a,b
30
a,b,c
40
a,b,c
50
a,b,c,d
Answer: Use an Expression transformation:-
Port Name Port Type Expression
Col1
I/O
Col2
I
V_Seq
V
CUME(1)
V_Col2
V
IIF (V_Seq = 1, Col2, IIF ( ISNULL (Col2), Prev_Col2, Prev_Col2 || ',' || Col2))
Prev_Col2
V
V_Col2
Out_Col2
O
Prev_Col2
Keep in mind the string length of the variable and output ports.
CUME function is used to calculate the cumulative amount based on the argument of
the cumulative func-tion. This means, if we call CUME with argument 1, e.g.
CUME(1); then on the first call it will re-turn 1; on the second call, it will
return 2; on the third call, it will return 3 and so on. Since Informatica process
data row by row, this means that when the first row is processed CUME(1) will
return 1; for the next row, it will return 2 and so on.
5. How can we implement aggregation operation without using an Aggregator Transfor-
mation in Informatica?
Answer:
We will use the very basic concept of the Expression Transformation, that at a time
we can ac-cess the previous row data as well as the currently processed data in an
expression transfor-mation. What we need is simple Sorter, Expression and Filter
transformation to achieve aggre-gation at Informatica level.
For detailed understanding visit Aggregation without Aggregator.
6. Scenario Implementation 1
Source
Col1
Col2
A
W
B
R
C
E
A
R
21
B
E
Target
Col1
READ
WRITE
EXECUTE
A
1
1
0
B
1
0
1
C
0
0
1
In this scenario Source values in Col2 W, R, E means read write and execute.
Answer:
Take an Expression transformation followed by Aggregator transformation. In
Expression Transformation:
Port Name
Port Type
Expression
Col1
I/O
Col2
I/O
Read
O
IIF ( Col2 = 'R', 1, 0 )
Write
O
IIF ( Col2 = 'W', 1, 0 )
Execute
O
IIF ( Col2 = 'E', 1, 0 )
In Aggregator Transform:
Col 1
I/O
GROUP BY
Read
I/O
MAX (Read)
Write
I/O
MAX (Write)
Execute
I/O
MAX (Execute)
Source data is like below:
Id
name1
name2
10
A
B
10
C
D
20
E
F
Desired Target data is like below
Id
name
10
AB
10
CD
20
EF
22
Answer:
Use Expression Transformation to concatenate both values as- name = name1 || name2
Suppose we have a field in source file named as DATA. We need to mark those records
having 9 characters such that the first 2 characters must be alphabets i.e.(A-Z)
and the rest 7 characters must be alphanumeric i.e.(A-Z) or (0-9) for the DATA
field as output. And the records which don’t match the condition should be marked
as “Invalid”. How do we implement this? E.g.
DATA
OUTPUT
AB345GH6756
AB345GH67
CD56789PJ
CD56789PJ
56CHJK97889
Invalid
DG//*67DF
Invalid
Answer:
Use the below logic in an output port of an Expression Transformation in
Informatica:- IIF( REG_MATCH( SUBSTR(DATA,1,2), '[[:alpha:]]{2}' ) = 1
ANDREG_MATCH( SUBSTR(DATA,3,7), '[[:alnum:]]{7}' ) = 1, SUBSTR(DATA, 1, 9),
'Invalid' )
How do we convert a Date field coming as data type string from a flat file?
Answer:
Use Date Conversion Functions:- IIF( IS_DATE( Column1 ) = 1, TO_DATE( Column1 ,
'YYYYMMDD' ), NULL )
In the above example, we have assumed the format of the date field is ‘YYYYMMDD’.
If the format is some-thing else (e.g. YYYY-MM-DD), we need to specify the same
Source:
Col1
Col2
1
B
2
C
23
3
D
4
E
Target
Col1
Col2
Col3
Col4
1
B
2
C
3
D
4
E
Describe the approach to the above scenario where the source 1st record loaded to
target col1,col2 then 2nd record loaded to col3,col4 again 3rd record to col1,col2
and so on.
Answer:
Use an Expression transformation:
Port Name
Port Type
Expression
Col1
I
Col2
I
V_ID
V
1 – MOD (Col1, 2)
O_ID
O
V_ID
O_Col1
O
V_Col1
O_Col2
O
V_Col2
O_Col3
O
Col1
O_Col4
O
Col2
V_Col1
V
Col1
V_Col2
V
Col2
Next use a Filter transformation with condition O_ID = 1
Next map O_Col1, O_Col2, O_Col3, O_Col4 to Col1, Col2, Col3, Col4 of the target
respectively.
24
3. Filter Transformation
1. What is a Filter Transformation and why it is an Active one?
Answer:
A Filter transformation is an Active and Connected transformation that can filter
rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation
to the next transfor-mation in the pipeline. TRUE and FALSE are the implicit return
values from any filter condition we set. If the filter condition evaluates to NULL,
the row is assumed to be FALSE. The numeric equivalent of FALSE is zero (0) and any
non-zero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of
rows passed through it. A filter condition returns TRUE or FALSE for each row that
passes through the transformation, de-pending on whether a row meets the specified
condition. Only rows that return TRUE pass through this transformation. Discarded
rows do not appear in the session log or reject files.
2. What is the difference between Source Qualifier transformations Source filter
option and filter transformation?
Answer:
SQ Source Filter
Filter Transformation Source Qualifier transformation filters rows when read from a
source. Filter transformation filters rows from within a mapping
Source Qualifier transformation can only filter rows from relational sources.
Filter transformation filters rows coming from any type of source system in the
map-ping level. Source Qualifier limits the row set extracted from a source. Filter
transformation limits the row set sent to a target.
Source Qualifier reduces the number of rows used throughout the mapping and hence
it provides better performance.
To maximize session performance, in-clude the Filter transformation as close to the
sources in the mapping as possible to filter out unwanted data early in the flow of
data from sources to targets. The filter condition in the Source Qualifier
transfor-mation only uses standard SQL as it runs in the database. Filter
Transformation can define a condi-tion using any statement or transformation
function that returns either a TRUE or FALSE value.
25
4. Joiner Transformation
1. What is a Joiner Transformation and why it is an Active one?
Answer:
A Joiner is an Active and Connected transformation used to join two source data
streams coming from same or heterogeneous databases or files.
The Joiner transformation joins sources with at least one matching column. The
Joiner transformation uses a condition that matches one or more pairs of columns
between the two sources.
In the Joiner transformation, we must configure the transformation properties
namely Join Condition, Join Type and optionally Sorted Input option to improve
Integration Service performance.
The join condition contains ports from both input sources that must match for the
Integration Service to join two rows. Depending on the join condition and the type
of join selected, the Integration Service either adds the row to the result set or
discards the row. Because of this reason, the number of rows in Joiner output may
not be equal to the number of rows in Joiner Input. This is why Joiner is
considered an Active transformation.
2. State the limitations where we cannot use Joiner in the mapping pipeline.
Answer:
The Joiner transformation accepts input from most transformations. However,
following are the limitations:
 Joiner transformation cannot be used when either of the input pipelines contains
an Update Strate-gy transformation.
 Joiner transformation cannot be used if we connect a Sequence Generator
transformation directly before the Joiner transformation.
3. Out of the two input pipelines of a joiner, which one will we set as the master
pipeline?
Answer:
During a session run, the Integration Service compares each row of the master
source against the detail source. The master and detail sources need to be
configured for optimal performance.
When the Integration Service processes an unsorted Joiner transformation, it blocks
the detail source while it caches rows from the master source. Once the Integration
Service finishes reading and caching all master rows, it unblocks the detail source
and reads the detail rows. This is why if we have the source containing fewer input
rows in master, the cache size will be smaller, thereby improving the performance.
26
For a Sorted Joiner transformation, use the source with fewer duplicate key values
as the master source for optimal performance and disk storage. When the Integration
Service processes a sorted Joiner transfor-mation, it caches rows for one hundred
keys at a time. If the master source contains many rows with the same key value,
the Integration Service must cache more rows, and performance can be slowed.
Blocking logic is possible if master and detail input to the Joiner transformation
originate from dif-ferent sources. Otherwise, it does not use blocking logic.
Instead, it stores more rows in the cache.
4. What are the different types of Joins available in Joiner Transformation?
Answer:
In SQL, a join is a relational operator that combines data from multiple tables
into a single result set. The Joiner transformation is similar to an SQL join
except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins:
 Normal
 Master Outer
 Detail Outer
 Full Outer
A normal or master outer join performs faster than a full outer or detail outer
join.
27
5. Define the various Join Types of Joiner Transformation.
Answer:
 In a normal join, the Integration Service discards all rows of data from the
master and detail source that do not match, based on the join condition.
 A master outer join keeps all rows of data from the detail source and the
matching rows from the master source. It discards the unmatched rows from the
master source.
 A detail outer join keeps all rows of data from the master source and the
matching rows from the detail source. It discards the unmatched rows from the
detail source.
 A full outer join keeps all rows of data from both the master and detail sources.
6. Describe the impact of number of join conditions and join order in a Joiner.
Answer:
We can define one or more conditions based on equality between the specified master
and detail sources. Both ports in a condition must have the same data type.
If we need to use two ports in the join condition with non-matching data types we
must convert the data types so that they match. The Designer validates data types
in a join condition.
Additional ports in the join condition, increases the time necessary to join two
sources.
The order of the ports in the join condition can impact the performance of the
Joiner transformation. If we use multiple ports in the join condition, the
Integration Service compares the ports in the order we specified.
Only equality operator is available in joiner join condition.
7. How does Joiner transformation treat NULL value matching?
Answer:
The Joiner transformation does not match null values.
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the
Integration Service does not consider them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports
tab of the joiner, and then join on the default values.
28
If a result set includes fields that do not contain data in either of the sources,
the Joiner transfor-mation populates the empty fields with null values. If we know
that a field will return a NULL and we do not want to insert NULLs in the target,
set a default value on the Ports tab for the corre-sponding port.
8. When we configure the join condition, what are the guidelines we need to follow
to main-tain the sort order?
Suppose we configure Sorter transformations in the master and detail pipelines with
the following sorted ports in order: ITEM_NO, ITEM_NAME and PRICE.
Answer:
If we have sorted both the master and detail pipelines in order of the ports say
ITEM_NO, ITEM_NAME and PRICE we must ensure that:
 Use ITEM_NO in the First Join Condition.
 If we add a Second Join Condition, we must use ITEM_NAME.
 If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use
ITEM_NAME in the Second Join Condition.
 If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort
order and the In-tegration Service fails the session.
9. What are the transformations that cannot be placed between the sort origin and
the Join-er transformation so that we do not lose the input sort order?
Answer:
The best option is to place the Joiner transformation directly after the sort
origin to maintain sorted data. However do not place any of the following
transformations between the sort origin and the Joiner transfor-mation:
 Custom
 Unsorted Aggregator
 Normalizer
 Rank
 Union transformation
 XML Parser transformation
 XML Generator transformation
 Mapplet [if it contains any one of the above mentioned transformations]
10. What is the use of sorted input in joiner transformation?
Answer:
It is recommended to Join sorted data when possible. We can improve session
performance by con-figuring the Joiner transformation to use sorted input. When we
configure the Joiner transformation to use sorted data, it improves performance by
minimizing disk input and output. We see
29
great performance improvement when we work with large data sets.
For an unsorted Joiner transformation, designate as the master source the source
with fewer rows. For optimal performance and disk storage, designate the master
source as the source with the fewer rows. During a session, the Joiner
transformation compares each row of the master source against the de-tail source.
The fewer unique rows in the master, the fewer iterations of the join comparison
occur, which speeds the join process.
11.Can we join two tables based on a join column having different data type?
For example table 1 EMPNO (string) and table 2 EMPNUM (number)
Answer:
Yes possible in this case. If we are using Joiner, we should be able to do this
explicit conversion in an expres-sion transformation before joining the tables.
12.Implementation Scenario1 - Joiner transformation is joining two tables s1 and
s2. s1 has 10,000 rows and s2 has 1000 rows . Which table you will set master for
better perfor-mance of joiner transformation? Why?
Answer:
Set table S2 as Master table because informatica server has to keep master table in
the cache so if it is 1000 in cache will get performance instead of having 10000
rows in cache.
30
5. Lookup Transformation
1. What is a Lookup transform?
Answer:
The transform is used to look up data in a flat file, relational table, views, or
synonym. The informatica server queries the lookup table based on the lookup ports
in the transformation. It compares lookup transfor-mation port values to lookup
table column values based on the lookup condition. The result is passed to other
transformations and the target.
Uses:
 Get related value
 Perform a calculation
 Update slowly changing dimension tables.
2. What are the differences between Connected and Unconnected Lookup?
Answer:
The differences are illustrated in the below table:
Connected Lookup
Unconnected Lookup Connected lookup participates in dataflow and re-ceives input
directly from the pipeline Unconnected lookup receives input values from the result
of a LKP: expression in an-other transformation
Connected lookup can use both dynamic and static cache
Unconnected Lookup cache can NOT be dynamic Connected lookup can return more than
one col-umn value ( output port ) Unconnected Lookup can return only one column
value i.e. output port
Connected lookup caches all lookup columns
Unconnected lookup caches only the lookup output ports in the lookup condi-tions
and the return port Supports user-defined default values (i.e. value to return when
lookup conditions are not satisfied) Does not support user defined default val-ues
3. What are the different lookup cache(s)?
Answer:
Informatica Lookups can be cached or un-cached (No cache). And Cached lookup can be
either static or dy-namic.
31
A static cache is one which does not modify the cache once it is built and the data
remains same during the session run.
On the other hand, a dynamic cache is refreshed during the session run by inserting
or updating the records in cache based on the incoming source data.
By default, Informatica cache is static cache.
A lookup cache can also be divided as persistent or non-persistent based on whether
Informatica retains the cache even after the completion of session run or deletes
it.
4. Is lookup an active or passive transformation?
Answer:
From Informatica 9x, Lookup transformation can be configured as an "Active"
transformation.
Find out How to configure lookup as active transformation.
However, in the earlier versions of Informatica, lookup is a passive
transformation.
5. What is the difference between Static and Dynamic Lookup Cache?
Answer:
We can configure a Lookup transformation to cache the underlying lookup table. In
case of static or read-only lookup cache the Integration Service caches the lookup
table at the beginning of the session and does not update the lookup cache while it
processes the Lookup transformation. Rows are not added dynamically in the cache.
In case of dynamic lookup cache the Integration Service dynamically inserts or
updates data in the lookup cache and passes the data to the target. The dynamic
cache is synchronized with the target. It basically, caches the rows as and when it
is passed.
In case you are wondering why we need to make lookup cache dynamic, read this
article on dynamic lookup.
6. What are the uses of index and data caches?
Answer:
The conditions are stored in index cache and records from the lookup are stored in
data cache
7. What is Persistent Lookup Cache?
Answer:
32
If the cache generated for a Lookup needs to be preserved for subsequent use then
persistent cache is used. It will not delete the index and data files. It is useful
only if the lookup table remains constant.
Lookups are cached by default in Informatica. Lookup cache can be either non-
persistent or persistent. The Integration Service saves or deletes lookup cache
files after a successful session run based on, whether the Lookup cache is checked
as persistent or not.
8. What type of join does Lookup support?
Answer:
Lookup is just similar like SQL LEFT OUTER JOIN.
9. Explain how lookup transformation works like SQL Left Outer Join.
Answer:
Lookup means if the source input column value matches the lookup table comparison
column value then it will Return valid values from the lookup table else it will
return NULL. Let’s consider the EMP table as Source and DEPT table as lookup. We
want to extract the location of each employee based on his or her department
number. So if the Location details are not available in the DEPT table, still we
want to have all the other information of the employee coming from the source EMP
table, apart from NULL as location and load in our target table. So the equivalent
SQL query looks like below:- SELECT EMP.*, DEPT.LOC FROM EMP LEFT OUTER JOIN DEPT
ON EMP.DEPTNO = DEPT.DEPTNO Hence Lookup is associated with the Source table as
Left Outer Join.
10.Where and why do we use Unconnected Lookup instead of Connected Lookup?
Answer:
The best part of unconnected lookup is that, we can call the lookup based on some
condition and not every time. I.e. based on some condition met we can invoke the
unconnected lookup in an expression transformation else not. By this we may
optimize the performance of a flow.
We may consider unconnected lookup as a function in any procedural language. It
takes multiple parameters as input and returns one values, and can be used
repeatedly. Same way unconnected lookup can be used in any scenario where we need
to use the lookup repeatedly either in single or multiple transformation. With the
unconnected lookup, we get the performance benefit of not caching the same data
multiple times. Also it is a good coding practice.
33
11.How can we Identify Persistent Cache Files in Informatica Server?
Answer:
 Cache files are generated in the Cache directory of the Informatica Server for
transformations like Aggregator, Joiner, Lookup, Rank & Sorter.
 Two types of cache files are generated i.e. the data and index files exception
being Sorter transfor-mation.
 Most Important point is that Informatica automatically deletes all the
generated .dat and .idx cache files after a session run is finished.
 So the files that are present in the Cache directory are basically the Persistent
Cache files of Lookup transformation, Aggregator Cache files of Incremental
Aggregation sessions or if the session run was not successfully completed.
 Informatica generated cache files are named as: PMAGG*.idx, PMAGG*.dat,
PMJNR*.idx, PMJNR*.dat, PMLKP*.idx, PMLKP*.dat.
 Often while handling big data cache Informatica creates multiple index and data
files due to paging and appends a number to the end of the files e.g. PMAGG*.dat0,
PMAGG*.idx0, PMAGG*.dat1, PMAGG*.idx1.
So if we have followed any particular naming convention for Lookup Persistent Cache
Name e.g. ta-ble_name_PC or the table names have a convention like GDW_ then use
shell commands accordingly to identify the cache files in server. In this context
you can revisit Lookup Persistent Cache and Incremental Aggregation article
12.How to configure a Lookup on a flat file with header?
Answer:
When we try to create a lookup transformation, we have the option to select the
location of the Lookup Ta-ble from any of Source, Target, Source Qualifier, Import
from Relational Table or Import from Flat File.
So after selecting the flat file as lookup from the desired location, the edit
Transformation tab of the lookup will have the Flat file information to choose
between Delimited or Fixed width and advanced properties to modify like Column
Delimiters, Code Page and obviously Number of initial rows to skip. Set Number of
initial rows to skip as 1. Set the Lookup condition as required.
Apart from that go to the Mapping tab of the corresponding session and select the
lookup transformation to configure the Lookup source file directory and filename
and Lookup source file type i.e. Direct or Indirect.
13.What is the difference between persistent cache and shared cache?
Answer:
Persistent cache is a type of Informatica lookup cache in which the cache file is
stored in disk. We can configure the session to re-cache if necessary. It will be
used only if we are sure that lookup table will not change between sessions. It
will be used if your mapping uses any static tables as lookup mostly.
34
If the persistent cache is shared across mappings, we call it as shared cache
(named). We will provide a name for this cache file.
If the lookup table is used in more than one transformation/mapping then the cache
built for the first lookup can be used for the others. It can be used across
mappings.
For Shared cache we have to give the name in cache file name prefix property. Use
the same name it in dif-ferent lookup where we want to use the cache.
Unshared cache: Within the mapping if the lookup table is used in more than one
transformation then the cache built for the first lookup can be used for the
others. It cannot be used across mappings.
14.Describe how to return multiple port values from unconnected lookup in
Informatica.
Answer:
Informatica Unconnected Lookup by default supports only one return port. So
alternatively we can write a Lookup SQL override with the required ports values
concatenated into a sin-gle string as return port value.
Call the Unconnected lookup from the expression transformation and use various
output ports to retrieve the lookup values based on the concatenated return value.
Use SUBSTR, INSTR functions to extract the col-umn values from the concatenated
return field.
15.How to make the persistent lookup cache in sync with lookup table?
Answer:
To make the persistent cache in sync with the lookup table simply enable Re-cache
option of the lookup transformation to rebuild the lookup cache from lookup table
again. While loading the target dimension ta-ble we can choose to make the lookup
cache dynamic and recache-persistent so that once dimension is loaded the
persistent cache file is in sync and available during Fact table loading.
16.If we use persistent cache for a dynamic lookup, will the cache file be updated
or inserted as required?
Answer:
Having persistent cache will not impact the dynamic cache anyway in doing insert &
updates to the cache file. Just that cache file will have a proper name assigned
using persistent named cache and it can be reused later.
17.Is there anything wrong in sharing a persistent cache between static and dynamic
lookup?
Answer:
35
Static & Dynamic lookup cannot share the same persistent cache.
18.What is the difference between the two update properties - update else insert,
insert else update in dynamic lookup cache?
Answer:
I
In Dynamic Cache:
 Update else Insert: In this scenario, if incoming record already exists in lookup
cache then the record is going to be updated in the cache and also the target else
it will be inserted.
 Insert else Update: In this scenario, if incoming record does not exist in lookup
cache then the record is going to be inserted in the cache and also the target else
it will be updated.
These options play a role in the performance part. If we know the nature of the
source data we can set the update option accordingly. Suppose if the maximum source
data is destined for insert we will select Insert else Update, otherwise we will go
for Update else Insert. Also, if the number of duplicate records coming from Source
is greater or there are few potential duplicates in source then we go for Update
Else Insert or Insert Else Update respectively for better performance.
19.If the default value for the lookup return port is not set, what will be the
output when the lookup condition fails?
Answer:
NULL will be returned from lookup transformation on lookup condition failure.
20.How can we ensure data is not duplicated in the target when the source has
duplicate records, using lookup transformation?
Answer:
Using Dynamic lookup cache we can ensure duplicate records are not inserted in the
target. That is through Using Dynamic Lookup Cache of the target table and
associating the input ports with the lookup port and checking the Insert Else
Update option will help to eliminate the duplicate records in source and hence
load-ing unique records in the target.
For more details check, Dynamic Lookup Cache
36
6. Normalizer Transformation
1. What is a Normalizer transformation?
Answer:
The normalizer transformation normalizes records from COBOL and relational sources,
allowing you to or-ganize the data according to your own needs. A Normalizer
transformation can appear anywhere in a data flow when you normalize a relational
source. Use a Normalizer transformation instead of the Source Qualifi-er
transformation when you normalize COBOL source. When you drag a COBOL source into
the Mapping De-signer Workspace, the Normalizer transformation appears, creating
input and output ports for every col-umns in the source.
2. Scenario Implementation 1 Suppose in our Source Table we have data as given
below:
Student Name
Math
Life Science
Physical Science
Sam
100
70
80
John
75
100
85
Tom
80
100
85 We want to load our Target Table as:
Student Name
Subject Name
Marks
Sam
Math
100
Sam
Life Science
70
Sam
Physical Science
80
John
Math
75
John
Life Science
100
John
Physical Science
85
Tom
Math
80
Tom
Life Science
100
Tom
Physical Science
85 Describe your approach. Answer: Here to convert the Rows to Columns we have to
use the Normalizer Transformation followed by an Expres-sion Transformation to
decode the column taken into consideration. For more details on how the mapping is
performed please visit Working with Normalizer.
3. What are levels in Normalizer transformation?
37
Answer:
The VSAM Normalizer transformation is the Source Qualifier for a COBOL source
definition. A COBOL can contain multiple-occurring data (Group of columns of same
type) and multiple types of records in the same file. Mostly level is for that use.
The Normalizer tab defines the structure of the source data. A group of col-umns
might define a record in a COBOL source or it might define a group of multiple-
occurring fields in the source.
The column level number identifies groups of columns in the data. Level numbers
define a data hierarchy.
Columns in a group have the same level number and display sequentially below a
group-level column. A group-level column has a lower level number, and it contains
no data.
4. What is the purpose of GCID and GK in a Normalizer transformation?
Answer:
Let’s take an example:
Source data is:
Name
FOOD
HOUSERENT
TRANSPORT
Saurav
1000
2000
500
Jenny
2000
2500
700
When we set the OCCURS property of the Normalizer to 3, the Normalizer creates 3
input ports to get data from the source. Say the 3 columns FOOD, HOUSERENT and
TRANSPORT is connected to the 3 input ports of the Normalizer. Then the GCID gets 3
values 1, 2 and 3 corresponding to the connected input columns for FOOD, HOUSERENT
and TRANSPORT. Going forward it generates 3 rows for each input columns values of a
single source row. On the other hand GK will keep a sequence value starting from 1
to number of source records. It holds the sequence number of the source records
being processed.
Below will help to visualize output data from the Normalizer in GCID and GK fields:
Name
EXPENSEHEAD
GCID_EXPENSEHEAD
EXPENSE
GK_EXPENSEHEAD
Saurav
FOOD
1
1000
1
Saurav
HOUSERENT
2
2000
1
Saurav
TRANSPORT
3
500
1
Jenny
FOOD
1
2000
2
Jenny
HOUSERENT
2
500
2
Jenny
TRANSPORT
3
700
2
38
7. Rank Transformation
1. What is a Rank Transform?
Answer:
Rank is an Active Connected transformation used to select a set of top or bottom
values of data. It basically filters the required number of records from the top or
from the bottom.
2. How does a Rank Transform differ from Aggregator Transform functions MAX and
MIN?
Answer:
Like the Aggregator transformation, the Rank transformation also groups
information. The Rank Transform allows us to select a group of top or bottom
values, not just one value as in case of Aggregator MAX, MIN functions.
3. How does a Rank Cache works?
Answer:
During a session, the Integration Service compares an input row with rows in the
data cache. If the input row out-ranks a cached row, the Integration Service
replaces the cached row with the input row. If we configure the Rank transformation
to rank based on different groups, the Integration Service ranks incrementally for
each group it finds. The Integration Service creates an index cache to stores the
group information and data cache for the row data.
4. What is a RANK port and RANKINDEX?
Answer:
Rank port is an input/output port used to specify the column for which we want to
rank the source values. By default Informatica creates an output port RANKINDEX for
each Rank trans-formation. It stores the ranking position for each row in a group.
5. How can you get ranks based on different groups?
Answer:
Rank transformation lets us group information. We can configure one of its
input/output ports as a group by port. For each unique value in the group port, the
transformation creates a group of rows falling within the rank definition (top or
bottom, and a particular number in each rank).
39
6. What happens if two rank values match?
Answer:
If two rank values match, they receive the same value in the rank index and the
transformation skips the next value.
7. What are the restrictions of Rank Transformation?
Answer:
 We can connect ports from only one transformation to the Rank transformation.
 We can select the top or bottom rank.
 We need to select the Number of records in each rank.
 We can designate only one Rank port in a Rank transformation.
8. How does Rank transformation handle string values?
Answer:
Rank transformation can return the strings at the top or the bottom of a session
sort order. When the Integration Service runs in Unicode mode, it sorts character
data in the session using the selected sort order associated with the Code Page of
Integration Service which may be French, German, etc. When the Integration Service
runs in ASCII mode, it ignores this setting and uses a binary sort order to sort
character data.
9. What is Dense Rank and does Informatica supports Dense Rank?
Answer:
When multiple rows share the same rank the next rank in the sequence is not
consecutive. On the other hand DENSE RANK assigns consecutive ranks. Take the
following example: Let’s say we want to see the top 2 highest salary of each
department.
DEPTNO
SAL
RANK
DENSE_RANK
10
400
1
1
10
400
1
1
10
300
3
2
10
100
4
3
20
550
1
1
20
550
2
2
20
150
2
2
30
200
1
1
40
40
600
1
1
So the normal RANK will generate the result set where we can miss rank (here RANK =
2 is missing for de-partment 10) for due to sharing of same ranks between multiple
records. On the other hand the DENSE RANK will generate all the consecutive ranks.
Informatica RANK transform performs a simple RANK, not DENSE RANK. So using
Informatica RANK trans-form we may miss consecutive ranks.
10.How do we achieve DENSE_RANK in Informatica?
Answer:
In order to achieve the DENSE RANK functionality in Informatica we will use the
combination of Sorter, Ex-pression and Filter transformation. Based on the previous
example data set, let’s say we want to get the top 2 highest salary of each
department as per DENSE RANK.
 Use a SORTER transformation. DEPTNO ASC, SAL DESC
 After the sorter place an EXPRESSION transformation.
PORT_NAME
TYPE
EXPRESSION
DEPT
I/O
SAL
I/O
V_COMP
V
IIF (DEPT <> V_DEPT_PREV, 1, IIF (DEPT = V_DEPT_PREV AND SAL <> V_SAL_PREV, RANK+1,
RANK))
RANK
O
V_COMP
V_DEPT_PREV
V
DEPT
V_SAL_PREV
V
SAL
 Next use a FILTER transformation. FILTER CONDITION: RANK < 3
11.Source table has 5 rows. Rank in rank transformation is set to 10. How many rows
the rank transformation will output?
Answer:
5 Rank
12.How you will load unique record into target flat file from source flat files has
duplicate da-ta?
Answer:
41
In rank transformation using group by port (Group the records) and then set no. of
rank 1. Rank transfor-mation returns one value from the group. That value will be a
unique one.
42
8. Router Transformation
1. What is the difference between Router and Filter?
Answer:
Following differences can be note:
Router
Filter Router transformation divides the incoming rec-ords into multiple groups
based on some condi-tion. Such groups can be mutually inclusive (Dif-ferent groups
may contain same record) Filter transformation restricts or blocks the incoming
record set based on one given condition.
Router transformation itself does not block any record. If a certain record does
not match any of the routing conditions, the record is routed to de-fault group
Filter transformation does not have a default group. If one record does not match
filter condition, the record is blocked Router acts like CASE... WHEN statement in
SQL (Or Switch ()... statement in C) Filter acts like WHERE condition is SQL.
In filter transformation the records are filtered based on the condition and
rejected rows are discarded. In Router the multiple conditions are placed and the
rejected rows can be assigned to a port.
2. What is the minimum number of groups we can declare in a Router transformation?
Answer:
We can define minimum 1 group condition for a Router transformation, and it will
create automatically an-other group called Default to pass those records that do
not conform to the Router condition for the group defined.
Loading Multiple Target Tables Based on Conditions- Suppose we have some serial
numbers in a flat file source. We want to load the serial numbers in two target
files one containing the EVEN serial numbers and the other file having the ODD
ones.
Answer:
After the Source Qualifier place a Router Transformation. Create two Groups namely
EVEN and ODD, with filter conditions as:
 MOD(SERIAL_NO,2)=0
 MOD(SERIAL_NO,2)=1
Then output the two groups into two flat file targets.
43
Suppose we have a source table and we want to load three target tables based on
source rows such that first row moves to first target table, second row in second
target table, third row in third target table, fourth row again in first target
table so on and so forth. Describe your approach.
Answer:
We can clearly understand that we need a Router transformation to route or filter
source data to the three target tables. Now the question is what will be the filter
conditions.
First of all we need an Expression Transformation where we have all the source
table columns and along with that we have another i/o port say seq_num, which gets
sequence numbers for each source row from the port NEXTVAL of a Sequence Generator
start value 0 and increment by 1.
Now the filter condition for the three router groups will be:
 MOD(SEQ_NUM,3)=1 connected to 1st target table
 MOD(SEQ_NUM,3)=2 connected to 2nd target table
 MOD(SEQ_NUM,3)=0 connected to 3rd target table
44
How can we distribute and load ‘n’ number of Source records equally into two target
tables, so that each have ‘n/2’ records?
Answer:
 After Source Qualifier use an expression transformation.
 In the expression transformation create a counter variable V_COUNTER = V_COUNTER
+ 1 (Variable port) O_COUNTER = V_COUNTER (o/p port) This counter variable will get
incremented by 1 for every new record which comes in.
 Router Transformation: Group_ODD: IIF(MOD(O_COUNTER, 2) = 1) Group_EVEN:
IIF(MOD(O_COUNTER, 2) = 0)
Half of the record (all odd number record) will go to Group_ODD and rest to
Group_EVEN.
 Finally the target tables.
45
9. Sequence Generator Transformation
1. What is a Sequence Generator Transformation?
Answer:
A Sequence Generator is a Passive and Connected transformation that generates
numeric values.
It is used to create unique primary key values, replace missing primary keys, or
cycle through a sequential range of numbers.
This transformation by default contains two OUTPUT ports only, namely CURRVAL and
NEXTVAL. We can-not edit or delete these ports neither we cannot add ports to this
unique transformation. We can create ap-proximately two billion unique numeric
values with the widest range from 1 to 2147483647.
2. Define the Properties available in Sequence Generator transformation in brief.
Answer:
Sequence Generator Properties
Description Start Value Start value of the generated sequence that we want the
Integration Service to use if we use the Cycle option. If we select Cycle, the In-
tegration Service cycles back to this value when it reaches the end value. Default
is 0.
Increment By
Difference between two consecutive values from the NEXTVAL port. Default is 1. End
Value Maximum value generated by Sequence Generator. After reaching this value the
session will fail if the sequence generator is not con-figured to cycle. Default is
2147483647.
Current Value
Current value of the sequence. Enter the value we want the Inte-gration Service to
use as the first value in the sequence. Default is 1. Cycle If selected, when the
Integration Service reaches the configured end value for the sequence, it wraps
around and starts the cycle again, beginning with the configured Start Value.
Number of Cached Values
Number of sequential values the Integration Service caches at a time. Default value
for a standard Sequence Generator is 0. Default value for a reusable Sequence
Generator is 1,000. Reset Restarts the sequence at the current value each time a
session runs. This option is disabled for reusable Sequence Generator
transformations.
46
Suppose we have a source table populating two target tables. We connect the NEXTVAL
port of the Se-quence Generator to the surrogate keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow
the same sequence values in both of them.
Answer:
When we connect the NEXTVAL output port of the Sequence Generator directly to the
surrogate key col-umns of the target tables, the Sequence number will not be the
same.
A block of sequence numbers is sent to one target tables surrogate key column. The
second target receives a block of sequence numbers from the Sequence Generator
transformation only after the first target table re-ceives the block of sequence
numbers.
Suppose we have 5 rows coming from the source, so the targets will have the
sequence values as TGT1 (1,2,3,4,5) and TGT2 (6,7,8,9,10). [Taken into
consideration Start Value 0, Current value 1 and Increment by 1]
Now suppose the requirement is like that we need to have the same surrogate keys in
both the targets.
Then the easiest way to handle the situation is to put an Expression transformation
in between the Se-quence Generator and the Target tables. The Sequence Generator
will pass unique values to the expression transformation, and then the rows are
routed from the expression transformation to the targets.
Suppose we have 100 records coming from the source. Now for a target column
population we used a Se-quence generator.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80.
What will happen?
Answer:
End Value is the maximum value the Sequence Generator will generate. After it
reaches the End value the session fails with the following error message:
47
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle
through the sequence, i.e. whenever the Integration Service reaches the configured
end value for the sequence; it wraps around and starts the cycle again, beginning
with the configured Start Value.
5. What are the changes we observe when we promote a non-reusable Sequence
Generator to a reusable one? And what happens if we set the Number of Cached Values
to 0 for a reusable transformation?
Answer:
When we convert a non-reusable sequence generator to reusable one we observe that
the Number of Cached Values is set to 1000 by default.
And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence
Generator to 0 in the Transformation Developer we encounter the following error
message:
The number of cached values must be greater than zero for reusable sequence
transformation.
6. How Sequence Generator in the mapping is handled when we migrate the mapping
from one environment to another?
Answer:
While promoting the Informatica Objects using Copy Folder Wizard we have the option
to choose to retain existing values or to replace them with values from the source
folder. Generally we Retain the current values for the Sequence Generator
transformation in the destination fold-er, else we may end up having duplicate
values for the sequence generated column and may result to ses-sion failure.
Find the below Informatica Metadata query which gives the list of the current value
of Sequence Generator transform:
SELECT OPB_SUBJECT.SUBJ_NAME AS "FOLDER NAME", OPB_MAPPING.MAPPING_NAME AS "MAPPING
NAME", REP_WIDGET_INST.INSTANCE_NAME AS "SEQ NAME", OPB_WIDGET_ATTR.ATTR_VALUE AS
"CURRENT VALUE" FROM REP_WIDGET_INST INNER JOIN OPB_MAPPING ON
(REP_WIDGET_INST.MAPPING_ID = OPB_MAPPING.MAPPING_ID) INNER JOIN OPB_WIDGET_ATTR ON
(REP_WIDGET_INST.WIDGET_TYPE = OPB_WIDGET_ATTR.WIDGET_TYPE AND
REP_WIDGET_INST.WIDGET_ID = OPB_WIDGET_ATTR.WIDGET_ID) INNER JOIN OPB_SUBJECT ON
(OPB_MAPPING.SUBJECT_ID = OPB_SUBJECT.SUBJ_ID ) WHERE
REP_WIDGET_INST.WIDGET_TYPE_NAME like 'Sequence%' AND OPB_WIDGET_ATTR.ATTR_ID = 4
--Current Value ORDER BY OPB_MAPPING.MAPPING_NAME
48
Consider we have two mappings that populate a single target table from two
different source systems. Both the mappings have Sequence Generator transform to
generate surrogate key in the target table. How can we ensure that the surrogate
key generated is consistent and does not generate duplicate values when pop-ulating
data from two different mappings?
Answer:
We should use a Reusable Sequence Generator in both the mappings to generate the
target surrogate keys.
8. How do I get a Sequence Generator to "pick up" where another "left off"?
Answer:
Use an unconnected lookup on the Sequence ID of the target table. Set the
properties to "LAST VALUE", in-put port is an ID. the condition is: SEQ_ID >=
input_ID. Then in an expression set up a variable port: connect a NEW self-
resetting sequence generator to a new input port in the expression. The variable
port's expres-sion should read: IIF( v_seq = 0 OR ISNULL(v_seq) = true,
:LKP.lkp_sequence(1), v_seq). Then, set up an output port. Change the output port's
expression to read: v_seq + input_seq (from the resetting sequence generator). Thus
you have just completed an "append" without a break in sequence numbers.
49
10. Stored Procedure Transformation
1. What is a Stored Procedure Transformation?
Answer:
Stored Procedure is a Passive transformation used to execute stored procedures pre-
built on the database through Informatica ETL. It can also be used to call
functions to return calculated values.
2. How many types of Stored Procedure transformation are there?
Answer:
There are two types of Stored Procedure transformation based on calling, Connected
and Uncon-nected. Based on the execution order they can be classified as Source Pre
Load, Source Post Load, Normal, Target Pre Load and Target Post Load.
Normal Stored Procedure transformation can be configured as both connected and
unconnected whereas Pre-Post Load Stored Procedures are unconnected ones.
3. How do we call an Unconnected Stored Procedure transformation?
Answer:
The unconnected Stored Procedure transformation is called from expression
transformation using the :SP.<Stored_Procedure_Name>(Argument1, Argument2).
Conditional execution of a Stored Procedure is possible using Unconnected Stored
Procedure unlike the con-nected one.
4. How do we set the Execution order of Pre-Post Load Stored Procedure?
Answer:
We set the execution order using the Stored Procedure Plan from the mapping
property.
5. How do we set the Call Text for Stored Procedure transformation?
Answer:
Once we specify the Stored Procedure Type other than Normal, the Call Text
Attribute in the Properties tab gets enabled. Here we have to specify how the
procedure has to be called along with arguments to be passed. E.g.
<Stored_Procedure_Name>(Argument1, Argument2).
50
6. How do we receive output/return parameters from Unconnected Stored Procedure?
Answer:
Configure the expression to send any input parameters and capture any output
parameters or return value You must know whether the parameters shown in the
Expression Editor are input or output parameters. You insert variables or port
names between the parentheses in the exact order that they appear in the stored
procedure itself. The datatypes of the ports and variables must match those of the
parameters passed to the stored procedure.
For example, when you click the stored procedure, something similar to the
following appears:
:SP.GET_NAME_FROM_ID()
This particular stored procedure requires an integer value as an input parameter
and returns a string value as an output parameter. How the output parameter or
return value is captured depends on the number of output parameters and whether the
return value needs to be captured.
If the stored procedure returns a single output parameter or a return value (but
not both), you should use the reserved variable PROC_RESULT as the output variable.
In the previous example, the expression would appear as:
:SP.GET_NAME_FROM_ID(inID, PROC_RESULT)
InID can be either an input port for the transformation or a variable in the
transformation. The value of PROC_RESULT is applied to the output port for the
expression.
If the stored procedure returns multiple output parameters, you must create
variables for each output pa-rameter. For example, if you created a port called
varOUTPUT2 for the stored procedure expression, and a variable called varOUTPUT1,
the expression would appears as:
:SP.GET_NAME_FROM_ID (inID, varOUTPUT1, PROC_RESULT)
The value of the second output port is applied to the output port for the
expression, and the value of the first output port is applied to varOUTPUT1. The
output parameters are returned in the order they are de-clared in the stored
procedure itself.
With all these expressions, the datatypes for the ports and variables must match
the datatypes for the in-put/output variables and return value.
51
11. Sorter Transformation
1. What is a Sorter Transformation?
Answer:
Sorter is an Active Connected transformation used to sort data in ascending or
descending order according to specified sort keys. The Sorter transformation
contains only input/output ports.
2. Why is Sorter an Active Transformation?
Answer:
This is because we can select the “distinct” option in the sorter property. When
the Sorter transformation is configured to treat output rows as distinct, it
assigns all ports as part of the sort key. The Inte-gration Service discards
duplicate rows compared during the sort operation. The number of Input Rows will
vary as compared with the Output rows and hence it is an Active transformation.
3. How does Sorter handle Case Sensitive sorting?
Answer:
The Case Sensitive property determines whether the Integration Service considers
case when sorting data. When we enable the Case Sensitive property, the Integration
Service sorts uppercase characters higher than lowercase characters.
4. How does Sorter handle NULL values?
Answer:
We can configure the way the Sorter transformation treats null values. Enable the
property Null Treated Low if we want to treat null values as lower than any other
value when it performs the sort operation. Disa-ble this option if we want the
Integration Service to treat null values as higher than any other value.
5. How does a Sorter Cache works?
Answer:
The Integration Service passes all incoming data into the Sorter Cache before
Sorter transfor-mation performs the sort operation.
The Integration Service uses the Sorter Cache Size property to determine the
maximum amount
52
of memory it can allocate to perform the sort operation. If it cannot allocate
enough memory, the Integra-tion Service fails the session. For best performance,
configure Sorter cache size with a value less than or equal to the amount of
available physical RAM on the Integration Service machine.
If the amount of incoming data is greater than the amount of Sorter cache size, the
Integration Service tem-porarily stores data in the Sorter transformation work
directory. The Integration Service requires disk space of at least twice the amount
of incoming data when storing data in the work directory.
6. How to delete duplicate records or rather to select distinct rows for flat file
sources?
Answer:
Since the source system is a Flat File you will not be able to select the distinct
option in the source qualifier as it will be disabled due to flat file source
table. Hence the next approach may be we use a Sorter Trans-formation and check the
Distinct option. When we select the distinct option all the columns will the
selected as keys, in ascending order by default.
53
12. Union Transformation
1. What is a Union Transformation?
Answer:
Union is an Active, Connected non-blocking multiple input group transformation used
to merge data from multiple pipelines or sources into one pipeline branch. Similar
to the UNION ALL SQL statement, the Union transformation does not remove duplicate
rows.
2. What are the restrictions of Union Transformation?
Answer:
 All input groups and the output group must have matching ports. The precision,
data type, and scale must be identical across all groups.
 We can create multiple input groups, but only one default output group.
 The Union transformation does not remove duplicate rows.
 We cannot use a Sequence Generator or Update Strategy transformation upstream
from a Union transformation.
 The Union transformation does not generate transactions.
3. How come union transformation is active?
Answer:
Active transformations are those that may change the number or position of rows in
the data stream. Any transformation that splits or combines data streams or
reduces, expands or sorts da-ta is an active transformation because it cannot be
guaranteed that when data passes through the transformation the number of rows and
their position in the data stream are always unchanged.
Union is an active transformation because it combines two or more data streams into
one. Though the total number of rows passing into the Union is the same as the
total number of rows passing out of it, and the se-quence of rows from any given
input stream is preserved in the output, the positions of the rows are not
preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the
output stream. Union does not even guarantee that the output is repeatable.
For Union, number of input rows does not match with the number of output rows.
Consider, we have two sources with 10 and 20 rows individually. For each of this
input Source we are getting 30 output rows. We could probably consider this like a
Joiner with 10 and 20 rows with Full Outer Join, with no matching col-umns, which
will give you all the rows as output.
It is a debatable Topic as why UNION transformation is Active. Union Transformation
is derived from Multigroup External transformation. As Multigroup External
transformation is Active, Union transformation can be termed as active.
54
13. Update Strategy Transformation
1. What is Update Strategy transform?
Answer:
Update strategy defines the sources to be flagged for insert, update, delete, and
reject at the targets.
2. What are Update Strategy Constants?
Answer:
 DD_INSERT - 0
 DD_UPDATE - 1
 DD_DELETE - 2
 DD_REJECT - 3
3. How can we update a record in target table without using Update strategy?
Answer:
A target table can also be updated without using “Update Strategy”. For this, we
need to define the key in the target table in Informatica level and then we need to
connect the key and the field we want to update in the mapping Target. In the
session level, we should set the target property as “Update as Update” and enable
the “Update” check-box.
Let's assume we have a target table "Customer" with fields as "Customer ID",
"Customer Name" and "Cus-tomer Address". Suppose we want to update "Customer
Address" without an Update Strategy. Then we have to define "Customer ID" as
primary key in Informatica level and we will have to connect Customer ID and
Customer Address fields in the mapping. If the session properties are set correctly
as described above, then the mapping will only update the customer address field
for all matching customer IDs.
4. What is Data Driven?
Answer:
Update strategy defines the sources to be flagged for insert, update, delete, and
reject at the targets. Treat input rows as Data Driven: This is the default session
property option selected while using an Update Strategy transformation in a
mapping. The integration service follows the instructions coded in mapping to flag
the rows for insert, update, delete or reject.
55
5. What happens when DD_UPDATE is defined in update strategy and Treat source rows
as INSERT is selected in Session?
Answer:
If in Session anything other than DATA DRIVEN is mentioned then Update strategy in
the mapping is ignored.
6. What are the three areas where the rows can be flagged for particular treatment?
Answer:
 In Mapping – Update Strategy
 In Session - Treat Source Rows As
 In Session - Target Insert / Update / Delete Options.
7. By default operation code for any row in Informatica without being altered is
INSERT. Then state when do we need DD_INSERT?
Answer:
When we handle data insertion, updating, deletion and/or rejection in a single
mapping, we use Update Strategy transformation to flag the rows for Insert, Update,
Delete or Reject. We flag it by either providing the values 0, 1, 2, 3 respectively
or by DD_INSERT, DD_UPDATE, DD_DELETE or DD_REJECT in the Update Strategy
transformation. By default the transform has the value '0' and hence it performs
insertion.
Suppose we want to perform insert or update target table in a single pipeline. Then
we can write the below expression in update strategy transformation to insert or
update based on the incoming row.
IIF (LKP_EMPLOYEE_ID IS NULL, DD_INSERT, DD_UPDATE) If we can use more than one
pipeline then, it’s not a problem. For the Insert part we don’t even need an Up-
date Strategy transform explicitly (DD_INSERT), we can map it straight away.
8. What is the difference between update strategy and following update options in
target?
Update as Update - Update as Insert - Update else Insert Even if we do not use
update strategy we can still update the target by setting, for example Update as
Update and treating target rows as data driven. So what's the difference here?
Answer:
The operations for the following options will be done in the Database Level.
 Update as Update
 Update as Insert
 Update else Insert
56
It will write a 'select' statement on the target table and will compare with the
source. Accordingly if the rec-ord already exits it will do an update else it will
insert. On the other hand the update strategy the operations will be done at the
Informatica level itself.
Update strategy also gives conditional update option - wherein based on some
condition you can update/ in-sert even reject the rows. Such conditional options
are not available in target based updates (wherein it will either “update” or it
will perform “update else insert” based on the keys defined in Informatica level)
9. What is the use of Forward Reject rows in Mapping?
Answer:
If DD_REJECT is selected in the Update Strategy, then we need to select this option
to generate the Reject/ Bad file.
Suppose we have source employee table and we want to load employees who belong to
department 10 to Target 1, 20 to Target 2 and 30 to Target 3. Describe the approach
without using FILTER or ROUTER Trans-formations.
Answer:
We will use three separate Update Strategy transformations before each of the
target tables (T1, T2, T3), and provide below condition in their expression editor:
UPD_T1: IIF (DEPTNO = 10, DD_INSERT, DD_REJECT) UPD_T2: IIF (DEPTNO = 20,
DD_INSERT, DD_REJECT) UPD_T3: IIF (DEPTNO = 30, DD_INSERT, DD_REJECT)
57
14. Java Transformation
Source:
Col1
Col2
A
3
B
2
C
2
Target:
Col1
Col2
A
3
A
3
A
3
B
2
B
2
C
2
C
2
Answer:
Using Java transformation in Informatica we can generate as many records required
as per the requirement. Here goes the Java code.
In_Col1 = Col1;
In_Col2 = Col2;
for (int i = 0, i < In_Col2, i++) { Out_Col1 = In_Col1; Out_Col2 = In_Col2;
generaterows(); }
How can I replace characters e.g. A to Z in a particular string to its ASCII value?
E.g. Input String-AB123C1; Output string-6566123671
Answer:
If the INPUT string is fixed size of 9 characters, Use the below code as expression
in an Output port of an Informatica Expression transformation. Alternatively you
can use Informatica User-Defined Function with the INPUT string as an Argument:
IIF( IS_NUMBER( SUBSTR( INPUT, 1, 1 ) ) = 1, SUBSTR( INPUT, 1, 1 ), TO_CHAR( ASCII(
SUBSTR( INPUT, 1, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 2, 1 ) ) = 1, SUBSTR(
INPUT, 2, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 2, 1 ) ) ) ) ||
IIF( IS_NUMBER( SUBSTR( INPUT, 5, 1 ) ) = 1, SUBSTR( INPUT, 5, 1 ),
58
TO_CHAR( ASCII( SUBSTR( INPUT, 5, 1 ) ) ) ) || IIF( IS_NUMBER( SUBSTR( INPUT, 6,
1 ) ) = 1, SUBSTR( INPUT, 6, 1 ), TO_CHAR( ASCII( SUBSTR( INPUT, 6, 1 ) ) ) ) ||
SUBSTR( INPUT, 9, 1 ) ) ) )
As per the requirement we want to convert just the Characters in an input String to
its ASCII equivalent not the Digits.
If the requirement were to convert a single character to ASCII equivalent in
Informatica, then the ASCII in-built function of Informatica would have been
helpful. E.g. ASCII(inp_chr) But single this is a string and we need the ASCII
equivalent of each characters in the string i.e. parse each characters; concept of
loop comes in picture. So use Informatica JAVA transformation. Use Informatica
Passive Java transformation: I have the i/p column name as INPUT and o/p value from
Java transform as OUTPUT port created. On the Java Code tab of Java transformation
use the below java code:- String inp = INPUT; String ch; String out=""; for (int i
= 0; i < inp.length(); i++) {
ch= inp.substring(i, i+1); char c = inp.charAt(i); if(! Character.isDigit(c)) {
int j = (int) c; out = out + j;
} else
out = out + ch;
} OUTPUT = out;
59
15. Source Qualifier Transformation
1. What is a Source Qualifier? What are the tasks we can perform using a Source
Qualifier and why it is an ACTIVE transformation?
Answer:
A Source Qualifier is an Active and Connected transformation that reads the rows
from a relational database or flat file source.
 We can configure the SQ to join [Both INNER as well as OUTER JOIN] data
originating from the same source database.
 We can use a source filter to reduce the number of rows the Integration Service
queries.
 We can specify a number for sorted ports and the Integration Service adds an
ORDER BY clause to the default SQL query.
 We can choose Select Distinct option for relational databases and the Integration
Service adds a SE-LECT DISTINCT clause to the default SQL query.
 Also we can write Custom/Used Defined SQL query which will override the default
query in the Source Qualifier by changing the default settings of the
transformation properties for relational da-tabases.
 Also we have the option to write Pre as well as Post SQL statements to be
executed before and after the Source Qualifier query in the source database.
Since the transformation provides us with the property Select Distinct, when the
Integration Service adds a SELECT DISTINCT clause to the default SQL query, which
in turn affects the number of rows returned by the Database to the Integration
Service and hence it is an Active transformation.
2. What happens to a mapping if we alter the data types between Source and its
corre-sponding Source Qualifier? Answer: The Source Qualifier transformation
displays the Informatica data types. The transformation data types de-termine how
the source database binds data when the Integration Service reads it. Now if we
alter the data types in the Source Qualifier transformation or the data types in
the Source defini-tion and Source Qualifier transformation do not match, the
Designer marks the mapping as invalid when we save the mapping.
3. Suppose we have used the Select Distinct and the Number of Sorted Ports property
in the Source Qualifier and then we add Custom SQL Query. Explain what will happen.
Answer:
Whenever we add Custom SQL or SQL override query it overrides the User-Defined
Join, Source Filter, Num-ber of Sorted Ports, and Select Distinct settings in the
Source Qualifier transformation. Hence only the user defined SQL Query will be
fired in the database and all the other options will be ignored.
60
4. Describe the situations where we will use the Source Filter, Select Distinct and
Number of Sorted Ports properties of Source Qualifier transformation.
Answer:
Source Filter option is used basically to reduce the number of rows the Integration
Service queries, so as to improve performance.
Select Distinct option is used when we want the Integration Service to select
unique values from a source. Filtering out unnecessary data earlier in the data
flow, will improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a
sorted fashion, so as to use the same in some following transformations like
Aggregator or Joiner, those when configured for sorted in-put will improve the
performance.
5. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and
the OUTPUT PORTS order in Source Qualifier transformation do not match?
Answer:
Mismatch or changing the order of the list of selected columns in the SQL Query
override of Source Qualifier to that of the connected transformation output ports
may result is unexpected value result for ports if data types matches by chance,
else will lead to session failure.
6. What happens if in the Source Filter property of SQ transformation we include
keyword WHERE say, WHERE CUSTOMERS.CUSTOMER_ID > 1000.
Answer:
We use Source filter to reduce the number of source records. If we include the
string WHERE in the source filter, the Integration Service fails the session. In
the above case, the correct syntax will be CUSTOM-ERS.CUSTOMER_ID > 1000
7. Describe the scenarios where we go for Joiner transformation instead of Source
Qualifier transformation.
Answer:
While joining Source Data of heterogeneous sources as well as to join flat files we
will use the Joiner trans-formation. Use the Joiner transformation when we need to
join the following types of sources:
 Join data from different Relational Databases.
 Join data from different Flat Files.
 Join relational sources and flat files.
61
8. What is the maximum number we can use in Number of Sorted Ports for Sybase
source system?
Answer:
Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is
Sybase, do not sort more than 16 columns.
9. What is use of Source Qualifier in Informatica? Can we create a mapping without
a source qualifier?
Answer:
Source Qualifier is used to convert the data types of Heterogeneous Source Objects
supported by Informatica to Native Informatica data types, after which Informatica
processes the following ob-jects in a mapping with consistent Informatica data
types.
Also for relational table Source Qualifier helps to join multiple tables from the
same database and also al-lows doing Pre or Post SQL operations.
We cannot create a mapping without Source Qualifier; it is the first transformation
in Informatica that is at-tached with the source tables or source flat file
instance.
10.Suppose we have two tables of same database type, residing in different Database
in-stance. If a Database Link is available, how can we join the two tables using a
Source Qualifier in Informatica provided there are valid join columns.
Answer:
Source Qualifier Override:- SELECT e.empno, e.ename, s.salary, s.comm FROM emp e,
sal@dblinkname s WHERE e.empno=s.empno It is advisable to create a Public Synonym
at Database for the remote tables so that we can avoid using the syntax :
TableName@DBLinkName
11. What is the meaning of “output is deterministic” property in source qualifier
transfor-mation?
Answer:
Output is deterministic means we are informing Informatica that the output does not
change (for the same input) across every session run. Why is this required?
Consider the source is relational and we have enabled the session for recovery. The
session fails and we resume the session. In this
62
case if we have set the source as deterministic, then the session would have
created a cache (on the disc) of the source during normal run to be used for
recovery. This saves time during recovery because we need not issue the SQL command
to the source database again. If this was not set, then the source data cache is
not created during normal run and SQL will be reissued dur-ing recovery. In some
cases, if this property is not set you will not be able to enable recovery for the
session.
How to delete duplicate rows present in relational database using Informatica?
Suppose we have duplicate records in Source System and we want to load only the
unique records in the Target System eliminating the duplicate rows. What will be
the approach?
Answer:
Assuming that the source system is a Relational Database, to eliminate duplicate
records, we can check the Distinct option of the Source Qualifier of the source
table and load the target accordingly.
63
16. Miscellaneous
1. What are the new features of Informatica 9.x in developer level?
Answer:
From a developer's perspective, some of the new features in Informatica 9.x are as
follows:
 Now Lookup can be configured as an active transformation - it can return multiple
rows on success-ful match.
 Now you can write SQL override on un-cached lookup also. Previously you could do
it only on cached lookup.
 You can control the size of your session log. In a real-time environment you can
control the session log file size or time.
 Database deadlock resilience feature - this will ensure that your session does
not immediately fail if it encounters any database deadlock, it will now retry the
operation again. You can configure num-ber of retry attempts.
 Cache can be updated based on a condition or expression.
 New interface for admin console, now onwards called Informatica Administrator.
(Create connection objects, grant permission on database connections, deploy or
configure deployment units from the Informatica Administrator)
 PowerCenter licensing now onwards based on the number of CPUs and repositories.
2. Name the transformations which converts one to many rows i.e. increases the I/P:
O/P row count. Also what is the name of its reverse transformation?
Answer:
Normalizers as well as Router Transformations are two Active transformations which
can increase the num-ber of input rows to output rows.
Aggregator Transformation performs the reverse action of Normalizer transformation.
3. How many ways we can filter records?
Answer:
 Source Qualifier
 Filter transformation
 Router transformation
 Update strategy
4. What are the transformations that use cache for performance?
Answer:
64
Aggregator, Sorter, Lookups, Joiner and Rank transformations use cache.
5. What is the formula for calculation of Lookup/Rank/Aggregator index & data
caches?
Answer:
 Index cache size = Total no. of rows * size of the column in the lookup condition
(50 * 4)
 Aggregator/Rank transformation Data Cache size = (Total no. of rows * size of the
column in the lookup condition) + (Total no. of rows * size of the connected output
ports)
 Aggregator Index cache: #Groups ((Σ column size) + 7)
 Aggregate data cache: #Groups ((Σ column size) + 7)
 Lookup Index Cache : #Rows in lookup table [(Σ column size) + 16)
 Lookup Data Cache: #Rows in lookup table [(Σ column size) + 8]
 Joiner Index Cache: #Master rows [(Σ column size) + 16)
 Joiner Data Cache: #Master row [(Σ column size) + 8]
 Rank Index Cache : #Groups ((Σ column size) + 7)
 Rank Data Cache: #Group [(#Ranks * (Σ column size + 10)) + 20]
6. What is the difference between Informatica PowerCenter and Exchange and Mart?
Answer:
PowerCenter:
 PowerCenter can have many repositories.
 It supports the Global Repository and networked local repositories.
 PowerCenter can connect to all native legacy source systems such as Mainframe,
ERP, CRM, EAI (TIBCO, MSMQ, JMQ)
 High Availability and Load sharing on multiple servers in the grid.
 Informatica Session level Partioning is available.
 Informatica Pushdown Optimizer is available.
PowerMart:
 PowerMart supports only one repository.
 PowerMart can connect to Relational and flat file sources.
PowerExchange:
 PowerExchange Client and PowerExchange ODBC are PowerExchange interfaces to
extract and load data for a variety of data types on a variety of platforms
relational, non-relational, and changed data in batch-mode or real-time using
PowerCenter.
65
 The PowerExchange Client for PowerCenter is installed with PowerCenter and
integrates PowerExchange(Separate License for the required source system; Check
Sources->Import from PowerExchange) and PowerCenter to extract relational, non-
relational, and changed data.
7. How do we handle delimiter character as a part of the data in a delimited source
file?
Answer:
For delimiter files the delimiter is the separator that identifies the data values
of fields present in the file. So ideally if the data file contains the delimiter
character as a part of the data in a field value, the field value either remains
within double or single quotes or an escape character precedes the delimiter that
is actually to be treated as a normal character. To handle the same flat-files in
Informatica, use the following options as per the data file format while defin-ing
the file structure. 1. Select Optional Quotes to Double or Single Quote. The column
delimiters within the quote characters are ignored. 2. Escape Character used to
escape the delimiter or quote character. Escape character preceding the delimiter
character in an unquoted string or the quote character in a quoted string is
treated as regular character.
8. We have just received source files from UNIX. We want to stage that data to ETL
process. What are the points we need to look for?
Answer:
When a source flat file is loaded to a staging database table, generally we focus
on the below items:
 Define proper file-format for the input file (Delimited/Fixed-width), Code Page
etc.
 Header information having any Processing date to be checked with sysdate or some
other business logic.
 Check the detail records count in the file with the information in the Trailer
information if any.
 Sum of any measure fields of detail records matches with Header/Trailer
information if any.
 In case of Indirect Loading we can add the filename and record number in file as
part of columns in the staging table.
Basically everything depends on your/business requirement.
9. What is the difference between Joiner and Lookup. Performance wise which one is
better to use.
Answer:
Joiner:
66
 Only “=” operator can be used in join condition
 Supports normal, full outer, left/right outer join
 Active transformation
 No SQL override
 Connected transformation
 No dynamic cache
 Heterogeneous source
Lookup:
 =, <, <=, >, >=, != operators can be used in join condition
 Supports left outer join
 Earlier a Passive transformation, 9 onwards an Active transformation (Can return
more than 1 rec-ords in case of multiple match)
 Supports SQL override
 Connected/Unconnected
 Supports dynamic cache update
 Relational/FlatFile source/target
 Pipeline Lookup
Selection between these two transformations is completely dependent on project
requirement. It’s a debat-able topic to conclude which one among these two serves
good in terms of performance.
10.What is the B2B in Informatica? How can we use it in Informatica?
Answer:
B2B allows to parse and read unstructured data such as PDF, EXCEL, HTML etc. It has
the capability to read binary data such as Messages, EBCDIC File etc. and has a
very large list of supported formats. B2B Data Transformation Studio is the
Developer tool, by which the parsing of (reading) the unstructured da-ta is done.
B2B mostly gives the output as an XML file. B2B Data Transformation is integrated
with Informatica PowerCenter using a Transformation "Unstructured Data
Transformation", This transformation can receive the output of B2B Data
Transformation studio and load into any Target supported by PowerCenter.
11. What is CDC, SCD and MD5 in Informatica?
Answer:
 CDC - Changed Data Capture. How, only the changed data is captured from the
Source System.
 SCD- Slowly Changing Dimension. How, history data is maintained in the Dimension
tables.
 MD5- MD5 Checksum Encoding. It generates 32 character HEX code encoding, can be
used to decide Insert/Update strategy for target records.
67
12.How can we implement an SCD Type2 mapping without using a lookup transformation?
Answer:
The entire implementation will be same as that using a lookup. The only thing we
need to replace the Lookup transformation with a Joiner transformation. In the
Joiner transformation the Source table will be used as Master and the Target table
as Detail. The join condition will be same as that of lookup condition and the join
type being Detail Outer Join.
13.How does Joiner and Lookup transformation treat NULL value matching?
Answer:
A NULL value is not equal to another NULL value in Joiner whereas, Lookup
transformation matches null val-ues.
14.Does Microsoft SQL server supports bulk loading? If yes, What happens when you
specify bulk mode and data driven for SQL server target
Answer:
Yes MS SQL Server supports Bulk Loading. But if we select Treat Source Rows as Data
Driven with the Target Load Type as Bulk then the session will fail. We have to
select Normal Load with Data Driven source records.
15.How can you utilize COM components in Informatica?
Answer:
By writing C+, VB, VC++ code in External Stored Procedure Transformation
16.What is SQL transformation in Informatica?
Answer:
A SQL transformation can processes any SQL queries midstream in an Informatica
pipeline. It supports mostly all the DDL, DML, DCL, TCL.
For quick reference following are some important notes:-
 We can configure the SQL transform in two modes that makes it Active/Passive.
 Active, Query mode fires the SQL query in the database defined in the
transformation.
 Script mode, which is the Passive, one can call external SQL scripts to be
executed.
68
 Query mode can be configured to handle Static SQL Query (i.e. the SQL query is
the same with bind variables) or Dynamic SQL Query (i.e. different query statements
for each input row).
 In case of Dynamic Query when we substitute the entire SQL query of the
Query_Port is called Full Query or portion of the query statement called Partial
Query.
 We can configure the SQL transformation to connect to a database with a Static
Connection (i.e. se-lecting a particular connection object) or Dynamic Connection
(i.e. based on the logic it will dynami-cally select the connection object to
connect to a database).
Also we can pass the entire database connection information (i.e.
username,password, connectstring, codepage) called Full Database Connection.
17.What is a XML source qualifier?
Answer:
The XML source qualifier represents the data elements that the Informatica server
reads when it runs a ses-sion with XML sources.
18.What is the “metadata extensions” tab in Informatica?
Answer:
PowerCenter allows end users and partners to extend the metadata stored in the
repository by associating information with individual objects in the repository.
That why it’s called Metadata Extension.
For example, when we create a mapping, we can store the information like the
mapping functionality, busi-ness user information, CR information. Similarly for
Session we can store schedule information, contact per-son for failed session
information. We basically associate the information with repository metadata using
metadata extensions.
When we create reusable metadata extensions for a repository object using the
Repository Manager, the metadata extension becomes part of the properties of that
type of object. For example, we can create a re-usable metadata extension for
source definition called SourceCreator. When we create or edit any source
definition in the Designer, the SourceCreator extension appears on the Metadata
Extensions tab. anyone who creates or edits a source can enter the name of the
person that created the source into this field.
PowerCenter Client applications can contain the following types of metadata
extensions:-
 Vendor-defined. Third-party application vendors create vendor-defined metadata
extensions. We can view and change the values of vendor-defined metadata
extensions, but we cannot create, de-lete, or redefine them.
 User-defined. We create user-defined metadata extensions using PowerCenter. We
can create, edit, delete, and view user-defined metadata extensions. We can also
change the values of user-defined extensions.
All metadata extensions exist within a domain. We see the domains when we create,
edit, or view metadata extensions. Vendor-defined metadata extensions exist within
a particular vendor domain. If we use third-
69
party applications or other Informatica products, we may see domains such as Ariba
or PowerExchange for Siebel. We cannot edit vendor-defined domains or change the
metadata extensions in them. User-defined metadata extensions exist within the User
Defined Metadata Domain. When we create metadata extensions for repository objects,
we add them to this domain.
Both vendor and user-defined metadata extensions can exist for the repository
objects- Source definitions, Target definitions, Transformations, Mappings,
Mapplets, Sessions, Tasks, Workflows, Worklets.
19.Describe some of the ETL Best Practices
Answer:
A lot of best practices may be applicable to a certain tool and pointless for the
other. In a very high level and in a very tool independent way-
 Naming conventions for ETL objects
 Naming conventions for Database objects
 Parameterization of connections (so that things are easy for moving from 1
environment to other)
 Maintaining of ETL job log - ideally automated maintenance through logging of job
run
 Handling of rejected records (and logging)
 Data reconciliation
 Meta data management- e.g. - maintaining Meta data columns in tables (Use of
Audit columns e.g. load date/ load user/ batch id etc.)
 Error reporting
 ETL job Performance evaluation
 Following generic coding standards
 Documentation
 Decomposing complex logic in multiple ETL stages - load balancing (pushdown
optimization wherev-er applicable) etc.
 Removal of unwanted ports from different transformations used in a mapping
 Using Shortcuts for source, target and lookups
 Using mapplet, worklet as and when required
 Write some comments for every transformation
 Use Decode function rather that “if than else”
 make sure that the sorted data is moved into the aggregator transformation
 If the target table is having indexes, loading data into such tables will
decrease the performance; in such situations, use pre SQL to drop the index before
loading the data into target tables and once the data is loaded then, re-create the
index using post SQL.
20.Is there a scope of cloud computing in Data warehousing technology?
Answer:
This is not only possible; in fact, this is the way to go for many of the providers
of the modern day BI tools. There are certain advantages and benefits of using
cloud computing for Business Intelligence applications and this is a big topic of
discussion today. I will quickly touch upon a few points that will substantiate the
need of Cloud BI and in the future I will try to make a comprehensive article post
in this website with more details. First, if you see the current state of BI -
there are these typical characteristics:
70
 High Infrastructure requirement, leading to high upfront investment
 High development cost (needs special talent) as well high maintenance cost
 Unpredictable workload (data volume), and skewed business growth pattern
All these lead to the issues of longer cycle time and limited adoption of BI
solutions. Now cloud platform, as opposed to typical in-house software platform, is
basically an alternative delivery method for the software service. When you deliver
the software or platform or infrastructure (as a service) through cloud, you can
instantly start to get the following benefits:
 Lower entry cost
 Lower maintenance cost (pay as you use)
 Faster deployment
 Reduced risk
 Lower TCO (total cost of ownership)
 Multiple deployment model etc. etc.
Moreover, Small and medium enterprises (SMEs) can easily adapt to this model given
their typical con-straints of small business. Companies like Pentaho etc. are
already “in” with their products in SaaS (soft-ware as a service) model of cloud
computing. But cloud models like SaaS has some typical problems (e.g. no
flexibility of design, security concerns etc.).
As opposed to SaaS model, we have another cloud model called PaaS - Platform as a
service - which has the benefit of design flexibility. PaaS is very suitable for
custom applications and even enterprise level BI applications. This cloud service
is being offered by almost everyone in the BI market - - BusinessObjects - SAS -
Microsoft Azure (check here: http://en.wikipedia.org/wiki/SQL_Azure ) - Vertica -
Greenplum etc.
71
17. Mapping
Suppose we have a source port called ename with data type varchar(20) and the
corresponding target port as ename with varchar(20). The data type is now altered
to varchar(50) in both source and target database. Describe the changes required to
modify the mapping.
Answer:
Reimport the source and target definition. Next open the mapping and Right click on
the source port ename and use "Propagate Attribute" option. This option allows us
to change the properties of one port across mul-tiple transformations without
manually modifying the port in each and every transformation. We can choose the
direction of propagation (forward / backward / both) and can also select attributes
of propagation e.g. data type, scale, precision etc.
2. What are mapping parameters and variables?
Answer:
A mapping parameter is a user definable constant that takes up a value before
running a session. It can be used in SQ expressions, Expression transformation etc.
A mapping variable is also defined similar to the parameter except that the value
of the variable is subjected to change. It picks up the value in the following
order.
 From the Session parameter file
 As stored in the repository object in the previous run
 As defined in the initial values in the designer
 Data type Default values
3. Which type of variables or parameters can be declared in parameter file? $, $$,
$$$ - Can all be declared or not.
Answer:
There is a difference between variable and parameter.
 Variable, as the name suggests, is like a variable value which can change within
a session run.
 Parameters are fixed and their values don't change during session run.
 $ - for session level parameters which can be declared in parameter files.
 $$ - for mapping level parameters which can be declared in parameter files.
 $$$- Inbuilt Informatica system variables that cannot be declared in parameter
files
E.g. $$$SessStartTime these are constant throughout the mapping and cannot be
changed.
72
Read this article to get a detail understanding:http://www.dwbiconcepts.com/etl/14-
etl-informatica/74-stop-hardcoding-follow-parameterization-technique.html
4. What are the default values for variables?
Answer:
 String = Null
 Number = 0
 Date = 1/1/1753
5. What does first column of bad file (rejected rows) indicates?
Answer:
 First Column - Row indicator (0, 1, 2, 3)
 Second Column – Column Indicator (D, O, N, T)
6. Out of 100000 source rows some rows get discard at target, how will you trace
them and where it gets loaded?
Answer:
 Rejected records are loaded into bad files. It has record indicator and column
indicator.
 Record indicator identified by (0-insert,1-update,2-delete,3-reject) and
 Column indicator identified by (D-valid,O-overflow,N-null,T-truncated).
 Normally data may get rejected in different reason due to transformation logic
7. What is Reject loading?
Answer:
During a session, the Informatica server creates a reject file for each target
instance in the mapping. If the writer or the target rejects data, the Informatica
server writes the rejected row into reject file. The reject file and session log
contain information that helps you determine the cause of the reject. You can
correct reject files and load them to relational targets using the Informatica
reject load utility. The reject loader also cre-ates another reject file for the
data that the writer or target reject during the reject loading.
Reject Loading
During a session, the server creates a reject file for each target instance in the
mapping. If the writer of the target rejects data, the server writers the rejected
rows into the reject file. You can correct those rejected
73
data and re-load them to relational targets, using the reject loading utility. (You
cannot load rejected data in-to a flat file target) Each time, you run a session,
the server appends a rejected data to the reject file.
Locating the BadFiles
 $PMBadFileDir / Filename.bad
When you run a partitioned session, the server creates a separate reject file for
each partition.
Reading Rejected data
Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things.
 Row indicator - Row indicator tells the writer, what to do with the row of wrong
data.
Row indicator Meaning Rejected By
o 0 Insert Writer or target
o 1 Update Writer or target
o 2 Delete Writer or target
o 3 Reject Writer
If a row indicator is 3, the writer rejected the row because an update strategy
expression marked it for reject.
 Column indicator - Column indicator is followed by the first column of data, and
another column in-dicator. They appears after every column of data and define the
type of data preceding it
Column Indicator Meaning Writer Treats as
o D Valid Data Good Data. The target accepts it unless a database error occurs,
such as finding duplicate key.
o Overflow Bad Data.
o N Null Bad Data.
o T Truncated Bad Data
NOTE: NULL columns appear in the reject file with commas marking their column.
Correcting Reject File
Use the reject file and the session log to determine the cause for rejected data.
Keep in mind that correcting the reject file does not necessarily correct the
source of the reject. Correct the mapping and target database to eliminate some of
the rejected data when you run the session again. Trying to correct target rejected
rows before correcting writer rejected rows is not recommended since they may
contain misleading column indicator. For example, a series of “N” indicator might
lead you to believe the target database does not accept NULL values, so you decide
to change those NULL values to Zero. However, if those rows also had a 3 in row
indicator. Column, the row was rejected b the writer because of an update strategy
expression, not because of a target database restriction. If you try to load the
corrected file to target, the writer will again re-ject those rows, and they will
contain inaccurate 0 values, in place of NULL values.
74
8. Why Informatica writer thread may reject a record?
Answer:
 Data overflowed column constraints
 An update strategy expression
9. Why target database can reject a record?
Answer:
 Data contains a NULL column
 Database errors, such as key violations
10.Describe various steps for loading reject file?
Answer:
 After correcting the rejected data, rename the rejected file to reject_file.in
 The rejloader used the data movement mode configured for the server. It also used
the code page of server/OS. Hence do not change the above, in middle of the reject
loading
 Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name]
11.Variable v1 has values set as 5 in designer (default), 10 in parameter file, and
15 in reposi-tory. While running session which value Informatica will read?
Answer:
Informatica read value 15 from repository
12.What are shortcuts? Where it can be used? What are the advantages?
Answer:
There are 2 shortcuts (Local and global) Local used in local repository and global
used in global repository. The advantage is reusing an object without creating
multiple objects. Say for example a source definition want to use in 10 mappings in
10 different folders without creating 10 multiple source you create 10 shortcuts.
75
13.Can we have an Informatica mapping with two pipelines, where one flow is having
a Transaction Control transformation and another not. Explain why?
Answer:
No it is not possible. Whenever we have a Transaction Control transformation in a
mapping, the session commit type is ‘User Defined’. Whereas for a pipeline without
the Transaction Control transform, the session expects the commit type to be either
Source based or Target based.
Hence we cannot have both the pipelines in a single mapping; rather we have to
develop single mappings for each of the pipelines.
14.How can we implement Reverse Pivoting using Informatica transformations?
Answer:
Pivoting can be done using Normalizer transformation. For reverse-pivoting we will
need to use an aggrega-tor transformation like below:
From,
Col1
Col2
A
10
B
20
To,
Col1
Col2
A
B
10
20
can be done using one Expression transformation and one Aggregator transformation:
In Expression transform, create two ports, o_col_a, o_col_b. o_col_a = IIF
(col1="A", ColB, 0) o_col_b = IIF (col1="B", ColB, 0)
Next in the aggregator transform, take the MAX () of o_col_a, o_col_b and map it to
target A and B columns. (We may need to take SUM (), instead of MAX () if we have
multiple A, B rows)
15.Is it possible to update a Target table without any key column in target?
Answer:
76
Yes it is possible to update the target table either by defining keys at
Informatica level in Warehouse designer or by using Update Override.
77
18. Mapplet
1. What is a Mapplet?
Answer:
Mapplets are reusable objects that represent collection of transformations.
2. What is the difference between Reusable transformation and Mapplet?
Answer:
Any Informatica Transformation created in the Transformation Developer or a non-
reusable pro-moted to reusable transformation from the mapping designer which can
be used in multiple mappings is known as Reusable Transformation. When we add a
reusable transformation to a mapping, we actually add an instance of the
transformation. Since the instance of a reusable transformation is a pointer to
that transformation, when we change the transformation in the Transformation
Developer, its instances reflect these changes.
A Mapplet is a reusable object created in the Mapplet Designer which contains a set
of transformations and lets us reuse the transformation logic in multiple mappings.
A Mapplet can contain as many transformations as we need. Like a reusable
transformation when we use a mapplet in a mapping, we use an instance of the
mapplet and any change made to the mapplet in Mapplet Designer, is inherited by all
instances of the mapplet.
3. What are the transformations that are not supported in Mapplet?
Answer:
 Normalizer
 Cobol sources
 XML sources
 XML Source Qualifier
 Target definitions
 Pre- and Post- session Stored Procedures
 Other Mapplet
4. Is it possible to convert reusable transformation to a non-reusable one?
Answer:
Reusable transformations are created in the Transformation Developer. Another way
is to promote a non-reusable transformation in a Mapping/Mapplet to reusable one.
78
**Converting a non-reusable transformation into a reusable transformation is not
reversible. But we can use the reusable transformation as a non-reusable one in any
mapping or mapplet by dragging the selected Reusable Transform from the Repository
Navigator and press the Ctrl key just before dropping the object in the
Mapplet/Mapping designer. The same applies for creating a non-reusable session from
a reusable one in the Worklet/Workflow designer.
5. What is the use of Mapplet & Worklet in project?
Answer:
Mapplet and Worklets allow you to create reusable objects and thus make your
informatica code reusable. Just like a procedure or function in a procedural
language, we can build a mapplet or worklet, to incorporate a business logic, which
can be used again and again in different mapping and workflow. Mapplet can be
created in PowerCenter Designer and reused in mapings. Worklet can be created in
Work-flow Manager and reused in Workflows.
6. Is it possible to have a mapplet within a mapplet and worklet within a worklet?
Answer:
Informatica does not support mapplet within a mapplet transformation but it
supports worklet within a worklet.
79
19. Session
1. What is Session and Batches?
Answer:
SESSION - A Session is a set of instructions that tells the Informatica Server /
Integration Service, how and when to move data from Sources to Targets. After
creating the session, we can use either the server manag-er or the command line
program pmcmd to start or stop the session.
BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server. There Are Two Types Of Batches:
 SEQUENTIAL - Run Session One after the Other.
 CONCURRENT - Run Session at the Same Time.
2. What are various session tracing levels?
Answer:
Normal - default Logs initialization and status information, errors encountered,
skipped rows due to transformation errors, summarizes session results but not at
the row level.
Terse - Log initialization, error messages, notification of rejected data.
Verbose Initialization - In addition to normal tracing levels, it also logs
additional initialization information, names of index and data files used and
detailed transformation statistics.
Verbose Data - In addition to verbose initialization, it records row level logs.
3. Can we copy a session to new folder or new repository?
Answer:
Yes we can copy session to new folder or repository, provided the corresponding
Mapping is already in the folder or repository.
4. Is it possible to store all the Informatica session log information in a
database table? Normally the session log is stored as a binary compression .bin
file in SessLogs directory. Can we store the same information in database tables
for future analysis?
80
Answer:
It is not possible to store all the session log information in some table. Along
with error related in-formation we may get some other session related information
from metadata repository tables like REP_SESS_LOG.
To capture error data, we can configure the session as below: Go to Session->Config
Object-> Error Handling Section Give the setting- Error Log Type: Relational
Database. Error Log Type: Give the Database Connection, where we want to store the
error tables. Error Log Table Name Prefix: Prefix for the error tables. By default,
Informatica creates 4 different error ta-bles. If we provide a prefix here the
error tables will be created with the same prefix in the database. Log Row Data:
This option is used to log the data at the point where the error happened. Log
Source Row Data: Capture the source date for the error record. Log Source Row Data:
Error data will be stored into a single column of the database table. We can
specify the delimiter for the source data here. List of Error tables created by
Informatica: PMERR_DATA. Stores data and metadata about a transformation row error
and its corresponding source row. PMERR_MSG. Stores metadata about an error and the
error message. PMERR_SESS. Stores metadata about the session. PMERR_TRANS. Stores
metadata about the source and transformation ports, such as name and data type,
when a transformation error occurs.
The above tables are specifically used to store the information about exception
(error) records - e.g. records in the reject file. We can use this as a base for
error handling strategy. But this does not contain all the information that are
present in session log - like performance details (thread busy percentage), details
of the transformation in-voked in the session etc. We can also check the contents
of REP_SESS_LOG view under Informatica reposito-ry schema; however, that too does
not contain all the information.
5. Can we call a shell script from session properties?
Answer:
The Integration Service can execute shell commands at the beginning or at the end
of the session. The Work-flow Manager provides the following types of shell
commands for each Session task:
 Pre-session command
 Post-session success command
 Post-session failure command
Use any valid UNIX command or shell script for UNIX nodes, or any valid DOS or
batch file for Windows nodes. Configure the session to run the pre- or post-session
shell commands.
81
6. Can we change the Source and Target table names in Session level?
Answer:
Yes, we can change the source and target table names in the session level. Go to
the session and navigate to the mapping tab. Select the source/target to be
changed- for target mention new table name in “Target Table Name” & for source
choose “Source Table Name”.
One more suitable method would be to parameterize the source and target table name.
We can run the same mapping concurrently using different parameter files. We have
to enable concurrent run mode in the Workflow level. Also find more information
regarding parameterization.
7. How to write flat file column names in target?
Answer:
There are two options available in session properties to take care of this
requirement. For this, Go to Map-ping Tab Target Properties and Choose the header
option as Output Field names OR Use Header Command output File.
Option 1, will create your output file with a header record and the column heading
names will be same as your Target transformation port names.
Option 2, we can create our command to generate the header record text. We can use
an 'echo' command here to get this created. Here is an example
echo '"Employee ID"|"Department ID"'
It is recommended using the second option as it gives more flexibility for writing
the column names.
8. What are the ERROR tables present in Informatica?
Answer:
 PMERR_DATA- Stores data and metadata about a transformation row error and its
corresponding source row.
 PMERR_MSG- Stores metadata about an error and the error message.
 PMERR_SESS- Stores metadata about the session.
 PMERR_TRANS- Stores metadata about the source and transformation ports, such as
name and data type, when a transformation error occurs.
9. What are the alternate ways to stop a session without using “STOP ON ERRORS”
option set to 1 in session properties?
Answer:
We can also use the functions STOP () or ERROR () in an expression transformation
to stop the execution of a session based on some user-defined conditions.
82
10. Suppose a session fails after loading of 10,000 records in the target. How can
we load the records from 10,001 when we run the session next time?
Answer:
If we configure the Session for Normal load rather than Bulk load & by using
Recovery Strategy in the Session Properties & selecting the Option “Resume from
last Check point”, then we can run the Session from the last Commit Interval.
In this case if we specify the Commit Interval as 10,000 & the Integration Service
issues a commit after load-ing 10,000 records then you can load the records from
10,001.
If 9999 rows were loaded and the session fails and Integration Service did not
issue any commit as the Com-mit Interval in this case is 10,000 then we cannot
perform Recovery. In this case truncate the Target Table & Restart the session.
11. Define the types of Commit intervals apart from user defined?
Answer:
The different commit intervals are:
Target-based commit. The Informatica Server commits data based on the number of
target rows and the key constraints on the target table. The commit point also
depends on the buffer block size and the commit in-terval.
Source-based commit. The Informatica Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the session
properties.
12.Suppose session is configured with commit interval of 10,000 rows and source has
50,000 rows explain the commit points for source based commit & target based
commit. Assume appropriate value wherever required?
Answer:
 Target Based commit (First time Buffer size full 7500 next time 15000)
Commit Every 15000, 22500, 30000, 40000, 50000
 Source Based commit(Does not affect rows held in buffer)
Commit Every 10000, 20000, 30000, 40000, 50000
83
13.How to capture performance statistics of individual transformation in the
mapping and explain some important statistics that can be captured?
Answer:
Use tracing level Verbose data.
14.How can we parameterize success or failure email list?
Answer:
We can parameterize the email user list and modify the values in parameter file.
Use $PMSuccessEmailUser, $PMFailureEmailUser. Also we can use pmrep command to
update the email task: updateemailaddr -d <folder_name> -s <session_name> -u
<success_email_address> -f <failure_email_address>
15.Is it possible that a session failed but still the workflow status is showing
success?
Answer:
If the workflow completes successfully it will show the execution status of success
irrespective of whether any session within the workflow failed or not. The workflow
success status has nothing to do with session failure. If and only if we set the
session general option in the workflow designer Fail Parent if this task fails,
then only the workflow status will display as failed on session failure.
16.What is Busy Percentage?
Answer:
Duration of time the thread was occupied compared to total run time of the mapping.
So let’s say, we have one writer thread - this thread is internally responsible for
writing data to the target ta-ble/ file. Now if our mapping runs for 100 seconds
but the time taken by the mapping to write the data to the target is only 20
seconds (because other time it was busy in reading/ transforming the data), then
busy percentage of the writer thread is 20%
84
17.Can we write a PL/SQL block in pre and post session or in target query override?
Answer:
Yes we can. Remember always to put a backslash (\) before any semi-colon ( ; ) we
use in the PL-SQL block.
18.Whenever a session runs does the data gets overwritten in a flat file target? Is
it possible to keep the existing data and add the new data to the target file?
Answer:
Normally with every session run target file data will be overwritten, except if we
select “Append if Exist” (8x onwards) option for the Target session Property which
will append the new data to the existing data in the flat file target.
19.Can we use the same session to load a target table in different databases having
same target definition?
Answer:
Yes we can use the same session to load same target definition in different
databases with the help of the Parameterization; i.e. using different parameter
files with different values for the parameterized Target Con-nection object
$DBConnection_TGT and Owner/Schema name Table Name Prefix with
$Param_Tgt_Tablename. To run the single workflow with the session, to load two
different database target tables we can consider using Concurrent workflow
Instances with different parameter files.
Even we can load two instance of the same target connected in the same pipeline. At
the session level use different relational connection object created for different
Databases.
20.How do you remove the cache files after the transformation?
Answer:
After session complete, DTM remove cache memory and deletes caches files. In case
using persistent cache and Incremental aggregation then caches files will be saved.
21.Why doesn't a running session QUIT when Oracle or Sybase return fatal errors?
Answer:
The session will only QUIT when its threshold: "Stop on errors" is set to 1.
Otherwise the session will contin-ue to run.
85
22.If we have written a source override query in source qualifier in mapping level
but have modified the query in session level SQL override then how integration
service behaves. Answer:
Informatica Integration Service treats the Session Level Query as final during the
session run. If both the que-ries are different Integration Service will consider
the Session level query for execution and will ignore the Mapping level query.
86
20. Workflow
1. What is the difference between STOP and ABORT options in Workflow?
Answer:
When we issue the STOP command on the executing session task, the Integration
Service stops reading data from source. It continues processing, writing and
committing the data to targets. If the Integration Service cannot finish processing
and committing data, we can issue the abort command.
In contrast ABORT command has a timeout period of 60 seconds. If the Integration
Service cannot finish pro-cessing and committing data within the timeout period, it
kills the DTM process and terminates the session.
We can stop or abort tasks, worklets within a workflow from the Workflow Monitor or
Control task in the workflow or from command task by using pmcmd stop or abort
command. We can also call the ABORT function from mapping level.
When we stop or abort a task, the Integration Service stops processing the task and
any other tasks in the path of the stopped or aborted task. The Integration Service
however continues processing concurrent tasks in the workflow. If the Integration
Service cannot stop the task, we can abort the task.
The Integration Service aborts any workflow if the Repository Service process shuts
down.
2. Running Informatica Workflow continuously – How to run a workflow continuously
until a certain condition is met?
Answer:
We can schedule a workflow to run continuously. A continuous workflow starts as
soon as the In-tegration Service initializes. If we schedule a real-time session to
run as a continuous workflow, the Integration Service starts the next run of the
workflow as soon as it finishes the first. When the workflow stops, it restarts
immediately.
Alternatively for normal batch scenario we can create conditional-continuous
workflow as below.
Suppose wf_Bus contains the business session that we want to run continuously until
a certain conditions is meet before it stops, may be presence of file or particular
value of workflow variable etc.
So modify the workflow as Start-Task followed by Decision Task which evaluates a
condition to be TRUE or FALSE. Based on this condition the workflow will run or
stop.
Next use the Link Task to link the business session for $Decision.Condition=TRUE.
For the other part use a Command Task for $Decision.Condition=FALSE.
In the command task create a command to call a dummy workflow using pmcmd
functionality. e.g. "C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe"
startworkflow -sv IS_info_repo8x -d Domain_hp -u info_repo8x -p info_repo8x -f
WorkFolder wf_dummy
Next create the dummy workflow name it as wf_dummy. Place a Command Task after the
Start Task.
87
Within the command task put the pmcmd command as
"C:\Informatica\PowerCenter8.6.0\server\bin\pmcmd.exe" startworkflow -sv
IS_info_repo8x -d Domain_sauravhp -u info_repo8x -p info_repo8x -f WorkFolder
wf_Bus
In this way we can manage to run a workflow continuously. So the basic concept is
to use two workflows and make them call each other.
3. How do we send emails from Informatica after the successful completion of one
session? The email will contain the job name/ session start time and session end
time in the mes-sage body.
Answer:
The first thing is to have "mail" utility configured in the Informatica server
(UNIX/WINDOWS).
After that, we will use the Informatica Email Task. We can create a email task and
call it in the session level “On Success Email”. Here we can use Informatica pre-
build variables like- mapping name (%m), session start time (%b) etc.
How to pass a value calculated in mapping variable to the email message. The email
will be sent in HTML format with a predefined message in which one value will be
populated from one mapping variable. Sup-pose, the predefined message is:
<html> <body>
The last transaction service ID is: <informatica_variable>
</body> </html>
In the place of <informatica_variable>, the value of the mapping variable at the
end of the session will go.
Answer:
We cannot use a mapping variable in Workflow or Session level. It is local to a
mapping. Instead, we have to use a Workflow variable for this purpose. But, we
cannot pass the value of the Mapping Variable to the Workflow variable directly
from your mapping.
1) Write the calculated value in some Flat File using your mapping say "value.txt".
2) Create a shell script say "mail.sh" to send the 2nd mail. Read the value from
the "value.txt" into a variable in "mail.sh". Use this variable in the body of the
mail.
3) Create a Cmd task in the WF level. Call this "mail.sh" in that Cmd task.
4) Use this Cmd task upstream of your actual session and link it on its success.
5. How can we send two separate emails after a successful session run?
Answer:
88
The problem is we cannot call two email tasks from one session i.e. from session
level “On Success Email”. So, for the second email we can create another Email Task
following the Session using and link them using Link Task with execution condition
as status=SUCCEEDED.
6. What is Cold Start in Informatica?
Answer:
In general terms, “Cold Start” means ‘To start a program from the very beginning,
without being able to con-tinue the processing that was occurring previously when
the system was interrupted.’ With respect to Informatica, we can resume a stopped
or failed real-time session. To resume a session, we must restart or recover the
session. The Integration Service can recover a session automatically if you ena-
bled the session for automatic task recovery. When you restart a session, the
Integration Service resumes the session based on the real-time source. Depending on
the real-time source, it restarts the session with or without recovery.
We can restart a task or workflow in cold start mode. When you restart a task or
workflow in cold start mode, the Integration Service discards the recovery
information and restarts the task or workflow. For e.g. if a workflow failed in
between and we don't want to recover data because we manually did all clean up of
data in the impacted target tables. If workflow recovery is enabled then we can opt
for a cold start which will skip recovery task. Cold start will remove all recover
data if any stored when session failed.
 When we restart a stopped or failed task or workflow that has recovery enabled in
cold start mode, the Integration Service discards the recovery information and
restarts the task or workflow.
 Cold Start Task, Cold Start Workflow or Cold Start Workflow from Task commands
can be executed from the Workflow Manager, Workflow Monitor, or pmcmd command line
programs.
 If we restart a session in cold start mode, targets may receive duplicate rows.
 So avoid cold start and restart the session with recovery to prevent data
duplication.
 So if recovery is not enabled in a session, then there is no difference between
cold start and restart.
Email - I have a llist of 10 peoples in email after session failure. can we edit
the list emails dynamically - I mean can we add or delete email ID without touching
the mapping.
Answer:
We can parameterize the email user list and modify the values in parameter file.
Use $PMSuccessEmailUser, $PMFailureEmailUser. Also you can use pmrep command to
update the email task:
updateemailaddr -d <folder_name> -s <session_name> -u <suc-cess_email_address> -f
<failure_email_address>
You can create a distribution list and use that DL in the session failure cmd. What
so ever emails will be listed in the DL will receive the mail. Later on you can
add/remove the emails in the DL depending upon your re-quirement.
89
8. We know there are 3 options for Session recovery strategy - Restart task, Fail
task and continue running the workflow, Resume from last checkpoint whenever a
session fails. How do we restart a workflow automatically without any manual
intervention in the event of session failure?
Answer:
Select “Automatically recover terminated tasks” option in workflow properties. Also
we can specify the max-imum number of auto attempts in the workflow property
“Maximum automatic recovery attempts”.
9. What is the difference Real-time and continuous workflows?
Answer:
Real-time Workflow is source XML Message triggered workflow, whereas if any
workflow which runs contin-uously using two workflows and command line arguments to
call each other.
Suppose we have two workflows workflow 1 (wf1) having two sessions (s1, s2) and
workflow 2 (wf2) having three sessions (s3, s4, s5) in the same folder, like below
wf1: s1, s2 wf2: s3, s4, s5
How can we run s1 first then s3 after that s2 next s4 and s5 without using pmcmd
command or unix script?
Answer:
Use Command Task or Post Session Command to create touch file and use Event Wait
Task to wait for the file (Filewatch Name). Combination of Command Task and Event
Wait will help to solve the problem. WF1----->S1------>CMD1----->EW2------
>S2------->CMD3 WF2----->EW1--->S3--------->CMD2----->EW3---->S4------>S5
So run both the workflows, session s1 starts and after successful execution calls
command task cmd1. cmd1 generates a touch file say s3.txt
After that the execution passes to event wait ew2. Immediately event wait ew1 will
start to process session s3 after the file s3.txt was generated. Next after success
of session s3 it will pass the control to command task cmd2 which in turn will
generate a touch file say s2.txt and passes the control to event wait task ew3.
Immediately at the same time the event wait ew2 gets started after receiving the
event wait file s2.txt and passes the control to session s2. After completion of
session s2 it triggers command task cmd3 which in turn generates a wait file s4.txt
and the workflow wf1 ends. On the other hand the event wait ew3 gets triggered with
wait file s4.txt in place and calls the session s4 which in turn after success
triggers the last session s5 and the workflow wf2 completes.
90
12.How do we send a session failure mail with the workflow or session log as
attachment?
Answer:
Design an Informatica email task to send email communication in the event of
session failure and used email variable %g to attach the corresponding session log.
Email Variables: (%g) - To attach session log. (%a<>) - To attach any file,
Absolute path needs to be given <>.
13.Explain deadlock in Informatica and how do we resolve it?
Answer:
In Database level deadlock normally occurs when two concurrent user sessions are
trying to ap-ply a DML command for same row in a table. Say for example, below
query got executed by us-er1 in session1
update emp set deptno=20 where deptno=10; Before user1 is commits the transaction,
if user2 from session2 execute the same query as below , it causes deadlock error.
update emp set deptno=30 where deptno=10;
In informatica normally deadlock occurs when two sessions are updating or deleting
records from a table in parallel, (parallel insert is not a problem). One option to
avoid deadlock is to identify those sessions and make them sequential. Another
option is to make use of the session level properties such as ‘deadlock retry
limits’ and ‘deadlock recovery option’
Busy Percentage is given by (runtime-idle time) * 100 / runtime. If a thread is
having 0 idle time, which means more Busy Percentage. So do we need to tune that
thread component? Why is it like that? So does it means we need to tune the thread
whose busy percentage (BP) is more or the one having more idle time.
Answer:
3 persons are asked to run 1 mile each. Each one of them is allotted 20 minutes of
time. First person com-pletes 1 mile in 5 minutes and stands idle other 15 minutes
of his allotted time. The 2nd person completes it in 10 minute and sits idle the
rest 10 minute. The last one takes all 20 minutes and idle for 0 minutes. Who is
the worst performer? Isn't it the last person who had no idle time? It's the same
for a thread with 0 idle time.
91
15.How can we pass a value from one workflow to another?
Answer:
Pass the Workflow variable value to a session variable in pre-assignment and then
next to mapping parame-ter. Next develop a mapping to generate a parameter file
with the desired value as a workflow variable that can be passes to the next
workflow using this parameter file.
Alternatively, develop the mapping to store the value in a flat file or Database
table. Next create another mapping to use that in the next workflow by passing it
to the session in post-assignment and then to work-flow level if required.
92
21. Administration
1. What is Load Manager?
Answer:
The load Manager performs the following tasks
 Manages session and batch scheduling.
 Locks the session and read session properties.
 Reads the parameter file.
 Expand the server and session variables and parameters.
 Verify permissions and privileges.
 Validate source and target code pages.
 Create the session log file.
 Create the Data Transformation Manager which executes the session.
2. What is DTM process? How many threads it creates to process data, explain each
thread in brief?
Answer:
After the load manager performs validations for the session, it creates the DTM
process. The DTM process is the second process associated with the session run. The
primary purpose of the DTM process is to create and manage threads that carry out
the session tasks. The DTM allocates process memory for the session and divide it
into buffers. This is also known as buffer memory. It cre-ates the main thread,
which is called the master thread. The master thread creates and man-ages all other
threads. If we partition a session, the DTM creates a set of threads for each par-
tition to allow concurrent processing. When Informatica server writes messages to
the session log it includes thread type and thread ID. Following are the types of
threads that DTM creates:
 MASTER THREAD - Main thread of the DTM process. Creates and manages all other
threads.
 MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping
Information.
 Pre and Post Session Thread - One Thread Each To Perform Pre and Post Session
Operations.
 READER THREAD - One Thread for Each Partition for Each Source Pipeline.
 WRITER THREAD - One Thread for Each Partition If Target Exist in the Source
pipeline Write to the Target.
 TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.
3. Can you create a folder within designer?
Answer:
93
Not possible
4. How do you take care of security using a repository manager?
Answer:
 Using repository privileges, folder permission and locking.
 Repository privileges(Session operator, Use designer, Browse repository, Create
session and batches, Administer repository, administer server, super user)
 Folder permission(owner, groups, users)
 Locking(Read, Write, Execute, Fetch, Save)
5. What are the different uses of a repository manager?
Answer:
Repository manager used to create repository which contains metadata the
Informatica uses to transform data from source to target. And also it use to create
informatica users and folders and copy, backup and re-store the repository
6. What are 2 modes of data movement in Informatica Server?
Answer:
The data movement mode depends on whether Informatica Server should process single
byte or multi-byte character data. This mode selection can affect the enforcement
of code page relationships and code page validation in the Informatica Client and
Server.
 Unicode – IS allows 2 bytes for each character and uses additional byte for each
non-ascii character (such as Japanese characters)
 ASCII – IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration
parameters. This comes into effect once you restart the Informatica Server.
7. What is Code Page used for?
Answer:
A code page contains the encoding to specify characters in a set of one or more
languages. An encoding is the assignment of a number to a character in the
character set. Code Page is used to identify characters that might be in different
languages. If you are importing Japanese data into mapping, then u must select the
Japanese code page for the source data.
94
8. What is Code Page Compatibility?
Answer:
Compatibility between code pages is used for accurate data movement when the
Informatica Sever runs in the Unicode data movement mode. If the code pages are
identical, then there will not be any data loss. One code page can be a subset or
superset of another. For accurate data movement, the target code page must be a
superset of the source code page.
Superset - A code page is a superset of another code page when it contains the
character encoded in the other code page. It also contains additional characters
not contained in the other code page.
Subset - A code page is a subset of another code page when all characters in the
code page are encoded in the other code page.
9. What is default block buffer size?
Answer: 64K
10.What is default LM shared memory size?
Answer: 2MB
11.Define Server Concepts with respect to memory buffers
Answer:
The Informatica server used three system resources – CPU, Shared Memory & Buffer
MemoryInformatica server uses shared memory, buffer memory and cache memory for
session information and to move data between session threads.
LM Shared Memory - Load Manager uses both process and shared memory. The LM keeps
the information server list of sessions and batches, and the schedule queue in
process memory. Once a session starts, the LM uses shared memory to store session
details for the duration of the session run or session schedule. This shared memory
appears as the configurable parameter (LMSharedMemory) and the server allots
2,000,000 bytes as default. This allows you to schedule or run approximately 10
sessions at one time.
DTM Buffer Memory - The DTM process allocates buffer memory to the session based on
the DTM buffer poll size settings, in session properties. By default, it allocates
12,000,000 bytes of memory to the session. DTM divides memory into buffer blocks as
configured in the buffer block size settings. (Default: 64,000 bytes per block)
95
12.What are the two programs that communicate with the Informatica Server?
Answer:
Informatica provides Server Manager and pmcmd programs to communicate with the
Informatica Server:
Server Manager - A client application used to create and manage sessions and
batches, and to monitor and stop the Informatica Server. You can use information
provided through the Server Manager to troubleshoot sessions and improve session
performance.
pmcmd - A command-line program that allows you to start and stop sessions and
batches, stop the Informatica Server, and verify if the Informatica Server is
running.
96
22. Command Line Arguments
1. What is pmcmd commands?
Answer:
pmcmd is a command line program to communicate with the Informatica server. This
does not replace the server manager, since there are many tasks that you can
perform only with server Manager.
These are some operations that you can do using PMCMD - Start, Stop and abort the
session
2. What is pmrep commands?
Answer:
You can use pmrep to create or delete repository users and groups. You can also use
pmrep to modify repos-itory privileges assigned to users and groups.
3. How do we start & stop session from pmcmd command line?
Answer:
Use the following syntax to ping the Informatica Server on a UNIX system:
pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}]
[hostname:]portno
Use the following syntax to start a session or batch on a UNIX system:
pmcmd start {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno [folder_name:]{session_name | batch_name} [:pf=param_file]
session_flag wait_flag
Use the following syntax to stop a session or batch on a UNIX system:
pmcmd stop {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno[folder_name:]{session_name | batch_name} session_flag
Use the following syntax to stop the Informatica Server on a UNIX system:
pmcmd stopserver {user_name | %user_env_var} {password | %pass-word_env_var}
[hostname:]portno
97
23. Metadata Repository
1. Is there any metadata query to find the list of Informatica folder name,
workflow names which are migrated in a particular Quarter?
Answer:
The below SQL will give you the list of folders, workflows and their last saved
date. SELECT W.SUBJECT_AREA FOLDER_NAME, W.WORKFLOW_NAME, W.WORKFLOW_LAST_SAVED
FROM REP_WORKFLOWS W ORDER BY TO_DATE (W.WORKFLOW_LAST_SAVED, 'MM/DD/YYYY
HH24:MI:SS') DESC
2. How can I run Metadata Queries in Informatica PowerCenter? Answer:
Informatica metadata is stored in some database repository. This can be the same
database where we have our source/ staging / target tables or it may be a
completely different database (that is the case in general). We can execute User
defined queries metadata queries only on this database. We may need to ask
Informatica administrator about the database login credentials. We need to have a
read access username/password for the database. After that we can connect to the
database and run the metadata queries.
3. Write a metadata query to identify the sessions having truncate option enabled
Answer:
select task_name, 'Truncate Target Table' ATTR, decode(attr_value,1,'Yes','No')
Value from OPB_EXTN_ATTR OEA, REP_ALL_TASKS RAT where OEA.SESSION_ID=rat.TASK_ID
and attr_id=9
4. Where can I find a history / metrics of the load sessions that have occurred in
Informatica?
Answer:
98
The tables which house this information are OPB_LOAD_SESSION, OPB_SESSION_LOG, and
OPB_SESS_TARG_LOG. OPB_LOAD_SESSION contains the single session entries,
OPB_SESSION_LOG contains a historical log of all session runs that have taken
place. OPB_SESS_TARG_LOG keeps track of the errors, and the target tables which
have been loaded. Keep in mind these tables are tied together by Session_ID. If a
session is deleted from OPB_LOAD_SESSION, it's history is not necessarily deleted
from OPB_SESSION_LOG, nor from OPB_SESS_TARG_LOG. Unfortunately - this leaves un-
identified session ID's in these tables. How-ever, when you can join them together,
you can get the start and complete times from each session.
5. How to extract the workflow monitor record information from Informatica metadata
re-pository?
Answer:
SELECT DISTINCT
FOLDER_NAME, WORKFLOW_NAME, SESSION_NAME,
START_DATE, START_TIME, END_DATE, END_TIME, DURATION "DURATION IN DD:HH:MI:SS",
SOURCE_ROWS, TARGET_ROWS, REJECTED_ROWS, REJECTED_STATUS, STATUS, FAILED_REASON
FROM
( SELECT
t.SUBJECT_AREA FOLDER_NAME, t.WORKFLOW_NAME, t.SESSION_NAME,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.ACTUAL_START,'DD-MON-YYYY'))
START_DATE,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.ACTUAL_START,'HH24:MI:SS AM'))
START_TIME,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.SESSION_TIMESTAMP,'DD-MON-YYYY'))
END_DATE,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TO_CHAR(t.SESSION_TIMESTAMP,'HH24:MI:SS PM'))
END_TIME,
DECODE(t.RUN_STATUS_CODE, 2,NULL, TRUNC((((86400*(SESSION_TIMESTAMP-
ACTUAL_START))/60)/60)/24)||':'
|| (TRUNC(((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60)
-24*(TRUNC((((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60)/24)))||':'
|| (TRUNC((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)
-60*(TRUNC(((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)/60))) ||':'
|| (TRUNC(86400*(SESSION_TIMESTAMP-ACTUAL_START))
-60*(TRUNC((86400*(SESSION_TIMESTAMP-ACTUAL_START))/60)))) DURATION ,
DECODE(t.RUN_STATUS_CODE, 2,NULL, t.SUCCESSFUL_SOURCE_ROWS) SOURCE_ROWS ,
DECODE(t.RUN_STATUS_CODE, 2,NULL, t.SUCCESSFUL_ROWS) TARGET_ROWS,
DECODE(t.RUN_STATUS_CODE, 2,NULL, t.FAILED_ROWS) REJECTED_ROWS,
DECODE(t.RUN_STATUS_CODE, 2,NULL,CASE WHEN t.SUCCESSFUL_SOURCE_ROWS <>
t.SUCCESSFUL_ROWS THEN 'VALIDATE THE MISMATCH' END) REJECTED_STATUS,
99
DECODE(t.RUN_STATUS_CODE, 1,'Succeeded', 2,'Disabled', 3,'Failed', 4,'Stopped',
5,'Aborted', 6,'Running', 7,'Suspending', 8,'Suspended', 9,'Stopping',
10,'Aborting', 11,'Waiting', 15,'Terminated') AS STATUS,
REPLACE(REPLACE(t.FIRST_ERROR_MSG,CHR(10),' '),'No errors encoun-tered.','') AS
FAILED_REASON,
RANK() OVER (PARTITION BY session_name ORDER BY t.SESSION_TIMESTAMP DESC) rnk
FROM REP_SESS_LOG t WHERE t.SUBJECT_AREA='<<informatica_folder_name>>'
) sess_run
WHERE sess_run.rnk = 1
ORDER BY START_DATE, START_TIME
Don't forget to put the informatica folder name in the SUBJECT_AREA filter above.
Also we might need to make some other small adjustments above to better suit your
purpose / informatica version.
100
24. Repository Manager
1. Describe the steps for export and import?
Answer:
 Open the folder which contains the mapping.
 Check Out the mapping to be exported.
 Click Repository-->Export Objects and save it in your local drive.
 Open the folder in which you want to export the mapping.
 Click Repository-->Import Objects and select mapping xml file and Click import.
 Once the mapping is imported to the new folder just save it and Check In.
2. What are the various methods of code migration or which is the best way of
deployment?
Answer:
The best way is, arguably, the XML export and import, as it is very easy. But again
it all depends upon the requirement; if we want to migrate some workflows with de-
pendent objects at once shot, then the suggested way is XML export and import.
If you need to migrate only some small objects (say some designer or workflow
manager objects) then we can go for copying through Repository Manager or through
Designer(for Designer objects) or through Work-flow manager (for Workflow manager
objects) itself. But for this we have to be connected to both the repos-itories
while coping. Sometime we may need to migrate entire project and want to have a
complete log of deployment, then we can go for creating Deployment Group using
Deployment Wizard.
We might use pmrep to automate exporting objects on a daily or weekly basis. To use
this command, we must create a Control File with all the specifications that the
Copy Wizard requires. The control file is an XML file defined by the depcntl.dtd
file. A deployment control file is an XML file that you use with the DeployFolder
and DeployDeploymentGroup pmrep commands to deploy a folder or deployment group.
We can create a deployment control file manually to provide parameters for
deployment, or you can create a deployment control file with the Copy Wizard. If
you create the deployment control file manually, it must conform to the depcntl.dtd
file that is installed with the PowerCenter Client. You include the location of the
depcntl.dtd file in the deployment control file.
One good thing is we can roll back a deployment to purge the deployed versions from
the target repository or folder. When we roll back a deployment, you roll back all
the objects in a deployment group that we de-ployed at a specific date and time. We
cannot roll back part of a deployment. In the PowerCenter Client, we can export
repository objects to an XML file and then import repository ob-jects from the XML
file. Use the following client applications to export and import repository
objects:
 Repository Manager: You can export and import both Designer and Workflow Manager
Objects.
 Designer: You can export and import Designer objects.
 Workflow Manager: You can export and import Workflow Manager objects.
101
 pmrep: You can export and import both Designer and Workflow Manager objects. You
might use pmrep to automate exporting objects on a daily or weekly basis.
3. What are the various options for ETL code migration
Answer:
There are couples of Options Available for Code migration. If you have a Versioned
Repository, as the first step Check in all the Workflows and dependent objects. Now
we have Couple of different ways to achieve the migration. Option 1. Now you can
export the Workflow from Repository Manager using the Export Object Option to ex-
port as XML and then import into QA using Repository Manager Import Object Option.
Option 2. You can keep your Dev and QA is in the same Repo, you can just do the
Drag and Drop option. For this Open Both Dev and QA Folders in Repository Manager
and Just Drag the Objects from Dev to QA. Option 3. You can Create a Deployment
Group using Repository Manager and attach all the Workflows you need to migrate in
the Deployment group and This Deployment group can be migrated Option 4. You have
the Option to Migrate the Entire Folder As well when we can Use these Options
Option 1. We can use this Option when the number of Workflows to migrate is few. If
you do not have Informatica Versioned Repository, These Exported XML can be used to
keep your Versions. Option 2. When you have less number of Workflows to Migrate you
can use this option. Option 3. Large number of Objects migrated together. It will
keep the list of Objects migrated as a group and in case of a rollback is required
it is easy in this approach. Option 4. Mostly used when you migrate a Project for
the first time to QA with a large number of workflows .
4. What is labeling in Informatica?
Answer:
we can see label concept in many places like in our mail box. Some time we do group
some of our mails to different level. Like marking some mails to personal level. In
Informatica, Label is a global object that you can associate with any versioned
object or group of ver-sioned objects in a repository. You may want to apply labels
to versioned objects to achieve the following re-sults: - Track versioned objects
during development. - Improve query results. - Associate groups of objects for
deployment.
102
- Associate groups of objects for import and export. For example, you might apply a
label to sources, targets, mappings, and sessions associated with a workflow so
that you can deploy the workflow to another repository without breaking any
dependency. You can apply the label to multiple versions of an object. Or you can
specify that you can apply the label to one version of the object. You can create
and modify labels in the Label Browser. From the Repository Manager, click
Versioning > La-bels to browse for a label. Informatica Version control is nothing
but a team based development methodology where we create copies of the actual
objects to tract the modification using check in and checkout options.
5. Suppose having Informatica Version Control in place, can we revert back an
object to a state of two previous version.
Answer:
 From the Version History of the Object, open the required version of the Object
in Workspace.
 Next export the xml metadata of the Object.
 Next Check out the Object.
 Then import the metadata exported earlier.
 Save and Check In the Object.
6. What do we mean by Team based development in Informatica?
Answer:
Team based development is nothing but version control for the metadata objects. If
we have the team-based development option, we can enable version control for the
repository. A ver-sioned repository stores multiple versions of an object. Each
version is a separate object with unique proper-ties. A PowerCenter version control
feature allows us to efficiently develop, test, and deploy metadata into
production. During development, we can perform the following change management
tasks to create and manage multi-ple versions of objects in the repository:
 Check out and check in versioned objects.
 Compare objects.
103
 Track changes to an object.
 Delete or purge a version.
 Use global objects such as queries, deployment groups, and labels to group
versioned objects.
104
25. Scenario Questions
1. Suppose we have ten source flat files of same structure. How can we load all the
files in target database in a single batch run using a single mapping?
Answer:
After we create a mapping to load data in target database from source flat file
definition, next we move on to the session property of the Source Qualifier.
To load a set of source files we need to create a file say final.txt containing the
source flat file names, ten files in our case and set the Source filetype option as
Indirect. Next point this flat file final.txt, fully qualified with Source file
directory and Source filename.
2. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to
Target tables TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded after
TGT1?
Answer:
If we have multiple Source Qualifier transformations connected to multiple targets,
we can designate the or-der in which the Integration Service loads data into the
targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the
Source Qualifier trans-formations in a mapping to specify the required loading
order.
105
It defines the order in which Informatica server loads the data into the targets.
This is to avoid integrity con-straint violations
106
3. Suppose we have a Source Qualifier transformation that populates two target
tables. How do you ensure TGT2 is loaded after TGT1?
Answer:
In the Workflow Manager, we can Configure Constraint based load ordering for a
session. The Integration Service orders the target load on a row-by-row basis. For
every row generated by an active source, the Inte-gration Service loads the
corresponding transformed row first to the primary key table, then to the foreign
key table.
Hence if we have one Source Qualifier transformation that provides data for
multiple target tables having primary and foreign key relationships, we will go for
Constraint based load ordering.
4. Suppose we have the EMP table as our source. In the target we want to view those
em-ployees whose salary are greater than or equal to the average salary for their
depart-ments. Describe your mapping approach.
Answer:
Our Mapping will look like this:
107
To start with the mapping we need the following transformations:
After the Source qualifier of the EMP table place a Sorter transformation. Sort
based on DEPTNO port.
Next we place a Sorted Aggregator Transformation. Here we will find out the AVERAGE
SALARY for each (GROUP BY) DEPTNO.
When we perform this aggregation, we lose the data for individual employees.
To maintain employee data, we must pass a branch of the pipeline to the Aggregator
Transformation and pass a branch with the same sorted source data to the Joiner
transformation to maintain the original data.
When we join both branches of the pipeline, we join the aggregated data with the
original data.
108
So next we need Sorted Joiner Transformation to join the sorted aggregated data
with the original data, based on DEPTNO. Here we will be taking the aggregated
pipeline as the Master and original dataflow as De-tail Pipeline.
109
After that we need a Filter Transformation to filter out the employees having
salary less than average salary for their department.
Filter Condition: SAL >= AVG_SAL
110
Finally we place the Target table instance.
5. How can we perform changed data capture based on load sequence number (integer)
col-umn present in the Source table?
Answer:
Create a Mapping Variable as integer data type and Aggregation type as MAX. Set the
value of this mapping variable in any of these transformations (Expression, Filter,
Router or Update Strategy). Use SETMAXVARIABLE( $$Variable, load_seq_column )
function. This function will assign the MAX sequence number of that particular load
into the variable $$variable. This function executes only if a row is marked as
insert. SETMAXVARIABLE ignores all other row types and the current value remains
unchanged. The function sets the current value of a mapping variable to the high-er
of two values- the current value of the variable or the value from the source
column for each record. At the end of a successful session, the Integration Service
saves the final current value to the repository.
When used with a session that contains multiple partitions, the Integration Service
generates different cur-rent values for each partition. At the end of the session,
it saves the highest current value across all parti-tions to the repository. Unless
overridden, it uses the saved value as the initial value of the variable for the
next session run. Now since the max sequence number for previous load is captured
in this mapping variable and is saved in the repository. We can use this variable
as a filter in the Source Qualifier query. Next time when we run the workflow, it
will only extract those records having load sequence number greater than this
sequence num-ber.
111
In my mapping I have 3 tables that we are joining. In the source query we want to
filter the data based off a value that is stored in one of our target tables. Is
there a way of pulling that one particular value from that target table and be able
to use it in the filter in the source qualifier? Basically the value is a load
sequence number that gets incremented with each session run. So when the session
runs again we only pull records that are greater than that load sequence number.
Answer:
There are different options to solve the problem.
Option 1: Assumption- Source and target tables cannot be accessed using a single DB
Connection and "load Sequence Number" is modified by the current process. In this
case you can use a mapping variable in the mapping and set the value of the mapping
variable to the highest/current value using the SETMAXVARIABLE function. This value
will be stored in Informatica reposito-ry and the same value can be used in Source
Qualifier Filter for the next session run. If incase the workflow fails, the value
of the mapping variable will not get incremented. Steps
 Define mapping Variable with Aggregation type as MAX.
 Use SETMAXVARIABLE($$variable, “Current load Sequence Number") function to store
the value into repository.
 Use the variable $$Variable in Source Qualifier filter.
We can provide a default value for the variable and change the value during your
code migration to set the starting value Option 2: Assumption- Source and target
tables cannot be accessed using a single DB Connection and "load Sequence Number"
is modified by different process. In this case you can create a mapping parameter
and need to pass the value as a parameter. Steps
 Create a workflow to get the latest "load Sequence Number" and create a parameter
file. This workflow will write a flat file which will contain the parameter value.
E.g. [wf_DAILY_INCR_LOAD] $$Variable=100
 In the actual mapping Define a mapping parameter $$Variable and use $$Variable in
the Source Qualifier
Each time you need to run the workflow which creates the parameter file before your
actual workflow is run Option 3: Assumption- Source and Target table can be
accessed using a single DB connection. If both your source and target tables are
connected using a single DB Connection, we can write the filter to get the latest
data in the Source Qualifier itself joining all the tables.
112
7. How can we load ‘x’ records (user defined record numbers) out of ‘N’ records
from source dynamically, without using filter and sequence generator
transformation?
Answer:
 Take a mapping parameter say $$CNT to pass the number of records we want to load
dynamically by changing in the parameter file each time before session run.
 Next after the Source Qualifier use an Expression transformation and create one
output port say CNTR with value CUME (1).
 Next use an Update Strategy with condition IIF ($$CNT >= CNTR, DD_INSERT,
DD_REJECT).
8. Suppose we have ‘n’ number of rows in the Source and we have two target tables.
How can we load ‘n/2’ i.e. first half the source data into one target and the
remaining half into the next target?
Answer:
Use a Expression transformation with an output port ROWNUM with the expression
CUME(1)
Next use a Router with 2 groups having below conditions:
MOD( ROWNUM, 2 ) = 0
MOD( ROWNUM, 2 ) = 1
Connect to the corresponding target instances.
Alternatively,
Below are the implementation steps in Informatica.
 First place the Source table and its corresponding Source Qualifier in the
mapping.
 Next split the data into two flows; One going to the Expression Transformation
with all the ports and the other flow with any one column to an Aggregator
Transformation.
 In the Aggregator add a numeric output port say CNT with expression as COUNT (1)
and do not group by on any other input port.
 Propagate this output column CNT to an Expression Transformation. Next in this
expression trans-formation create another numeric output port JN with expression
value 1.
 Now let’s go back to the first expression transformation having all the source
columns. Introduce a Sequence Generator transformation with RESET attribute
property enabled and propagate the NEXTVAL port to the expression transformation.
Next also add one more numeric output port JN with expression value 1
 Now take a Joiner Transformation and check the property Sorted Input.
 Now bring in all the columns from the Expression Transformation next to the
Source Qualifier. An-other flow to the joiner is from the expression with two
columns CNT and JN. Join condition is based on JN ports.
 Next after the joiner place a Router Transformation. Create one group say FST
with condition as NEXTVAL < (CNT/2).
 Next introduce two target tables first and second. Propagate the columns of the
FST group of the router to the first target. Next propagate the columns of the
Default group of the router transfor-mation to the second target.
113
9. Suppose we have a flat file which has a header record with ‘file creation date’,
and de-tailed data records. Describe the approach to load the 'file creation date'
column along with each and every detailed record.
Answer:
 We can use the below shell command to write the header information in another
flat file as pre-session command. head -1 Sourc_File.dat > header.txt
 Next Use this flat file header.txt as Lookup in the mapping.
 Create an output port in expression transformation with value 'H' or the tag in
the source data file that identifies the header record
 Use this as Lookup condition and get the file creation date as return field and
populate it in your tar-get table.
Suppose we have the below two tables. What will be the output if we select Table 1
as Source and use Joiner and Lookup transformation on Table 2 based on column ID?
Table 1
Table 2
ID
ID
Name
10
10
A
10
B
10
C
Answer:
When we use a Joiner Transformation as Inner Join on column id, we will get 3 rows
as output.
When we use Passive Lookup Transformation we will get 1 row as output. In this case
of multiple lookup match, lookup will return either the first or the last as
configured in “on multiple matches” property of the transformation.
When we use Active Lookup Transformation we will get 3 rows as output, as active
lookup returns all the matching values on multiple lookup matches.
11.Suppose we have a flat file which contains just a numeric value. We need to
populate this value in one column of the target table for every source record. How
can we achieve this?
Answer:
 Use an Expression and create a decimal Output port say ‘DUMMY’ with a very high
number along with other I/O ports from the source table. Say, DUMMY = 99999999999
[Note- Use such a number value that can never appear in the lookup flat file.]
 Now use a Lookup transformation based on the source file. Say, the column name in
the lookup is ‘VALUE’
114
 Map DUMMY from Expression to Lookup and use the lookup condition as DUMMY !=
VALUE
 Next use the VALUE column of the Lookup to populate the target column.
12.How will you load a source flat file into a staging table when the file name is
not fixed? The file name is like sales_2013_02_22.txt, i.e. date is appended at the
end of the file as a part of file name.
Answer:
The generic file name is like- sales_YYYY_MM_DD.txt
One option is to rename the file in the pre session load task. We will use OS level
command to rename the file to a fixed name. We will next set the Informatica source
filename to this fixed name and load the file. E.g. in Unix: $> mv sales_*.txt
sales.txt
Another option is to use Indirect Loading with a fixed file name. The content of
the filename will contain the actual filename to be processed.
E.g. in Unix: $> ls sales_*.txt > sales.txt
13.Solve the below scenario using Informatica and Database SQL.
Source
PRODUCT_ID
PRODUCT_NAME
PRODUCT_PRICE
10
Lux
100
10
Dove
200
20
Cinthol
400
20
Dettol
500
30
Fiama
600
Target
Answer:
Using Informatica: In one pipeline, calculate SUM (product-price) GROUP BY product-
id using Aggregator transformation. In the other flow bring all the data normally,
then join the first flow with the second using an Informatica Joiner transformation
suing join column product-id and join type inner join.
PRODUCT_ID
PRODUCT_NAME
PRODUCT_PRICE
SUM_PRODUCT_PRICE
10
Lux
100
300
10
Dove
200
300
20
Cinthol
400
900
20
Dettol
500
900
30
Fiama
600
600
115
Using SQL: SELECT M.*, N. SUM_PRODUCT_PRICE FROM SOURCE M, (SELECT SUM
(PRODUCT_PRICE) SUM_PRODUCT_PRICE, PRODUCT_ID FROM SOURCE GROUP BY PRODUCT_ID) N
WHERE M. PRODUCT_ID = N. PRODUCT_ID
14.Suppose we have a column in source with values as below:
EMPNO
ENAME
SAL
1
Tom
100
2
Jack
200
3
Peter
150
4
Donald
230
999
TEST
999
6
Eric
300
If we encounter EMPNO = 999, then whole record set should not be loaded in target
table. Describe the ap-proach.
Answer:
From Source create two flows:- 1: Source -> Expression -> Sorter 2: Source ->
Filter ->Expression -> Sorter 1.1 In the Expression create output field dummy_M as
'X' 1.2 Sort on dummy
2.1 In the Filter set Filter Condition as EMPNO = 999
2.2 In the Expression create output field dummy_D as 'X'
2.3 Sort on dummy 3. Next use a Joiner Transform: Set first flow as Master and
second flow as Detail. Set Join Condition as dummy_M = dummy_D Set Join Type as
Detail Outer Join.
Use Sorted Input. 4. Next use a Filter Transform: Set Filter Condition as dummy_D
IS NULL And finally your Target.
116
15.Can we pass the value of a mapping variable between 2 pipelines under the same
map-ping? If not how can we achieve this?
Answer:
We cannot pass the value of an Informatica variable between 2 pipelines in a same
mapping. Mapping varia-bles are values that can change between sessions. The
Integration Service saves the latest value of a map-ping variable to the repository
only at the end of each successful session run. Now in case we have two pipe-lines
under same mapping- The mapping will have a single session and the value of the
mapping variable will be saved to the repository only when this session succeeds,
that means when both the pipeline execution completes.
The alternative method to solve this scenario is as below:
1. Split the pipelines into two different mappings say “map1” and “map2”. 2. Create
a mapping variable say “var1” in “map1” and set the value of the variable using
SETVARIABLE () function. Next our goal is to pass the value of “var1” at the end of
the successful session run to “map2”. 3. Create a mapping variable say “var2” in
“map2” and use this in the mapping where ever the value of the variable from the
first mapping “var1” is required. 4. Create the workflow with a workflow variable
say "wfvar". 5. Create two Non-Reusable sessions say “ses1”,”ses2” for “map1”,
“map2” respectively. 6. In the Post-session success variable assignment of “ses1”
assign the value of mapping variable “var1” to workflow variable “wfvar”. 7. In the
Pre-session variable assignment of “ses2” assign the value of workflow variable
“wfvar” to the map-ping variable “var2”. With this approach, we will be able to
pass the value from the first session to the second session.
Suppose we have a huge (size in GB) flat file as source. The flat file contains 22
columns- out of which 4 col-umns are considered as “key” columns-CUST_SRC_ID,
PRODUCT_ID, FF_ID, SNM_ID
There is one more column in the flat file relevant to the discussion that is
DATE_ID which stores date in YYYY-MM-DD format.
The flat file contains duplicate records based on the above 4 columns (that is -
the records are not entirely duplicated, may be some values are different in some
other columns).
Now the requirement is to choose all the unique records from the flat file based on
the uniqueness of the above mentioned “keys”. If there is any duplicate record
then, we must select the record for which DATE_ID column contains the latest value.
So suppose we get following records in the flat file:
CUST_SRC_ID
PRODUCT_ID
FF_ID
SNM_ID
DATE_ID
OTHER COLUMNS
123
P1
F1
S1
2013-01-02
X, Y, Z
123
P1
F1
S1
2013-01-06
P, Q, R
123
P1
F1
S1
2013-01-02
S, T, U
In the above case we want the following row in the target:
CUST_SRC_ID
PRODUCT_ID
FF_ID
SNM_ID
DATE_ID
OTHER COLUMNS
123
P1
F1
S1
2013-01-06
P, Q, R
117
How can we achieve this in a single mapping?
Answer:
Use a Sorter transformation after Source Qualifier. Sorting key will be in below
order:
 CUST_SRC_ID Ascending order
 PRODUCT_ID Ascending order
 FF_ID Ascending order
 SNM_ID Ascending order
 DATE_ID Descending order
Next use an Expression transformation and create 3 variable ports in the below
order:
 V_Keys = CUST_SRC_ID || PRODUCT_ID || FF_ID || SNM_ID
 V_FLAG = IIF (V_Keys != V_Keys_PREV, 1, 0)
 V_Keys_PREV = V_Keys
 O_FLAG = V_FLAG (output port)
Now use a filter transformation with filter condition as below:
 O_FLAG=1
After sorting the data, for every group based on the unique keys, first record will
have the latest date, be-cause we have sorted it on DATE_ID descending. Using this
expression logic, for every group 1st record (with latest date) will have O_FLAG
value as 1 and rest others with 0. We will filter those unwanted duplicate rec-ords
using Filter transformation.
I have a flat file with just one column as given below- C1 L1 C2 L2 C3 L3 where
data starting with C denotes company name and that of L depicts Location of the
Company. Have to load this data in Target table (using Infa) as - C1, L1 C2, L2 C3,
L3
Answer:
This is what i would do to achieve this req.
118
1. After the SQ, in a expression generate (This is tricky, use variable port logic)
unique sequence number each group unique number for each record with in the group
duplicate the column once After the Expression the output will be as below Col1,
Col2, Col3, Col4 1,1,C1,C1 1,2,L1,L1 2,1,C2,C2 2,2,L2,L2 3,1,C3,C3 3,2,L3,L3 2. Add
an Aggregator with group by on the first column Agg expression max(col3, col2 = 1)
Agg expression max(col3, col2 = 2)
18.Implement slowly changing dimension of Type 2 which will load current record in
Current table and old data in Log table.
Answer:
 Use Joiner transformation to join Source and Current table with Full Outer Join.
 Next use Expression transformation to mark the rows which are new or old and
correspondingly assign values like 0 or 1 in new output port.
 Pass all the columns to a Router transformation and filter based on new port
created.
 If 0 means use Update Strategy transform DD_INSERT with insert to current table.
 If 1 means use Update Strategy transform DD_UPDATE with update to current table
 Also populate the data from Current table for 1 to the Log table.
119
26. Performance Tuning
1. Which one is faster Connected or Unconnected Lookup?
Answer:
There can be some very specific situation where unconnected lookup may add some
performance benefit on total execution. If you are calling the “Unconnected lookup”
based on some condition (e.g. calling it from an expression transformation only
when some specific condition is met - as opposed to a connected lookup which will
be called anyway) then you might save some “calls” to the unconnected lookup,
thereby marginally improving the performance. The improvement will be more apparent
if your data volume is really huge. Keep the “Pre-build Lookup Cache” option set to
“Always disallowed” for the lookup, so that you can ensure that the lookup is not
even cached if it is not being called, although this technique has other
disadvantages, check http://www.dwbiconcepts.com/etl/14-etl-informatica/46-tuning-
informatica-lookup.html , especially the points under following subheadings: -
Effect of choosing connected OR Unconnected Lookup, and - WHEN TO set Pre-build
Lookup Cache OPTION (AND WHEN NOT TO)
2. How we can improve performance of Informatica Normalization Transformation.
Answer:
As such there is no way to improve the performance of any session by using
Normalizer. Normalizer is a transformation used to pivot or normalize datasets and
has nothing to with performance. In fact, Normalizer does not much impact the
performance (apart from taking a little more memory).
3. How to improve the Session performance?
Answer:
 Run concurrent sessions
 Partition session (Power center)
 Tune Parameter - DTM buffer pool, Buffer block size, Index cache size, data cache
size, Commit In-terval, Tracing level (Normal, Terse, Verbose Initialization,
Verbose Data)
 The session has memory to hold 83 sources and targets. If it is more, then DTM
can be increased.
 The Informatica server uses the index and data caches for Aggregate, Rank, Lookup
and Joiner trans-formation. The server stores the transformed data from the above
transformation in the data cache before returning it to the data flow. It stores
group information for those transformations in index cache. If the allocated data
or index cache is not large enough to store the date, the server stores the data in
a temporary disk file as it processes the session data. Each time the server pages
to the disk the performance slows. This can be seen from the counters. Since
generally data cache is larger than the index cache, it has to be more than the
index.
 Remove Staging area
120
 Tune off Session recovery
 Reduce error tracing
4. How do you identify the bottlenecks in Mappings?
Answer:
Bottlenecks can occur in
 Targets - The most common performance bottleneck occurs when the informatica
server writes to a tar-get database. You can identify target bottleneck by
configuring the session to write to a flat file target. If the session performance
increases significantly when you write to a flat file, you have a target bottle-
neck.
Solution:
 Drop or Disable index or constraints
 Perform bulk load (Ignores Database log)
 Increase commit interval (Recovery is compromised)
 Tune the database for RBS, Dynamic Extension etc.,
 Sources - Set a filter transformation after each SQ and see the records are not
through. If the time taken is same then there is a problem. You can also identify
the Source problem by Read Test Session - where we copy the mapping with sources,
SQ and remove all transformations and connect to file target. If the performance is
same then there is a Source bottleneck.
Using database query - Copy the read query directly from the log. Execute the query
against the source database with a query tool. If the time it takes to execute the
query and the time to fetch the first row are significantly different, then the
query can be modified using optimizer hints.
Solution:
 Optimize Queries using hints.
 Use indexes wherever possible.
 Mapping - If both Source and target are OK then problem could be in mapping. Add
a filter transfor-mation before target and if the time is the same then there is a
problem. (OR) Look for the performance monitor in the Sessions property sheet and
view the counters.
Solutions:
 If High error rows and rows in lookup cache indicate a mapping bottleneck.
 Optimize Single Pass Reading:
 Optimize Lookup transformation :
o Caching the lookup table: When caching is enabled the Informatica server caches
the lookup ta-ble and queries the cache during the session. When this option is not
enabled the server queries the lookup table on a row-by row basis. Static, Dynamic,
Shared, Un-shared and Persistent cache
o Optimizing the lookup condition: Whenever multiple conditions are placed, the
condition with equality sign should take precedence.
121
o Indexing the lookup table: The cached lookup table should be indexed on order by
columns. The session log contains the ORDER BY statement The un-cached lookup since
the server issues a SE-LECT statement for each row passing into lookup
transformation, it is better to index the lookup table on the columns in the
condition
 Optimize Filter transformation: You can improve the efficiency by filtering early
in the data flow. Instead of using a filter transformation halfway through the
mapping to remove a sizable amount of data.
 Use a source qualifier filter to remove those same rows at the source, If not
possible to move the filter into SQ, move the filter transformation as close to the
source qualifier as possible to remove unneces-sary data early in the data flow.
 Optimize Aggregate transformation:
o Group by simpler columns. Preferably numeric columns.
o Use Sorted input. The sorted input decreases the use of aggregate caches. The
server assumes all input data are sorted and as it reads it performs aggregate
calculations.
o Use incremental aggregation in session property sheet.
 Optimize Seq. Generator transformation:
o Try creating a reusable Seq. Generator transformation and use it in multiple
mappings
o The number of cached value property determines the number of values the
Informatica server caches at one time.
 Optimize Expression transformation:
o Factoring out common logic
o Minimize aggregate function calls.
o Replace common sub-expressions with local variables.
o Use operators instead of functions.
 Sessions: If you do not have a source, target, or mapping bottleneck, you may
have a session bottleneck. You can identify a session bottleneck by using the
performance details. The informatica server creates performance details when you
enable Collect Performance Data on the General Tab of the session prop-erties.
Performance details display information about each Source Qualifier, target
definitions, and indi-vidual transformation. All transformations have some basic
counters that indicate the Number of input rows, output rows, and error rows. Any
value other than zero in the readfromdisk and writetodisk coun-ters for Aggregate,
Joiner, or Rank transformations indicate a session bottleneck. Low
BufferInput_efficiency and BufferOutput_efficiency counter also indicate a session
bottleneck. Small cache size, low buffer memory, and small commit intervals can
cause session bottlenecks.
 System (Networks)
5. How do you handle performance issues in Informatica? Where can you monitor the
per-formance?
Answer:
There are several aspects to the performance handling .Some of them are:-
 Source tuning
 Target tuning
 Repository tuning
122
 Session performance tuning
 Incremental Change identification in source side.
 Software, hardware (Use multiple servers) and network tuning.
 Bulk Loading
 Use the appropriate transformation.
To monitor this
 Set performance detail criteria
 Enable performance monitoring
 Monitor session at runtime &/ or Check the performance monitor file .
6. What are performance counters?
Answer:
The performance details provide that help you understand the session and mapping
efficiency. Each Source Qualifier, target definition, and individual transformation
appears in the performance details, along with that display performance information
about each transformation
Understanding Performance Counters
All transformations have some basic that indicates the number of input rows, output
rows, and error rows. Source Qualifiers, Normalizes, and targets have additional
that indicates the efficiency of data moving into and out of buffers. You can use
these to locate performance bottlenecks. Some transformations have specif-ic to
their functionality. For example, each Lookup transformation has an indicator that
indicates the number of rows stored in the lookup cache. When you read performance
details, the first column displays the trans-formation name as it appears in the
mapping, the second column contains the name, and the third column holds the
resulting number or efficiency percentage. When you partition a source, the
Informatica Server generates one set of for each partition. The following
performance illustrate two partitions for an Expression transformation:
Transformation Counter Value
 EXPTRANS [1]
o Expression_input rows 8
o Expression_output rows 8
 EXPTRANS [2]
o Expression_input rows 16
o Expression_output rows 16
Note: When you partition a session, the number of aggregate or rank input rows may
be different from the number of output rows from the previous transformation.
7. How can we increase Session Performance?
123
Answer:
 Minimum log (Terse)
 Partitioning source data
 Performing ETL for each partition, in parallel. (For this, multiple CPUs are
needed)
 Adding indexes
 Changing commit Level
 Using Filter transformation to remove unwanted data movement
 Increasing buffer memory, when large volume of data
 Multiple lookups can reduce the performance. Verify the largest lookup table and
tune the expres-sions.
 In session level, the causes are small cache size, low buffer memory and small
commit interval
At system level,
 WIN NT/2000-Use the task manager
 UNIX: VMSTART, IOSTART
Hierarchy of optimization
 Target
 Source
 Mapping
 Session
 System
Optimizing Target Databases:
 Drop indexes /constraints
 Increase checkpoint intervals
 Use bulk loading /external loading
 Turn off recovery
 Increase database network packet size
Source level
 Optimize the query (using group by, group by)
 Use conditional filters
 Connect to RDBMS using IPC protocol
Mapping
 Optimize data type conversions
 Eliminate transformation errors
 Optimize transformations/ expressions
Session
 Concurrent batches
 Partition sessions
124
 Reduce error tracing
 Tune session parameters
System
 Improve network speed
 Use multiple preservers on separate systems
 Reduce paging
What would be the best approach to update a huge table (more than 200 million
records) using Informatica. The table does not contain any primary key. However
there are a few indexes defined on it. The target table is partitioned. On the
other hand the source table contains only a few records (less than a thousand) that
will go to the target and update the same. Is there any better approach than just
doing it by an update strategy transformation?
Answer:
Since the target busy percentage is 99.99% it is very clear that the bottleneck is
on the target. So we need tweak the target. I have couple of Options
1. Since the target tale is partitioned on time_id, you need to include in the
WHERE clause of the SQL fired by Informatica. For that you can define the time_id
column as primary key in the target definition. With this your update query will
have the time_id in the where clause.
2. With Informatica update strategy, it fires update sql for every row which is
marked for update by update strategy. To avoid multiple update statements you can
INSERT all the records which is meant to be UPDATE into a temporary table. Then use
a correlated sql to update the records in the actual table (200M table). This query
can be fires as a post session SQL. Please see the sample SQL
UPDATE TGT_TABLE U SET (U.COLUMNS_LIST /*Column List to be updated*/) = (SELECT
I.COLUMNS_LIST /*Column List to be updated*/ FROM UPD_TABLE I WHERE I.KEYS = U.KEYS
AND I.TIME_ID = U.TIME_ID) WHERE EXISTS (SELECT 1 FROM UPD_TABLE I WHERE I.KEYS =
U.KEYS AND I.TIME_ID = U.TIME_ID) TGT_TABLE –
Actual table with 200M records UPD_TABLE - Table with records meant for UPDATE (1K
record) We need to make sure that your indexes are up to date and stats are
collected. Since this is more to be done with DB performance, you may need the help
of DBA as well to check the DB throughput, SQL cost etc Hope this will help you.

Informatica Aggregator and Expression Transformations Guide

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Informatica Aggregator and Expression Transformations Guide

Transféré par

Droits d'auteur :

Formats disponibles

240349441-Informatica-Question-Answer-Set.

Vous aimerez peut-être aussi