Vous êtes sur la page 1sur 90

12/23/2015

Informatica Question Answer

2nd June 2012

Informatica Question Answer

Deleting duplicate row using Informatica


Q1. Suppose we have Duplicate records in Source System and we want to load only the unique records in the
Target System eliminating the duplicate rows. What will be the approach?
Ans.

[http://3.bp.blogspot.com/eupPD2VLBU/T8nOyWjmZjI/AAAAAAAADIg/t5
LcnEgyc/s1600/t1.PNG]

[http://3.bp.blogspot.com/
BFYOJdTagiM/T8nPFWTZfaI/AAAAAAAADIo/Yg4ByC2Ld5w/s1600/t2.PNG]

Let us assume that the source system is a Relational Database . The source table is having duplicate rows. Now to
eliminate duplicate records, we can check the Distinct option of the Source Qualifier of the source table and load the
target accordingly.
Source Qualifier Transformation DISTINCT clause

Deleting duplicate row for FLAT FILE sources


Now suppose the source system is a Flat File. Here in the Source Qualifier you will not be able to select the distinct
clause as it is disabled due to flat file source table. Hence the next approach may be we use a Sorter Transformation and
check the Distinct option. When we select the distinct option all the columns will the selected as keys, in ascending order
by default.

Sorter Transformation DISTINCT clause

Deleting Duplicate Record Using Informatica Aggregator


Other ways to handle duplicate records in source batch run is to use an Aggregator Transformation and using the Group
By checkbox on the ports having duplicate occurring data. Here you can have the flexibility to select the last or the first
of the duplicate column value records. Apart from that using Dynamic Lookup Cache of the target table and associating
the input ports with the lookup port and checking the Insert Else Update option will help to eliminate the duplicate
records in source and hence loading unique records in the target.

Loading Multiple Target Tables Based on Conditions


http://shaninformatica.blogspot.com/

1/71

12/23/2015

Informatica Question Answer


Q2. Suppose we have some serial numbers in a flat file source. We want to load the serial numbers in two target files
one containing the EVEN serial numbers and the other file having the ODD ones.
Ans.
After the Source Qualifier place a Router Transformation . Create two Groups namely EVEN and ODD, with filter
conditions as MOD(SERIAL_NO,2)=0 and MOD(SERIAL_NO,2)=1 respectively. Then output the two groups into two
flat file targets.

Router Transformation Groups Tab

Normalizer Related Questions


Q3. Suppose in our Source Table we have data as given below:
Student Name

Maths

Life Science

Physical Science

Sam

100

70

80

John

75

100

85

Tom

80

100

85

We want to load our Target Table as:


Student Name

Subject Name

Marks

Sam

Maths

100

Sam

Life Science

70

Sam

Physical Science

80

John

Maths

75

John

Life Science

100

John

Physical Science

85

Tom

Maths

80

Tom

Life Science

100

Tom

Physical Science

85

Describe your approach.


Ans.
Here to convert the Rows to Columns we have to use the Normalizer Transformation followed by an Expression
Transformation to Decode the column taken into consideration. For more details on how the mapping is performed
please visit Working with Normalizer [http://www.dwbiconcepts.com/basic concept/3etl/23usinginformaticanormalizer
transformation.html]

Q4. Name the transformations which converts one to many rows i.e increases the i/p:o/p row count. Also what is the
name of its reverse transformation.
Ans.
Normalizer as well as Router Transformations are the Active transformation which can increase the number of input
rows to output rows.
Aggregator Transformation is the active transformation that performs the reverse action.
Q5. Suppose we have a source table and we want to load three target tables based on source rows such that first row
moves to first target table, secord row in second target table, third row in third target table, fourth row again in first target
table so on and so forth. Describe your approach.
Ans.
We can clearly understand that we need a Router transformation to route or filter source data to the three target tables.
Now the question is what will be the filter conditions. First of all we need an Expression Transformation where we have
all the source table columns and along with that we have another i/o port say seq_num, which is gets sequence
numbers for each source row from the port NextVal of a Sequence Generator start value 0 and increment by 1. Now
the filter condition for the three router groups will be:
MOD(SEQ_NUM,3)=1 connected to 1st target table, MOD(SEQ_NUM,3)=2 connected to 2nd target table,
MOD(SEQ_NUM,3)=0 connected to 3rd target table.
http://shaninformatica.blogspot.com/

2/71

12/23/2015

Informatica Question Answer

Router Transformation Groups Tab

Loading Multiple Flat Files using one mapping


Q6. Suppose we have ten source flat files of same structure. How can we load all the files in target database in a
single batch run using a single mapping.
Ans.
After we create a mapping to load data in target database from flat files, next we move on to the session property of
the Source Qualifier. To load a set of source files we need to create a file say final.txt containing the source falt file
names, ten files in our case and set the Source filetype option as Indirect. Next point this flat file final.txt fully qualified
through Source file directory and Source filename .
Image: Session Property Flat File [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
Q7. How can we implement Aggregation operation without using an Aggregator Transformation in Informatica.
Ans.
We will use the very basic concept of the Expression Transformation that at a time we can access the previous row
data as well as the currently processed data in an expression transformation. What we need is simple Sorter,
Expression and Filter transformation to achieve aggregation at Informatica level.
For detailed understanding visit Aggregation without Aggregator [http://www.dwbiconcepts.com/basic concept/3etl/10
aggregationwithoutinformaticaaggregator .html]

Q8. Suppose in our Source Table we have data as given below:


Student Name

Subject Name

Marks

Sam

Maths

100

Tom

Maths

80

Sam

Physical Science

80

John

Maths

75

Sam

Life Science

70

John

Life Science

100

John

Physical Science

85

Tom

Life Science

100

Tom

Physical Science

85

http://shaninformatica.blogspot.com/

3/71

12/23/2015

Informatica Question Answer


We want to load our Target Table as:
Student Name

Maths

Life Science

Physical Science

Sam

100

70

80

John

75

100

85

Tom

80

100

85

Describe your approach.


Ans.
Here our scenario is to convert many rows to one rows, and the transformation which will help us to achieve this is
Aggregator .Our Mapping will look like this:

Mapping using sorter and Aggregator

We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.

http://shaninformatica.blogspot.com/

4/71

12/23/2015

Informatica Question Answer


Sorter Transformation

Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as
MATHS: MAX(MARKS, SUBJECT='Maths')
LIFE_SC: MAX(MARKS, SUBJECT='Life Science')
PHY_SC: MAX(MARKS, SUBJECT='Physical Science')

Aggregator Transformation

Revisiting Source Qualifier Transformation


Q9. What is a Source Qualifier? What are the tasks we can perform using a SQ and why it is an ACTIVE
transformation?
Ans.
A Source Qualifier is an Active and Connected Informatica transformation that reads the rows from a relational
database or flat file source.
We can configure the SQ to join [Both INNER as well as OUTER JOIN] data originating from the same source
database.
We can use a source filter to reduce the number of rows the Integration Service queries.
We can specify a number for sorted ports and the Integration Service adds an ORDER BY clause to the default SQL
query.
We can choose Select Distinct option for relational databases and the Integration Service adds a SELECT DISTINCT
clause to the default SQL query.
Also we can write Custom/Used Defined SQL query which will override the default query in the SQ by changing the
default settings of the transformation properties.
Aslo we have the option to write Pre as well as Post SQL statements to be executed before and after the SQ query in
the source database.
Since the transformation provides us with the property Select Distinct , when the Integration Service adds a SELECT
DISTINCT clause to the default SQL query, which in turn affects the number of rows returned by the Database to the
Integration Service and hence it is an Active transformation.
Q10. What happens to a mapping if we alter the datatypes between Source and its corresponding Source Qualifier?
Ans.
The Source Qualifier transformation displays the transformation datatypes. The transformation datatypes determine
how the source database binds data when the Integration Service reads it.
Now if we alter the datatypes in the Source Qualifier transformation or the datatypes in the source definition and
Source Qualifier transformation do not match, the Designer marks the mapping as invalid when we save it.
Q11. Suppose we have used the Select Distinct and the Number Of Sorted Ports property in the SQ and then we add
Custom SQL Query. Explain what will happen.
Ans.
Whenever we add Custom SQL or SQL override query it overrides the UserDefined Join, Source Filter, Number of
http://shaninformatica.blogspot.com/

5/71

12/23/2015

Informatica Question Answer


Sorted Ports, and Select Distinct settings in the Source Qualifier transformation. Hence only the user defined SQL
Query will be fired in the database and all the other options will be ignored .
Q12. Describe the situations where we will use the Source Filter, Select Distinct and Number Of Sorted Ports
properties of Source Qualifier transformation.
Ans.
Source Filter option is used basically to reduce the number of rows the Integration Service queries so as to improve
performance.
Select Distinct option is used when we want the Integration Service to select unique values from a source, filtering out
unnecessary data earlier in the data flow, which might improve performance.
Number Of Sorted Ports option is used when we want the source data to be in a sorted fashion so as to use the same
in some following transformations like Aggregator or Joiner, those when configured for sorted input will improve the
performance.
Q13. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS
order in SQ transformation do not match?
Ans.
Mismatch or Changing the order of the list of selected columns to that of the connected transformation output ports
may result is session failure.
Q14. What happens if in the Source Filter property of SQ transformation we include keyword WHERE say, WHERE
CUSTOMERS.CUSTOMER_ID > 1000.
Ans.
We use source filter to reduce the number of source records. If we include the string WHERE in the source filter, the
Integration Service fails the session .
Q15. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation.
Ans.
While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner
transformation.
Use the Joiner transformation when we need to join the following types of sources:
Join data from different Relational Databases.
Join data from different Flat Files.
Join relational sources and flat files.
Q16. What is the maximum number we can use in Number Of Sorted Ports for Sybase source system.
Ans.
Sybase supports a maximum of 16 columns in an ORDER BY clause. So if the source is Sybase, do not sort more than
16 columns.
Q17. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target tables TGT1 and
TGT2 respectively. How do you ensure TGT2 is loaded after TGT1?
Ans.
If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the order in which
the Integration Service loads data into the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier transformations in a
mapping to specify the required loading order.
Image: Target Load Plan [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]

Target Load Plan Ordering

Q18. Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2
http://shaninformatica.blogspot.com/

6/71

12/23/2015

Informatica Question Answer


is loaded after TGT1?
Ans.
In the Workflow Manager, we can Configure Constraint based load ordering for a session. The Integration Service
orders the target load on a rowbyrow basis. For every row generated by an active source, the Integration Service loads
the corresponding transformed row first to the primary key table, then to the foreign key table.
Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and
foreign key relationships, we will go for Constraint based load ordering.
Image: Constraint based loading [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]

Revisiting Filter Transformation


Q19. What is a Filter Transformation and why it is an Active one?
Ans.
A Filter transformation is an Active and Connected transformation that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation to the next transformation in the
pipeline. TRUE and FALSE are the implicit return values from any filter condition we set. If the filter condition evaluates
to NULL, the row is assumed to be FALSE.
The numeric equivalent of FALSE is zero (0) and any nonzero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. A filter
condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row
meets the specified condition. Only rows that return TRUE pass through this transformation. Discarded rows do not
appear in the session log or reject files.
Q20. What is the difference between Source Qualifier transformations Source Filter to Filter transformation?
Ans.
SQ Source Filter
Source Qualifier
transformation filters rows
when read from a source.
Source Qualifier
transformation can only
filter rows from Relational
Sources.
Source Qualifier limits the
row set extracted from a
source.

Filter Transformation
Filter transformation filters rows from within a
mapping

Filter transformation filters rows coming from


any type of source system in the mapping level.

Filter transformation limits the row set sent to a


target.

Source Qualifier reduces


the number of rows used
throughout the mapping and
hence it provides better
performance.

To maximize session performance, include the


Filter transformation as close to the sources in
the mapping as possible to filter out unwanted
data early in the flow of data from sources to
targets.

The filter condition in the


Source Qualifier
transformation only uses
standard SQL as it runs in
the database.

Filter Transformation can define a condition


using any statement or transformation function
that returns either a TRUE or FALSE value.

Revisiting Joiner Transformation


Q21. What is a Joiner Transformation and why it is an Active one?
Ans.
A Joiner is an Active and Connected transformation used to join source data from the same source system or from two
related heterogeneous sources residing in different locations or file systems.
The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition
that matches one or more pairs of columns between the two sources.
The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master
pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and
Sorted Input option to improve Integration Service performance.
The join condition contains ports from both input sources that must match for the Integration Service to join two rows.
Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row .

http://shaninformatica.blogspot.com/

7/71

12/23/2015

Informatica Question Answer


The Joiner transformation produces result sets based on the join type, condition, and input data sources. Hence it is
an Active transformation.
Q22. State the limitations where we cannot use Joiner in the mapping pipeline.
Ans.
The Joiner transformation accepts input from most transformations. However, following are the limitations:
Joiner transformation cannot be used when either of the input pipeline contains an Update Strategy transformation.
Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner
transformation.
Q23. Out of the two input pipelines of a joiner, which one will you set as the master pipeline?
Ans.
During a session run, the Integration Service compares each row of the master source against the detail source. The
master and detail sources need to be configured for optimal performance .
To improve performance for an Unsorted Joiner transformation, use the source with fewer rows as the master source.
The fewer unique rows in the master, the fewer iterations of the join comparison occur, which speeds the join process.
When the Integration Service processes an unsorted Joiner transformation, it reads all master rows before it reads the
detail rows. The Integration Service blocks the detail source while it caches rows from the master source . Once the
Integration Service reads and caches all master rows, it unblocks the detail source and reads the detail rows.
To improve performance for a Sorted Joiner transformation, use the source with fewer duplicate key values as the
master source.
When the Integration Service processes a sorted Joiner transformation, it blocks data based on the mapping
configuration and it stores fewer rows in the cache, increasing performance. Blocking logic is possible if master and
detail input to the Joiner transformation originate from different sources . Otherwise, it does not use blocking logic.
Instead, it stores more rows in the cache.
Q24. What are the different types of Joins available in Joiner Transformation?
Ans.
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner
transformation is similar to an SQL join except that data can originate from different types of sources.
The Joiner transformation supports the following types of joins :
Normal
Master Outer
Detail Outer
Full Outer

Join Type property of Joiner Transformation

http://shaninformatica.blogspot.com/

8/71

12/23/2015

Informatica Question Answer


Note: A normal or master outer join performs faster than a full outer or detail outer join.
Q25. Define the various Join Types of Joiner Transformation.
Ans.
In a normal join , the Integration Service discards all rows of data from the master and detail source that do not
match, based on the join condition.
A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It
discards the unmatched rows from the master source.
A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It
discards the unmatched rows from the detail source.
A full outer join keeps all rows of data from both the master and detail sources.
Q26. Describe the impact of number of join conditions and join order in a Joiner Transformation.
Ans.
We can define one or more conditions based on equality between the specified master and detail sources. Both
ports in a condition must have the same datatype . If we need to use two ports in the join condition with non
matching datatypes we must convert the datatypes so that they match. The Designer validates datatypes in a join
condition.
Additional ports in the join condition increases the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple
ports in the join condition, the Integration Service compares the ports in the order we specified.
NOTE: Only equality operator is available in joiner join condition.
Q27. How does Joiner transformation treat NULL value matching.
Ans.
The Joiner transformation does not match null values .
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null value, the Integration Service does not consider
them a match and does not join the two rows.
To join rows with null values, replace null input with default values in the Ports tab of the joiner, and then join on the
default values.
Note: If a result set includes fields that do not contain data in either of the sources, the Joiner transformation
populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert
NULLs in the target, set a default value on the Ports tab for the corresponding port.
Q28. Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in
order: ITEM_NO, ITEM_NAME, PRICE.
When we configure the join condition, what are the guidelines we need to follow to maintain the sort order?
Ans.
If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we
must ensure that:
Use ITEM_NO in the First Join Condition.
If we add a Second Join Condition, we must use ITEM_NAME.
If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join
Condition.
If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the Integration
Service fails the session .
Q29. What are the transformations that cannot be placed between the sort origin and the Joiner transformation so that
we do not lose the input sort order.
Ans.
The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However
do not place any of the following transformations between the sort origin and the Joiner transformation:
Custom
Unsorted Aggregator
Normalizer
Rank
Union transformation
XML Parser transformation
XML Generator transformation
Mapplet [if it contains any one of the above mentioned transformations]
Q30. Suppose we have the EMP table as our source. In the target we want to view those employees whose salary is
greater than or equal to the average salary for their departments.
Describe your mapping approach.Ans.
Our Mapping will look like this:
Image: Mapping using Joiner [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
To start with the mapping we need the following transformations:

http://shaninformatica.blogspot.com/

9/71

12/23/2015

Informatica Question Answer


After the Source qualifier of the EMP table place a Sorter Transformation . Sort based on DEPTNO port.

Sorter Ports Tab

Next we place a Sorted Aggregator Transformation . Here we will find out the AVERAGE SALARY for each
(GROUP BY) DEPTNO .
When we perform this aggregation, we lose the data for individual employees. To maintain employee data, we must
pass a branch of the pipeline to the Aggregator Transformation and pass a branch with the same sorted source data to
the Joiner transformation to maintain the original data. When we join both branches of the pipeline, we join the
aggregated data with the original data.

Aggregator Ports Tab

http://shaninformatica.blogspot.com/

10/71

12/23/2015

Informatica Question Answer

Aggregator Properties Tab

So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original data, based on
DEPTNO .
Here we will be taking the aggregated pipeline as the Master and original dataflow as Detail Pipeline.

Joiner Condition Tab

http://shaninformatica.blogspot.com/

11/71

12/23/2015

Informatica Question Answer

Joiner Properties Tab

After that we need a Filter Transformation to filter out the employees having salary less than average salary for their
department.
Filter Condition: SAL>=AVG_SAL

Filter Properties Tab

Lastly we have the Target table instance.

Revisiting Sequence Generator Transformation


Q31. What is a Sequence Generator Transformation?

http://shaninformatica.blogspot.com/

12/71

12/23/2015

Informatica Question Answer


Ans.
A Sequence Generator transformation is a Passive and Connected transformation that generates numeric values. It is
used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of
numbers.
This transformation by default contains ONLY Two OUTPUT ports namely CURRVAL and NEXTVAL . We cannot edit
or delete these ports neither we cannot add ports to this unique transformation.
We can create approximately two billion unique numeric values with the widest range from 1 to 2147483647.

Q32. Define the Properties available in Sequence Generator transformation in brief.


Ans.
Sequence
Generator
Properties

Description

Start
Value

Start value of the generated sequence that we want the


Integration Service to use if we use the Cycle option. If we select
Cycle, the Integration Service cycles back to this value when it
reaches the end value.
Default is 0.

Increment
By

Difference between two consecutive values from the NEXTVAL


port.
Default is 1.
Maximum value generated by SeqGen. After reaching this value

End Value

the session will fail if the sequence generator is not configured to


cycle.
Default is 2147483647.

Current

Current value of the sequence. Enter the value we want the

Value

Integration Service to use as the first value in the sequence.


Default is 1.
If selected, when the Integration Service reaches the configured

Cycle

Number
of Cached
Values

Reset

end value for the sequence, it wraps around and starts the cycle
again, beginning with the configured Start Value.
Number of sequential values the Integration Service caches at a
time.
Default value for a standard Sequence Generator is 0.
Default value for a reusable Sequence Generator is 1,000.
Restarts the sequence at the current value each time a session
runs.
This option is disabled for reusable Sequence Generator
transformations.

Q33. Suppose we have a source table populating two target tables. We connect the NEXTVAL port of the Sequence
Generator to the surrogate keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values in both of
them.
Ans.
When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the
target tables, the Sequence number will not be the same .
A block of sequence numbers is sent to one target tables surrogate key column. The second targets receives a block of
sequence numbers from the Sequence Generator transformation only after the first target table receives the block of
sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1 (1,2,3,4,5)
and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and Increment by 1.
Now suppose the requirement is like that we need to have the same surrogate keys in both the targets.
Then the easiest way to handle the situation is to put an Expression Transformation in between the Sequence
Generator and the Target tables. The SeqGen will pass unique values to the expression transformation, and then the
rows are routed from the expression transformation to the targets.

http://shaninformatica.blogspot.com/

13/71

12/23/2015

Informatica Question Answer

Sequence Generator

Q34. Suppose we have 100 records coming from the source. Now for a target column population we used a Sequence
generator.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
Ans.
End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session
fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e.
whenever the Integration Service reaches the configured end value for the sequence, it wraps around and starts the
cycle again, beginning with the configured Start Value.
Q35. What are the changes we observe when we promote a non resuable Sequence Generator to a resuable one?
And what happens if we set the Number of Cached Values to 0 for a reusable transformation?
Ans.
When we convert a non reusable sequence generator to resuable one we observe that the Number of Cached
Values is set to 1000 by default And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the
Transformation Developer we encounter the following error message:
The number of cached values must be greater than zero for reusable sequence transformation.

Which is the fastest? Informatica or Oracle?


In our previous article, we tested the performance of ORDER BY operation in Informatica and Oracle
[http://www.dwbiconcepts.com/advance/7general/36informaticaoraclesortperformancetest.html] and found that, in our test
condition, Oracle performs sorting 14% speedier than Informatica. This time we will look into the JOIN operation, not
only because JOIN is the single most important data set operation but also because performance of JOIN can give
crucial data to a developer in order to develop proper push down optimization manually.
Informatica is one of the leading data integration tools in todays world. More than 4,000 enterprises worldwide rely on
Informatica to access, integrate and trust their information assets with it. On the other hand, Oracle database is arguably
the most successful and powerful RDBMS system that is trusted from 1980s in all sorts of business domain and across
all major platforms. Both of these systems are bests in the technologies that they support. But when it comes to the
application development, developers often face challenge to strike the right balance of operational load sharing between
these systems. This article will help them to take the informed decision.

Which JOINs data faster? Oracle or Informatica?


As an application developer, you have the choice of either using joining syntaxes in database level to join your data or
using JOINER TRANSFORMATION in Informatica to achieve the same outcome. The question is which system
performs this faster?

Test Preparation
We will perform the same test with 4 different data points (data volumes) and log the results. We will start with 1 million
data in detail table and 0.1 million in master table. Subsequently we will test with 2 million, 4 million and 6 million detail
table data volumes and 0.2 million, 0.4 million and 0.6 million master table data volumes. Here are the details of the
setup we will use,
1. Oracle 10g database as relational source and target
2. Informatica PowerCentre 8.5 as ETL tool
3. Database and Informatica setup on different physical servers using HP UNIX
4. Source database table has no constraint, no index, no database statistics and no partition
http://shaninformatica.blogspot.com/

14/71

12/23/2015

Informatica Question Answer


5. Source database table is not available in Oracle shared pool before the same is read
6. There is no session level partition in Informatica PowerCentre
7. There is no parallel hint provided in extraction SQL query
8. Informatica JOINER has enough cache size
We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre designer. The first
mapping m_db_side_join will use an INNER JOIN clause in the source qualifier to sort data in database level. Second
mapping m_Infa_side_join will use an Informatica JOINER to JOIN data in informatica level. We have executed these
mappings with different data points and logged the result.
Further to the above test we will execute m_db_side_join mapping once again, this time with proper database side
indexes and statistics and log the results.

Result
The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort
data. The average time is plotted along vertical axis and data points are plotted along horizontal axis.
Data Points Master Table Record Count

Detail Table Record Count

0.1 M

1M

0.2 M

2M

0.4 M

4M

0.6 M

6M

Verdict
In our test environment, Oracle 10g performs JOIN operation 24% faster than Informatica Joiner
Transformation while without Index and 42% faster with Database Index
Assumption
1. Average server load remains same during all the experiments
2. Average network speed remains same during all the experiments

Note
1. This data can only be used for performance comparison but cannot be used for performance benchmarking.
2. This data is only indicative and may vary in different testing conditions.

Which is the fastest? Informatica or Oracle?


Informatica is one of the leading data integration tools in todays world. More than 4,000 enterprises worldwide rely on
Informatica to access, integrate and trust their information assets with it. On the other hand, Oracle database is arguably
the most successful and powerful RDBMS system that is trusted from 1980s in all sorts of business domain and across
all major platforms. Both of these systems are bests in the technologies that they support. But when it comes to the
application development, developers often face challenge to strike the right balance of operational load sharing between
these systems.
Think about a typical ETL operation often used in enterprise level data integration. A lot of data processing can be
either redirected to the database or to the ETL tool. In general, both the database and the ETL tool are reasonably
capable of doing such operations with almost same efficiency and capability. But in order to achieve the optimized
http://shaninformatica.blogspot.com/

15/71

12/23/2015

Informatica Question Answer


performance, a developer must carefully consider and decide which system s/he should be trusting with for each
individual processing task.
In this article, we will take a basic database operation Sorting, and we will put these two systems to test in order to
determine which does it faster than the other, if at all.

Which sorts data faster? Oracle or Informatica?


As an application developer, you have the choice of either using ORDER BY in database level to sort your data or
using SORTER TRANSFORMATION in Informatica to achieve the same outcome. The question is which system
performs this faster?

Test Preparation
We will perform the same test with different data points (data volumes) and log the results. We will start with 1 million
records and we will be doubling the volume for each next data points. Here are the details of the setup we will use,
1. Oracle 10g database as relational source and target
2. Informatica PowerCentre 8.5 as ETL tool
3. Database and Informatica setup on different physical servers using HP UNIX
4. Source database table has no constraint, no index, no database statistics and no partition
5. Source database table is not available in Oracle shared pool before the same is read
6. There is no session level partition in Informatica PowerCentre
7. There is no parallel hint provided in extraction SQL query
8. The source table has 10 columns and first 8 columns will be used for sorting
9. Informatica sorter has enough cache size
We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre designer. The first
mapping m_db_side_sort will use an ORDER BY clause in the source qualifier to sort data in database level. Second
mapping m_Infa_side_sort will use an Informatica sorter to sort data in informatica level. We have executed these
mappings with different data points and logged the result.

Result
The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort
data. The time is plotted along vertical axis and data volume is plotted along horizontal axis.

Verdict
The above experiment demonstrates that Oracle
database is faster in SORT operation than Informatica by
an average factor of 14%.
Assumption
1. Average server load remains same during all the experiments
2. Average network speed remains same during all the experiments

Note
This data can only be used for performance comparison but cannot be used for performance benchmarking.

http://shaninformatica.blogspot.com/

16/71

12/23/2015

Informatica Question Answer

Informatica Reject File How to Identify rejection reason


[http://www.dwbiconcepts.com/basicconcept/3etl/32informaticarejectorbad
files.html]
Saurav Mitra
[http://www.dwbiconcepts.com/basicconcept/3etl/32informaticarejectorbadfiles.html?
tmpl=component&print=1&layout=default&page=] [http://cdn1.dwbiconcepts.com/basicconcept/3etl/32informaticarejectorbad
files.pdf]

inShare [file:///E:/Tutorial/fundooo%20informatica%20(1).doc] 0
0diggsdigg

[http://www.addthis.com/bookmark.php?v=250&username=xa4bc2f37319cd6ca8]

When we run a session, the integration service may create a reject file for each target instance in the mapping to store
the target reject record. With the help of the Session Log and Reject File we can identify the cause of data rejection in
the session. Eliminating the cause of rejection will lead to rejection free loads in the subsequent session runs. If the
Informatica Writer or the Target Database rejects data due to any valid reason the integration service logs the rejected
records into the reject file. Every time we run the session the integration service appends the rejected records to the
reject file.

Working with Informatica Bad Files or Reject Files


By default the Integration service creates the reject files or bad files in the $PMBadFileDir process variable directory. It
writes the entire reject record row in the bad file although the problem may be in any one of the Columns. The reject files
have a default naming convention like [target_instance_name].bad . If we open the reject file in an editor we will see
comma separated values having some tags/ indicator and some data values. We will see two types of Indicators in the
reject file. One is the Row Indicator and the other is the Column Indicator .
For reading the bad file the best method is to copy the contents of the bad file and saving the same as a CSV (Comma
Sepatared Value) file. Opening the csv file will give an excel sheet type look and feel. The firstmost column in the reject
file is the Row Indicator , that determines whether the row was destined for insert, update, delete or reject. It is basically a
flag that determines the Update Strategy for the data row. When the Commit Type of the session is configured as Userdefined the row indicator indicates whether the transaction was rolled back due to a nonfatal error, or if the committed
transaction was in a failed target connection group.

List of Values of Row Indicators:

Row Indicator

Indicator Significance

Rejected By

Insert

Writer or target

Update

Writer or target

Delete

Writer or target

Reject

Writer

Rolledback insert

Writer

Rolledback update

Writer

Rolledback delete

Writer

Committed insert

Writer

Committed update

Writer

Committed delete

Writer

Now comes the Column Data values followed by their Column Indicators, that determines the data quality of the
corresponding Column.

http://shaninformatica.blogspot.com/

17/71

12/23/2015

Informatica Question Answer


List of Values of Column Indicators:
>
Column

Type of

Indicator

data

Writer Treats As

Valid data

Writer passes it to the target database. The target

or Good
Data.

accepts it unless a database error occurs, such as


finding a duplicate key while inserting.

Overflowed

Numeric data exceeded the specified precision or

Numeric
Data.

scale for the column. Bad data, if you configured the


mapping target to reject overflow or truncated data.

Null Value.

Truncated
String

The column contains a null value. Good data. Writer

Data.

passes it to the target, which rejects it if the target


database does not accept null values.

String data exceeded a specified precision for the


column, so the Integration Service truncated it. Bad
data, if you configured the mapping target to reject
overflow or truncated data.

Also to be noted that the second column contains column indicator flag value 'D' which signifies that the Row Indicator is
valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T

Implementing Informatica Incremental Aggregation


[http://www.dwbiconcepts.com/advance/4etl/26implementinginformaticas
incrementalaggregation.html]
Using incremental aggregation, we apply captured changes in the source data (CDC part) to aggregate calculations in a
session. If the source changes incrementally and we can capture the changes, then we can configure the session to
process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to
delete previous loads data, process the entire source data and recalculate the same data each time you run the session.

Incremental Aggregation
When the session runs with incremental aggregation enabled for the first time say 1st week of Jan, we will use the entire
source. This allows the Integration Service to read and store the necessary aggregate data information. On 2nd week of
Jan, when we run the session again, we will filter out the CDC records from the source i.e the records loaded after the
initial load. The Integration Service then processes these new data and updates the target accordingly.
Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally
changed source alters more than half the existing target, the session may not benefit from using incremental
aggregation. In this case, drop the table and recreate the target with entire source data and recalculate the same
aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in monthly facts in a weekly
basis.
Let us see a sample mapping to implement incremental aggregation:
Image: Incremental Aggregation Sample Mapping [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
Look at the Source Qualifier query to fetch the CDC part using a BATCH_LOAD_CONTROL table that saves the
last successful load date for the particular mapping.
Image: Incremental Aggregation Source Qualifier [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
Look at the ports tab of Expression transformation.

http://shaninformatica.blogspot.com/

18/71

12/23/2015

Informatica Question Answer

Look at the ports tab of Aggregator Transformation.

Now the most important session properties configuation to implement incremental Aggregation
http://shaninformatica.blogspot.com/

19/71

12/23/2015

Informatica Question Answer

If we want to reinitialize the aggregate cache suppose during first week of every month we will configure another
session same as the previous session the only change being the Reinitialize aggregate cache property checked in

Now have a look at the source table data:

http://shaninformatica.blogspot.com/

20/71

12/23/2015

Informatica Question Answer


CUSTOMER_KEY

INVOICE_KEY

AMOUNT

LOAD_DATE

1111

5001

100

01/01/2010

2222

5002

250

01/01/2010

3333

5003

300

01/01/2010

1111

6007

200

07/01/2010

1111

6008

150

07/01/2010

2222

6009

250

07/01/2010

4444

1234

350

07/01/2010

5555

6157

500

07/01/2010

After the first Load on 1st week of Jan 2010, the data in the target is as follows:
CUSTOMER_KEY

INVOICE_KEY

MON_KEY

AMOUNT

1111

5001

201001

100

2222

5002

201001

250

3333

5003

201001

300

Now during the 2nd week load it will process only the incremental data in the source i.e those records having load date
greater than the last session run date. After the 2nd weeks load after incremental aggregation of the incremental source
data with the aggregate cache file data will update the target table with the following dataset:
CUSTOMER_KEY

INVOICE_KEY

MON_KEY

AMOUNT

Remarks/Operation

1111

6008

201001

450

The cache file updated after aggretation

2222

6009

201001

500

The cache file updated after aggretation

3333

5003

201001

300

The cache file remains the same as before

4444

1234

201001

350

New group row inserted in cache file

5555

6157

201001

500

New group row inserted in cache file

The first time we run an incremental aggregation session, the Integration Service processes the entire source. At the end
of the session, the Integration Service stores aggregate data for that session run in two files, the index file and the data
file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation
properties.Each subsequent time we run the session with incremental aggregation, we use the incremental source
changes in the session. For each input record, the Integration Service checks historical information in the index file for a
corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation
incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the existing target. It saves modified
aggregate data in the index and data files to be used as historical data the next time you run the session.
Each subsequent time we run a session with incremental aggregation, the Integration Service creates a backup of the
incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for
two sets of the files.
The Integration Service creates new aggregate data, instead of using historical data, when we configure the session to
reinitialize the aggregate cache, Delete cache files etc.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.

Using Informatica Normalizer Transformation


[http://www.dwbiconcepts.com/basicconcept/3etl/23usinginformatica
normalizertransformation.html]
http://shaninformatica.blogspot.com/

21/71

12/23/2015

Informatica Question Answer


Saurav Mitra
[http://www.dwbiconcepts.com/basicconcept/3etl/23usinginformaticanormalizertransformation.html?
tmpl=component&print=1&layout=default&page=] [http://cdn1.dwbiconcepts.com/basicconcept/3etl/23usinginformatica
normalizertransformation.pdf]

inShare [file:///E:/Tutorial/fundooo%20informatica%20(1).doc] 0
0diggsdigg

.
Normalizer, a native transformation in Informatica, can ease many complex data transformation requirement. Learn
how to effectively use normalizer here.

Using Noramalizer Transformation


A Normalizer is an Active transformation that returns multiple rows from a source row, it returns duplicate data for
singleoccurring source columns. The Normalizer transformation parses multipleoccurring columns from COBOL
sources, relational tables, or other sources. Normalizer can be used to transpose the data in columns to rows.
Normalizer effectively does the opposite of what Aggregator does!

Example of Data Transpose using Normalizer


Think of a relational table that stores four quarters of sales by store and we need to create a row for each sales
occurrence. We can configure a Normalizer transformation to return a separate row for each quarter like below..
The following source rows contain four quarters of sales by store:
Source Table
Store

Quarter1

Quarter2

Quarter3

Quarter4

Store1

100

300

500

700

Store2

250

450

650

850

The Normalizer returns a row for each store and sales combination. It also returns an index(GCID) that identifies the
quarter number:
Target Table
Store

Sales

Quarter

Store 1

100

Store 1

300

Store 1

500

Store 1

700

Store 2

250

Store 2

450

Store 2

650

Store 2

850

How Informatica Normalizer Works


Suppose we have the following data in source:
Name

Month

Transportation

House Rent

Food

Sam

Jan

200

1500

500

John

Jan

300

1200

300

Tom

Jan

300

1350

350

Sam

Feb

300

1550

450

John

Feb

350

1200

290

Tom

Feb

350

1400

350

http://shaninformatica.blogspot.com/

22/71

12/23/2015

Informatica Question Answer


and we need to transform the source data and populate this as below in the target table:
Name

Month

Expense Type

Expense

Sam

Jan

Transport

200

Sam

Jan

House rent

1500

Sam

Jan

Food

500

John

Jan

Transport

300

John

Jan

House rent

1200

John

Jan

Food

300

Tom

Jan

Transport

300

Tom

Jan

House rent

1350

Tom

Jan

Food

350

.. like this.
Now below is the screenshot of a complete mapping which shows how to achieve this result using Informatica
PowerCenter Designer.Image: Normalization Mapping Example 1
[file:///E:/Tutorial/fundooo%20informatica%20(1).doc]

I will explain the mapping further below.

Setting Up Normalizer Transformation Property


First we need to set the number of occurences property of the Expense head as 3 in the Normalizer tab of the
Normalizer transformation, since we have Food,Houserent and Transportation.
Which in turn will create the corresponding 3 input ports in the ports tab along with the fields Individual and Month

In the Ports tab of the Normalizer the ports will be created automatically as configured in the Normalizer tab.
Interestingly we will observe two new columns namely,
GK_EXPENSEHEAD
GCID_EXPENSEHEAD
GK field generates sequence number starting from the value as defined in Sequence field while GCID holds the value
of the occurence field i.e. the column no of the input Expense head.
http://shaninformatica.blogspot.com/

23/71

12/23/2015

Informatica Question Answer


Here 1 is for FOOD, 2 is for HOUSERENT and 3 is for TRANSPORTATION.

Now the GCID will give which expense corresponds to which field while converting columns to rows.
Below is the screenshot of the expression to handle this GCID efficiently:
Image: Expression to handle GCID [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]

Informatica Dynamic Lookup Cache [http://www.dwbiconcepts.com/basic


concept/3etl/22dynamiclookupcache.html]
A LookUp cache does not change once built. But what if the underlying lookup table changes the data after the lookup
cache is created? Is there a way so that the cache always remain uptodate even if the underlying table changes?
Dynamic Lookup Cache

Let's think about this scenario. You are loading your target table through a mapping. Inside the mapping you
have a Lookup and in the Lookup, you are actually looking up the same target table you are loading. You may
ask me, "So? What's the big deal? We all do it quite often...". And yes you are right. There is no "big deal"
because Informatica (generally) caches the lookup table in the very beginning of the mapping, so whatever
record getting inserted to the target table through the mapping, will have no effect on the Lookup cache. The lookup
will still hold the previously cached data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when the target table is changing? What if you want
your lookup cache to always show the exact snapshot of the data in your target table at that point in time? Clearly this
requirement will not be fullfilled in case you use a static cache. You will need a dynamic cache to handle this.

But why anyone will need a dynamic cache?


To understand this, let's first understand a static cache scenario.

Informatica Dynamic Lookup Cache [http://www.dwbiconcepts.com/basic


concept/3etl/22dynamiclookupcache.html]
Saurav Mitra
[http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?
tmpl=component&print=1&layout=default&page=] [http://cdn1.dwbiconcepts.com/basicconcept/3etl/22dynamiclookup
cache.pdf]

Article Index
Informatica Dynamic Lookup Cache [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html] What
is Static Cache [http://www.dwbiconcepts.com/basic concept/3etl/22dynamiclookupcache.html?start=1]
What is Dynamic Cache [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=2]
How does dynamic cache work [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=3] Dynamic
Lookup Mapping Example [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=4] Dynamic Lookup
Sequence ID [http://www.dwbiconcepts.com/basic concept/3etl/22dynamiclookupcache.html?start=5] Dynamic Lookup Ports
[http://www.dwbiconcepts.com/basic concept/3etl/22dynamic lookupcache.html?start=6]

http://shaninformatica.blogspot.com/

24/71

12/23/2015

Informatica Question Answer


NULL handling in LookUp [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=7] Other
Details [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=8]
All Pages [http://www.dwbiconcepts.com/basic concept/3etl/22dynamiclookupcache.html?showall=1]
Page 1 of 9
inShare [file:///E:/Tutorial/fundooo%20informatica%20(1).doc] 0
0diggsdigg

.
A LookUp cache does not change once built. But what if the underlying lookup table changes the data after the lookup
cache is created? Is there a way so that the cache always remain uptodate even if the underlying table changes?
Dynamic Lookup Cache

Let's think about this scenario. You are loading your target table through a mapping. Inside the mapping you
have a Lookup and in the Lookup, you are actually looking up the same target table you are loading. You may
ask me, "So? What's the big deal? We all do it quite often...". And yes you are right. There is no "big deal"
because Informatica (generally) caches the lookup table in the very beginning of the mapping, so whatever
record getting inserted to the target table through the mapping, will have no effect on the Lookup cache. The lookup
will still hold the previously cached data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when the target table is changing? What if you want
your lookup cache to always show the exact snapshot of the data in your target table at that point in time? Clearly this
requirement will not be fullfilled in case you use a static cache. You will need a dynamic cache to handle this.

But why anyone will need a dynamic cache?


To understand this, let's first understand a static cache scenario.
Static Cache Scenario
Let's suppose you run a retail business and maintain all your customer information in a customer master table (RDBMS
table). Every night, all the customers from your customer master table is loaded in to a Customer Dimension table in
your data warehouse. Your source customer table is a transaction system table, probably in 3rd normal form, and does
not store history. Meaning, if a customer changes his address, the old address is updated with the new address. But
your data warehouse table stores the history (may be in the form of SCD TypeII). There is a map that loads your data
warehouse table from the source table. Typically you do a Lookup on target (static cache) and check with your every
incoming customer record to determine if the customer is already existing in target or not. If the customer is not already
existing in target, you conclude the customer is new and INSERT the record whereas if the customer is already existing,
you may want to update the target record with this new record (if the record is updated). This is illustrated below, You
don't need dynamic Lookup cache for this
Image: A static Lookup Cache to determine if a source record is new or updatable
Dynamic Lookup Cache Scenario
Notice in the previous example I mentioned that your source table is an RDBMS table. This ensures that your source
table does not have any duplicate record.
But, What if you had a flat file as source with many duplicate records?
Would the scenario be same? No, see the below illustration.

http://shaninformatica.blogspot.com/

25/71

12/23/2015

Informatica Question Answer

Image: A Scenario illustrating the use of dynamic lookup cache


Here are some more examples when you may consider using dynamic lookup,
Updating a master customer table with both new and updated customer information coming together as shown
above
Loading data into a slowly changing dimension table and a fact table at the same time. Remember, you typically
lookup the dimension while loading to fact. So you load dimension table before loading fact table. But using
dynamic lookup, you can load both simultaneously.
Loading data from a file with many duplicate records and to eliminate duplicate records in target by updating a
duplicate row i.e. keeping the most recent row or the initial row
Loading the same data from multiple sources using a single mapping. Just consider the previous Retail business
example. If you have more than one shops and Linda has visited two of your shops for the first time, customer
record Linda will come twice during the same load.

So, How does dynamic lookup work?


When the Integration Service reads a row from the source, it updates the lookup cache by performing one of the
following actions:
Inserts the row into the cache: If the incoming row is not in the cache, the Integration Service inserts the row in the
cache based on input ports or generated SequenceID. The Integration Service flags the row as insert. Updates the
row in the cache: If the row exists in the cache, the Integration Service updates the row in the cache based on the
input ports. The Integration Service flags the row as update.
Makes no change to the cache: This happens when the row exists in the cache and the lookup is configured or
specified To Insert New Rows only or, the row is not in the cache and lookup is configured to update existing rows
only or, the row is in the cache, but based on the lookup condition, nothing changes. The Integration Service flags
the row as unchanged.
Notice that Integration Service actually flags the rows based on the above three conditions.
And that's a great thing, because, if you know the flag you can actually reroute the row to achieve different logic. This
flag port is called
NewLookupRow
Using the value of this port, the rows can be routed for insert, update or to do nothing. You just need to use a Router or
Filter transformation followed by an Update Strategy.
Oh, forgot to tell you the actual values that you can expect in NewLookupRow port are:
0 = Integration Service does not update or insert the row in the cache.
1 = Integration Service inserts the row into the cache.
2 = Integration Service updates the row in the cache.
When the Integration Service reads a row, it changes the lookup cache depending on the results of the lookup query
and the Lookup transformation properties you define. It assigns the value 0, 1, or 2 to the NewLookupRow port to
indicate if it inserts or updates the row in the cache, or makes no change.
http://shaninformatica.blogspot.com/

26/71

12/23/2015

Informatica Question Answer


Posted 2nd June 2012 by Shankar Prasad

0 Add a comment

2nd June 2012

Datawarehouse and Informatica Interview Question

Datawarehouse and Informatica Interview Question


*******************Shankar Prasad*******************************

1.Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are associated with one Fact
Table ur project?
Ans: Yes
2.What is ROLAP, MOLAP, and DOLAP...?
Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop OLAP). In these three
OLAP
architectures, the interface to the analytic layer is typically the same what is quite different is how the
data is physically stored.
In MOLAP, the premise is that online analytical processing is best implemented by storing the data
multidimensionally that is,
data must be stored multidimensionally in order to be viewed in a multidimensional manner.
In ROLAP, architects believe to store the data in the relational model for instance, OLAP capabilities are
best provided
against the relational database.
DOLAP, is a variation that exists to provide portability for the OLAP user. It creates multidimensional
datasets that can be
transferred from server to desktop, requiring only the DOLAP software to exist on the target system. This
provides significant
advantages to portable computer users, such as salespeople who are frequently on the road and do not
have direct access to
their office server.
3.What is an MDDB? and What is the difference between MDDBs and RDBMSs?
Ans: Multidimensional Database There are two primary technologies that are used for storing the data used in
OLAP applications.
These two technologies are multidimensional databases (MDDB) and relational databases (RDBMS). The
major difference
between MDDBs and RDBMSs is in how they store data. Relational databases store their data in a series
of tables and
columns. Multidimensional databases, on the other hand, store their data in a large
multidimensional arrays.
For example, in an MDDB world, you might refer to a sales figure as Sales with Date, Product, and
Location coordinates of
1212001, Car, and south, respectively.
Advantages of MDDB:
Retrieval is very fast because

The data corresponding to any combination of dimension members can be retrieved with a single I/O.

Data is clustered compactly in a multidimensional array.

Values are caluculated ahead of time.

The index is small and can therefore usually reside completely in memory.
Storage is very efficient because

The blocks contain only data.

A single index locates the block corresponding to a combination of sparse dimension numbers.

4. What is MDB modeling and RDB Modeling?


Ans:
5. What is Mapplet and how do u create Mapplet?
Ans: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse
transformation logic and can
contain as many transformations as you need.
Create a mapplet when you want to use a standardized set of transformation logic in several mappings.
For example, if you
have a several fact tables that require a series of dimension keys, you can create a mapplet containing

http://shaninformatica.blogspot.com/

27/71

12/23/2015

Informatica Question Answer


a series of Lookup
transformations to find each dimension key. You can then use the mapplet in each fact table mapping,
rather than recreate the
same lookup logic in each mapping.
To create a new mapplet:
1. In the Mapplet Designer, choose MappletsCreate Mapplet.
2. Enter a descriptive mapplet name.
The recommended naming convention for mapplets is mpltMappletName.
3. Click OK.
The Mapping Designer creates a new mapplet in the Mapplet Designer.
4. Choose RepositorySave.
6. What for is the transformations are used?
Ans: Transformations are the manipulation of data from how it appears in the source system(s) into another form
in the data
warehouse or mart in a way that enhances or simplifies its meaning. In short, u transform data into
information.
This includes Datamerging, Cleansing, Aggregation:
Datamerging: Process of standardizing data types and fields. Suppose one source system calls integer
type data as smallint
where as another calls similar data as decimal. The data from the two source systems needs to
rationalized when moved into
the oracle data format called number.
Cleansing: This involves identifying any changing inconsistencies or inaccuracies.
Eliminating inconsistencies in the data from multiple sources.
Converting data from different systems into single consistent data set suitable for analysis.
Meets a standard for establishing data elements, codes, domains, formats and naming conventions.
Correct data errors and fills in for missing data values.
Aggregation: The process where by multiple detailed values are combined into a single summary value
typically summation numbers representing dollars spend or units sold.
Generate summarized data for use in aggregate fact and dimension tables.
Data Transformation is an interesting concept in that some transformation can occur during the
extract, some during the
transformation, or even in limited cases during load portion of the ETL process. The type of
transformation function u
need will most often determine where it should be performed. Some transformation functions could even
be performed in more
than one place. Bze many of the transformations u will want to perform already exist in some form or
another in more than
one of the three environments (source database or application, ETL tool, or the target db).
7. What is the difference btween OLTP & OLAP?
Ans: OLTP stand for Online Transaction Processing. This is standard, normalized database structure. OLTP is
designed for
Transactions, which means that inserts, updates, and deletes must be fast. Imagine a call center that takes
orders. Call takers are continually taking calls and entering orders that may contain numerous items. Each
order and each item must be inserted into a database. Since the performance of database is critical, we want
to maximize the speed of inserts (and updates and deletes). To maximize performance, we typically try to
hold as few records in the database as possible.
OLAP stands for Online Analytical Processing. OLAP is a term that means many things to many people. Here,
we will use the term OLAP and Star Schema pretty much interchangeably. We will assume that star schema
database is an OLAP system.( This is not the same thing that Microsoft calls OLAP they extend OLAP to mean
the cube structures built using their product, OLAP Services). Here, we will assume that any system of read only, historical, aggregated data is an OLAP system.
A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost always used to
support decisionmaking in the organization. That is why many data warehouses are considered to be
DSS (DecisionSupport Systems).
Both a data warehouse and a data mart are storage mechanisms for readonly, historical, aggregated data.
By readonly, we mean that the person looking at the data wont be changing it. If a user wants at the sales
yesterday for a certain product, they should not have the ability to change that number.
The historical part may just be a few minutes old, but usually it is at least a day old.A data warehouse
usually holds data that goes back a certain period in time, such as five years. In contrast, standard OLTP
systems usually only hold data as long as it is current or active. An order table, for example, may move
orders to an archive table once they have been completed, shipped, and received by the customer.
When we say that data warehouses and data marts hold aggregated data, we need to stress that there are
many levels of aggregation in a typical data warehouse.
8. If data source is in the form of Excel Spread sheet then how do use?

http://shaninformatica.blogspot.com/

28/71

12/23/2015

Informatica Question Answer


Ans: PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a flat file. Like
relational sources,
the Designer uses ODBC to import a Microsoft Excel source. You do not need database permissions to
import Microsoft
Excel sources.
To import an Excel source definition, you need to complete the following tasks:
Install the Microsoft Excel ODBC driver on your system.
Create a Microsoft Excel ODBC data source for each source file in the ODBC 32bit Administrator.
Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of numeric data.
Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges display as
source definitions
when you import the source.
9. Which db is RDBMS and which is MDDB can u name them?
Ans: MDDB ex. Oracle Express Server(OES), Essbase by Hyperion Software, Powerplay by Cognos and
RDBMS ex. Oracle , SQL Server etc.
10. What are the modules/tools in Business Objects? Explain theier purpose briefly?
Ans: BO Designer, Business Query for Excel, BO Reporter, Infoview,Explorer,WEBI, BO Publisher, and Broadcast
Agent, BO
ZABO).
InfoView: IT portal entry into WebIntelligence & Business Objects.
Base module required for all options to view and refresh reports.
Reporter: Upgrade to create/modify reports on LAN or Web.
Explorer: Upgrade to perform OLAP processing on LAN or Web.
Designer: Creates semantic layer between user and database.
Supervisor: Administer and control access for group of users.
WebIntelligence: Integrated query, reporting, and OLAP analysis over the Web.
Broadcast Agent: Used to schedule, run, publish, push, and broadcast prebuilt reports and
spreadsheets, including event
notification and response capabilities, event filtering, and calendar based
notification, over the LAN, e
mail, pager,Fax, Personal Digital Assistant( PDA), Short Messaging
Service(SMS), etc.
Set Analyzer Applies setbased analysis to perform functions such as execlusion, intersections, unions,
and overlaps
visually.
Developer Suite Build packaged, analytical, or customized apps.
11.What are the Ad hoc quries, Canned Quries/Reports? and How do u create them?
(Plz check this pageC\:BObjects\Quries\Data Warehouse About Queries.htm)
Ans: The data warehouse will contain two types of query. There will be fixed queries that are clearly defined and
well understood, such as regular reports, canned queries (standard reports) and common aggregations. There
will also be ad hoc queries that are unpredictable, both in quantity and frequency.
Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a database. Any business analyst
wants to know what is inside the database. He then proceeds by calculating totals, averages, maximum and
minimum values for most attributes within the database. These are unpredictable element of a data
warehouse. It is exactly that ability to run any query when desired and expect a reasonable response that
makes the data warhouse worthwhile, and makes the design such a significant challenge.
The enduser access tools are capable of automatically generating the database query that answers any
Question posed by the user. The user will typically pose questions in terms that they are familier with (for
example, sales by store last week) this is converted into the database query by the access tool, which is
aware of the structure of information within the data warehouse.
Canned queries: Canned queries are predefined queries. In most instances, canned queries contain prompts
that allow you to customize the query for your specific needs. For example, a prompt may ask you for a
School, department, term, or section ID. In this instance you would enter the name of the School, department
or term, and the query will retrieve the specified data from the Warehouse.You can measure resource
requirements of these queries, and the results can be used for capacity palnning and for database design.
The main reason for using a canned query or report rather than creating your own is that your chances of
misinterpreting data or getting the wrong answer are reduced. You are assured of getting the right data and
the right answer.
12. How many Fact tables and how many dimension tables u did? Which table precedes what?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp
13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp
14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?
Ans: Because its denormalized structure , i.e., Dimension Tables are denormalized. Why to denormalize means
the first (and often
only) answer is : speed. OLTP structure is designed for data inserts, updates, and deletes, but not data
retrieval. Therefore,

http://shaninformatica.blogspot.com/

29/71

12/23/2015

Informatica Question Answer


we can often squeeze some speed out of it by denormalizing some of the tables and having queries go
against fewer tables.
These queries are faster because they perform fewer joins to retrieve the same recordset. Joins are also
confusing to many
End users. By denormalizing, we can present the user with a view of the data that is far easier for them to
understand.

Benefits of STAR SCHEMA:


Far fewer Tables.
Designed for analysis across time.
Simplifies joins.
Less database space.
Supports drilling in reports.
Flexibility to meet business and technical needs.
15. How do u load the data using Informatica?
Ans: Using session.
16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to use FTP without a special
utility?
Ans: (i): The FTP (File Transfer Protocol) utility program is commonly used for copying files to and from other
computers. These
computers may be at the same site or at different sites thousands of miles apart. FTP is general
protocol that works on UNIX
systems as well as other non UNIX systems.
(ii): Remote connect commands: ftp
machinename
ex: ftp 129.82.45.181 [ftp://ftp%20129.82.45.181/] or ftp iesg
If the remote machine has been reached successfully, FTP responds by asking for a loginname and
password. When u enter
ur own loginname and password for the remote machine, it returns the prompt like below
ftp>
and permits u access to ur own home directory on the remote machine. U should be able to move
around in ur own directory
and to copy files to and from ur local machine using the FTP interface commands.
Note: U can set the mode of file transfer to ASCII ( default and transmits seven bits per character). Use
the ASCII mode with any of the following:
Raw Data (e.g. *.dat or *.txt, codebooks, or other plain text documents)
SPSS Portable files.
HTML files.
If u set mode of file transfer to Binary (the binary mode transmits all eight bits per byte and thus
provides less chance of
a transmission error and must be used to transmit files other than ASCII files).
For example use binary mode for the following types of files:
SPSS System files
SAS Dataset
Graphic files (eg., *.gif, *.jpg, *.bmp, etc.)
Microsoft Office documents (*.doc, *.xls, etc.)
(iii): Yes. If u r using Windows, u can access a textbased FTP utility from a DOS prompt.
To do this, perform the following steps:
1.

From the Start Programs MSDos Prompt

2. Enter ftp ftp.geocities.com [ftp://ftp.geocities.com/] . A prompt will appear


(or)
Enter ftp to get ftp prompt ftp> open hostname ex. ftp>open ftp.geocities.com (It connect to the
specified host).
3.

Enter ur yahoo! GeoCities member name.

4.
enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo! GeoCities directory.
17.What cmd is used to transfer multiple files at a time using FTP?
Ans: mget ==> To copy multiple files from the remote machine to the local machine. You will be prompted for a
y/n answer before
transferring each file mget

* ( copies all files in the current remote directory to ur current

local directory,
using the same file names).
mput ==> To copy multiple files from the local machine to the remote machine.
18. What is an Filter Transformation? or what options u have in Filter Transformation?
Ans: The Filter transformation provides the means for filtering records in a mapping. You pass all the rows from a
source
transformation through the Filter transformation, then enter a filter condition for the transformation. All
ports in a Filter

http://shaninformatica.blogspot.com/

30/71

12/23/2015

Informatica Question Answer


transformation are input/output, and only records that meet the condition pass through the Filter
transformation.

Note: Discarded rows do not appear in the session log or reject files

To maximize session performance, include the Filter transformation as close to the sources in the
mapping as possible.

Rather than passing records you plan to discard through the mapping, you then filter out unwanted data
early in the

flow of data from sources to targets.

You cannot concatenate ports from more than one transformation into the Filter transformation the
input ports for the filter
must come from a single transformation. Filter transformations exist within the flow of the mapping and
cannot be
unconnected. The Filter transformation does not allow setting output default values.
19.What are default sources which will supported by Informatica Powermart ?
Ans :
Relational tables, views, and synonyms.
Fixedwidth and delimited flat files that do not contain binary data.
COBOL files.
20. When do u create the Source Definition ? Can I use this Source Defn to any Transformation?
Ans: When working with a file that contains fixedwidth binary data, you must create the source definition.
The Designer displays the source definition as a table, consisting of names, datatypes, and constraints.
To use a source
definition in a mapping, connect a source definition to a Source Qualifier or Normalizer
transformation. The Informatica
Server uses these transformations to read the source data.
21. What is Active & Passive Transformation ?
Ans: Active and Passive Transformations
Transformations can be active or passive. An active transformation can change the number of records
passed through it. A
passive transformation never changes the record count.For example, the Filter transformation removes
rows that do not
meet the filter condition defined in the transformation.
Active transformations that might change the record count include the following:
Advanced External Procedure
Aggregator
Filter
Joiner
Normalizer
Rank
Source Qualifier
Note: If you use PowerConnect to access ERP sources, the ERP Source Qualifier is also an active
transformation.
/*
You can connect only one of these active transformations to the same transformation or target, since the
Informatica
Server cannot determine how to concatenate data from different sets of records with different numbers of
rows.
*/
Passive transformations that never change the record count include the following:
Lookup
Expression
External Procedure
Sequence Generator
Stored Procedure
Update Strategy
You can connect any number of these passive transformations, or

http://shaninformatica.blogspot.com/

connect

one

active

31/71

12/23/2015

Informatica Question Answer


transformation with any number of
passive transformations, to the same transformation or target.
22. What is staging Area and Work Area?
Ans: Staging Area :
Holding Tables on DW Server.
Loaded from Extract Process
Input for Integration/Transformation
May function as Work Areas
Output to a work area or Fact Table

Work Area:

Temporary Tables
Memory

23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page # 125)
Ans: Defn: Data About Data
Metadata contains descriptive data for end users. In a data warehouse the term metadata is used in a
number of different
situations.
Metadata is used for:

Data transformation and load

Data management

Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and any changes that
need to be made. The advantage of storing metadata about the data being transformed is that as source data
changes the changes can be captured in the metadata, and transformation programs automatically
regenerated.
For each source data field the following information is reqd: Source
Field:

Unique identifier (to avoid any confusion occurring betn 2 fields of the same anme from different
sources).

Name (Local field name).

Type (storage type of data, like character,integer,floating pointand so on).

Location
system ( system it comes from ex.Accouting system).
object ( object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:

Unique identifier

Name

Type (database data type, such as Char, Varchar, Number and so on).

Tablename (Name of the table th field will be part of).

The other information that needs to be stored is the transformation or transformations that need to be
applied to turn the source data into the destination data:
Transformation:

Transformation (s)
Name

Language (name of the lanjuage that transformation is written in).


module name
syntax
The Name is the unique identifier that differentiates this from any other similar transformations. The
Language attribute contains the name of the lnguage that the transformation is written in.
The other attributes are module name and syntax. Generally these will be mutually exclusive, with only
one being defined. For simple transformations such as simple SQL functions the syntax will be stored. For
complex transformations the name of the module that contains the code is stored instead.
Data management:
Metadata is reqd to describe the data as it resides in the data warehouse.This is needed by the warhouse
manager to allow it to track and control all data movements. Every object in the database needs to be
described.
Metadata is needed for all the following:

Tables
Columns
name
type

Indexes

http://shaninformatica.blogspot.com/

32/71

12/23/2015

Informatica Question Answer


Columns
name
type

Views
Columns
name
type

Constraints
name
type
table

columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page # 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same metadata can be
used by the Whouse manager to describe the data in the data warehouse is also reqd by the query manager.
The query mangaer will also generate metadata about the queries it has run. This metadata can be used to
build a history of all quries run and generate a query profile for each user, group of users and the data
warehouse as a whole.
The metadata that is reqd for each query is:
query
tables accessed
columns accessed
name
refence identifier
restrictions applied
column name
table name
reference identifier
restriction
join Criteria applied

aggregate functions used

group
by
criteria

sort
criteria

syntax
execution
plan
resources

24. What kind of Unix flavoures u r experienced?


Ans: Solaris 2.5 SunOs 5.5 (Operating System)
Solaris 2.6 SunOs 5.6 (Operating System)
Solaris 2.8 SunOs 5.8 (Operating System)
AIX 4.0.3
5.5.1 2.5.1 May 96 sun4c, sun4m, sun4d, sun4u, x86, ppc
5.6 2.6 Aug. 97 sun4c, sun4m, sun4d, sun4u, x86
5.7 7 Oct. 98 sun4c, sun4m, sun4d, sun4u, x86
5.8 8 2000 sun4m, sun4d, sun4u, x86

25. What are the tasks that are done by Informatica Server?
Ans:The Informatica Server performs the following tasks:
Manages the scheduling and execution of sessions and batches
Executes sessions and batches
Verifies permissions and privileges
Interacts with the Server Manager and pmcmd.

http://shaninformatica.blogspot.com/

33/71

12/23/2015

Informatica Question Answer


The Informatica Server moves data from sources to targets based on metadata stored in a repository. For
instructions on how to move and transform data, the Informatica Server reads a mapping (a type of
metadata that includes transformations and source and target definitions). Each mapping uses a session to
define additional information and to optionally override mappinglevel options. You can group multiple
sessions to run as a single unit, known as a batch.
26. What are the two programs that communicate with the Informatica Server?
Ans: Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server:
Server Manager. A client application used to create and manage sessions and batches, and to monitor and
stop the Informatica Server. You can use information provided through the Server Manager to troubleshoot
sessions and improve session performance.
pmcmd. A commandline program that allows you to start and stop sessions and batches, stop the Informatica
Server, and verify if the Informatica Server is running.
27. When do u reinitialize Aggregate Cache?
Ans: Reinitializing the aggregate cache overwrites historical aggregate data with new aggregate data. When you
reinitialize the
aggregate cache, instead of using the captured changes in source tables, you typically need to use the
use the entire source
table.
For example, you can reinitialize the aggregate cache if the source for a session changes
incrementally every day and
completely changes once a month. When you receive the new monthly source, you might configure the
session to reinitialize
the aggregate cache, truncate the existing target, and use the new source table during the session.

/? Note: To be clarified when server manger works for following ?/


To reinitialize the aggregate cache:
1.In the Server Manager, open the session property sheet.
2.Click the Transformations tab.
3.Check Reinitialize Aggregate Cache. 4.Click
OK three times to save your changes. 5.Run
the session.
The Informatica Server creates a new aggregate cache, overwriting the existing aggregate cache.
/? To be check for step 6 & step 7 after successful run of session ?/
6.After running the session, open the property sheet again. 7.Click
the Data tab.
8.Clear Reinitialize Aggregate Cache.
9.Click OK.
28. (i) What is Target Load Order in Designer?

Ans: Target Load Order: In the Designer, you can set the order in which the Informatica Server sends
records to various target
definitions in a mapping. This feature is crucial if you want to maintain referential integrity when
inserting, deleting, or updating
records in tables that have the primary key and foreign key constraints applied to them. The
Informatica Server writes data to
all the targets connected to the same Source Qualifier or Normalizer simultaneously, to
maximize performance.
28. (ii) What are the minimim condition that u need to have so as to use Targte Load Order Option in
Designer?
Ans: U need to have Multiple Source Qualifier transformations.
To specify the order in which the Informatica Server sends data to targets, create one Source Qualifier or
Normalizer
transformation for each target within a mapping. To set the target load order, you then determine the
order in which each
Source Qualifier sends data to connected targets in the mapping.
When a mapping includes a Joiner transformation, the Informatica Server sends all records to
targets connected to that
Joiner at the same time, regardless of the target load order.
28(iii). How do u set the Target load order?
Ans: To set the target load order:
1. Create a mapping that contains multiple Source Qualifier transformations.
2. After you complete the mapping, choose MappingsTarget Load Plan.
A dialog box lists all Source Qualifier transformations in the mapping, as well as the targets that
receive data from each
Source Qualifier.
3. Select a Source Qualifier from the list.

http://shaninformatica.blogspot.com/

34/71

12/23/2015

Informatica Question Answer


4. Click the Up and Down buttons to move the Source Qualifier within the load order.
5. Repeat steps 3 and 4 for any other Source Qualifiers you wish to reorder.
6. Click OK and Choose RepositorySave.
29. What u can do with Repository Manager?
Ans: We can do following tasks using Repository Manager :
To create usernames, you must have one of the following sets of privileges:
Administer Repository privilege
Super User privilege
To create a user group, you must have one of the following privileges :
Administer Repository privilege
Super User privilege
To assign or revoke privileges , u must hv one of the following privilege..
Administer Repository privilege
Super User privilege
Note: You cannot change the privileges of the default user groups or the default repository users.
30. What u can do with Designer ?
Ans: The Designer client application provides five tools to help you create mappings:
Source Analyzer. Use to import or create source definitions for flat file, Cobol, ERP, and relational
sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.
Note:The Designer allows you to work with multiple tools at one time. You can also work in multiple folders
and repositories
31. What are different types of Tracing Levels u hv in Transformations?
Ans: Tracing Levels in Transformations :
Level
Description
Terse

Indicates when the Informatica Server initializes the session and its

components. Summarizes session results, but not at the level of individual records.
Normal
Includes initialization information as well as error messages and notification of
rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive
information about initializing transformations in the session.
Verbose data
Includes all information provided with the Verbose initialization setting.

Note: By default, the tracing level for every transformation is Normal.


To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum of detail
to the session log
when running a session containing the transformation.
31(i). What the difference is between a database, a data warehouse and a data mart?
Ans: A database is an organized collection of information.
A data warehouse is a very large database with special sets of tools to extract and cleanse data from
operational systems
and to analyze data.
A data mart is a focused subset of a data warehouse that deals with a single area of data and is
organized for quick
analysis.
32. What is Data Mart, Data WareHouse and Decision Support System explain briefly?
Ans: Data Mart:
A data mart is a repository of data gathered from operational data and other sources that is designed to serve
a particular
community of knowledge workers. In scope, the data may derive from an enterprise wide database or data
warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a
particular group of knowledge users in terms of analysis, content, presentation, and easeofuse. Users of a
data mart can expect to have data presented in terms that are familiar.
In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some
form. However, most writers using the term seem to agree that the design of a data mart tends to start from
an analysis of user needs and that a data warehouse tends to start from an analysis of what data already
exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central
aggregation of data (which can be distributed physically) a data mart is a data repository that may derive
from a data warehouse or not and that emphasizes ease of access and usability for a particular designed
purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept a data mart
tends to be tactical and aimed at meeting an immediate need.
Data Warehouse:
A data warehouse is a central repository for all or significant parts of the data that an enterprise's

http://shaninformatica.blogspot.com/

35/71

12/23/2015

Informatica Question Answer


various business systems collect. The term was coined by W. H. Inmon. IBM sometimes uses the term
"information warehouse."
Typically, a data warehouse is housed on an enterprise mainframe server. Data from various online transaction
processing (OLTP) applications and other sources is selectively extracted and organized on the data
warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the
capture of data from diverse sources for useful analysis and access, but does not generally start from the
pointofview of the end user or knowledge worker who may need access to specialized, sometimes local
databases. The latter idea is known as the data mart.
data mining, Web mining, and a decision support system (DSS) are three kinds of applications that can make
use of a data warehouse.
Decision Support System:
A decision support system (DSS) is a computer program application that analyzes business data and presents
it so that users can make business decisions more easily. It is an "informational application" (in distinction to
an "operational application" that collects the data in the course of normal business operation).
Typical information that a decision support application might gather and present would be:
Comparative sales figures between one week and the next Projected
revenue figures based on new product sales assumptions
The consequences of different decision alternatives, given past experience in a context that is described
A decision support system may present information graphically and may include an expert system or artificial
intelligence (AI). It may be aimed at business executives or some other group of knowledge workers.

33. What r the differences between Heterogeneous and Homogeneous?


Ans: Heterogeneous
Stored in different Schemas

Homogeneous
Common structure

Stored in different file or db types

Same database type

Spread across in several countries

Same data center

Different platform n H/W config.

Same platform and H/Ware configuration.

34. How do you use DDL commands in PL/SQL block ex. Accept table name from user and drop it, if available else
display msg?
Ans: To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL, the Package used is
DBMS_SQL.
35. What r the steps to work with Dynamic SQL?
Ans: Open a Dynamic cursor, Parse SQL stmt, Bind i/p variables (if any), Execute SQL stmt of Dynamic Cursor
and
Close the Cursor.
36. Which package, procedure is used to find/check free space available for db objects like
table/procedures/views/synonymsetc?
Ans: The Package
is DBMS_SPACE
The Procedure
is UNUSED_SPACE
The Table

is DBA_OBJECTS

Note: See the script to find free space @ c:\informatica\tbl_free_space


37. Does informatica allow if EmpId is PKey in Target tbl and source data is 2 rows with same EmpID?If u use
lookup for the same
situation does it allow to load 2 rows or only 1?
Ans: => No, it will not it generates pkey constraint voilation. (it loads 1 row)
=> Even then no if EmpId is Pkey.
38. If Ename varchar2(40) from 1 source(siebel), Ename char(100) from another source (oracle) and the target
is having Name
varchar2(50) then how does informatica handles this situation? How Informatica handles string and
numbers datatypes
sources?
39. How do u debug mappings? I mean where do u attack?
40. How do u qry the Metadata tables for Informatica?
41(i). When do u use connected lookup n when do u use unconnected lookup?
Ans:
Connected Lookups :
A connected Lookup transformation is part of the mapping data flow. With connected lookups, you can have
multiple return values. That is, you can pass multiple values from the same row in the lookup table out of
the Lookup transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates
=> Finding a value based on multiple conditions

http://shaninformatica.blogspot.com/

36/71

12/23/2015

Informatica Question Answer


Unconnected Lookups :
An unconnected Lookup transformation exists separate from the data flow in the mapping. You write an
expression using
the :LKP reference qualifier to call the lookup within another transformation.
Some common uses for unconnected lookups include: =>
Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example, updating slowly changing
dimension tables)
=> Calling the same lookup multiple times in one mapping

41(ii). What r the differences between Connected lookups and Unconnected lookups?
Ans:
Although both types of lookups perform the same basic task, there are some important
differences:

Connected Lookup

Unconnected Lookup

Part of the mapping data flow.


Separate from the mapping data flow.
Can return multiple values from the same row.
Returns one value from each row.
You link the lookup/output ports to another
You designate the return value with the Return port
(R).
transformation.
Supports default values.
Does not support default values.
If there's no match for the lookup condition, the
If there's no match for the lookup condition, the server
server returns the default value for all output ports.
returns NULL.
More visible. Shows the data passing in and out
Less visible. You write an expression using :LKP to
tell
of the lookup.
the server when to perform the lookup.
Cache includes all lookup columns used in the
Cache includes lookup/output ports in the Lookup
condition
mapping (that is, lookup table columns included
and lookup/return port.
in the lookup condition and lookup table
columns linked as output ports to other
transformations).

42. What u need concentrate after getting explain plan?


Ans: The 3 most significant columns in the plan table are named OPERATION,OPTIONS, and OBJECT_NAME.For
each step,
these tell u which operation is going to be performed and which object is the target of that operation.
Ex:
**************************
TO USE EXPLAIN PLAN FOR A QRY...
**************************
SQL> EXPLAIN PLAN
2 SET STATEMENT_ID = 'PKAR02'
3 FOR
4 SELECT JOB,MAX(SAL)
5 FROM EMP
6 GROUP BY JOB
7 HAVING MAX(SAL) >= 5000
Explained.
**************************
TO QUERY THE PLAN TABLE :
**************************
SQL> SELECT RTRIM(ID)||' '||
2
LPAD(' ', 2*(LEVEL1))||OPERATION
3
||' '||OPTIONS
4
||' '||OBJECT_NAME STEP_DESCRIPTION
5 FROM PLAN_TABLE
6 START WITH ID = 0 AND STATEMENT_ID = 'PKAR02'
7 CONNECT BY PRIOR ID = PARENT_ID
8 AND STATEMENT_ID = 'PKAR02'
9 ORDER BY ID
STEP_DESCRIPTION

http://shaninformatica.blogspot.com/

37/71

12/23/2015

Informatica Question Answer


0 SELECT STATEMENT
1 FILTER
2
SORT GROUP BY
3
TABLE ACCESS FULL EMP

43. How components are interfaced in Psoft?


Ans:
44. How do u do the analysis of an ETL?
Ans:
==============================================================
45. What is Standard, Reusable Transformation

and Mapplet?

Ans: Mappings contain two types of transformations, standard and reusable. Standard transformations exist
within a single
mapping. You cannot reuse a standard transformation you created in another mapping, nor can you
create a shortcut to that transformation. However, often you want to create transformations that perform
common tasks, such as calculating the average salary in a department. Since a standard transformation
cannot be used by more than one mapping, you have to set up the same transformation each time you want
to calculate the average salary in a department.
Mapplet: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse
transformation logic
and can contain as many transformations as you need. A mapplet can contain transformations,
reusable transformations, and
shortcuts to transformations.
46. How do u copy Mapping, Repository, Sessions?
Ans: To copy an object (such as a mapping or reusable transformation) from a shared folder, press the Ctrl key
and drag and drop
the mapping into the destination folder.
To copy a mapping from a nonshared folder, drag and drop the mapping into the destination folder. In
both cases, the destination folder must be open with the related tool active.
For example, to copy a mapping, the Mapping Designer must be active. To copy a Source Definition, the
Source Analyzer must be active.
Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not dragging it to the workbook.
When asked if you want to make a copy, click Yes, then enter a new name and click OK.
Choose RepositorySave.
Repository Copying: You can copy a repository from one database to another. You use this feature before
upgrading, to
preserve the original repository. Copying repositories provides a quick way to copy all metadata you want
to use as a basis for
a new repository.
If the database into which you plan to copy the repository contains an existing repository, the Repository
Manager deletes the existing repository. If you want to preserve the old repository, cancel the copy. Then back
up the existing repository before copying the new repository.
To copy a repository, you must have one of the following privileges:

Administer Repository privilege

Super User privilege

To copy a repository:
1. In the Repository Manager, choose RepositoryCopy Repository.
2. Select a repository you wish to copy, then enter the following information:
Copy Repository Field Required/ Optional

Repository

Required

Description

Name for the repository copy. Each repository

name must be unique within


the domain and should be easily distinguished from
all other repositories.
Database Username Required Username required to connect to the database. This login must have the
appropriate database permissions to create the
repository.

http://shaninformatica.blogspot.com/

38/71

12/23/2015

Informatica Question Answer


Database Password

Required

Password associated with the database

username.Must be in USASCII.
ODBC Data Source

Required

Data source used to connect to the database.

Native Connect String Required

Connect string identifying the location of the

database.
Code Page

Required

Character set associated with the repository.

Must be a superset of the code


page of the repository you want to copy.
If you are not connected to the repository you want to copy, the Repository Manager asks you to log in.
3.

Click OK.

5. If asked whether you want to delete an existing repository data in the second repository, click OK to delete it.
Click Cancel to preserve the existing repository.
Copying Sessions:
In the Server Manager, you can copy standalone sessions within a folder, or copy sessions in and out of
batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission

Super User privilege


To copy a session:
1. In the Server Manager, select the session you wish to copy.
2. Click the Copy Session button or choose OperationsCopy Session.
The Server Manager makes a copy of the session. The Informatica Server names the copy after the original
session, appending a number, such as session_name1.
47. What are shortcuts, and what is advantage?
Ans: Shortcuts allow you to use metadata across folders without making copies, ensuring uniform metadata. A
shortcut inherits all
properties of the object to which it points. Once you create a shortcut, you can configure the shortcut
name and description.
When the object the shortcut references changes, the shortcut inherits those changes. By using a
shortcut instead of a copy,
you ensure each use of the shortcut exactly matches the original object. For example, if you have a
shortcut to a target
definition, and you add a column to the definition, the shortcut automatically inherits the additional
column.
Shortcuts allow you to reuse an object without creating multiple objects in the repository. For example, you
use a source
definition in ten mappings in ten different folders. Instead of creating 10 copies of the same source
definition, one in each
folder, you can create 10 shortcuts to the original source definition.
You can create shortcuts to objects in shared folders. If you try to create a shortcut to a nonshared folder,
the Designer
creates a copy of the object instead.
You can create shortcuts to the following repository objects:
Source definitions
Reusable transformations
Mapplets
Mappings
Target definitions
Business components
You can create two types of shortcuts:
Local shortcut. A shortcut created in the same repository as the original object.
Global shortcut. A shortcut created in a local repository that references an object in a global
repository.
Advantages: One of the primary advantages of using a shortcut is maintenance. If you need to change all
instances of an
object, you can edit the original repository object. All shortcuts accessing the object automatically inherit
the changes.
Shortcuts have the following advantages over copied repository objects:

You can maintain a common repository object in a single location. If you need to edit the object, all
shortcuts immediately inherit the changes you make.

You can restrict repository users to a set of predefined metadata by asking users to incorporate the
shortcuts into their work instead of developing repository objects independently.

You can develop complex mappings, mapplets, or reusable transformations, then reuse them
easily in other folders.

You can save space in your repository by keeping a single repository object and using shortcuts to
that object, instead of creating copies of the object in multiple folders or multiple repositories.

http://shaninformatica.blogspot.com/

39/71

12/23/2015

Informatica Question Answer


48. What are Presession and Postsession Options?
(Plzz refer Help Using Shell Commands n PostSession Commands and Email)
Ans: The Informatica Server can perform one or more shell commands before or after the session runs. Shell
commands are
operating system commands. You can use pre or post session shell commands, for example, to delete a
reject file or
session log, or to archive target files before the session begins.
The status of the shell command, whether it completed successfully or failed, appears in the session log
file.
To call a pre or postsession shell command you must:
1. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for
Windows NT servers.
2.

Configure the session to execute the pre or postsession shell commands.

You can configure a session to stop if the Informatica Server encounters an error while executing pre session
shell commands.
For example, you might use a shell command to copy a file from one directory to another. For a Windows NT
server you would use the following shell command to copy the SALES_ ADJ file from the target directory, L, to
the source, H:
copy L:\sales\sales_adj H:\marketing\
For a UNIX server, you would use the following command line to perform a similar operation: cp
sales/sales_adj marketing/
Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the Informatica Server.
Environment settings in one shell command script do not carry over to other scripts. To run all shell
commands in the same environment, call a single shell script that in turn invokes other scripts.
49. What are Folder Versions?
Ans: In the Repository Manager, you can create different versions within a folder to help you archive work in
development. You can copy versions to other folders as well. When you save a version, you save all metadata
at a particular point in development. Later versions contain new or modified metadata, reflecting work that
you have completed since the last version.
Maintaining different versions lets you revert to earlier work when needed. By archiving the contents of a
folder into a version each time you reach a development landmark, you can access those versions if later edits
prove unsuccessful.
You create a folder version after completing a version of a difficult mapping, then continue working on the
mapping. If you are unhappy with the results of subsequent work, you can revert to the previous version, then
create a new version to continue development. Thus you keep the landmark version intact, but available for
regression.
Note: You can only work within one version of a folder at a time.
50. How do automate/schedule sessions/batches n did u use any tool for automating Sessions/batch?
Ans: We scheduled our sessions/batches using Server Manager.
You can either schedule a session to run at a given time or interval, or you can manually start the
session.
U needto hv create sessions n batches with Read n Execute permissions or super user privilege.
If you configure a batch to run only on demand, you cannot schedule it.
Note: We did not use any tool for automation process.
51. What are the differences between 4.7 and 5.1 versions?
Ans: New Transformations added like XML Transformation and MQ Series Transformation, and PowerMart and
PowerCenter both
are same from 5.1version.
52. What r the procedure that u need to undergo before moving Mappings/sessions from Testing/Development to
Production?
Ans:
53. How many values it (informatica server) returns when it passes thru Connected Lookup n Unconncted
Lookup?
Ans: Connected Lookup can return multiple values where as Unconnected Lookup will return only one values that
is Return Value.
54. What is the difference between PowerMart and PowerCenter in 4.7.2?
Ans: If You Are Using PowerCenter
PowerCenter allows you to register and run multiple Informatica Servers against the same repository.
Because you can run
these servers at the same time, you can distribute the repository session load across available servers to

http://shaninformatica.blogspot.com/

40/71

12/23/2015

Informatica Question Answer


improve overall
performance.
With PowerCenter, you receive all product functionality, including distributed metadata, the ability to organize
repositories into
a data mart domain and share metadata across repositories.
A PowerCenter license lets you create a single repository that you can configure as a global repository, the
core component
of a data warehouse.
If You Are Using PowerMart
This version of PowerMart includes all features except distributed metadata and multiple registered
servers. Also, the various
options available with PowerCenter (such as PowerCenter Integration Server for BW, PowerConnect for
IBM DB2,
PowerConnect for SAP R/3, and PowerConnect for PeopleSoft) are not available with PowerMart.

55. What kind of modifications u can do/perform with each Transformation?


Ans: Using transformations, you can modify data in the following ways:
Task

Transformation

Calculate a value
Expression
Perform an aggregate calculations
Aggregator
Modify text
Expression
Filter records
Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure
Stored Procedure
Call a procedure in a shared library or in the
External Procedure
COM layer of Windows NT
Generate primary keys
Sequence Generator
Limit records to a top or bottom range
Rank
Normalize records, including those read
Normalizer
from COBOL sources
Look up values
Lookup
Determine whether to insert, delete, update,
Update Strategy
or reject records
Join records from different databases
Joiner
or flat file systems
56. Expressions in Transformations, Explain briefly how do u use?
Ans: Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The most obvious
examples of these are the
Expression and Aggregator transformations, which perform calculations on either single values or an
entire range of values
within a port. Transformations that use expressions include the following:
Transformation
Expression
Aggregator
Filter
expression.
Rank
Update Strategy

How It Uses Expressions


Calculates the result of an expression for each row passing
through the transformation, using values from one or more ports.
Calculates the result of an aggregate expression, such as a sum or
average, based on all data passing through a port or on groups within that data.
Filters records based on a condition you enter using an
Filters the top or bottom range of records, based on a condition
you enter using an expression.
Assigns a numeric code to each record based on an expression,
indicating whether the Informatica Server should use the information in the record
to insert, delete, or update the target.

In each transformation, you use the Expression Editor to enter the expression. The Expression Editor
supports the transformation language for building expressions. The transformation language uses SQLlike
functions, operators, and other components to build the expression. For example, as in SQL, the
transformation language includes the functions COUNT and SUM. However, the PowerMart/PowerCenter
transformation language includes additional functions not found in SQL.
When you enter the expression, you can use values available through ports. For example, if the
transformation has two input ports representing a price and sales tax rate, you can calculate the final sales
tax using these two values. The ports used in the expression can appear in the same transformation, or you
can use output ports in other transformations.
57. In case of Flat files (which comes thru FTP as source) has not arrived then what happens?Where do u set
this option?
Ans: U get an fatel error which cause server to fail/stop the session.

http://shaninformatica.blogspot.com/

41/71

12/23/2015

Informatica Question Answer


U can set EventBased Scheduling Option in Session Properties under General tab>Advanced options..
EventBased

Required/ Optional

Indicator File to Wait For


the indicator file

Description

Optional

Required to use eventbased scheduling. Enter


(or directory and file) whose arrival schedules

the session. If you do


not enter a directory, the Informatica Server
assumes the file appears
in the server variable directory $PMRootDir.
58. What is the Test Load Option and when you use in Server Manager?
Ans: When testing sessions in development, you may not need to process the entire source. If this is true, use
the Test Load
Option(Session Properties General Tab Target Options Choose Target Load options as Normal
(option button), with
Test Load cheked (Check box) and No.of rows to test ex.2000 (Text box with Scrolls)). You can also
click the Start button.

59. SCD Type 2 and SGT difference?


60. Differences between 4.7 and 5.1?
61. Tuning Informatica Server for improving performance? Performance Issues?
Ans: See /* C:\pkar\Informatica\Performance Issues.doc */
62. What is Override Option? Which is better?
63. What will happen if u increase buffer size?
64. what will happen if u increase commit Intervals? and also decrease commit Intervals?
65. What kind of Complex mapping u did? And what sort of problems u faced?
66. If u have 10 mappings designed and u need to implement some changes(may be in existing mapping or
new mapping need to
be designed) then how much time it takes from easier to complex?
67. Can u refresh Repository in 4.7 and 5.1? and also can u refresh pieces (partially) of repository in 4.7 and
5.1?
68. What is BI?
Ans: http://www.visionnet.com/bi/index.shtml [http://www.visionnet.com/bi/index.shtml]
69. Benefits of BI?
Ans: http://www.visionnet.com/bi/bibenefits.shtml [http://www.visionnet.com/bi/bibenefits.shtml]
70. BI Faq
Ans: http://www.visionnet.com/bi/bifaq.shtml [http://www.visionnet.com/bi/bifaq.shtml]
71. What is difference between data scrubbing and data cleansing?
Ans: Scrubbing data is the process of cleaning up the junk in legacy data and making it accurate and useful for
the next generations
of automated systems. This is perhaps the most difficult of all conversion activities. Very often, this is
made more difficult when
the customer wants to make good data out of bad data. This is the dog work. It is also the most
important and can not be done
without the active participation of the user.
DATA CLEANING a two step process including DETECTION and then CORRECTION of errors in a data
set
72. What is Metadata and Repository?
Ans:
Metadata. Data about data .
It contains descriptive data for end users.
Contains data that controls the ETL processing.
Contains data about the current state of the data warehouse.
ETL updates metadata, to provide the most current state.
Repository. The place where you store the metadata is called a repository. The more sophisticated your
repository, the more
complex and detailed metadata you can store in it. PowerMart and PowerCenter use a relational
database as the

http://shaninformatica.blogspot.com/

42/71

12/23/2015

Informatica Question Answer


repository.

73. SQL * LOADER?


Ans: http://downloadwest.oracle.com/otndoc/oracle9i/901_doc/server.901/a90192/ch03.htm#1004678 [http://download
west.oracle.com/otndoc/oracle9i/901_doc/server.901/a90192/ch03.htm#1004678]

74. Debugger in Mapping?


75. Parameters passing in 5.1 vesion exposure?
76. What is the filename which u need to configure in Unix while Installing Informatica?
77. How do u select duplicate rows using Informatica i.e., how do u use Max(Rowid)/Min(Rowid) in
Informatica?
**********************************Shankar
Prasad*************************************************

Posted 2nd June 2012 by Shankar Prasad


0

2nd June 2012

Add a comment

Datawarehouse BASIC DEFINITIONS Informatica


Datawarehouse BASIC DEFINITIONS (by Shankar Prasad)

DWH : is a repository of integrated information, specifically structured for queries and analysis. Data and information
are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to
run queries over data that originally came from different sources.
Data Mart : is a collection of subject areas organized for decision support based on the needs of a given
department. Ex : sales, marketing etc. the data mart is designed to suit the needs of a department. Data
mart is much less granular than the ware house data
Data Warehouse : is used on an enterprise level, while data marts is used on a business division / department level.
Data warehouses are arranged around the corporate subject areas found in the corporate data model. Data warehouses
contain more detail information while most data marts contain more summarized or aggregated data.

OLTP : Online Transaction Processing. This is standard, normalized database structure. OLTP is
designed for Transactions, which means that inserts, updates and deletes must be fast.
OLAP : Online Analytical Processing. Readonly, historical, aggregated data.
Fact Table : contain the quantitative measures about the business
Dimension Table : descriptive data about the facts (business)
Conformed dimensions : dimension table shared by fact tables.. these tables connect separate star
schemas into an enterprise star schema.
Star Schema : is a set of tables comprised of a single, central fact table surrounded by denormalized
dimensions. Star schema implement dimensional data structures with denormalized dimensions Snow
Flake : is a set of tables comprised of a single, central fact table surrounded by normalized dimension
hierarchies. Snowflake schema implement dimensional data structures with fully normailized
dimensions.
Staging Area : it is the work place where raw data is brought in, cleaned, combined, archived and
exported to one or more data marts. The purpose of data staging area is to get data ready for loading into
a presentation layer.
Queries : The DWH contains 2 types of queries. There will be fixed queries that are clearly defined and
well understood, such as regular reports, canned queries and common aggregations.
There will also be ad hoc queries that are unpredictable, both in quantity and frequency.
Ad Hoc Query : are the starting point for any analysis into a database. The ability to run any query when
desired and expect a reasonable response that makes the data warehouse worthwhile and makes the
design such a significant challenge.
The enduser access tools are capable of automatically generating the database query that answers
http://shaninformatica.blogspot.com/

43/71

12/23/2015

Informatica Question Answer


any question posted by the user.
Canned Queries : are predefined queries. Canned queries contain prompts that allow you to customize
the query for your specific needs
Kimball (Bottom up) vs Inmon (Top down) approaches :
Acc. To Ralph Kimball, when you plan to design analytical solutions for an enterprise, try building data
marts. When you have 3 or 4 such data marts, you would be having an enterprise wide data warehouse
built up automatically without time and effort from exclusively spent on building the EDWH. Because the
time required for building a data mart is lesser than for an EDWH.
INMON : try to build an Enterprise wide Data warehouse first and all the data marts will be the subsets of
the EDWH. Acc. To him, independent data marts cannot make up an enterprise data warehouse under
any circumstance, but they will remain isolated pieces of information stove pieces
************************************************************************************************************************

Dimensional Data Model :


Dimensional data model is most often used in data warehousing systems. This is different from the 3rd normal form, commonly
used for transactional (OLTP) type systems. As you can imagine, the same data would then be stored differently in a
dimensional model than in a 3rd normal form model.
To understand dimensional data modeling, let's define some of the terms commonly used in this type of modeling:
Dimension: A category of information. For example, the time dimension.
Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.
Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For
example, one possible hierarchy in the Time dimension is Year > Quarter > Month > Day.
Fact Table [http://www.1keydata.com/datawarehousing/fact tablegranularity.html] : A fact table is a table that contains the measures
of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate
granularity. For example, it can be sales amount by store by day. In this case, the fact table would contain three columns: A
date column, a store column, and a sales amount column.
Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the
Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have
several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that
particular quarter is represented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").
A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables
do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are
the nonkey columns in the lookup tables.
In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and
Snowflake Schema.
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other
surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of
one fact table; a complex star can have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the star explodes into
more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk
storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional
maintenance efforts needed due to the increase number of lookup tables.
Whether one uses a star or a snowflake largely depends on personal preference and business needs. Personally, I am partial to
snowflakes, when there is a business case to analyze the information at that particular level.
Slowly Changing Dimensions:
The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a nutshell, this applies to cases
where the attribute for a record varies over time. We give an example below:
Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has
the following record:
Customer Key

Name

State

1001

Christina

Illinois

At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to
reflect this change? This is the "Slowly Changing Dimension" problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:
Type 1 [http://www.1keydata.com/datawarehousing/scd type1.html] : The new record replaces the original record. No trace of the
old record exists.
Type 2 [http://www.1keydata.com/datawarehousing/scd type2.html] : A new record is added into the customer dimension table.
Therefore, the customer is treated essentially as two people.
Type 3 [http://www.1keydata.com/datawarehousing/scd type3.html] : The original record is modified to reflect the change.
We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we
compare and contrast among the three alternatives.
Type 1 Slowly Changing Dimension:
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no
history is kept.
In our example, recall we originally have the following table:
Customer Key

Name

State

1001

Christina

Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and we have the following

http://shaninformatica.blogspot.com/

44/71

12/23/2015

Informatica Question Answer


table:
Customer Key

Name

State

1001

Christina

California

Advantages:
This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old
information.
Disadvantages:
All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the
company would not be able to know that Christina lived in Illinois before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical
changes.
Type 2 Slowly Changing Dimension:
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both
the original and the new record will be present. The newe record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key

Name

State

1001

Christina

Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the table:
Customer Key

Name

State

1001

Christina

Illinois

1005

Christina

California

Advantages:
This allows us to accurately keep all historical information.
Disadvantages:
This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with,
storage and performance can become a concern.
This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.
Type 3 Slowly Changing Dimension :
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating
the original value, and one indicating the current value. There will also be a column that indicates when the current value
becomes active.
In our example, recall we originally have the following table:
Customer Key

Name

State

1001

Christina

Illinois

To accomodate Type 3 Slowly Changing Dimension, we will now have the following columns:
Customer Key
Name
Original State
Current State
Effective Date
After Christina moved from Illinois to California, the original information gets updated, and we have the following table
(assuming the effective date of change is January 15, 2003):
Customer Key

Name

Original State

Current State

Effective Date

1001

Christina

Illinois

California

15JAN2003

Advantages:
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later
moves to Texas on December 15, 2003, the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical
changes, and when such changes will only occur for a finite number of time.
Surrogate key :
A surrogate key is frequently a sequential number but doesn't have to be. Having the key independent of all other columns
insulates the database relationships from changes in data values or database design and guarantees uniqueness.
Some database designers use surrogate keys religiously regardless of the suitability of other candidate keys. However, if a
good key already exists, the addition of a surrogate key will merely slow down access, particularly if it is indexed.
The concept of surrogate key is important in data warehouse ,surrogate means deputy or substitute. surrogate key is a small
integer(say 4 bytes)that can uniquely identify the record in the dimension table.however it has no meaning data warehouse experts
suggest that production key used in the databases should not be used in the dimension tables as primary keys instead in

http://shaninformatica.blogspot.com/

45/71

12/23/2015

Informatica Question Answer


there place the surrogate key have to be used which are generated automatically.

Conceptual, Logical, And Physical Data Models:


There are three levels of data modeling. They are conceptual, logical, and physical. This section will explain the difference
among the three, the order with which each one is created, and how to go from one level to the other.
Conceptual Data Model
Features of conceptual data model include:
Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.
At this level, the data modeler attempts to identify the highestlevel relationships among the different entities.
Logical Data Model
Features of logical data model include:
Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity specified.
Foreign keys (keys identifying the relationship between different entities) are specified.
Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be
physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step
(deliverable).
The steps for designing the logical data model are as follows:

1. Identify all entities.


2. Specify primary keys for all entities.
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve manytomany relationships.
6. Normalization.
Physical Data Model
Features of physical data model include:
Specification all tables and columns.
Foreign keys are used to identify relationships between tables.
Denormalization may occur based on user requirements.
Physical considerations may cause the physical data model to be quite different from the logical data model.
At this level, the data modeler will specify how the logical data model will be realized in the database schema.
The steps for physical data model design are as follows:

1. Convert entities into tables.


2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical constraints / requirements.
What Is OLAP :
OLAP stands for OnLine Analytical Processing. The first attempt to provide a definition to OLAP was by Dr. Codd, who
proposed 12 rules for OLAP. Later, it was discovered that this particular white paper was sponsored by one of the OLAP tool
vendors, thus causing it to lose objectivity. The OLAP Report has proposed the FASMI test, Fast Analysis of Shared
Multidimensional Information. For a more detailed description of both Dr. Codd's rules and the FASMI test, please visit The
OLAP Report [http://www.olapreport.com/fasmi.htm] .
For people on the business side, the key feature out of the above list is "Multidimensional." In other words, the ability to
analyze metrics in different dimensions such as time, geography, gender, product, etc. For example, sales for the company is
up. What region is most responsible for this increase? Which store in this region is most responsible for the increase? What
particular product category or categories contributed the most to the increase? Answering these types of questions in order
means that you are performing an OLAP analysis.
Depending on the underlying technology used, OLAP can be braodly divided into two different camps: MOLAP and ROLAP. A
discussion of the different OLAP types can be found in the MOLAP, ROLAP, and HOLAP
[http://www.1keydata.com/datawarehousing/molap rolap.html] section.
In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP (ROLAP).
Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP.
MOLAP
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in
the relational database, but in proprietary formats.
Advantages:
Excellent performance: MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing operations.
Can perform complex calculations: All calculations have been pregenerated when the cube is created. Hence, complex
calculations are not only doable, but they return quickly.
Disadvantages:
Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not
possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived
from a large amount of data. Indeed, this is possible. But in this case, only summarylevel information will be included in
the cube itself.

http://shaninformatica.blogspot.com/

46/71

12/23/2015

Informatica Question Answer


Requires additional investment: Cube technology are often proprietary and do not already exist in the organization.
Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed.
ROLAP
This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional
OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in
the SQL statement.
Advantages:
Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the
underlying relational database. In other words, ROLAP itself places no limitation on data amount.
Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of
functionalities. ROLAP technologies, since they sit on top of the relational database, can therefore leverage these
functionalities.
Disadvantages:
Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the
relational database, the query time can be long if the underlying data size is large.
Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL statements to query the
relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations
using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this
risk by building into the tool outofthebox complex functions as well as the ability to allow users to define their own
functions.
HOLAP
HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summarytype information, HOLAP
leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the
cube into the underlying relational data.
Bill Inmon vs. Ralph Kimball:
In the data warehousing field, we often hear about discussions on where a person / organization's philosophy falls into Bill
Inmon's camp or into Ralph Kimball's camp. We describe below the difference between the two.
Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data
warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in
3rd normal form.
Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always
stored in the dimensional model.
There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the
data warehouse in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a
departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolve into
a data warehouse.
********************************************************************************
******************** Shankar Prasad ****************************************
********************************************************************************

Posted 2nd June 2012 by Shankar Prasad


0

2nd June 2012

Add a comment

Informatica Interview Question Answer

Informatica Interview Question


Answer: by Shankar Prasad

Q.What are Target Types on the Server?


A.Target Types are File
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] ,

Relational and ERP.

Q.What are Target Types on the Server?


A.Target Types are File

http://shaninformatica.blogspot.com/

47/71

12/23/2015

Informatica Question Answer


[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] ,

Relational and ERP.

Q.How do you identify existing rows of data in the target table using lookup transformation?
A.There are two ways to lookup the target table to verify a row exists or not :
1.Use connect dynamic cache lookup and then check the values of NewLookuprow Output
port to decide whether the incoming record already exists in the table / cache or not.
2.Use Unconnected lookup and call it from an expression transformation and check the
Lookup condition port value (Null/ Not Null) to decide whether the incoming record
already exists in the table or not.
Q.What are Aggregate transformations?
A.Aggregator transform is much like the Group by clause in traditional SQL.
This particular transform is a connected/ active
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc ]

transform which can take the incoming data from the mapping pipeline and group them
based on the group by ports specified and can caculated aggregate functions like ( avg, sum,
count, stddev....etc) for each of those groups. From a performance perspective if your
mapping has an AGGREGATOR transform use filters and sorters very early in the pipeline if
there is any need for them.
Q.What are various types of Aggregation?
A.Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST, MEDIAN, PERCENTILE, STDDEV, and VARIANCE.
Q.What are Dimensions and various types of Dimension?

A.Dimensions are classified to 3 types.


1.SCD TYPE 1(Slowly Changing Dimension): this contains current data.
2.SCD TYPE 2(Slowly Changing Dimension): this contains current data + complete historical data.
3.SCD TYPE 3(Slowly Changing Dimension): this contains current data.+partially historical data
Q.What are 2 modes of data
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

movement in Informatica Server?


A. The data movement mode depends on whether Informatica Server should process single byte or multibyte
character data. This mode selection can affect the enforcement of code page relationships and code page
validation in the Informatica Client and Server.

a)Unicode IS allows 2 bytes for each character and uses additional byte for each nonascii
character (such as Japanese characters)
b)ASCII IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration parameters.
This comes into effect once you restart the Informatica Server.

Q.What is Code Page Compatibility?


A. Compatibility between code pages is used for accurate data movement when the Informatica Sever
runs in the Unicode data movement mode. If the code pages are identical, then there will not be any data
loss. One code page can be a subset or superset of another. For accurate data movement, the target code
page must be a superset of the source code page.

`Superset A code page is a superset of another code page when it contains the
character encoded in the other code page, it also contains additional characters not
contained in the other code page.
Subset A code page is a subset of another code page when all characters in the code
page are encoded in the other code page.
What is Code Page used for?
Code Page is used to identify characters that might be in different languages. If you are importing Japanese data
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] into
mapping, u must select the Japanese code page of source data.
Q.What is Router transformation?
A. It is different from filter transformation in that we can specify multiple conditions and route the
data to multiple targets depending on the condition.

Q.What is Load Manager?

A.While running
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] a

Workflow, the PowerCenter Server


[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

uses the Load Manager process and the Data Transformation Manager Process (DTM) to run
the workflow and

http://shaninformatica.blogspot.com/

48/71

12/23/2015

Informatica Question Answer


carry out workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager
performs the following tasks:
1.Locks the workflow and reads workflow properties.
2.Reads the parameter file and expands workflow variables.
3.Creates the workflow log file.
4.Runs workflow tasks.
5.Distributes sessions to worker servers.
6.Starts the DTM to run sessions.
7.Runs sessions from master servers.
8.Sends postsession email if the DTM terminates abnormally.
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1.Fetches session and mapping metadata from the repository.
2.Creates and expands session variables.
3.Creates the session log file.
4.Validates session code pages if data code page validation is
enabled. Checks query conversions if data code page validation is
disabled.
5.Verifies connection object permissions.
6.Runs presession shell commands.
7.Runs presession stored procedures and SQL.
8.Creates and runs mappings, reader, writer, and transformation threads to extract,
transform, and load data.
9.Runs postsession stored procedures and SQL.
10.
Runs postsession shell commands.
11.
Sends postsession email.

Q.What is Data Transformation Manager?

A.After the load manager performs


[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

validations for the session, it creates the DTM process. The DTM process is the second
process associated with the session run. The primary purpose of the DTM process is to create
and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divide it into buffers. This is also
known as buffer memory. It creates the main thread, which is called the master thread.
The master thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition to allow
concurrent processing.. When Informatica server writes messages to the session log it
includes thread type and thread ID.
Following are the types of threads that DTM creates:
Master Thread Main thread of the DTM process. Creates and manages all other threads.
Mapping Thread One Thread to Each Session. Fetches Session and Mapping Information.
Pre and Post Session Thread One Thread each to Perform Pre and Post Session
Operations.
Reader Thread One Thread for Each Partition for Each Source Pipeline.
Writer Thread One Thread for Each Partition if target exist in the source pipeline write to
the target.
Transformation Thread One or More Transformation Thread For Each Partition.

Q.What is Session and Batches?


A. Session A Session Is A set of instructions that tells the Informatica Server How And When To Move
Data From Sources To Targets. After creating the session, we can use either the server manager or
the command line program pmcmd to start or stop the session. Batches It Provides A Way to
Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two
Types Of Batches :
1. Sequential Run Session One after the Other.
2. Concurrent Run Session At The Same Time.

Q.What is a source qualifier?

A. It represents all data

http://shaninformatica.blogspot.com/

49/71

12/23/2015

Informatica Question Answer


[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

queried from the source.


Q.Why we use lookup transformations?
A.Lookup Transformations can access data
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

from

relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the
following tasks:

Get a related valueGet the Employee Name from Employee table based
on the Employee ID Perform Calculation.
Update slowly changing dimension tables We can use unconnected lookup transformation
to determine whether the records already exist in the target or not.
Q. While importing the relational source definition from database, what are the meta data of source U
import?
Source name
Database location
Column names
Data types
Key constraints
Q.How many ways you can update a relational source definition and what are they?
A. Two ways
1.Edit the definition
2.Reimport the definition
Q.Where should you place the flat file to import the flat file definition to the designer?
A. Place it in local
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] folder

Q.Which transformation should u need while using the Cobol sources as source definitions?
A. Normalizer transformation which is used to normalize the data. Since Cobol sources r often consists of
denormalized data.

Q.How can you create or import flat file definition in to the warehouse designer?
A. You can create flat file definition in warehouse designer. In the warehouse designer, you can create
a new target: select the type as flat file. Save it and u can enter various columns for that created
target by editing its properties. Once the target is created, save it. You can import it from the
mapping designer.
Q.What is a mapplet?

A. A mapplet should have a mapplet input transformation which receives input values,
and an output transformation which passes the final modified data to back to the
mapping. Set of transformations where the logic can be reusable when the mapplet is
displayed within the mapping only input & output ports are displayed so that the internal
logic is hidden from end user point of view.
Q.What is a transformation?
A.It is a repository object that generates, modifies or passes data.
Q.What are the designer tools for creating transformations?
A.Mapping designer
Transformation developer
Mapplet designer
Q.What are connected and unconnected transformations?
A.Connect Transformation : A transformation which participates in the mapping data flow . Connected
transformation can receive
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

multiple inputs and provides multiple outputs

Unconnected: An unconnected transformation does not participate in the mapping


data flow. It can receive multiple inputs and provides single output

Q.In how many ways can you create ports?


A.Two ways
1.Drag the port from another transformation
2.Click the add button on the ports tab.

http://shaninformatica.blogspot.com/

50/71

12/23/2015

Informatica Question Answer


Q.What are reusable transformations?
A.A transformation that can be reused is called a reusable
transformation They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q.What are mapping parameters and mapping variables?
A. Mapping parameter represents a constant value that U can define before running a session. A
mapping parameter retains the same value throughout the entire session.
When u use the mapping parameter ,U declare and use the parameter in a mapping or mapplet.
Then define the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change throughout the
session. The Informatica server saves the value of mapping variable to the repository at the end of
session run and uses that value next time U run the session.
Q.Can U use the mapping parameters or variables created in one mapping into another mapping?
A.NO.
We can use mapping parameters or variables in any transformation of the same mapping or mapplet
in which U have created mapping parameters or variables.
Q.How can U improve session performance in aggregator transformation?

A.1. Use sorted input. Use a sorter before the aggregator


2. Do not forget to check the option on the aggregator that tells the aggregator that the
input is sorted on the same keys as group by. The key order is also very important.
Q.Is aggregate cache in aggregator transformation?
A.The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u
run a session that uses an aggregator transformation, the Informatica server creates index and data
caches in memory to process the transformation. If the Informatica server requires more space, it stores
overflow values in cache files.
Q.What r the difference between joiner transformation and source qualifier transformation?
A. You can join heterogeneous data sources in joiner transformation which we cannot achieve in source qualifier
transformation.
You need matching keys to join two relational sources in source qualifier transformation. Whereas u doesnt
need matching keys to join two sources.
Two relational sources should come from same data source in sourcequalifier. You can join relational sources
which r coming from different sources also.
Q.In which conditions can we not use joiner transformations?
A.You cannot use a Joiner transformation in the following situations (according
to Informatica 7.1): Either input pipeline contains an Update Strategy
transformation.
You connect a Sequence Generator transformation directly
before the Joiner transformation.
Q.What r the settings that u use to configure the joiner transformation?
A. Master and detail source
Type of join
Condition of the join
Q. What are the join types
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] in joiner

transformation?
A.Normal (Default) only matching rows from both
master and detail Master outer all detail rows and only
matching rows from master Detail outer all master
rows and only matching rows from detail
Full outer all rows from both master and detail ( matching or non matching)
Q.What are the joiner caches?
A.When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the
master source and builds index and data caches based on the master rows.
After building the caches, the Joiner transformation reads records from the detail source and performs joins.
Q.Why use the lookup transformation?
A. To perform the following tasks.
Get a related value. For example, if your source table includes employee ID, but you want to include the
employee name in your target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per
invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records
already exist in the target.

http://shaninformatica.blogspot.com/

51/71

12/23/2015

Informatica Question Answer

Differences Between Connected and Unconnected Lookups


Connected Lookup

Unconnected
Lookup
Receives input
values
from the result of
a
:LKP expression in
another
transformation.

Receives input values directly from the pipeline.

You can use a dynamic or static


cache.

You can use a


static
cache.
Cache includes all

Cache includes all lookup columns used in the mapping (that is, lookup source columns
included in the lookup
condition and lookup source colum
s linke

Informatica Question

Classic Flipcard Magazine Mosaic Sidebar

tions).
search

lookup/output
ports
in the lookup
condition and the
lookup/return port.
Designate one
return
port (R). Returns
one
column from each
row.

Snapshot Timeslide

Can return multiple columns from the same row or insert into the
dynamic lookup cache.

If there is no match for the lookup condition, the PowerCenter


Server
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads

ica_Interview_Question.doc]

returns the default value for all output ports. If you configure dynamic
caching, t
into the cache or leaves it
unchanged.

Center Server inserts


rows

If there is a match for the lookup condition, the PowerCenter Server returns the result of the
lookup condition for all
lookup/output ports. If you configure dynamic caching, the PowerCenter Server either updates
the row the in the
cache or leaves the row
unchanged.

If there is no
match
for the lookup
condition, the
PowerCenter
Server
returns NULL.
If there is a match
for
the lookup
condition,
the PowerCenter
Server returns the
result of the
lookup
condition into the
return port.
Pass one output

Pass multiple output values to another transformation. Link lookup/output ports to another
transformation.

value to another
transformation.
The
lookup/output/retu
rn
port passes the
value
to the
transformation
calling :LKP
expression.
Does not support

Supports userdefined default


values.

userdefined
default
values.

Q. What is meant by lookup caches?


A.The Informatica server builds a cache in memory when it processes the first row of a data in a
cached look up transformation. It allocates memory for the cache based on the amount u configure in
the transformation or session properties. The Informatica server stores condition values in the index
cache and output values in the data cache.
Q. What r the types of lookup caches?
A. Persistent cache: U can save the lookup cache files and reuse them the next time the Informatica server processes
a lookup
transformation configured to use the cache.
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]

Recache from database: If the persistent cache is not synchronized with the lookup table, you can configure the lookup
Dynamic Views template. Powered by Blogger.

transformation to rebuild the lookup cache.

Static cache: U can configure a static or readonly cache for only lookup table. By default
Informatica server creates a static cache. It caches the lookup table and lookup values in the cache
for each row that comes into the transformation. When the lookup condition is true, the Informatica
server does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target,
you can create a look up transformation to use dynamic cache. The Informatica server dynamically
inserts data to the target table.
Shared cache: U can share the lookup cache between multiple transactions. You can share
unnamed cache between transformations in the same mapping.
http://shaninformatica.blogspot.com/

52/71

12/23/2015

Informatica Question Answer

Q. What r the types of lookup caches?


A. Persistent cache: U can save the lookup cache files and reuse them the next time the Informatica
server processes
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] a lookup
transformation configured to use the cache.
Recache from database: If the persistent cache is not synchronized with the lookup table, you
can configure the lookup transformation to rebuild the lookup cache.
Static cache: U can configure a static or readonly cache for only lookup table. By default
Informatica server creates a static cache. It caches the lookup table and lookup values in the cache
for each row that comes into the transformation. When the lookup condition is true, the Informatica
server does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target,
you can create a look up transformation to use dynamic cache. The Informatica server dynamically
inserts data to the target table.
Shared cache: U can share the lookup cache between multiple transactions. You can share
unnamed cache between transformations in the same mapping.

Q: What do you know about Informatica and ETL?


A: Informatica is a very useful GUI based ETL tool.
Q: FULL and DELTA files. Historical and Ongoing load.
A: FULL file contains complete data as of today including history data, DELTA file contains only the
changes since last extract.
Q: Power Center/ Power Mart which products have you worked with?
A: Power Center will have Global and Local repository, whereas Power Mart will have only Local
repository.
Q: Explain what are the tools you have used in Power Center and/or Power Mart?
A: Designer, Server Manager, and Repository Manager.

Q: What is a Mapping?
A: Mapping Represent the data flow between source and target
Q: What are the components must contain in Mapping?
A: Source definition, Transformation, Target Definition and Connectors
Q: What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data. Transformation
performs specific function. They are two types of transformations:
1. Active

Rows, which are affected during the transformation or can change the no of rows that
pass through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source
qualifier, Update Strategy, ERP Source Qualifier, Advance External Procedure.
2. Passive

Does not change the number of rows that pass through it. Eg: Expression, External
Procedure, Input, Lookup, Stored Procedure, Output, Sequence Generator, XML Source
Qualifier.
Q: Which transformation can be overridden at the Server?
A: Source Qualifier and Lookup Transformations
Q: What is connected and unconnected Transformation and give Examples?
Q: What are Options/Type to run a Stored Procedure?
A:
Normal: During a session, the stored procedure runs where the transformation exists in the
mapping on a rowbyrow basis. This is useful for calling the stored procedure for each row of data
that passes through the mapping, such as running a calculation against an input port. Connected
stored procedures run only in normal mode.
Preload of the Source. Before the session retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of tables or performing joins of data in a
temporary table.
Postload of the Source. After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
Preload of the Target. Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.

Postload of the Target. After the session sends data to the target, the stored procedure runs. This
is useful for recreating indexes on the database.
http://shaninformatica.blogspot.com/

53/71

12/23/2015

Informatica Question Answer


It must contain at least one Input and one Output port.
Q: What kinds of sources and of targets can be used in Informatica?

A:
Sources may be Flat file, relational db or XML.

Target may be relational tables, XML or flat files.


Q:Transformations: What are the different transformations you have worked with?
A:
Source Qualifier (XML, ERP, MQ)
Joiner
Expression
Lookup
Filter
Router
Sequence Generator
Aggregator
Update Strategy
Stored Proc
External Proc
Advanced External Proc
Rank
Normalizer
Q: What are active/passive transformations?
A: Passive transformations do not change the nos. of rows passing through it whereas active
transformation changes the nos. rows passing thru it.
Active: Filter, Aggregator, Rank, Joiner, Source Qualifier
Passive: Expression, Lookup, Stored Proc, Seq. Generator
Q: What are connected/unconnected transformations?
A:
Connected transformations are part of the mapping pipeline. The input and output ports
are connected to other transformations.
Unconnected transformations are not part of the mapping pipeline. They are not linked in the map
with any input or output ports. Eg. In Unconnected Lookup you can pass multiple values to
unconnected transformation but only one column of data will be returned from the transformation.
Unconnected: Lookup, Stored Proc.
Q:In target load ordering, what do you order Targets or Source Qualifiers?
A: Source Qualifiers. If there are multiple targets in the mapping, which are populated from multiple
sources, then we can use Target Load ordering.
Q: Have you used constraintbased load ordering? Where do you set this?
A: Constraint based loading can be used when you have multiple targets in the mapping and the target
tables have a PKFK relationship in the database. It can be set in the session properties. You have to set
the Source Treat Rows as: INSERT and check the box Constraint based load ordering in Advanced
Tab.
Q: If you have a FULL file that you have to match and load into a corresponding table, how will you go
about it? Will you use Joiner transformation?
A: Use Joiner and join the file and Source Qualifier.
Q: If you have 2 files to join, which file will you use as the master file?
A: Use the file with lesser nos. of records as master file.
Q: If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses
the NEXTVAL port, what value will each target get?
A: Each target will get the value in multiple of 3.
Q: Have you used the Abort, Decode functions?
A: Abort can be used to Abort / stop the session on an error condition.
If the primary key column contains NULL, and you need to stop the session from continuing then you
may use ABORT function in the default value for the port. It can be used with IIF and DECODE function
to Abort the session.
Q: Have you used SQL Override?
A: It is used to override the default SQL generated in the Source Qualifier / Lookup transformation.
Q: If you make a local transformation reusable by mistake, can you undo the reusable action?
A: No
Q: What is the difference between filter and router transformations?
A: Filter can filter the records based on ONE condition only whereas Router can be used to filter records
on multiple condition.
Q: Lookup transformations: Cached/uncached
A: When the Lookup Transformation is cached the Informatica Server caches the data and index. This is
done at the beginning of

http://shaninformatica.blogspot.com/

54/71

12/23/2015

Informatica Question Answer


the session before reading the first record from the source. If the Lookup is uncached then the
Informatica reads the data from the database for every record coming from the Source Qualifier.
Q: Connected/unconnected if there is no match for the lookup, what is returned?
A: Unconnected Lookup returns NULL if there is no matching record found in the Lookup transformation.
Q: What is persistent cache?
A: When the Lookup is configured to be a persistent cache Informatica server does not delete the cache
files after completion of the session. In the next run Informatica server uses the cache file from the
previous session.
Q: What is dynamic lookup strategy?
A: The Informatica server compares the data in the lookup table and the cache, if there is no matching
record found in the cache file then it modifies the cache files by inserting the record. You may use only
(=) equality in the lookup condition.
If multiple matches are found in the lookup then Informatica fails the session. By default the
Informatica server creates a static cache.
Q: Mapplets: What are the 2 transformations used only in mapplets?
A: Mapplet Input / Source Qualifier, Mapplet Output
Q: Have you used Shortcuts?
A: Shortcuts may used to refer to another mapping. Informatica refers to the original mapping. If any
changes are made to the mapping / mapplet, it is immediately reflected in the mapping where it is
used.
Q: If you used a database when importing sources/targets that was dropped later on, will your mappings
still be valid?
A: No
Q: In expression transformation, how can you store a value from the previous row?
A: By creating a variable in the transformation.
Q: How does Informatica do variable initialization? Number/String/Date
A: Number 0, String blank, Date 1/1/1753
Q: Have you used the Informatica debugger?
A: Debugger is used to test the mapping during development. You can give breakpoints in the mappings
and analyze the data.
Q: What do you know about the Informatica server architecture? Load Manager, DTM, Reader, Writer,
Transformer.
A:
Load Manager is the first process started when the session runs. It checks for validity of mappings,
locks sessions and other objects.
DTM process is started once the Load Manager has completed its job. It starts a thread for each
pipeline.
Reader scans data from the specified sources.
Writer manages the target/output data.
Transformer performs the task specified in the mapping.
Q:Have you used partitioning in sessions? (not available with Powermart)
A: It is available in PowerCenter. It can be configured in the session properties.
Q: Have you used External loader? What is the difference between normal and bulk loading?
A: External loader will perform direct data load to the table/data files, bypass the SQL layer and will not
log the data. During normal data load, data passes through SQL layer, data is logged in to the archive log
file and as a result it is slow.
Q: Do you enable/disable decimal arithmetic in session properties?
A: Disabling Decimal Arithmetic will improve the session performance but it converts numeric values
to double, thus leading to reduced accuracy.
Q: When would use multiple update strategy in a mapping?
A: When you would like to insert and update the records in a Type 2 Dimension table.
Q: When would you truncate the target before running the session?
A: When we want to load entire data set including history in one shot. Update strategy do not have
dd_update, dd_delete and it does only dd_insert.
Q: How do you use stored proc transformation in the mapping?
A: In side mapping we can use stored procedure transformation, pass input parameters and get back the
output parameters. When handling through session, it can be invoked either in Presession or post
session scripts.
Q: What did you do in the stored procedure? Why did you use stored proc instead of using expression?
A:
Q: When would you use SQ, Joiner and Lookup?
A:
If we are using multiples source tables and they are related at the database, then we can use a
single SQ.

If we need to Lookup values in a table or Update Slowly Changing Dimension tables then we can use
Lookup transformation.
Joiner is used to join heterogeneous sources, e.g. Flat file and relational tables.
Q:How do you create a batch load? What are the different types of batches?
A: Batch is created in the Server Manager. It contains multiple sessions. First create sessions and then
create a batch. Drag the
http://shaninformatica.blogspot.com/

55/71

12/23/2015

Informatica Question Answer


sessions into the batch from the session list window.
Batches may be sequential or concurrent. Sequential batch runs the sessions sequentially. Concurrent
sessions run parallel thus optimizing the server resources.
Q: How did you handle reject data? What file does Informatica create for bad data?
A: Informatica saves the rejected data in a .bad file. Informatica adds a row identifier for each record
rejected indicating whether the row was rejected because of Writer or Target. Additionally for every
column there is an indicator for each column specifying whether the data was rejected due to overflow,
null, truncation, etc.
Q: How did you handle runtime errors? If the session stops abnormally how were you managing the
reload process?
Q: Have you used pmcmd command? What can you do using this command?
A:pmcmd is a command line program. Using this command
You can start sessions
Stop sessions
Recover session
Q: What are the two default repository user groups
A: Administrators and Public
Q: What are the Privileges of Default Repository and Extended Repository user?
A:
Default
Repository
Privileges o
Use
Designer

o Browse Repository
o Create Session and Batches
Extended Repository Privileges
o Session Operator
o Administer Repository
o Administer Server
o Super User
Q:How many different locks are available for repository objects
A:There are five kinds of locks available on repository objects:
Read lock. Created when you open a repository object in a folder for which you do not have write
permission. Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you have write
permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server starts a
scheduled session or batch.
Fetch lock. Created when the repository reads information about repository objects from the
database.
Save lock. Created when you save information to the repository.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and sends postsession
email when the session completes.
Q: What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform data, and
handle pre and postsession operations.
Q: When the Informatica Server runs a session, what are the tasks handled?
A:
Load Manager (LM):
o LM locks the session and reads session properties.
o LM reads the parameter file.
o LM expands the server and session variables and parameters.
o LM verifies permissions and privileges.
o LM validates source and target code pages.
o LM creates the session log file.
o LM creates the DTM (Data Transformation Manager) process.
Data Transformation Manager (DTM):

o DTM process allocates DTM process memory.


o DTM initializes the session and fetches the mapping.
o DTM executes presession commands and procedures.
o DTM creates reader, transformation, and writer threads for each source pipeline. If the
pipeline is partitioned, it creates a set of threads for each partition.
o DTM executes postsession commands and procedures.
o DTM writes historical incremental aggregation and lookup data to disk, and it writes
persisted sequence values and mapping variables to the repository.
o Load Manager sends postsession email
Q: What is Code Page?
A: A code page contains the encoding to specify characters in a set of one or more languages.
Q: How to handle the performance in the server side?
A: Informatica tool has no role to play here. The server administrator will take up the issue.

http://shaninformatica.blogspot.com/

56/71

12/23/2015

Informatica Question Answer


Q: What are the DTM (Data Transformation Manager) Parameters?
A:
DTM Memory parameter Default buffer block size/Data & Index Cache size ,
Reader Parameter Line Sequential buffer length for flat files,
General Parameter Commit Interval (source and Target)/ Others Enabling Lookup cache,
Event based Scheduling Indicator file to wait for.
1. Explain about your projects
Architecture
Dimension and Fact tables
Sources and Targets
Transformations used
Frequency of populating data
Database size
2. What is dimension modeling?
Unlike ER model the dimensional model is very asymmetric with one large central table called as
fact table connected to multiple dimension tables .It is also called star schema.
3. What are mapplets?
Mapplets are reusable objects that represents collection of transformations
Transformations not to be included in mapplets are
Cobol source
definitions Joiner
transformations
Normalizer
Transformations
Nonreusable
sequence
generator
transformations Pre or post session
procedures
Target definitions
XML Source
definitions
IBM MQ source definitions
Power mart 3.5 style Lookup functions
4. What are the transformations that use cache for performance?
Aggregator, Lookups, Joiner and Ranker
5. What the active and passive transformations?
An active transformation changes the number of rows that pass
through the mapping. 1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy
6. Aggregator
7. Advanced External procedure
8. Normalizer
9. Joiner
Passive transformations do not change the number of rows that pass
through the mapping. 1. Expressions
2. Lookup
3. Stored procedure
4. External procedure
5. Sequence generator
6. XML Source qualifier

6. What is a lookup transformation?


Used to look up data in a relational table, views, or synonym, The informatica server queries the
lookup table based on the lookup ports in the transformation. It compares lookup transformation port
values to lookup table column values based on the lookup condition. The result is passed to other
transformations and the target.
Used to :
Get related value
Perform a
calculation
Update slowly changing dimension tables.
Diff between connected and unconnected lookups. Which is better?
Connected :
Received input values directly from
the pipeline Can use Dynamic or static
cache.
Cache includes all lookup columns used in
the mapping Can return multiple columns
from the same row
If there is no match , can return
default values Default values can be
specified.

Un connected :
Receive input values from the result of a LKP expression in another
transformation. Only static cache can be used.
http://shaninformatica.blogspot.com/

57/71

12/23/2015

Informatica Question Answer


Cache includes all lookup/output ports in the lookup condition and lookup or return port.
Can return only one column from each row.
If there is no match it returns null.
Default values cannot be specified.
Explain various caches :
Static:
Caches the lookup table before executing the transformation. Rows are not
added dynamically. Dynamic:
Caches the rows as and when it
is passed. Unshared:
Within the mapping if the lookup table is used in more than one transformation then the cache built
for the first lookup can be used for the others. It cannot be used across mappings.
Shared:
If the lookup table is used in more than one transformation/mapping then the cache built for the first
lookup can be used for the others. It can be used across mappings.
Persistent :
If the cache generated for a Lookup needs to be preserved for subsequent use then persistent cache
is used. It will not delete the index and data files. It is useful only if the lookup table remains
constant.
What are the uses of index and data caches?
The conditions are stored in index cache and records from the lookup are stored in data cache
7. Explain aggregate transformation?
The aggregate transformation allows you to perform aggregate calculations, such as averages, sum,
max, min etc. The aggregate transformation is unlike the Expression transformation, in that you can
use the aggregator transformation to perform calculations in groups. The expression transformation
permits you to perform calculations on a rowbyrow basis only.
Performance issues ?
The Informatica server performs calculations as it reads and stores necessary data group and row
data in an aggregate cache. Create Sorted input ports and pass the input records to aggregator in
sorted forms by groups then by port
Incremental aggregation?
In the Session property tag there is an option for performing incremental aggregation. When the
Informatica server performs incremental aggregation , it passes new source data through the mapping
and uses historical cache (index and data cache) data to perform new aggregation calculations
incrementally.
What are the uses of index and data cache?
The group data is stored in index files and Row data stored in data files.
8. Explain update strategy?
Update strategy defines the sources to be flagged for insert, update, delete, and reject at the targets.
What are update strategy constants?
DD_INSERT,0 DD_UPDATE,1 DD_DELETE,2
DD_REJECT,3
If DD_UPDATE is defined in update strategy and Treat source rows as INSERT in Session . What
happens?
Hints: If in Session anything other than DATA DRIVEN is mentions then Update strategy in the mapping
is ignored.
What are the three areas where the rows can be flagged for particular treatment?
In mapping, In Session treat Source Rows and In Session Target Options.
What is the use of Forward/Reject rows in Mapping?
9. Explain the expression transformation ?
Expression transformation is used to calculate values in a single row before
writing to the target. What are the default values for variables?
Hints: Straing = Null, Number = 0, Date = 1/1/1753
10.Difference between Router and filter transformation?
In filter transformation the records are filtered based on the condition and rejected rows are
discarded. In Router the multiple conditions are placed and the rejected rows can be assigned to a
port.
How many ways you can filter the records?
1.Source Qualifier
2.Filter transformation
3.Router transformation
4.Ranker
5.Update strategy
.
11. How do you call stored procedure and external procedure transformation ?
External Procedure can be called in the Presession and post session tag in the
Session property sheet. Store procedures are to be called in the mapping designer
by three methods
1.Select the icon and add a Stored procedure transformation
2.Select transformation Import Stored Procedure
3.Select Transformation Create and then select stored procedure.

12.Explain Joiner transformation and where it is used?


While a Source qualifier transformation can join data originating from a common source database, the
joiner transformation
http://shaninformatica.blogspot.com/

58/71

12/23/2015

Informatica Question Answer


joins two related heterogeneous sources residing in different locations or file systems.
Two relational tables existing in separate
databases Two flat files in different file
systems.
Two different ODBC sources
In one transformation how many sources can be coupled?
Two sources can be couples. If more than two is to be couples add another Joiner in the hierarchy.
What are join options?
Normal
(Default)
Master Outer
Detail Outer
Full Outer

13. Explain Normalizer transformation?


The normaliser transformation normalises records from COBOL and relational sources, allowing you to
organise the data according to your own needs. A Normaliser transformation can appear anywhere in
a data flow when you normalize a relational source. Use a Normaliser transformation instead of the
Source Qualifier transformation when you normalize COBOL source. When you drag a COBOL source
into the Mapping Designer Workspace, the Normaliser transformation appears, creating input and
output ports for every columns in the source.
14. What is Source qualifier transformation?
When you add relational or flat file source definition to a mapping , you need to connect to a source
Qualifier transformation. The source qualifier represents the records that the informatica server
reads when it runs a session.
Join Data originating from the same source database.
Filter records when the Informatica server reads the
source data. Specify an outer join rather than the
default inner join.
Specify sorted ports
Select only distinct values from the source
Create a custom query to issue a special SELECT statement for the Informatica server to read the
source data.

15. What is Ranker transformation?


Filters the required number of records from the top or from the bottom.
16. What is target load option?
It defines the order in which informatica server loads the data
into the targets. This is to avoid integrity constraint violations

17.How do you identify the bottlenecks in Mappings?


Bottlenecks
can
occur in 1. Targets
The most common performance bottleneck occurs when the informatica server
writes to a target database. You can identify target bottleneck by configuring
the session to write to a flat file target. If the session performance increases
significantly when you write to a flat file, you have a target
bottleneck.
Solution :
Drop or Disable index or
constraints Perform bulk load
(Ignores Database log)
Increase

commit

interval

(Recovery

is

compromised) Tune the database for RBS,


Dynamic Extension etc.,

2. Sources
Set a filter transformation after each SQ and see the records
are not through. If the time taken is same then there is a
problem.
You can also identify the Source problem by
Read Test Session where we copy the mapping with sources, SQ and remove
all transformations and connect to file target. If the performance is same then
there is a Source bottleneck.
Using database query Copy the read query directly from the log. Execute
the query against the source database with a query tool. If the time it takes
to execute the query and the time to fetch the first row are significantly
different, then the query can be modified using optimizer hints.
Solutions:
Optimize Queries using
hints. Use indexes
wherever possible.
3. Mapping
If both Source and target are OK then problem could be in mapping.
Add a filter transformation before target and if the time is the same then
there is a problem. (OR) Look for the performance monitor in the Sessions
property sheet and view the counters.

Solutions:
If High error rows and rows in lookup cache indicate a
mapping bottleneck. Optimize Single Pass Reading:
Optimize Lookup
transformation : 1.
Caching the lookup table:

When caching is enabled the informatica server caches the lookup


table and queries the cache during the session. When this option is not
enabled the server queries the lookup table on a rowby row basis.
http://shaninformatica.blogspot.com/

59/71

12/23/2015

Informatica Question Answer


Static, Dynamic, Shared, Unshared and Persistent cache
2. Optimizing the lookup condition
Whenever multiple conditions are placed, the condition with equality
sign should take precedence.
3.
Indexing the lookup table
The cached lookup table should be indexed on order by columns. The
session log contains the ORDER BY statement
The uncached lookup since the server issues a SELECT statement for
each row passing into lookup transformation, it is better to index the
lookup table on the columns in the condition
Optimize Filter transformation:
You can improve the efficiency by filtering early in the data flow. Instead of
using a filter transformation halfway through the mapping to remove a
sizable amount of data.

Use a source qualifier filter to remove those same rows at the source,
If not possible to move the filter into SQ, move the filter
transformation as close to the source
qualifier as possible to remove unnecessary data early in
the data flow. Optimize Aggregate transformation:
1.Group by simpler columns. Preferably numeric columns.
2. Use Sorted input. The sorted input decreases the use of aggregate
caches. The server assumes all input data are sorted and as it reads
it performs aggregate calculations.
3. Use incremental aggregation in session property sheet.
Optimize Seq. Generator transformation:
1. Try creating a reusable Seq. Generator transformation and use it in multiple mappings
2.The number of cached value property determines the number of
values the informatica server caches at one time.
Optimize Expression transformation:
1.Factoring out common logic
2.Minimize aggregate function calls.
3.Replace common subexpressions with local variables.
4. Use operators instead of functions.
4. Sessions
If you do not have a source, target, or mapping bottleneck, you may have a
session bottleneck. You can identify a session bottleneck by using the
performance details. The informatica server creates performance details
when you enable Collect Performance Data on the General Tab of the
session properties.
Performance details display information about each Source Qualifier,
target definitions, and individual transformation. All transformations have
some basic counters that indicate the Number of input rows, output rows,
and error rows.
Any value other than zero in the readfromdisk and writetodisk counters for
Aggregate, Joiner, or Rank transformations indicate a session bottleneck.
Low bufferInput_efficiency and BufferOutput_efficiency counter also
indicate a session bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.
5.System (Networks)
18.How to improve the Session performance?
1 Run concurrent sessions
2 Partition session (Power center)
3.Tune Parameter DTM buffer pool, Buffer block size, Index cache size, data cache size,
Commit Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM
can be increased. The informatica server uses the index and data caches for
Aggregate, Rank, Lookup and Joiner transformation. The server stores the
transformed data from the above transformation in the data cache before
returning it to the data flow. It stores group information for those
transformations in index cache.
If the allocated data or index cache is not large enough to store the date, the
server stores the data in a temporary disk file as it processes the session data.
Each time the server pages to the disk the performance slows. This can be seen
from the counters .
Since generally data cache is larger than the index cache, it has to be more than the index.
4.Remove Staging area
5.Tune off Session recovery
6.Reduce error tracing
19.What are tracing levels?
Normaldefault
Logs initialization and status information, errors encountered, skipped rows due to transformation
errors, summarizes session results but not at the row level.
Terse
Log initialization, error messages, notification of
rejected data. Verbose Init.
In addition to normal tracing levels, it also logs additional initialization information, names of index
and data files used and detailed transformation statistics.
Verbose Data.
In addition to Verbose init, It records row level logs.

http://shaninformatica.blogspot.com/

60/71

12/23/2015

Informatica Question Answer


20. What is Slowly changing dimensions?
Slowly changing dimensions are dimension tables that have slowly increasing data as well as
updates to existing data. 21. What are mapping parameters and variables?
A mapping parameter is a user definable constant that takes up a value before running a
session. It can be used in SQ expressions, Expression transformation etc.
Steps:
Define the parameter in the mapping designer
parameter & variables . Use the parameter in the
Expressions.
Define the values for the parameter in the parameter file.
A mapping variable is also defined similar to the parameter except that the value of the
variable is subjected to change. It picks up the value in the following order.
1.From the Session parameter file
2.As stored in the repository object in the previous run.
3.As defined in the initial values in the designer.
4.Default values
Q. What are the output files that the Informatica server creates during the session running?
Informatica server log: Informatica server (on UNIX) creates a log for all status and error messages
(default name: pm.server.log). It also creates an error log for error messages. These files will be created
in Informatica home directory
Session log file: Informatica server creates session log file for each session. It writes information about
session into log files such as initialization process, creation of sql commands for reader and writer
threads, errors encountered and load summary. The amount of detail in session log file depends on the
tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping. Session detail includes
information such as table name, number of rows written or rejected. You can view this file by double
clicking on the session in monitor window. Performance detail file: This file contains information known
as session performance details which helps you where performance can be improved. To generate this
file select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when you run a session that uses the
external loader. The control file contains the information about the target flat file such as data format
and loading instructions for the external loader. Post session email: Post session email allows you to
automatically communicate information about a session run to designated recipients. You can create
two different messages. One if the session completed successfully the other if the session fails.
Indicator file: If you use the flat file as a target, you can configure the Informatica server to create
indicator file. For each target row, the indicator file contains a number to indicate whether the row was
marked for insert, update, delete or reject.
Output file: If session writes to a target file, the Informatica server creates the target file based on file
properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also creates cache files.
For the following circumstances Informatica server creates index and data cache files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q.What is the difference between joiner transformation and source qualifier transformation?
A.You can join heterogeneous data sources in joiner transformation which we cannot do in source qualifier
transformation.
Q.What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the first row of a data in a cached
look up transformation. It allocates memory for the cache based on the amount you configure in the
transformation or session properties. The Informatica server stores condition values in the index cache
and output values in the data cache.
Q.What is meant by parameters and variables in Informatica and how it is used?
A. Parameter: A mapping parameter represents a constant value that you can define before
running a session. A mapping parameter retains the same value throughout the entire session.
Variable: A mapping variable represents a value that can change through the session. Informatica Server
saves the value of a mapping variable to the repository at the end of each successful session run and
uses that value the next time you run the session
Q.What is target load order?
You specify the target load order based on source qualifiers in a mapping. If you have multiple
source qualifiers connected to multiple targets, you can define the order in which Informatica server
loads data into the targets
nformatica is a leading data integration software. The products of the company support various enterprisewide data
integration and data quality solutions including data warehousing, data migration, data consolidation, data
synchronization, data governance, master data management, and crossenterprise data integration.
The important Informatica Components are:
Power Exchange
Power Center

Power Center Connect


Power Exchange
Power Channel
Metadata Exchange
Power Analyzer
http://shaninformatica.blogspot.com/

61/71

12/23/2015

Informatica Question Answer


Super Glue
This section will contain some useful tips and tricks for optimizing informatica performance. This includes some of the
real time problems or errors and way to troubleshoot them, best prcatices etc.
Q1: Introduce Yourself. []
Re: What is incremental aggregation and how it is done?
Answer

When
incremental aggregation, you apply captured
using
[http://www.allinterview.com/viewpost/261896.html] changes in the source to aggregate calculations in a
#4

session. If the source changes only incrementally and you


can capture changes, you can configure the session to
process only those changes. This allows

the Informatica

Server to

update your target incrementally, rather than

forcing it

to process the entire source

same calculations each time you run the

and recalculate the


session.

Q2: What is datawarehousing?


a collection of data [http://www.webopedia.com/TERM/D/data.html] designed to support management decision making. Data
warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time.
Development of a data warehouse includes development of systems to extract data from operating systems plus
installation of a warehousedatabase system [http://www.webopedia.com/TERM/D/database_management_system_DBMS.html] that
provides managers flexible access to the data.
The term data warehousing generally refers to the combination of many different databases across an entire enterprise.
Contrast with data mart [http://www.webopedia.com/TERM/D/data_mart.html] .
Q3: What is the need of datawarehousing?

Q4: Diff b/w OLTP & OlAP

OLTP
Current data
Short database transactions
Online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary

OLAP
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary

Q5: Why do we use OLTP & OLAP


Q6: How to handle decimal in informatica while using flatfies?
while importing flat file definetion just specify the scale for a neumaric data type. in the mapping, the flat file source supports
only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that
number port of the source.

Q7: Why do we use update stratgey?


Seession Properties like pre Souurce Rows
INSERT,UPDATE,REJECT,DELETE ,,
Using Session Properties We can do single flow only.

SCD aplicable for Insert,Update,,at a time using Update


Strategy trans only.
http://shaninformatica.blogspot.com/

62/71

12/23/2015

Informatica Question Answer


Using Update Trans we can creat SCD mapping easily.

Actually its important to use a update strategy transofmration in the


SCD's as SCDs maintain some historical data specially type 2
dimensions. In this case we may need to flag rows from the same
target for different database operations. Hence we have no choice but
to use update strategy as at session level this will not be possible.

Q8: Can we use update strategy in flatfiles?


Data in flat file cannot be updated

Q9: If yes why? If not why?


Q10: What is junk dimension?
A junk dimension is a collection of random transactional codes or text attributes that are unrelated to any particular dimension.
The junk dimension is simply a structure that provides a convenient place to store the junk attributes.

Q11 Diff between iif and decode?


You can use nested IIF statements to test multiple conditions. The following example tests for various
conditions and returns 0 if sales is zero or negative:
IIF( SALES > 0 IIF( SALES < 50 SALARY1 IIF( SALES < 100 SALARY2 IIF( SALES < 200 SALARY3 BONUS))) 0 )
You can use DECODE [http://www.geekinterview.com/question_details/20702] instead of IIF in many cases. DECODE may improve
readability. The following shows how you can use DECODE instead of IIF :
SALES > 0 and SALES < 50 SALARY1
SALES > 49 AND SALES < 100 SALARY2 []
SALES > 99 AND SALES < 200 SALARY3

Q12 Diff b/w corelated subquery and nested subquery


Correlated subquery runs once for each row selected by the outer query. It contains a reference to a value from the
row selected by the outer query.
Nested subquery runs only once for the entire nesting (outer) query. It does not contain any reference to the outer query
row. For example
Correlated Subquery:
select e1.empname e1.basicsal e1.deptno from emp e1 where e1.basicsal (select max(basicsal) from emp e2 where
e2.deptno e1.deptno)
Nested Subquery:
select empname basicsal deptno from emp where (deptno basicsal) in (select deptno max(basicsal) from emp group by deptno)

Q13: What is Union?


The Union transformation is a multiple input group transformation that you use to merge data from multiple pipelines or
pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement
to combine the results from two or more SQL statements. Similar to the UNION ALL statement the
Union transformation does not remove duplicate rows.
The Integration Service processes all input groups in parallel. The Integration Service concurrently reads sources
connected to the Union transformation and pushes blocks of data into the input groups of the transformation. The
Union transformation processes the blocks of data based on the order it receives the blocks from the Integration Service.
You can connect heterogeneous sources to a Union transformation. The Union transformation merges sources with matching
ports and outputs the data from one output group with the same ports as the input groups.

Q14: How to use union?


what is the difference between star schema and Snowflake Schema
Star Schema : Star Schema is a relational database schema for representing multimensional data. It is the simplest form of
data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entityrelationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple
dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The
advantage of star schema are slicing down performance increase and easy understanding of data.
Snowflake Schema : A snowflake schema is a term that describes a star schema structure normalized through the use
of outrigger tables. i.e dimension table hierachies are broken into simpler tables.

http://shaninformatica.blogspot.com/

63/71

12/23/2015

Informatica Question Answer


In a star schema every dimension will have a primary key.

In a star schema a dimension table will not have any parent table.
Whereas in a snow flake schema a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from
topmost hierachies to the lowermost hierarchies.

Q15: How many data sources are available?


Q16: What is scd:
scdslowly changing dimension
It is the capturing the slowly changing data which changes
very slowly with respect to the time. for example: the address
of a custumer may change in rare case. the address of a
custumer never changes frequently.

there are 3 types of scd. type1 here the most recent


changed data is stored
type2 here the recent data as well as all past data
(historical data) is stored
trpe3 here partially historical data and recent data are stored. it
mean it stores most recent update and most recent history.
As datawarehouse is a historical data, so type2 is more
usefull for it.
Q17: Types of scd
Q18: How can we improve the session performance?
Re: How the informatica server increases the session performance through partitioning the source?
Answer
#1
[http://allinterview.com/viewpost/10395.html]

For a relational sources informatica server creates multiple


connections for each parttion of a single source and extracts
seperate range of data for each connection. Informatica server
reads multiple partitions of a single source concurently. Similarly
for loading also informatica server creates multiple connections
to the target and loads partitions of data concurently.

For XML and file sources,informatica server reads multiple files


concurently. For loading the data informatica server creates a
seperate file for each partition(of a source file).U can choose to
merge the targets.

Q19:What do you mean by informatica?

Q20: Diff b/w dimensions and fact table


Dimension Table features
1. It provides the context /descriptive information for a fact table measurements.

http://shaninformatica.blogspot.com/

64/71

12/23/2015

Informatica Question Answer


2. Provides entry points to data.
3. Structure of Dimension Surrogate key one or more other fields that compose the natural key (nk) and set of Attributes.
4. Size of Dimension Table is smaller than Fact Table.
5. In a schema more number of dimensions are presented than Fact Table.
6. Surrogate Key is used to prevent the primary key (pk) violation(store historical data).
7. Values of fields are in numeric and text representation.
Fact Table features
1. It provides measurement of an enterprise.
2. Measurement is the amount determined by observation.
3. Structure of Fact Table foreign key (fk) Degenerated Dimension and Measurements.
4. Size of Fact Table is larger than Dimension Table.
5. In a schema less number of Fact Tables observed compared to Dimension Tables.
6. Compose of Degenerate Dimension fields act as Primary Key.
7. Values of the fields always in numeric or integer form.

Performance tuning in Informatica?


The goal of performance tuning is optimize session performance so sessions run during the available load window for
the Informatica Server.Increase the session performance by following.
The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB
per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session
performance.So aviod netwrok connections.
Flat files: If ur flat files stored on a machine other than the informatca server, move those files to the machine that consists of
informatica server.
Relational datasources: Minimize the connections to sources ,targets and informatica server to
improve session performance.Moving target database into server system may improve session
performance.
Staging areas: If u use staging areas u force informatica server to perform multiple datapasses.
Removing of staging areas may improve session performance.
U can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may
improve session performance.
Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a
character value in one byte.Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select
statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size,which allows
data to cross the network at one time.To do this go to server manger ,choose server configure database connections.
If u r target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop
constraints and indexes before u run the session and rebuild them after completion of session.
Running a parallel sessions by using concurrent batches will also reduce the time of loading the
data.So concurent batches may also increase the session performance.
Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in
paralel pipe lines.
In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to improve session performance.
Aviod transformation errors to improve the session performance.
If the sessioin containd lookup transformation u can improve the session performance by enabling the look up cache.
If Ur session contains filter transformation ,create that filter transformation nearer to the sources
or u can use filter condition in source qualifier.
Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before
processing it.To improve session performance in this case use sorted ports option.

You can also perform the following tasks to optimize the mapping:
1. Configure singlepass reading.
2. Optimize datatype conversions.
3. Eliminate transformation errors.
4. Optimize transformations.
5. Optimize expressions.

RE: Why did you use stored procedure in your ETL Appli...
Click Here to view complete
document hi
http://shaninformatica.blogspot.com/

65/71

12/23/2015

Informatica Question Answer

usage of stored procedure has the following


advantages 1checks the status of the target database
2drops and recreates indexes
3determines if enough space exists in the
database 4performs aspecilized calculation
=======================================
Stored procedure in Informatica will be useful to impose complex business rules.
=======================================static cache:

1.static cache remains same during the session run 2.static


can be used to relational and falt file lookup types
3.static cache can be used to both unconnected and
connected lookup transformation
4.we can handle multiple matches in static cache
5.we can use other than relational operators like <,>,<=,>=
&=

Dynamic cache:

1.dynamic cache changes durig session run


2.dynamic cache can be used to only relational lookup types
3.Dynamic cache can be used to only connetced lookups

4.we cannot multiple matches in dynamic cache


5.we can use only = operator with dynamic cache.

Q.What is the difference between $ & $$ in mapping or parameter file? In which cases they are
generally used?

A. $ prefixes are used to denote session Parameter and variables and $$ prefixes are
used to denote mapping parameters and variables.
how to connect two or more table with single source qualifier?

create a Oracle source with how much ever column you want and
write the join query in SQL query override. But the

column order and data type should be same as in the SQL query.

A set of worlflow tasks is called worklet


Workflow tasks means
1)timer2)decesion3)command4)eventwait5)eventrise6)mail etc......
But we r use diffrent situations by using this only
=======================================
Worklet is a set of tasks. If a certain set of task has to be reused in many workflows then we use
worklets. To execute a Worklet it has to be placed inside a workflow.

The use of worklet in a workflow is similar to the use of mapplet in a mapping.


Worklet is reusable workflows. It might contain more than on task in it. We can use these worklets in

other workflows

Which will beter perform IIf or decode?


decode is better perform than iff condtion,decode can be
http://shaninformatica.blogspot.com/

66/71

12/23/2015

Informatica Question Answer


uesd insted of using multiple iff cases

DECODE FUNCTION YOU CAN FIND IN SQL BUT IIF FUNCTION IS NOT
IN SQL. DECODE FUNCTION WILL GIVE CLEAR READABILITY TO
UNDERSTAND THE LOGIC TO OTHER.

What is source qualifier transformation?


SQ is an active tramsformation. It performs one of the following task: to join data from the same source database to
filtr the rows when Power centre reads source data to perform an outer join to select only distinct values from the
source
In source qualifier transformatio a user can defined join conditons,filter the data and eliminating the duplicates. The
default source qualifier can over written by the above options, this is known as SQL Override.
The source qualifier represents the records that the informatica server reads when it runs a session.
When we add a relational or a flat file source definition to a mapping,we need to connect it to a source qualifier
transformation.The source qualifier transformation represents the records that the informatica server reads when it
runs a session.

How many dimension tables did you had in your project and name some dimensions (columns)?
Product Dimension : Product Key, Product id, Product Type, Product name, Batch Number.
Distributor Dimension: Distributor key, Distributor Id, Distributor Location,
Customer Dimension : Customer Key, Customer Id, CName, Age, status, Address, Contact
Account Dimension : Account Key, Acct id, acct type, Location, Balance,

What is meant by clustering?


It will join two (or more) tables in single buffer, will retrieve the data easily.

What are the rank caches?


the informatica server stores group information in an index catche and row data in data catche
when the server runs a session with a Rank transformation, it compares an input row with rows with rows in data cache. If the
input row outranks a stored row,the Informatica server replaces the stored row with the input row.
During the session ,the informatica server compares an inout row with rows in the datacache. If the input row outranks a stored
row, the informatica server replaces the stored row with the input row. The informatica server stores group information in an index
cache and row data in a data cache.
Q. What type of repositories can be created using Informatica Repository Manager?
A. Informatica PowerCenter includeds following type of repositories :
Standalone Repository : A repository that functions individually and this is unrelated to any other repositories.
Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a
domain. The objects are shared through global shortcuts.
Local Repository : Local repository is within a domain and its not a global repository. Local repository can connect to a global
repository using global shortcuts and can use objects in its shared folders.
Versioned Repository : This can either be local or global repository but it allows version control for the repository. A versioned
repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the
production environment.
Q. What is a code page? []
A. A code page contains encoding to specify characters in a set of one or more languages. The code page is selected based on source of the
data. For example if source contains Japanese text then the code page should be selected to support Japanese text.
When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data that describes the
characters the application recognizes. This influences the way that application stores, receives, and sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to? []
A. PowerCenter Server on Windows can connect to following databases:
IBM DB2
Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to? []
A. PowerCenter Server on UNIX can connect to following databases:
IBM DB2
Informix
Oracle
Sybase

http://shaninformatica.blogspot.com/

67/71

12/23/2015

Informatica Question Answer


Teradata

Infomratica Mapping Designer


Q. How to execute PL/SQL script from Informatica mapping? []
A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be
specified. Whenever the session is executed, the session will call the pl/sql procedure.
Q. How can you define a transformation? What are different types of transformations available in Informatica? []
A. A transformation is a repository object that generates, modifies, or passes data. The Designer provides a set of transformations that
perform specific functions. For example, an Aggregator transformation performs calculations on groups of data. Below are the various
transformations available in Informatica:
Aggregator
Application Source Qualifier
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Output
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override? []
A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session. When a
relational or a flat file source definition is added to a mapping, it is connected to a Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it runs the session. The default query is SELET
statement containing all the source columns. Source Qualifier has capability to override this default query by changing the default settings of
the transformation properties. The list of selected ports or the order they appear in the default query should not be changed in overridden query.
Q. What is aggregator transformation? []
A. The Aggregator transformation allows performing aggregate calculations, such as averages and sums. Unlike Expression Transformation,
the Aggregator transformation can only be used to perform calculations on groups. The Expression transformation permits calculations on a
rowbyrow basis only.
Aggregator Transformation contains group by ports that indicate how to group the data. While grouping the data, the aggregator
transformation outputs the last row of each group unless otherwise specified in the transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV,
SUM, VARIANCE.
Q. What is Incremental Aggregation? []
A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled.
When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform
new aggregation calculations incrementally.
Q. How Union Transformation is used? []
A. The union transformation is a multiple input group transformation that can be used to merge data from various sources (or pipelines).
This transformation works just like UNION ALL statement in SQL, that is used to combine result set of two SELECT statements.
Q. Can two flat files be joined with Joiner Transformation? []
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation? []
A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. It compares lookup transformation ports
(input ports) to the source column values based on the lookup condition. Later returned values can be passed to other transformations.
Q. Can a lookup be done on Flat Files? []
A. Yes.
Q. What is the difference between a connected look up and unconnected look up? []
A. Connected lookup takes input values directly from other transformations in the pipleline.
Unconnected lookup doesnt take inputs directly from any other transformation, but it can be used in any transformation (like expression) and
can be invoked as a function using :LKP expression. So, an unconnected lookup can be called multiple times in a mapping.
Q. What is a mapplet? []
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to
reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean? []
A. Reusable transformations can be used multiple times in a mapping. The reusable transformation is stored as a metadata separate from any
other mapping that uses the transformation. Whenever any changes to a reusable transformation are made, all the mappings where the
transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy? []
A. Informatica processes the source data rowbyrow. By default every row is marked to be inserted in the target table. If the row has to be

http://shaninformatica.blogspot.com/

68/71

12/23/2015

Informatica Question Answer


updated/inserted based on some logic Update Strategy transformation is used. The condition can be specified in Update Strategy to mark the
processed row for update or insert.
Following options are available for update strategy :
DD_INSERT : If this is used the Update Strategy flags the row for insertion. Equivalent numeric value of DD_INSERT is 0.
DD_UPDATE : If this is used the Update Strategy flags the row for update. Equivalent numeric value of DD_UPDATE is 1.
DD_DELETE : If this is used the Update Strategy flags the row for deletion. Equivalent numeric value of DD_DELETE is 2.
DD_REJECT : If this is used the Update Strategy flags the row for rejection. Equivalent numeric value of DD_REJECT is 3.

Re: What are Anti joins


Answer
#1
[http://www.allinterview.com/viewpost/310157.html]

Antijoins:
Antijoins are written using the NOT EXISTS or NOT IN
constructs. An antijoin between two tables returns rows from
the first table for which there are no corresponding rows in
the second table. In other words, it returns rows that fail to
match the subquery on the right side.

Suppose you want a list of departments with no employees.


You could write a query like this:
SELECT

d.department_name

FROM departments d
MINUS
SELECT

d.department_name

FROM departments d, employees e


WHERE d.department_id =
e.department_id ORDER BY
department_name;

The above query will give the desired results, but it might be
clearer to write the query using an antijoin:
SELECT

d.department_name

FROM departments d
WHERE NOT EXISTS (SELECT NULL
FROM employees e
WHERE e.department_id =
d.department_id) ORDER BY d.department_name;

Re: Without using any transformations how u can load the data into target?

if i were the candidate i would simply say if there are no


transformations to be done, i will simply run an insert script if
the source and target can talk to each other. or simply source
> source qualifier > target. if the interviewer says SQ is a
transformation, then say "then i dont know. i have always used
informatica when there is some kind of transformation involved
because that is what informatica is mainly used for".
What is a source qualifier?
What is a surrogate key?

http://shaninformatica.blogspot.com/

69/71

12/23/2015

Informatica Question Answer


What is difference between Mapplet and reusable
transformation? What is DTM session?
What is a Mapplet?
What is a look up function? What is default transformation for the look up function? What
is difference between a connected look up and unconnected look up?

What is up date strategy and what are the options for update
strategy? What is subject area?
What is the difference between truncate and delete statements?
What kind of Update strategies are normally used (Type 1, 2 & 3) & what are the
differences? What is the exact syntax of an update strategy?
What are bitmap indexes and how and why are they used?
What is bulk bind? How does it improve performance?

What are the different ways to filter rows using Informatica


transformations? What is referential Integrity error? How do you rectify it?
What is DTM process? What
is target load order?

What exactly is a shortcut and how do you use


it? What is a shared folder?
What are the different transformations where you can use a SQL override?
What is the difference between a Bulk and Normal mode and where exactly is it
defined? What is the difference between Local & Global repository?
What are data driven sessions?
What are the common errors while running a Informatica
session? What are worklets and what is their use?
What is change data capture?
What exactly is tracing level?

What is the difference between constraints based load ordering and target load
plan? What is a deployment group and what is its use?
When and how a partition is defined using Informatica? How
do you improve performance in an Update strategy?

How do you validate all the mappings in the repository at once?


How can you join two or more tables without using the source qualifier override SQL or a Joiner transformation? How
can you define a transformation? What are different types of transformations in Informatica?

How many repositories can be created in Informatica?


How many minimum groups can be defined in a Router
transformation? How do you define partitions in Informatica?
How can you improve performance in an Aggregator transformation?
How does the Informatica know that the input is sorted?

How many worklets can be defined within a workflow?


How do you define a parameter file? Give an example of its use.
If you join two or more tables and then pull out about two columns from each table into the source qualifier and then just pull out
one column from the source qualifier into an Expression transformation and then do a generate SQL in the source qualifier how
many columns will show up in the generated SQL.
In a Type 1 mapping with one source and one target table what is the minimum number of update strategy transformations to be
used? At what levels can you define parameter files and what is the order?
In a session log file where can you find the reader and the writer details?
For joining three heterogeneous tables how many joiner transformations are
required? Can you look up a flat file using Informatica?
While running a session what default files are created?
Describe the use of Materialized views and how are they different from a normal view.
Contributed by Mukherjee, Saibal (ETL Consultant)
Many readers are asking Wheres the answer? Well it will take some time before I get time to write it But there is no reason to get

upset The informatica help files should have all of these answers!
Posted in ETL Tools [http://etlguru.com/blog/category/etltools/] , Informatica [http://etlguru.com/blog/category/informatica/] , Informatica FAQs
[http://etlguru.com/blog/category/informatica faqs/] , Interview FAQs [http://etlguru.com/blog/category/interview faqs/] ,Uncategorized
[http://etlguru.com/blog/category/uncategorized/] | 26 Comments [http://etlguru.com/blog/2006/11/21/informatica interviewquestionsfaqs/#comments]

Loading & testing fact/transactional/balances (data), which is valid between dates!


[http://etlguru.com/blog/2006/07/25/loadingtestingfacttransactionalbalancesdatawhichisvalidbetweendates/]
Tuesday, July 25th, 2006

This is going to be a very interesting topic for ETL & Data modelers who design processes/tables to load fact or transactional data
which keeps on changing between dates. ex: prices of shares, Company ratings, etc.

http://shaninformatica.blogspot.com/

70/71

12/23/2015

Informatica Question Answer

The table above shows an entity in the source system that contains time variant values but they dont change daily. The values are valid over a period of time

then they change.

[http://web.archive.org/web/20070523031241/http:/etlguru.com/blog/wp content/uploads/2006/08/variable_bond_interest_fct1.JPG]

1 .What the table structure should be used in the data warehouse?

Maybe Ralph Kimball or Bill Inmon can come with better data model!

But for ETL developers or ETL leads the decision is already made so lets look for a

solution.
2. What should be the ETL design to load such a structure?
Design A
There is one to one relationship between the source row and the target row.
There is a CURRENT_FLAG attribute, that means every time the ETL process get a new value it has add a new row with current flag
and go to the previous row and retire it. Now this step is a very costly ETL step it will slow down the ETL process.
From the report writer issue this model is a major challange to use. Because what if the report wants a rate which is not current.
Imagine the complex query.
Design B
In this design the sanpshot of the source table is taken every day.
The ETL is very easy. But can you imagine the size of fact table when the source which has more than 1 million rows in the source
table. (1 million x 365 days = ? rows per year). And what if the change in values are in hours or minutes?
But you have a very happy user who can write SQL reports very easily.
Design C
Can there be a comprimise. How about using from date (time) to date (time)! The report write can simply provide a date (time)
and the straight SQL can return a value/row that was valid at that moment.
However the ETL is indeed complex as the A model. Because while the current row will be from current date to infinity. The previous
row has to be retired to from date to todays date 1.
This kind of ETL coding also creates lots of testing issues as you want to make sure that for nay given date and time only one
instance of the row exists (for the primary key).
Which design is better, I have used all depending on the situtation.
3. What should be the unit test plan?
There are various cases where the ETL can miss and when planning for test cases and your plan should be to precisely test those. Here are some
examples of test plans
a. There should be only one value for a given date/date time
b. During the initial load when the data is available for multiple days the process should go sequential and create snapshots/ranges correctly.
c. At any given time there should be only one current row .
d. etc

Posted 2nd June 2012 by Shankar Prasad


4

http://shaninformatica.blogspot.com/

View comments

71/71

Vous aimerez peut-être aussi