Académique Documents
Professionnel Documents
Culture Documents
[http://3.bp.blogspot.com/eupPD2VLBU/T8nOyWjmZjI/AAAAAAAADIg/t5
LcnEgyc/s1600/t1.PNG]
[http://3.bp.blogspot.com/
BFYOJdTagiM/T8nPFWTZfaI/AAAAAAAADIo/Yg4ByC2Ld5w/s1600/t2.PNG]
Let us assume that the source system is a Relational Database . The source table is having duplicate rows. Now to
eliminate duplicate records, we can check the Distinct option of the Source Qualifier of the source table and load the
target accordingly.
Source Qualifier Transformation DISTINCT clause
1/71
12/23/2015
Maths
Life Science
Physical Science
Sam
100
70
80
John
75
100
85
Tom
80
100
85
Subject Name
Marks
Sam
Maths
100
Sam
Life Science
70
Sam
Physical Science
80
John
Maths
75
John
Life Science
100
John
Physical Science
85
Tom
Maths
80
Tom
Life Science
100
Tom
Physical Science
85
Q4. Name the transformations which converts one to many rows i.e increases the i/p:o/p row count. Also what is the
name of its reverse transformation.
Ans.
Normalizer as well as Router Transformations are the Active transformation which can increase the number of input
rows to output rows.
Aggregator Transformation is the active transformation that performs the reverse action.
Q5. Suppose we have a source table and we want to load three target tables based on source rows such that first row
moves to first target table, secord row in second target table, third row in third target table, fourth row again in first target
table so on and so forth. Describe your approach.
Ans.
We can clearly understand that we need a Router transformation to route or filter source data to the three target tables.
Now the question is what will be the filter conditions. First of all we need an Expression Transformation where we have
all the source table columns and along with that we have another i/o port say seq_num, which is gets sequence
numbers for each source row from the port NextVal of a Sequence Generator start value 0 and increment by 1. Now
the filter condition for the three router groups will be:
MOD(SEQ_NUM,3)=1 connected to 1st target table, MOD(SEQ_NUM,3)=2 connected to 2nd target table,
MOD(SEQ_NUM,3)=0 connected to 3rd target table.
http://shaninformatica.blogspot.com/
2/71
12/23/2015
Subject Name
Marks
Sam
Maths
100
Tom
Maths
80
Sam
Physical Science
80
John
Maths
75
Sam
Life Science
70
John
Life Science
100
John
Physical Science
85
Tom
Life Science
100
Tom
Physical Science
85
http://shaninformatica.blogspot.com/
3/71
12/23/2015
Maths
Life Science
Physical Science
Sam
100
70
80
John
75
100
85
Tom
80
100
85
We will sort the source data based on STUDENT_NAME ascending followed by SUBJECT ascending.
http://shaninformatica.blogspot.com/
4/71
12/23/2015
Now based on STUDENT_NAME in GROUP BY clause the following output subject columns are populated as
MATHS: MAX(MARKS, SUBJECT='Maths')
LIFE_SC: MAX(MARKS, SUBJECT='Life Science')
PHY_SC: MAX(MARKS, SUBJECT='Physical Science')
Aggregator Transformation
5/71
12/23/2015
Q18. Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2
http://shaninformatica.blogspot.com/
6/71
12/23/2015
Filter Transformation
Filter transformation filters rows from within a
mapping
http://shaninformatica.blogspot.com/
7/71
12/23/2015
http://shaninformatica.blogspot.com/
8/71
12/23/2015
http://shaninformatica.blogspot.com/
9/71
12/23/2015
Next we place a Sorted Aggregator Transformation . Here we will find out the AVERAGE SALARY for each
(GROUP BY) DEPTNO .
When we perform this aggregation, we lose the data for individual employees. To maintain employee data, we must
pass a branch of the pipeline to the Aggregator Transformation and pass a branch with the same sorted source data to
the Joiner transformation to maintain the original data. When we join both branches of the pipeline, we join the
aggregated data with the original data.
http://shaninformatica.blogspot.com/
10/71
12/23/2015
So next we need Sorted Joiner Transformation to join the sorted aggregated data with the original data, based on
DEPTNO .
Here we will be taking the aggregated pipeline as the Master and original dataflow as Detail Pipeline.
http://shaninformatica.blogspot.com/
11/71
12/23/2015
After that we need a Filter Transformation to filter out the employees having salary less than average salary for their
department.
Filter Condition: SAL>=AVG_SAL
http://shaninformatica.blogspot.com/
12/71
12/23/2015
Description
Start
Value
Increment
By
End Value
Current
Value
Cycle
Number
of Cached
Values
Reset
end value for the sequence, it wraps around and starts the cycle
again, beginning with the configured Start Value.
Number of sequential values the Integration Service caches at a
time.
Default value for a standard Sequence Generator is 0.
Default value for a reusable Sequence Generator is 1,000.
Restarts the sequence at the current value each time a session
runs.
This option is disabled for reusable Sequence Generator
transformations.
Q33. Suppose we have a source table populating two target tables. We connect the NEXTVAL port of the Sequence
Generator to the surrogate keys of both the target tables.
Will the Surrogate keys in both the target tables be same? If not how can we flow the same sequence values in both of
them.
Ans.
When we connect the NEXTVAL output port of the Sequence Generator directly to the surrogate key columns of the
target tables, the Sequence number will not be the same .
A block of sequence numbers is sent to one target tables surrogate key column. The second targets receives a block of
sequence numbers from the Sequence Generator transformation only after the first target table receives the block of
sequence numbers.
Suppose we have 5 rows coming from the source, so the targets will have the sequence values as TGT1 (1,2,3,4,5)
and TGT2 (6,7,8,9,10). [Taken into consideration Start Value 0, Current value 1 and Increment by 1.
Now suppose the requirement is like that we need to have the same surrogate keys in both the targets.
Then the easiest way to handle the situation is to put an Expression Transformation in between the Sequence
Generator and the Target tables. The SeqGen will pass unique values to the expression transformation, and then the
rows are routed from the expression transformation to the targets.
http://shaninformatica.blogspot.com/
13/71
12/23/2015
Sequence Generator
Q34. Suppose we have 100 records coming from the source. Now for a target column population we used a Sequence
generator.
Suppose the Current Value is 0 and End Value of Sequence generator is set to 80. What will happen?
Ans.
End Value is the maximum value the Sequence Generator will generate. After it reaches the End value the session
fails with the following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Failing of session can be handled if the Sequence Generator is configured to Cycle through the sequence, i.e.
whenever the Integration Service reaches the configured end value for the sequence, it wraps around and starts the
cycle again, beginning with the configured Start Value.
Q35. What are the changes we observe when we promote a non resuable Sequence Generator to a resuable one?
And what happens if we set the Number of Cached Values to 0 for a reusable transformation?
Ans.
When we convert a non reusable sequence generator to resuable one we observe that the Number of Cached
Values is set to 1000 by default And the Reset property is disabled.
When we try to set the Number of Cached Values property of a Reusable Sequence Generator to 0 in the
Transformation Developer we encounter the following error message:
The number of cached values must be greater than zero for reusable sequence transformation.
Test Preparation
We will perform the same test with 4 different data points (data volumes) and log the results. We will start with 1 million
data in detail table and 0.1 million in master table. Subsequently we will test with 2 million, 4 million and 6 million detail
table data volumes and 0.2 million, 0.4 million and 0.6 million master table data volumes. Here are the details of the
setup we will use,
1. Oracle 10g database as relational source and target
2. Informatica PowerCentre 8.5 as ETL tool
3. Database and Informatica setup on different physical servers using HP UNIX
4. Source database table has no constraint, no index, no database statistics and no partition
http://shaninformatica.blogspot.com/
14/71
12/23/2015
Result
The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort
data. The average time is plotted along vertical axis and data points are plotted along horizontal axis.
Data Points Master Table Record Count
0.1 M
1M
0.2 M
2M
0.4 M
4M
0.6 M
6M
Verdict
In our test environment, Oracle 10g performs JOIN operation 24% faster than Informatica Joiner
Transformation while without Index and 42% faster with Database Index
Assumption
1. Average server load remains same during all the experiments
2. Average network speed remains same during all the experiments
Note
1. This data can only be used for performance comparison but cannot be used for performance benchmarking.
2. This data is only indicative and may vary in different testing conditions.
15/71
12/23/2015
Test Preparation
We will perform the same test with different data points (data volumes) and log the results. We will start with 1 million
records and we will be doubling the volume for each next data points. Here are the details of the setup we will use,
1. Oracle 10g database as relational source and target
2. Informatica PowerCentre 8.5 as ETL tool
3. Database and Informatica setup on different physical servers using HP UNIX
4. Source database table has no constraint, no index, no database statistics and no partition
5. Source database table is not available in Oracle shared pool before the same is read
6. There is no session level partition in Informatica PowerCentre
7. There is no parallel hint provided in extraction SQL query
8. The source table has 10 columns and first 8 columns will be used for sorting
9. Informatica sorter has enough cache size
We have used two sets of Informatica PowerCentre mappings created in Informatica PowerCentre designer. The first
mapping m_db_side_sort will use an ORDER BY clause in the source qualifier to sort data in database level. Second
mapping m_Infa_side_sort will use an Informatica sorter to sort data in informatica level. We have executed these
mappings with different data points and logged the result.
Result
The following graph shows the performance of Informatica and Database in terms of time taken by each system to sort
data. The time is plotted along vertical axis and data volume is plotted along horizontal axis.
Verdict
The above experiment demonstrates that Oracle
database is faster in SORT operation than Informatica by
an average factor of 14%.
Assumption
1. Average server load remains same during all the experiments
2. Average network speed remains same during all the experiments
Note
This data can only be used for performance comparison but cannot be used for performance benchmarking.
http://shaninformatica.blogspot.com/
16/71
12/23/2015
inShare [file:///E:/Tutorial/fundooo%20informatica%20(1).doc] 0
0diggsdigg
[http://www.addthis.com/bookmark.php?v=250&username=xa4bc2f37319cd6ca8]
When we run a session, the integration service may create a reject file for each target instance in the mapping to store
the target reject record. With the help of the Session Log and Reject File we can identify the cause of data rejection in
the session. Eliminating the cause of rejection will lead to rejection free loads in the subsequent session runs. If the
Informatica Writer or the Target Database rejects data due to any valid reason the integration service logs the rejected
records into the reject file. Every time we run the session the integration service appends the rejected records to the
reject file.
Row Indicator
Indicator Significance
Rejected By
Insert
Writer or target
Update
Writer or target
Delete
Writer or target
Reject
Writer
Rolledback insert
Writer
Rolledback update
Writer
Rolledback delete
Writer
Committed insert
Writer
Committed update
Writer
Committed delete
Writer
Now comes the Column Data values followed by their Column Indicators, that determines the data quality of the
corresponding Column.
http://shaninformatica.blogspot.com/
17/71
12/23/2015
Type of
Indicator
data
Writer Treats As
Valid data
or Good
Data.
Overflowed
Numeric
Data.
Null Value.
Truncated
String
Data.
Also to be noted that the second column contains column indicator flag value 'D' which signifies that the Row Indicator is
valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T
Incremental Aggregation
When the session runs with incremental aggregation enabled for the first time say 1st week of Jan, we will use the entire
source. This allows the Integration Service to read and store the necessary aggregate data information. On 2nd week of
Jan, when we run the session again, we will filter out the CDC records from the source i.e the records loaded after the
initial load. The Integration Service then processes these new data and updates the target accordingly.
Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally
changed source alters more than half the existing target, the session may not benefit from using incremental
aggregation. In this case, drop the table and recreate the target with entire source data and recalculate the same
aggregation formula .
INCREMENTAL AGGREGATION, may be helpful in cases when we need to load data in monthly facts in a weekly
basis.
Let us see a sample mapping to implement incremental aggregation:
Image: Incremental Aggregation Sample Mapping [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
Look at the Source Qualifier query to fetch the CDC part using a BATCH_LOAD_CONTROL table that saves the
last successful load date for the particular mapping.
Image: Incremental Aggregation Source Qualifier [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
Look at the ports tab of Expression transformation.
http://shaninformatica.blogspot.com/
18/71
12/23/2015
Now the most important session properties configuation to implement incremental Aggregation
http://shaninformatica.blogspot.com/
19/71
12/23/2015
If we want to reinitialize the aggregate cache suppose during first week of every month we will configure another
session same as the previous session the only change being the Reinitialize aggregate cache property checked in
http://shaninformatica.blogspot.com/
20/71
12/23/2015
INVOICE_KEY
AMOUNT
LOAD_DATE
1111
5001
100
01/01/2010
2222
5002
250
01/01/2010
3333
5003
300
01/01/2010
1111
6007
200
07/01/2010
1111
6008
150
07/01/2010
2222
6009
250
07/01/2010
4444
1234
350
07/01/2010
5555
6157
500
07/01/2010
After the first Load on 1st week of Jan 2010, the data in the target is as follows:
CUSTOMER_KEY
INVOICE_KEY
MON_KEY
AMOUNT
1111
5001
201001
100
2222
5002
201001
250
3333
5003
201001
300
Now during the 2nd week load it will process only the incremental data in the source i.e those records having load date
greater than the last session run date. After the 2nd weeks load after incremental aggregation of the incremental source
data with the aggregate cache file data will update the target table with the following dataset:
CUSTOMER_KEY
INVOICE_KEY
MON_KEY
AMOUNT
Remarks/Operation
1111
6008
201001
450
2222
6009
201001
500
3333
5003
201001
300
4444
1234
201001
350
5555
6157
201001
500
The first time we run an incremental aggregation session, the Integration Service processes the entire source. At the end
of the session, the Integration Service stores aggregate data for that session run in two files, the index file and the data
file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation
properties.Each subsequent time we run the session with incremental aggregation, we use the incremental source
changes in the session. For each input record, the Integration Service checks historical information in the index file for a
corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation
incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves the record data.
When writing to the target, the Integration Service applies the changes to the existing target. It saves modified
aggregate data in the index and data files to be used as historical data the next time you run the session.
Each subsequent time we run a session with incremental aggregation, the Integration Service creates a backup of the
incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for
two sets of the files.
The Integration Service creates new aggregate data, instead of using historical data, when we configure the session to
reinitialize the aggregate cache, Delete cache files etc.
When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.
21/71
12/23/2015
inShare [file:///E:/Tutorial/fundooo%20informatica%20(1).doc] 0
0diggsdigg
.
Normalizer, a native transformation in Informatica, can ease many complex data transformation requirement. Learn
how to effectively use normalizer here.
Quarter1
Quarter2
Quarter3
Quarter4
Store1
100
300
500
700
Store2
250
450
650
850
The Normalizer returns a row for each store and sales combination. It also returns an index(GCID) that identifies the
quarter number:
Target Table
Store
Sales
Quarter
Store 1
100
Store 1
300
Store 1
500
Store 1
700
Store 2
250
Store 2
450
Store 2
650
Store 2
850
Month
Transportation
House Rent
Food
Sam
Jan
200
1500
500
John
Jan
300
1200
300
Tom
Jan
300
1350
350
Sam
Feb
300
1550
450
John
Feb
350
1200
290
Tom
Feb
350
1400
350
http://shaninformatica.blogspot.com/
22/71
12/23/2015
Month
Expense Type
Expense
Sam
Jan
Transport
200
Sam
Jan
House rent
1500
Sam
Jan
Food
500
John
Jan
Transport
300
John
Jan
House rent
1200
John
Jan
Food
300
Tom
Jan
Transport
300
Tom
Jan
House rent
1350
Tom
Jan
Food
350
.. like this.
Now below is the screenshot of a complete mapping which shows how to achieve this result using Informatica
PowerCenter Designer.Image: Normalization Mapping Example 1
[file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
In the Ports tab of the Normalizer the ports will be created automatically as configured in the Normalizer tab.
Interestingly we will observe two new columns namely,
GK_EXPENSEHEAD
GCID_EXPENSEHEAD
GK field generates sequence number starting from the value as defined in Sequence field while GCID holds the value
of the occurence field i.e. the column no of the input Expense head.
http://shaninformatica.blogspot.com/
23/71
12/23/2015
Now the GCID will give which expense corresponds to which field while converting columns to rows.
Below is the screenshot of the expression to handle this GCID efficiently:
Image: Expression to handle GCID [file:///E:/Tutorial/fundooo%20informatica%20(1).doc]
Let's think about this scenario. You are loading your target table through a mapping. Inside the mapping you
have a Lookup and in the Lookup, you are actually looking up the same target table you are loading. You may
ask me, "So? What's the big deal? We all do it quite often...". And yes you are right. There is no "big deal"
because Informatica (generally) caches the lookup table in the very beginning of the mapping, so whatever
record getting inserted to the target table through the mapping, will have no effect on the Lookup cache. The lookup
will still hold the previously cached data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when the target table is changing? What if you want
your lookup cache to always show the exact snapshot of the data in your target table at that point in time? Clearly this
requirement will not be fullfilled in case you use a static cache. You will need a dynamic cache to handle this.
Article Index
Informatica Dynamic Lookup Cache [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html] What
is Static Cache [http://www.dwbiconcepts.com/basic concept/3etl/22dynamiclookupcache.html?start=1]
What is Dynamic Cache [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=2]
How does dynamic cache work [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=3] Dynamic
Lookup Mapping Example [http://www.dwbiconcepts.com/basicconcept/3etl/22dynamiclookupcache.html?start=4] Dynamic Lookup
Sequence ID [http://www.dwbiconcepts.com/basic concept/3etl/22dynamiclookupcache.html?start=5] Dynamic Lookup Ports
[http://www.dwbiconcepts.com/basic concept/3etl/22dynamic lookupcache.html?start=6]
http://shaninformatica.blogspot.com/
24/71
12/23/2015
.
A LookUp cache does not change once built. But what if the underlying lookup table changes the data after the lookup
cache is created? Is there a way so that the cache always remain uptodate even if the underlying table changes?
Dynamic Lookup Cache
Let's think about this scenario. You are loading your target table through a mapping. Inside the mapping you
have a Lookup and in the Lookup, you are actually looking up the same target table you are loading. You may
ask me, "So? What's the big deal? We all do it quite often...". And yes you are right. There is no "big deal"
because Informatica (generally) caches the lookup table in the very beginning of the mapping, so whatever
record getting inserted to the target table through the mapping, will have no effect on the Lookup cache. The lookup
will still hold the previously cached data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when the target table is changing? What if you want
your lookup cache to always show the exact snapshot of the data in your target table at that point in time? Clearly this
requirement will not be fullfilled in case you use a static cache. You will need a dynamic cache to handle this.
http://shaninformatica.blogspot.com/
25/71
12/23/2015
26/71
12/23/2015
0 Add a comment
1.Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are associated with one Fact
Table ur project?
Ans: Yes
2.What is ROLAP, MOLAP, and DOLAP...?
Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop OLAP). In these three
OLAP
architectures, the interface to the analytic layer is typically the same what is quite different is how the
data is physically stored.
In MOLAP, the premise is that online analytical processing is best implemented by storing the data
multidimensionally that is,
data must be stored multidimensionally in order to be viewed in a multidimensional manner.
In ROLAP, architects believe to store the data in the relational model for instance, OLAP capabilities are
best provided
against the relational database.
DOLAP, is a variation that exists to provide portability for the OLAP user. It creates multidimensional
datasets that can be
transferred from server to desktop, requiring only the DOLAP software to exist on the target system. This
provides significant
advantages to portable computer users, such as salespeople who are frequently on the road and do not
have direct access to
their office server.
3.What is an MDDB? and What is the difference between MDDBs and RDBMSs?
Ans: Multidimensional Database There are two primary technologies that are used for storing the data used in
OLAP applications.
These two technologies are multidimensional databases (MDDB) and relational databases (RDBMS). The
major difference
between MDDBs and RDBMSs is in how they store data. Relational databases store their data in a series
of tables and
columns. Multidimensional databases, on the other hand, store their data in a large
multidimensional arrays.
For example, in an MDDB world, you might refer to a sales figure as Sales with Date, Product, and
Location coordinates of
1212001, Car, and south, respectively.
Advantages of MDDB:
Retrieval is very fast because
The data corresponding to any combination of dimension members can be retrieved with a single I/O.
The index is small and can therefore usually reside completely in memory.
Storage is very efficient because
A single index locates the block corresponding to a combination of sparse dimension numbers.
http://shaninformatica.blogspot.com/
27/71
12/23/2015
http://shaninformatica.blogspot.com/
28/71
12/23/2015
http://shaninformatica.blogspot.com/
29/71
12/23/2015
4.
enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo! GeoCities directory.
17.What cmd is used to transfer multiple files at a time using FTP?
Ans: mget ==> To copy multiple files from the remote machine to the local machine. You will be prompted for a
y/n answer before
transferring each file mget
local directory,
using the same file names).
mput ==> To copy multiple files from the local machine to the remote machine.
18. What is an Filter Transformation? or what options u have in Filter Transformation?
Ans: The Filter transformation provides the means for filtering records in a mapping. You pass all the rows from a
source
transformation through the Filter transformation, then enter a filter condition for the transformation. All
ports in a Filter
http://shaninformatica.blogspot.com/
30/71
12/23/2015
Note: Discarded rows do not appear in the session log or reject files
To maximize session performance, include the Filter transformation as close to the sources in the
mapping as possible.
Rather than passing records you plan to discard through the mapping, you then filter out unwanted data
early in the
You cannot concatenate ports from more than one transformation into the Filter transformation the
input ports for the filter
must come from a single transformation. Filter transformations exist within the flow of the mapping and
cannot be
unconnected. The Filter transformation does not allow setting output default values.
19.What are default sources which will supported by Informatica Powermart ?
Ans :
Relational tables, views, and synonyms.
Fixedwidth and delimited flat files that do not contain binary data.
COBOL files.
20. When do u create the Source Definition ? Can I use this Source Defn to any Transformation?
Ans: When working with a file that contains fixedwidth binary data, you must create the source definition.
The Designer displays the source definition as a table, consisting of names, datatypes, and constraints.
To use a source
definition in a mapping, connect a source definition to a Source Qualifier or Normalizer
transformation. The Informatica
Server uses these transformations to read the source data.
21. What is Active & Passive Transformation ?
Ans: Active and Passive Transformations
Transformations can be active or passive. An active transformation can change the number of records
passed through it. A
passive transformation never changes the record count.For example, the Filter transformation removes
rows that do not
meet the filter condition defined in the transformation.
Active transformations that might change the record count include the following:
Advanced External Procedure
Aggregator
Filter
Joiner
Normalizer
Rank
Source Qualifier
Note: If you use PowerConnect to access ERP sources, the ERP Source Qualifier is also an active
transformation.
/*
You can connect only one of these active transformations to the same transformation or target, since the
Informatica
Server cannot determine how to concatenate data from different sets of records with different numbers of
rows.
*/
Passive transformations that never change the record count include the following:
Lookup
Expression
External Procedure
Sequence Generator
Stored Procedure
Update Strategy
You can connect any number of these passive transformations, or
http://shaninformatica.blogspot.com/
connect
one
active
31/71
12/23/2015
Work Area:
Temporary Tables
Memory
23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page # 125)
Ans: Defn: Data About Data
Metadata contains descriptive data for end users. In a data warehouse the term metadata is used in a
number of different
situations.
Metadata is used for:
Data management
Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and any changes that
need to be made. The advantage of storing metadata about the data being transformed is that as source data
changes the changes can be captured in the metadata, and transformation programs automatically
regenerated.
For each source data field the following information is reqd: Source
Field:
Unique identifier (to avoid any confusion occurring betn 2 fields of the same anme from different
sources).
Location
system ( system it comes from ex.Accouting system).
object ( object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:
Unique identifier
Name
Type (database data type, such as Char, Varchar, Number and so on).
The other information that needs to be stored is the transformation or transformations that need to be
applied to turn the source data into the destination data:
Transformation:
Transformation (s)
Name
Tables
Columns
name
type
Indexes
http://shaninformatica.blogspot.com/
32/71
12/23/2015
Views
Columns
name
type
Constraints
name
type
table
columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page # 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same metadata can be
used by the Whouse manager to describe the data in the data warehouse is also reqd by the query manager.
The query mangaer will also generate metadata about the queries it has run. This metadata can be used to
build a history of all quries run and generate a query profile for each user, group of users and the data
warehouse as a whole.
The metadata that is reqd for each query is:
query
tables accessed
columns accessed
name
refence identifier
restrictions applied
column name
table name
reference identifier
restriction
join Criteria applied
group
by
criteria
sort
criteria
syntax
execution
plan
resources
25. What are the tasks that are done by Informatica Server?
Ans:The Informatica Server performs the following tasks:
Manages the scheduling and execution of sessions and batches
Executes sessions and batches
Verifies permissions and privileges
Interacts with the Server Manager and pmcmd.
http://shaninformatica.blogspot.com/
33/71
12/23/2015
Ans: Target Load Order: In the Designer, you can set the order in which the Informatica Server sends
records to various target
definitions in a mapping. This feature is crucial if you want to maintain referential integrity when
inserting, deleting, or updating
records in tables that have the primary key and foreign key constraints applied to them. The
Informatica Server writes data to
all the targets connected to the same Source Qualifier or Normalizer simultaneously, to
maximize performance.
28. (ii) What are the minimim condition that u need to have so as to use Targte Load Order Option in
Designer?
Ans: U need to have Multiple Source Qualifier transformations.
To specify the order in which the Informatica Server sends data to targets, create one Source Qualifier or
Normalizer
transformation for each target within a mapping. To set the target load order, you then determine the
order in which each
Source Qualifier sends data to connected targets in the mapping.
When a mapping includes a Joiner transformation, the Informatica Server sends all records to
targets connected to that
Joiner at the same time, regardless of the target load order.
28(iii). How do u set the Target load order?
Ans: To set the target load order:
1. Create a mapping that contains multiple Source Qualifier transformations.
2. After you complete the mapping, choose MappingsTarget Load Plan.
A dialog box lists all Source Qualifier transformations in the mapping, as well as the targets that
receive data from each
Source Qualifier.
3. Select a Source Qualifier from the list.
http://shaninformatica.blogspot.com/
34/71
12/23/2015
Indicates when the Informatica Server initializes the session and its
components. Summarizes session results, but not at the level of individual records.
Normal
Includes initialization information as well as error messages and notification of
rejected data.
Verbose initialization Includes all information provided with the Normal setting plus more extensive
information about initializing transformations in the session.
Verbose data
Includes all information provided with the Verbose initialization setting.
http://shaninformatica.blogspot.com/
35/71
12/23/2015
Homogeneous
Common structure
34. How do you use DDL commands in PL/SQL block ex. Accept table name from user and drop it, if available else
display msg?
Ans: To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL, the Package used is
DBMS_SQL.
35. What r the steps to work with Dynamic SQL?
Ans: Open a Dynamic cursor, Parse SQL stmt, Bind i/p variables (if any), Execute SQL stmt of Dynamic Cursor
and
Close the Cursor.
36. Which package, procedure is used to find/check free space available for db objects like
table/procedures/views/synonymsetc?
Ans: The Package
is DBMS_SPACE
The Procedure
is UNUSED_SPACE
The Table
is DBA_OBJECTS
http://shaninformatica.blogspot.com/
36/71
12/23/2015
41(ii). What r the differences between Connected lookups and Unconnected lookups?
Ans:
Although both types of lookups perform the same basic task, there are some important
differences:
Connected Lookup
Unconnected Lookup
http://shaninformatica.blogspot.com/
37/71
12/23/2015
and Mapplet?
Ans: Mappings contain two types of transformations, standard and reusable. Standard transformations exist
within a single
mapping. You cannot reuse a standard transformation you created in another mapping, nor can you
create a shortcut to that transformation. However, often you want to create transformations that perform
common tasks, such as calculating the average salary in a department. Since a standard transformation
cannot be used by more than one mapping, you have to set up the same transformation each time you want
to calculate the average salary in a department.
Mapplet: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse
transformation logic
and can contain as many transformations as you need. A mapplet can contain transformations,
reusable transformations, and
shortcuts to transformations.
46. How do u copy Mapping, Repository, Sessions?
Ans: To copy an object (such as a mapping or reusable transformation) from a shared folder, press the Ctrl key
and drag and drop
the mapping into the destination folder.
To copy a mapping from a nonshared folder, drag and drop the mapping into the destination folder. In
both cases, the destination folder must be open with the related tool active.
For example, to copy a mapping, the Mapping Designer must be active. To copy a Source Definition, the
Source Analyzer must be active.
Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not dragging it to the workbook.
When asked if you want to make a copy, click Yes, then enter a new name and click OK.
Choose RepositorySave.
Repository Copying: You can copy a repository from one database to another. You use this feature before
upgrading, to
preserve the original repository. Copying repositories provides a quick way to copy all metadata you want
to use as a basis for
a new repository.
If the database into which you plan to copy the repository contains an existing repository, the Repository
Manager deletes the existing repository. If you want to preserve the old repository, cancel the copy. Then back
up the existing repository before copying the new repository.
To copy a repository, you must have one of the following privileges:
To copy a repository:
1. In the Repository Manager, choose RepositoryCopy Repository.
2. Select a repository you wish to copy, then enter the following information:
Copy Repository Field Required/ Optional
Repository
Required
Description
http://shaninformatica.blogspot.com/
38/71
12/23/2015
Required
username.Must be in USASCII.
ODBC Data Source
Required
database.
Code Page
Required
Click OK.
5. If asked whether you want to delete an existing repository data in the second repository, click OK to delete it.
Click Cancel to preserve the existing repository.
Copying Sessions:
In the Server Manager, you can copy standalone sessions within a folder, or copy sessions in and out of
batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission
You can maintain a common repository object in a single location. If you need to edit the object, all
shortcuts immediately inherit the changes you make.
You can restrict repository users to a set of predefined metadata by asking users to incorporate the
shortcuts into their work instead of developing repository objects independently.
You can develop complex mappings, mapplets, or reusable transformations, then reuse them
easily in other folders.
You can save space in your repository by keeping a single repository object and using shortcuts to
that object, instead of creating copies of the object in multiple folders or multiple repositories.
http://shaninformatica.blogspot.com/
39/71
12/23/2015
You can configure a session to stop if the Informatica Server encounters an error while executing pre session
shell commands.
For example, you might use a shell command to copy a file from one directory to another. For a Windows NT
server you would use the following shell command to copy the SALES_ ADJ file from the target directory, L, to
the source, H:
copy L:\sales\sales_adj H:\marketing\
For a UNIX server, you would use the following command line to perform a similar operation: cp
sales/sales_adj marketing/
Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the Informatica Server.
Environment settings in one shell command script do not carry over to other scripts. To run all shell
commands in the same environment, call a single shell script that in turn invokes other scripts.
49. What are Folder Versions?
Ans: In the Repository Manager, you can create different versions within a folder to help you archive work in
development. You can copy versions to other folders as well. When you save a version, you save all metadata
at a particular point in development. Later versions contain new or modified metadata, reflecting work that
you have completed since the last version.
Maintaining different versions lets you revert to earlier work when needed. By archiving the contents of a
folder into a version each time you reach a development landmark, you can access those versions if later edits
prove unsuccessful.
You create a folder version after completing a version of a difficult mapping, then continue working on the
mapping. If you are unhappy with the results of subsequent work, you can revert to the previous version, then
create a new version to continue development. Thus you keep the landmark version intact, but available for
regression.
Note: You can only work within one version of a folder at a time.
50. How do automate/schedule sessions/batches n did u use any tool for automating Sessions/batch?
Ans: We scheduled our sessions/batches using Server Manager.
You can either schedule a session to run at a given time or interval, or you can manually start the
session.
U needto hv create sessions n batches with Read n Execute permissions or super user privilege.
If you configure a batch to run only on demand, you cannot schedule it.
Note: We did not use any tool for automation process.
51. What are the differences between 4.7 and 5.1 versions?
Ans: New Transformations added like XML Transformation and MQ Series Transformation, and PowerMart and
PowerCenter both
are same from 5.1version.
52. What r the procedure that u need to undergo before moving Mappings/sessions from Testing/Development to
Production?
Ans:
53. How many values it (informatica server) returns when it passes thru Connected Lookup n Unconncted
Lookup?
Ans: Connected Lookup can return multiple values where as Unconnected Lookup will return only one values that
is Return Value.
54. What is the difference between PowerMart and PowerCenter in 4.7.2?
Ans: If You Are Using PowerCenter
PowerCenter allows you to register and run multiple Informatica Servers against the same repository.
Because you can run
these servers at the same time, you can distribute the repository session load across available servers to
http://shaninformatica.blogspot.com/
40/71
12/23/2015
Transformation
Calculate a value
Expression
Perform an aggregate calculations
Aggregator
Modify text
Expression
Filter records
Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure
Stored Procedure
Call a procedure in a shared library or in the
External Procedure
COM layer of Windows NT
Generate primary keys
Sequence Generator
Limit records to a top or bottom range
Rank
Normalize records, including those read
Normalizer
from COBOL sources
Look up values
Lookup
Determine whether to insert, delete, update,
Update Strategy
or reject records
Join records from different databases
Joiner
or flat file systems
56. Expressions in Transformations, Explain briefly how do u use?
Ans: Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The most obvious
examples of these are the
Expression and Aggregator transformations, which perform calculations on either single values or an
entire range of values
within a port. Transformations that use expressions include the following:
Transformation
Expression
Aggregator
Filter
expression.
Rank
Update Strategy
In each transformation, you use the Expression Editor to enter the expression. The Expression Editor
supports the transformation language for building expressions. The transformation language uses SQLlike
functions, operators, and other components to build the expression. For example, as in SQL, the
transformation language includes the functions COUNT and SUM. However, the PowerMart/PowerCenter
transformation language includes additional functions not found in SQL.
When you enter the expression, you can use values available through ports. For example, if the
transformation has two input ports representing a price and sales tax rate, you can calculate the final sales
tax using these two values. The ports used in the expression can appear in the same transformation, or you
can use output ports in other transformations.
57. In case of Flat files (which comes thru FTP as source) has not arrived then what happens?Where do u set
this option?
Ans: U get an fatel error which cause server to fail/stop the session.
http://shaninformatica.blogspot.com/
41/71
12/23/2015
Required/ Optional
Description
Optional
http://shaninformatica.blogspot.com/
42/71
12/23/2015
Add a comment
DWH : is a repository of integrated information, specifically structured for queries and analysis. Data and information
are extracted from heterogeneous sources as they are generated. This makes it much easier and more efficient to
run queries over data that originally came from different sources.
Data Mart : is a collection of subject areas organized for decision support based on the needs of a given
department. Ex : sales, marketing etc. the data mart is designed to suit the needs of a department. Data
mart is much less granular than the ware house data
Data Warehouse : is used on an enterprise level, while data marts is used on a business division / department level.
Data warehouses are arranged around the corporate subject areas found in the corporate data model. Data warehouses
contain more detail information while most data marts contain more summarized or aggregated data.
OLTP : Online Transaction Processing. This is standard, normalized database structure. OLTP is
designed for Transactions, which means that inserts, updates and deletes must be fast.
OLAP : Online Analytical Processing. Readonly, historical, aggregated data.
Fact Table : contain the quantitative measures about the business
Dimension Table : descriptive data about the facts (business)
Conformed dimensions : dimension table shared by fact tables.. these tables connect separate star
schemas into an enterprise star schema.
Star Schema : is a set of tables comprised of a single, central fact table surrounded by denormalized
dimensions. Star schema implement dimensional data structures with denormalized dimensions Snow
Flake : is a set of tables comprised of a single, central fact table surrounded by normalized dimension
hierarchies. Snowflake schema implement dimensional data structures with fully normailized
dimensions.
Staging Area : it is the work place where raw data is brought in, cleaned, combined, archived and
exported to one or more data marts. The purpose of data staging area is to get data ready for loading into
a presentation layer.
Queries : The DWH contains 2 types of queries. There will be fixed queries that are clearly defined and
well understood, such as regular reports, canned queries and common aggregations.
There will also be ad hoc queries that are unpredictable, both in quantity and frequency.
Ad Hoc Query : are the starting point for any analysis into a database. The ability to run any query when
desired and expect a reasonable response that makes the data warehouse worthwhile and makes the
design such a significant challenge.
The enduser access tools are capable of automatically generating the database query that answers
http://shaninformatica.blogspot.com/
43/71
12/23/2015
Name
State
1001
Christina
Illinois
At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to
reflect this change? This is the "Slowly Changing Dimension" problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:
Type 1 [http://www.1keydata.com/datawarehousing/scd type1.html] : The new record replaces the original record. No trace of the
old record exists.
Type 2 [http://www.1keydata.com/datawarehousing/scd type2.html] : A new record is added into the customer dimension table.
Therefore, the customer is treated essentially as two people.
Type 3 [http://www.1keydata.com/datawarehousing/scd type3.html] : The original record is modified to reflect the change.
We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we
compare and contrast among the three alternatives.
Type 1 Slowly Changing Dimension:
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no
history is kept.
In our example, recall we originally have the following table:
Customer Key
Name
State
1001
Christina
Illinois
After Christina moved from Illinois to California, the new information replaces the new record, and we have the following
http://shaninformatica.blogspot.com/
44/71
12/23/2015
Name
State
1001
Christina
California
Advantages:
This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old
information.
Disadvantages:
All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the
company would not be able to know that Christina lived in Illinois before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical
changes.
Type 2 Slowly Changing Dimension:
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both
the original and the new record will be present. The newe record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key
Name
State
1001
Christina
Illinois
After Christina moved from Illinois to California, we add the new information as a new row into the table:
Customer Key
Name
State
1001
Christina
Illinois
1005
Christina
California
Advantages:
This allows us to accurately keep all historical information.
Disadvantages:
This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with,
storage and performance can become a concern.
This necessarily complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.
Type 3 Slowly Changing Dimension :
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating
the original value, and one indicating the current value. There will also be a column that indicates when the current value
becomes active.
In our example, recall we originally have the following table:
Customer Key
Name
State
1001
Christina
Illinois
To accomodate Type 3 Slowly Changing Dimension, we will now have the following columns:
Customer Key
Name
Original State
Current State
Effective Date
After Christina moved from Illinois to California, the original information gets updated, and we have the following table
(assuming the effective date of change is January 15, 2003):
Customer Key
Name
Original State
Current State
Effective Date
1001
Christina
Illinois
California
15JAN2003
Advantages:
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages:
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later
moves to Texas on December 15, 2003, the California information will be lost.
Usage:
Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical
changes, and when such changes will only occur for a finite number of time.
Surrogate key :
A surrogate key is frequently a sequential number but doesn't have to be. Having the key independent of all other columns
insulates the database relationships from changes in data values or database design and guarantees uniqueness.
Some database designers use surrogate keys religiously regardless of the suitability of other candidate keys. However, if a
good key already exists, the addition of a surrogate key will merely slow down access, particularly if it is indexed.
The concept of surrogate key is important in data warehouse ,surrogate means deputy or substitute. surrogate key is a small
integer(say 4 bytes)that can uniquely identify the record in the dimension table.however it has no meaning data warehouse experts
suggest that production key used in the databases should not be used in the dimension tables as primary keys instead in
http://shaninformatica.blogspot.com/
45/71
12/23/2015
http://shaninformatica.blogspot.com/
46/71
12/23/2015
Add a comment
http://shaninformatica.blogspot.com/
47/71
12/23/2015
Q.How do you identify existing rows of data in the target table using lookup transformation?
A.There are two ways to lookup the target table to verify a row exists or not :
1.Use connect dynamic cache lookup and then check the values of NewLookuprow Output
port to decide whether the incoming record already exists in the table / cache or not.
2.Use Unconnected lookup and call it from an expression transformation and check the
Lookup condition port value (Null/ Not Null) to decide whether the incoming record
already exists in the table or not.
Q.What are Aggregate transformations?
A.Aggregator transform is much like the Group by clause in traditional SQL.
This particular transform is a connected/ active
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc ]
transform which can take the incoming data from the mapping pipeline and group them
based on the group by ports specified and can caculated aggregate functions like ( avg, sum,
count, stddev....etc) for each of those groups. From a performance perspective if your
mapping has an AGGREGATOR transform use filters and sorters very early in the pipeline if
there is any need for them.
Q.What are various types of Aggregation?
A.Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST, MEDIAN, PERCENTILE, STDDEV, and VARIANCE.
Q.What are Dimensions and various types of Dimension?
a)Unicode IS allows 2 bytes for each character and uses additional byte for each nonascii
character (such as Japanese characters)
b)ASCII IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration parameters.
This comes into effect once you restart the Informatica Server.
`Superset A code page is a superset of another code page when it contains the
character encoded in the other code page, it also contains additional characters not
contained in the other code page.
Subset A code page is a subset of another code page when all characters in the code
page are encoded in the other code page.
What is Code Page used for?
Code Page is used to identify characters that might be in different languages. If you are importing Japanese data
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] into
mapping, u must select the Japanese code page of source data.
Q.What is Router transformation?
A. It is different from filter transformation in that we can specify multiple conditions and route the
data to multiple targets depending on the condition.
A.While running
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] a
uses the Load Manager process and the Data Transformation Manager Process (DTM) to run
the workflow and
http://shaninformatica.blogspot.com/
48/71
12/23/2015
validations for the session, it creates the DTM process. The DTM process is the second
process associated with the session run. The primary purpose of the DTM process is to create
and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divide it into buffers. This is also
known as buffer memory. It creates the main thread, which is called the master thread.
The master thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition to allow
concurrent processing.. When Informatica server writes messages to the session log it
includes thread type and thread ID.
Following are the types of threads that DTM creates:
Master Thread Main thread of the DTM process. Creates and manages all other threads.
Mapping Thread One Thread to Each Session. Fetches Session and Mapping Information.
Pre and Post Session Thread One Thread each to Perform Pre and Post Session
Operations.
Reader Thread One Thread for Each Partition for Each Source Pipeline.
Writer Thread One Thread for Each Partition if target exist in the source pipeline write to
the target.
Transformation Thread One or More Transformation Thread For Each Partition.
http://shaninformatica.blogspot.com/
49/71
12/23/2015
from
relational tables that are not sources in mapping. With Lookup transformation, we can accomplish the
following tasks:
Get a related valueGet the Employee Name from Employee table based
on the Employee ID Perform Calculation.
Update slowly changing dimension tables We can use unconnected lookup transformation
to determine whether the records already exist in the target or not.
Q. While importing the relational source definition from database, what are the meta data of source U
import?
Source name
Database location
Column names
Data types
Key constraints
Q.How many ways you can update a relational source definition and what are they?
A. Two ways
1.Edit the definition
2.Reimport the definition
Q.Where should you place the flat file to import the flat file definition to the designer?
A. Place it in local
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc] folder
Q.Which transformation should u need while using the Cobol sources as source definitions?
A. Normalizer transformation which is used to normalize the data. Since Cobol sources r often consists of
denormalized data.
Q.How can you create or import flat file definition in to the warehouse designer?
A. You can create flat file definition in warehouse designer. In the warehouse designer, you can create
a new target: select the type as flat file. Save it and u can enter various columns for that created
target by editing its properties. Once the target is created, save it. You can import it from the
mapping designer.
Q.What is a mapplet?
A. A mapplet should have a mapplet input transformation which receives input values,
and an output transformation which passes the final modified data to back to the
mapping. Set of transformations where the logic can be reusable when the mapplet is
displayed within the mapping only input & output ports are displayed so that the internal
logic is hidden from end user point of view.
Q.What is a transformation?
A.It is a repository object that generates, modifies or passes data.
Q.What are the designer tools for creating transformations?
A.Mapping designer
Transformation developer
Mapplet designer
Q.What are connected and unconnected transformations?
A.Connect Transformation : A transformation which participates in the mapping data flow . Connected
transformation can receive
[file:///C:/Documents%20and%20Settings/Shankar/My%20Documents/Downloads/Informatica_Interview_Question.doc]
http://shaninformatica.blogspot.com/
50/71
12/23/2015
transformation?
A.Normal (Default) only matching rows from both
master and detail Master outer all detail rows and only
matching rows from master Detail outer all master
rows and only matching rows from detail
Full outer all rows from both master and detail ( matching or non matching)
Q.What are the joiner caches?
A.When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the
master source and builds index and data caches based on the master rows.
After building the caches, the Joiner transformation reads records from the detail source and performs joins.
Q.Why use the lookup transformation?
A. To perform the following tasks.
Get a related value. For example, if your source table includes employee ID, but you want to include the
employee name in your target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per
invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records
already exist in the target.
http://shaninformatica.blogspot.com/
51/71
12/23/2015
Unconnected
Lookup
Receives input
values
from the result of
a
:LKP expression in
another
transformation.
Cache includes all lookup columns used in the mapping (that is, lookup source columns
included in the lookup
condition and lookup source colum
s linke
Informatica Question
tions).
search
lookup/output
ports
in the lookup
condition and the
lookup/return port.
Designate one
return
port (R). Returns
one
column from each
row.
Snapshot Timeslide
Can return multiple columns from the same row or insert into the
dynamic lookup cache.
ica_Interview_Question.doc]
returns the default value for all output ports. If you configure dynamic
caching, t
into the cache or leaves it
unchanged.
If there is a match for the lookup condition, the PowerCenter Server returns the result of the
lookup condition for all
lookup/output ports. If you configure dynamic caching, the PowerCenter Server either updates
the row the in the
cache or leaves the row
unchanged.
If there is no
match
for the lookup
condition, the
PowerCenter
Server
returns NULL.
If there is a match
for
the lookup
condition,
the PowerCenter
Server returns the
result of the
lookup
condition into the
return port.
Pass one output
Pass multiple output values to another transformation. Link lookup/output ports to another
transformation.
value to another
transformation.
The
lookup/output/retu
rn
port passes the
value
to the
transformation
calling :LKP
expression.
Does not support
userdefined
default
values.
Recache from database: If the persistent cache is not synchronized with the lookup table, you can configure the lookup
Dynamic Views template. Powered by Blogger.
Static cache: U can configure a static or readonly cache for only lookup table. By default
Informatica server creates a static cache. It caches the lookup table and lookup values in the cache
for each row that comes into the transformation. When the lookup condition is true, the Informatica
server does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the target,
you can create a look up transformation to use dynamic cache. The Informatica server dynamically
inserts data to the target table.
Shared cache: U can share the lookup cache between multiple transactions. You can share
unnamed cache between transformations in the same mapping.
http://shaninformatica.blogspot.com/
52/71
12/23/2015
Q: What is a Mapping?
A: Mapping Represent the data flow between source and target
Q: What are the components must contain in Mapping?
A: Source definition, Transformation, Target Definition and Connectors
Q: What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data. Transformation
performs specific function. They are two types of transformations:
1. Active
Rows, which are affected during the transformation or can change the no of rows that
pass through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source
qualifier, Update Strategy, ERP Source Qualifier, Advance External Procedure.
2. Passive
Does not change the number of rows that pass through it. Eg: Expression, External
Procedure, Input, Lookup, Stored Procedure, Output, Sequence Generator, XML Source
Qualifier.
Q: Which transformation can be overridden at the Server?
A: Source Qualifier and Lookup Transformations
Q: What is connected and unconnected Transformation and give Examples?
Q: What are Options/Type to run a Stored Procedure?
A:
Normal: During a session, the stored procedure runs where the transformation exists in the
mapping on a rowbyrow basis. This is useful for calling the stored procedure for each row of data
that passes through the mapping, such as running a calculation against an input port. Connected
stored procedures run only in normal mode.
Preload of the Source. Before the session retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of tables or performing joins of data in a
temporary table.
Postload of the Source. After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
Preload of the Target. Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.
Postload of the Target. After the session sends data to the target, the stored procedure runs. This
is useful for recreating indexes on the database.
http://shaninformatica.blogspot.com/
53/71
12/23/2015
A:
Sources may be Flat file, relational db or XML.
http://shaninformatica.blogspot.com/
54/71
12/23/2015
If we need to Lookup values in a table or Update Slowly Changing Dimension tables then we can use
Lookup transformation.
Joiner is used to join heterogeneous sources, e.g. Flat file and relational tables.
Q:How do you create a batch load? What are the different types of batches?
A: Batch is created in the Server Manager. It contains multiple sessions. First create sessions and then
create a batch. Drag the
http://shaninformatica.blogspot.com/
55/71
12/23/2015
o Browse Repository
o Create Session and Batches
Extended Repository Privileges
o Session Operator
o Administer Repository
o Administer Server
o Super User
Q:How many different locks are available for repository objects
A:There are five kinds of locks available on repository objects:
Read lock. Created when you open a repository object in a folder for which you do not have write
permission. Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you have write
permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server starts a
scheduled session or batch.
Fetch lock. Created when the repository reads information about repository objects from the
database.
Save lock. Created when you save information to the repository.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and sends postsession
email when the session completes.
Q: What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform data, and
handle pre and postsession operations.
Q: When the Informatica Server runs a session, what are the tasks handled?
A:
Load Manager (LM):
o LM locks the session and reads session properties.
o LM reads the parameter file.
o LM expands the server and session variables and parameters.
o LM verifies permissions and privileges.
o LM validates source and target code pages.
o LM creates the session log file.
o LM creates the DTM (Data Transformation Manager) process.
Data Transformation Manager (DTM):
http://shaninformatica.blogspot.com/
56/71
12/23/2015
Un connected :
Receive input values from the result of a LKP expression in another
transformation. Only static cache can be used.
http://shaninformatica.blogspot.com/
57/71
12/23/2015
58/71
12/23/2015
commit
interval
(Recovery
is
2. Sources
Set a filter transformation after each SQ and see the records
are not through. If the time taken is same then there is a
problem.
You can also identify the Source problem by
Read Test Session where we copy the mapping with sources, SQ and remove
all transformations and connect to file target. If the performance is same then
there is a Source bottleneck.
Using database query Copy the read query directly from the log. Execute
the query against the source database with a query tool. If the time it takes
to execute the query and the time to fetch the first row are significantly
different, then the query can be modified using optimizer hints.
Solutions:
Optimize Queries using
hints. Use indexes
wherever possible.
3. Mapping
If both Source and target are OK then problem could be in mapping.
Add a filter transformation before target and if the time is the same then
there is a problem. (OR) Look for the performance monitor in the Sessions
property sheet and view the counters.
Solutions:
If High error rows and rows in lookup cache indicate a
mapping bottleneck. Optimize Single Pass Reading:
Optimize Lookup
transformation : 1.
Caching the lookup table:
59/71
12/23/2015
Use a source qualifier filter to remove those same rows at the source,
If not possible to move the filter into SQ, move the filter
transformation as close to the source
qualifier as possible to remove unnecessary data early in
the data flow. Optimize Aggregate transformation:
1.Group by simpler columns. Preferably numeric columns.
2. Use Sorted input. The sorted input decreases the use of aggregate
caches. The server assumes all input data are sorted and as it reads
it performs aggregate calculations.
3. Use incremental aggregation in session property sheet.
Optimize Seq. Generator transformation:
1. Try creating a reusable Seq. Generator transformation and use it in multiple mappings
2.The number of cached value property determines the number of
values the informatica server caches at one time.
Optimize Expression transformation:
1.Factoring out common logic
2.Minimize aggregate function calls.
3.Replace common subexpressions with local variables.
4. Use operators instead of functions.
4. Sessions
If you do not have a source, target, or mapping bottleneck, you may have a
session bottleneck. You can identify a session bottleneck by using the
performance details. The informatica server creates performance details
when you enable Collect Performance Data on the General Tab of the
session properties.
Performance details display information about each Source Qualifier,
target definitions, and individual transformation. All transformations have
some basic counters that indicate the Number of input rows, output rows,
and error rows.
Any value other than zero in the readfromdisk and writetodisk counters for
Aggregate, Joiner, or Rank transformations indicate a session bottleneck.
Low bufferInput_efficiency and BufferOutput_efficiency counter also
indicate a session bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.
5.System (Networks)
18.How to improve the Session performance?
1 Run concurrent sessions
2 Partition session (Power center)
3.Tune Parameter DTM buffer pool, Buffer block size, Index cache size, data cache size,
Commit Interval, Tracing level (Normal, Terse, Verbose Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then DTM
can be increased. The informatica server uses the index and data caches for
Aggregate, Rank, Lookup and Joiner transformation. The server stores the
transformed data from the above transformation in the data cache before
returning it to the data flow. It stores group information for those
transformations in index cache.
If the allocated data or index cache is not large enough to store the date, the
server stores the data in a temporary disk file as it processes the session data.
Each time the server pages to the disk the performance slows. This can be seen
from the counters .
Since generally data cache is larger than the index cache, it has to be more than the index.
4.Remove Staging area
5.Tune off Session recovery
6.Reduce error tracing
19.What are tracing levels?
Normaldefault
Logs initialization and status information, errors encountered, skipped rows due to transformation
errors, summarizes session results but not at the row level.
Terse
Log initialization, error messages, notification of
rejected data. Verbose Init.
In addition to normal tracing levels, it also logs additional initialization information, names of index
and data files used and detailed transformation statistics.
Verbose Data.
In addition to Verbose init, It records row level logs.
http://shaninformatica.blogspot.com/
60/71
12/23/2015
61/71
12/23/2015
When
incremental aggregation, you apply captured
using
[http://www.allinterview.com/viewpost/261896.html] changes in the source to aggregate calculations in a
#4
the Informatica
Server to
forcing it
OLTP
Current data
Short database transactions
Online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary
62/71
12/23/2015
http://shaninformatica.blogspot.com/
63/71
12/23/2015
In a star schema a dimension table will not have any parent table.
Whereas in a snow flake schema a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from
topmost hierachies to the lowermost hierarchies.
http://shaninformatica.blogspot.com/
64/71
12/23/2015
You can also perform the following tasks to optimize the mapping:
1. Configure singlepass reading.
2. Optimize datatype conversions.
3. Eliminate transformation errors.
4. Optimize transformations.
5. Optimize expressions.
RE: Why did you use stored procedure in your ETL Appli...
Click Here to view complete
document hi
http://shaninformatica.blogspot.com/
65/71
12/23/2015
Dynamic cache:
Q.What is the difference between $ & $$ in mapping or parameter file? In which cases they are
generally used?
A. $ prefixes are used to denote session Parameter and variables and $$ prefixes are
used to denote mapping parameters and variables.
how to connect two or more table with single source qualifier?
create a Oracle source with how much ever column you want and
write the join query in SQL query override. But the
column order and data type should be same as in the SQL query.
other workflows
66/71
12/23/2015
DECODE FUNCTION YOU CAN FIND IN SQL BUT IIF FUNCTION IS NOT
IN SQL. DECODE FUNCTION WILL GIVE CLEAR READABILITY TO
UNDERSTAND THE LOGIC TO OTHER.
How many dimension tables did you had in your project and name some dimensions (columns)?
Product Dimension : Product Key, Product id, Product Type, Product name, Batch Number.
Distributor Dimension: Distributor key, Distributor Id, Distributor Location,
Customer Dimension : Customer Key, Customer Id, CName, Age, status, Address, Contact
Account Dimension : Account Key, Acct id, acct type, Location, Balance,
http://shaninformatica.blogspot.com/
67/71
12/23/2015
http://shaninformatica.blogspot.com/
68/71
12/23/2015
Antijoins:
Antijoins are written using the NOT EXISTS or NOT IN
constructs. An antijoin between two tables returns rows from
the first table for which there are no corresponding rows in
the second table. In other words, it returns rows that fail to
match the subquery on the right side.
d.department_name
FROM departments d
MINUS
SELECT
d.department_name
The above query will give the desired results, but it might be
clearer to write the query using an antijoin:
SELECT
d.department_name
FROM departments d
WHERE NOT EXISTS (SELECT NULL
FROM employees e
WHERE e.department_id =
d.department_id) ORDER BY d.department_name;
Re: Without using any transformations how u can load the data into target?
http://shaninformatica.blogspot.com/
69/71
12/23/2015
What is up date strategy and what are the options for update
strategy? What is subject area?
What is the difference between truncate and delete statements?
What kind of Update strategies are normally used (Type 1, 2 & 3) & what are the
differences? What is the exact syntax of an update strategy?
What are bitmap indexes and how and why are they used?
What is bulk bind? How does it improve performance?
What is the difference between constraints based load ordering and target load
plan? What is a deployment group and what is its use?
When and how a partition is defined using Informatica? How
do you improve performance in an Update strategy?
upset The informatica help files should have all of these answers!
Posted in ETL Tools [http://etlguru.com/blog/category/etltools/] , Informatica [http://etlguru.com/blog/category/informatica/] , Informatica FAQs
[http://etlguru.com/blog/category/informatica faqs/] , Interview FAQs [http://etlguru.com/blog/category/interview faqs/] ,Uncategorized
[http://etlguru.com/blog/category/uncategorized/] | 26 Comments [http://etlguru.com/blog/2006/11/21/informatica interviewquestionsfaqs/#comments]
This is going to be a very interesting topic for ETL & Data modelers who design processes/tables to load fact or transactional data
which keeps on changing between dates. ex: prices of shares, Company ratings, etc.
http://shaninformatica.blogspot.com/
70/71
12/23/2015
The table above shows an entity in the source system that contains time variant values but they dont change daily. The values are valid over a period of time
[http://web.archive.org/web/20070523031241/http:/etlguru.com/blog/wp content/uploads/2006/08/variable_bond_interest_fct1.JPG]
Maybe Ralph Kimball or Bill Inmon can come with better data model!
But for ETL developers or ETL leads the decision is already made so lets look for a
solution.
2. What should be the ETL design to load such a structure?
Design A
There is one to one relationship between the source row and the target row.
There is a CURRENT_FLAG attribute, that means every time the ETL process get a new value it has add a new row with current flag
and go to the previous row and retire it. Now this step is a very costly ETL step it will slow down the ETL process.
From the report writer issue this model is a major challange to use. Because what if the report wants a rate which is not current.
Imagine the complex query.
Design B
In this design the sanpshot of the source table is taken every day.
The ETL is very easy. But can you imagine the size of fact table when the source which has more than 1 million rows in the source
table. (1 million x 365 days = ? rows per year). And what if the change in values are in hours or minutes?
But you have a very happy user who can write SQL reports very easily.
Design C
Can there be a comprimise. How about using from date (time) to date (time)! The report write can simply provide a date (time)
and the straight SQL can return a value/row that was valid at that moment.
However the ETL is indeed complex as the A model. Because while the current row will be from current date to infinity. The previous
row has to be retired to from date to todays date 1.
This kind of ETL coding also creates lots of testing issues as you want to make sure that for nay given date and time only one
instance of the row exists (for the primary key).
Which design is better, I have used all depending on the situtation.
3. What should be the unit test plan?
There are various cases where the ETL can miss and when planning for test cases and your plan should be to precisely test those. Here are some
examples of test plans
a. There should be only one value for a given date/date time
b. During the initial load when the data is available for multiple days the process should go sequential and create snapshots/ranges correctly.
c. At any given time there should be only one current row .
d. etc
http://shaninformatica.blogspot.com/
View comments
71/71