377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?

377.Informatica - what are the main issues while working with flat files as source and as targets ?
We need to specify correct path in the session and mension either that file is 'direct' or 'indirect'. keep that file in exact path which you have specified in the session . -regards rasmi ======================================= 1. We can not use SQL override. We have to use transformations for all our requirements 2. Testing the flat files is a very tedious job 3. The file format (source/target definition) should match exactly with the format of data file. Most of the time erroneous result come when the data file layout is not in sync with the actual file. (i) Your data file may be fixed width but the definition is delimited----> truncated data (ii) Your data file as well as definition is delimited but specifying a wrong delimiter (a) a delimitor other than present in actual file or (b) a delimiter that comes as a character in some field of the file-->wrong data again (iii) Not specifying NULL character properly may result in wrong data (iv) there are other settings/attributes while creating file definition which one should be very careful 4. If you miss link to any column of the target then all the data will be placed in wrong fields. That missed column wont exist in the target data file.
332.Informatica - Explain about Informatica server process that how it works relates to mapping variables? informatica primarly uses load manager and data transformation manager(dtm) to perform extracting transformation and loading.load manager reads parameters and variables related to session mapping and server and paases the mapping parameters and variable information to the DTM.DTM uses this information to perform the datamovement from source to target ======================================= The PowerCenter Server holds two different values for a mapping variable during a session run: l Start value of a mapping variable l Current value of a mapping variable Start Value The start value is the value of the variable at the start of the session. The start value could be a value defined in the parameter file for the variable a value saved in the repository from the previous run of the session a user defined initial value for the variable or the default value based on the variable datatype. The PowerCenter Server looks for the start value in the following order: 1. Value in parameter file 2. Value saved in the repository 3. Initial value 4. Default value Current Value The current value is the value of the variable as the session progresses. When a session starts the current value of a variable is the same as the start value. As the session progresses the PowerCenter Server calculates the current value using a variable function that you set for the variable. Unlike the start value of a mapping variable the current value can change as the PowerCenter Server evaluates the current value of a variable as each row passes through the mapping. ======================================= First load manager starts the session and it performs verifications and validations about variables and manages post session tasks such as mail. then it creates DTM process. this DTM inturn creates a master thread which creates remaining threads. master thread credtes read thread write thread transformation thread pre and post session thread etc... Finally DTM hand overs to the load manager after writing into the target
331.Informatica - write a query to retrieve the latest records from the table sorted by version(scd). you can write a query like inline view clause you can compare previous version to new highest version then you can get your result ======================================= hi Sunil Can u please expalin your answer some what in detail ???? ======================================= Hi Assume if you put the surrogate key in target (Dept table) like p_key and version field dno field and loc field is there then select a.p_key a.dno a.loc a.version from t_dept a where a.version (select max(b.version) from t_dept b where a.dno b.dno) this is the query if you write in lookup it retrieves latest (max) version in lookup from target. in this way performance increases. ======================================= Select Acct.* Rank() Over ( partition by ch_key_id order by version desc) as Rank from Acct where Rank() 1 ======================================= select business_key max(version) from tablename group by business_key
329.Informatica - How do you handle two sessions in Informatica You can handle 2 session by using a link condition (id $ PrevTaskStatus SUCCESSFULL) or you can have a decision task between them. I feel since its only one session dependent on one have a link condition ======================================= By giving a link condition like $PrevTaskStatus SUCCESSFULL ======================================= where exactly do we need to use this link condition (id $ PrevTaskStatus SUCCESSFULL) ======================================= you can drag and drop more than one session in a workflow. there will be linking different and is sequential linking concurrent linking in sequential linking you can run which ever session you require or if the workflow runs all the sessions sequentially. in concurrent linking you can't run any session you want.
319.Informatica - which one is better performance wise joiner or lookup Are you lookuping flat file or database table? Generaly sorted joiner is more effective on flat files than lookup because sorted joiner uses merge join and cashes less rows. Lookup cashes always whole file. If the file is not sorted it can be comparable.Lookups into database table can be effective if the database can return sorted data fast and the amount of data is small because lookup can create whole cash in memory. If database responses slowly or big amount of data are processed lookup cache initialization can be really slow (lookup waits for database and stores cashed data on discs). Then it can be better use sorted joiner which throws data to output as reads them on input.
318.Informatica - How to partition the Session?(Interview o Round-Robin: PC server distributes rows of data evenly to all partitions. @filtero Hash keys: distribute rows to the partitions by group. @rank sorter joiner and unsorted aggregator.o Key range: distributes rows based on a port or set of ports that you specify as the partition key. @source and targeto Pass-through: processes data without redistributing rows among partitions. @any valid partition point. When you create or edit a session you can change the partitioning information for each pipeline in a mapping. If the mapping contains multiple pipelines you can specify multiple partitions in some pipelines and single partitions in others. You update partitioning information using the Partitions view on the Mapping tab in the session properties.
You can configure the following information in the Partitions view on the Mapping tab: l Add and delete partition points. l Enter a description for each partition. l Specify the partition type at each partition point. l Add a partition key and key ranges for certain partition types. ======================================= By default when we create the session workflow creates pass-through partition points at Source Qualifier transformations and target instances.
312.Informatica - how many types of sessions are there in informatica.please explain them. reusable nonusable session ======================================= Total 10 SESSIONS 1. SESSION: FOR MAPPING EXECUTION 2. EMAIL:TO SEND EMAILS 3. COMMAND: TO EXECUTE OS COMMANDS 4 CONTROL: FAIL STOP ABORT 5.EVENT WAIT: FOR PRE_DEFINED OR POST_DEFINED EVENTS 6 EVENT RAISE:TO RAISE ACTIVE USER_DEFINED EVENT 7. DECISSION :CONDITION TO BE EVALUATED FOR CONTROLING FLOW OR PROCESS 8. TIMER: TO HALT THE PROCESS FOR SPECIFIC TIME 9.WORKLET TASK: REUSABLE TASK 10.ASSIGNEMENT: TO ASSIGN VALUES WORKLET OR WORK FLOW VARIABLES ======================================= Session is a type of workflow task and set of instructions that describe how to move Data from Source to targets using a mapping There are two session in informatica 1. sequential: When Data moves one after another from source to target it is sequential 2.Concurrent: When whole data moves simultaneously from source to target it is Concurrent 309.Informatica - Explain the pipeline partition with real time example? PIPELINE SPECIFIES THE FLOW OF DATA FROM SOURCE TO TARGET .PIPELINE PARTISON MEANS PARTISON THE DATA BASED ON SOME KEY VALUES AND LOAD THE DATA TO TARGET UNDER CONCURRENT MODE. WHICH INPROVES THE SESSION PERFORMANCE i.e data loading time reduces. in real time we have some thousands of records exists everyday to load the data to targets .so pipeline partisoning definetly reduces the data loading time.
305.Informatica - how can we remove/optmize source bottlenecks using "query hints" Create indexes for source table colums ======================================= first u must have proper indexes and the table must be analyzed to gather stats to use the cbo. u can get free doc from oracle technet. use the hints after and it is powerful so be careful with the hints. ======================================= 306.Informatica - how can we eliminate source bottleneck using query hint You can identify source bottlenecks by executing the read query directly against the source database. Copy the read query directly from the session log. Execute the query against the source database with a query tool such as isql. On Windows you can load the result of the query in a file. On UNIX systems you can load the result of the query in /dev/null. Measure the query execution time and the time it takes for the query to return the first row. If there is a long delay between the two time measurements you can use an optimizer hint to eliminate the source bottleneck.
297.Informatica - what is test load? Test load is nothing but checking whether the data is moving correctly to the target or not. ======================================= Test load is the property we can set at the session property level by which Informatica performs all pre and post session tasks but does not save target data(in RDBMS target table it writes the data to check the constraints but rolls it back). If the target is flat file then it does not write anything in the file. We can specify number of source rows to test load the mapping. This is another way of debugging the mapping without loading the target. 296.Informatica - how can we delete duplicate rows from flat files ? ======================================= We can delete duplicate rows from flat files by using Sorter transformation. ======================================= Sorter Transofrormation do the records in sorting order(for better performence) . iam asking how can we delete duplicate rows ======================================= use a lookup by primary key ======================================= In the mapping read the flat file through a Source Definition and SQ. Apply a Sorter Transformation in the property tab select distinct . out put will give a sorter distinct data hence you get rid of duplicates. You can also use an Aggegator Transformation and group by the PK. Gives the same result. ======================================= Use Sorter Transformation and check Distinct option. It will remove the duplicates.
283.Informatica - hi, how we validate all the mappings in the repository at once You can not validate all the mappings in one go. But you can validate all the mappings in a folder in one go and continue the process for all the folders. For dooing this log on to the repository manager. Open the folder then the mapping sub folder then select all or some of the mappings(by pressing the shift or control key ctrl+A does not work) and then rightclick and validate. ======================================= Yes. We can validate all mappings using the Repo Manager.
276.Informatica - How many types of TASKS we have in Workflomanager? What r they? work flow task : 1)session 2)command 3)email 4)control 5)command 6)presession 7)post session 8)assigment ======================================= 1) session2) command 3) email4) event-wait5) event-raise6) assignment7) control8) decision9)timer10) worklet3) 8) 9) are self explanatory. 1) run mappings. 2) run OS commands/scripts. 4 + 5) raise user-defined or pre-defined events and wait for the the event to be raised. 6) assign values to workflow var 10) run worklets. ======================================= The following Tasks we r having in Workflow manager Assignment Control Command decision E-mail Session Event-Wait Event-raise and Timer. The Tasks developed in the task developer rreusable tasks and taske which r developed by useing workflow or worklet r non reusable. Among these tasks only Session Command and E-mail r the reusable remaining tasks r non reusable. 274.Informatica - What is target load order ? In a mapping if there are more than one target table then we need to give in which order the target tables should be loaded example: suppose in our mapping there are 2 target table 1. customer 2. Audit table first customer table should be populated than Audit table for that we use target load order
269.Informatica - how did you handle errors?(ETL-Row-Errors) If there is an error comes it stored it on target_table.bad file. The error are in two type 1. row-based errors 2. column based errors column based errors identified by D-GOOD DATA N-NULL DATA O-OVERFLOW DATA R-REJECTED DATA the data stored in .bad file D1232234O877NDDDN23 Like that 268.Informatica - what is dynamic insert? When we selecting the dynamic cache in look up transformation the informatica server create the new look up row port it will indicates the numeric value wheather the informatica server inserts updates or makes no changes to the look up cashe and if u associate a sequence id the informatica server create a sequence id for newly inserted records. 266.Informatica - what is the event-based scheduling? In time based scheduling the jobs run at the specified time. In some situations we've to run a job based on some events like if a file arrives then only the job has to run whatever the time it is. In such cases event based scheduling is used. ======================================= event based scheduling is using for row indicator file. when u dont no where is the source data that time we use shellcommand script batch file to send to the local directory of the Informatica server is waiting for row indiactor file befor running the session.
262.Informatica - in which particular situation we use unconnected lookup transformation? hi, both unconnected and connected will provide single output. if it is the case that we can use either unconnected or connected i prefer unconnected why because unconnected does not participate in the dataflow so informatica server creates a seperate cache for unconnected and processing takes place parallely. So performance increases. We can use the unconnected lookup transformation when i need to return the only one port at that time I will use the unconnected lookup transformation instead of connected. We can also use the connected to return the one port but if u r taking unconnected lookup transformation it is not connected to the other transformation and it is not a part of data flow that why performance will increase. ======================================= The major advantage of unconnected lookup is its reusability. We can call an unconnected lookup multiple times in the mapping unlike connected lookup. ======================================= We can use the unconnected lookup transformation when we need to return the output from a single port. If we want the output from a multiple ports at that time we have to use connected lookup Transformation. ======================================= Use of connected and unconnected Lookup is completely based on the logic which we need. However i just wanted to clear that we can get multiple rows data from an unconnected lookup also. Just concatenate all the values which you want and get the result from the return row of unconnected lookup and then further split it in the expression. However using Unconnected lookup takes more time as it breaks the flow and goes to an unconnected lookup to fetch the results. ======================================= both unconnected and connected will provide single output. if it is the case that we can use either unconnected or connected i prefer unconnected why because unconnected doesnot participate in the dataflow so informatica server creates a seperate cache for unconnected and processing takes place parallely. so performance increases.
247.Informatica - What is Shortcut? What is use of it? Shortcut is a facility providing by informatica to share metadata objects across folders without copying the objects to every folder.we can create shortcuts for Source definitions Reusable transformations Mapplets Mappings Target definitions Business components.there are two diffrent types of shortcuts 1.local shortcut2. global shortcut ======================================= 248.Informatica - what is the use of Factless Facttable? Factless Fact table are fact table which is not having any measures. For example - You want to store the attendance information of the student. This table will give you datewise whether the student has attended the class or not. But there is no measures because fees paid etc is not daily. ======================================= transaction can occur without the measure fore example victim id 236.Informatica - What Bulk & Normal load? Where we use Bulk and where Normal? when we try to load data in bulk mode there will be no entry in database log files so it will be tough to recover data if session got failed at some point. where as in case of normal mode entry of every record will be with database log file and with the informatica repository. so if the session got failed it will be easy for us to start data from last committed point. Bulk mode is very fast compartively with normal mode. we use bulk mode to load data in databases it won't work with text files using as target where as normal mode will work fine with all type of targets. ======================================= in case of bulk for group of records a dml statement will created and executed but in the case of normal for every recorda a dml statement will created and executed if u selecting bulk performance will be increasing ======================================= Bulk mode is used for Oracle/SQLserver/Sybase. This mode improves performance by not writing to the database log. As a result when using this mode recovery is unavailable. Further this mode doesn't work when update transformation is used and there shouldn't be any indexes or constraints on the table. Ofcourse one can use the pre-session and postsession SQLs to drop and rebuild indexes/constraints. 234.Informatica - Explain in detail about Key Range & Round Robin partition with an example. key range: The informatica server distributes the rows of data based on the st of ports that u specify as the partition key. Round robin: The informatica server distributes the equal no of rows for each and every partition. 233.Informatica - What is the differance between Local and Global repositary? You can develop global and local repositories to share metadata: Global repository. The global repository is the hub of the domain. Use the global repository to store common objects that multiple developers can use through shortcuts. These objects may include operational or Application source definitions reusable transformations mapplets and mappings. Local repositories. A local repository is within a domain that is not the global repository. Use local repositories for development. From a local repository you can create shortcuts to objects in shared folders in the global repository. These objects typically include source definitions common dimensions and lookups and enterprise standard transformations. You can also create copies of objects in non-shared folders. 230.Informatica - What is CDC? Changed Data Capture (CDC) helps identify the data in the source system that has changed since the last extraction. With CDC data extraction takes place at the same time the insert update or delete operations occur in the source tables and the change data is stored inside the database in change tables. The change data thus captured is then made available to the target systems in a controlled manner. ======================================= CDC Changed Data Capture. Name itself saying that if any data is changed it will how to get the values. for this one we have type1 and type2 and type3 cdc's are there. depending upon our requirement we can fallow. =======================================
Whenever any source data is changed we need to capture it in the target system also this can be basically in 3 ways Target record is completely replaced with new record(Type 1) Complete changes can be captured as different records & stored in the target table(Type 2) Only last change & present data can be captured (Type 3) CDC can be done generally by using a timestamp or version key 228.Informatica - what is the repository agent? The Repository Agent is a multi-threaded process that fetches inserts and updates metadata in the repository database tables. The Repository Agent uses object locking to ensure the consistency of metadata in the repository. ======================================= The Repository Server uses a process called Repository agent to access the tables from Repository database. The Repository sever uses multiple repository agent processes to manage multiple repositories on different machines on the network using native drivers. ======================================= Name itself it is saying that agent means mediator between and repository server and repository database tables. simply repository agent means who speaks with repository.
224.Informatica - what are the transformations that restrict the partitioning of sessions? Advanced External procedure transformation and External procedure transformation: This Transformation contains a check box on the properties tab to allow partitioning. *Aggregator Transformation: If you use sorted ports you cannot partition the associated source *Joiner Transformation: you can not partition the master source for a joiner transformation *Normalizer Transformation *XML targets. ======================================= 1)source defination 2)Sequence Generator 3)Unconnected Transformation 4)Xml Target defination Advanced External procedure transformation and External procedure transformation: This Transformation contains a check box on the properties tab to allow partitioning. Aggregator Transformation: If you use sorted ports you cannot partition the associated source Joiner Transformation: you can not partition the master source for a joiner transformation Normalizer Transformation XML targets.
222.Informatica - What about rapidly changing dimensions?Can u analyze with an example?
216.Informatica - what is the architecture of any Data warehousing project? what is the flow?
213.Informatica - How do you create single lookup transformation using multiple tables? Write a override sql query. Adjust the ports as per the sql query. ======================================= no it is not possible to create single lookup on multiple tables. beacuse lookup is created upon target table. ======================================= for connected lkp transformation1>create the lkp transformation.2>go for skip.3>manually enter the ports name that u want to lookup.4>connect with the i/p port from src table.5>give the condition6>go for generate sql then modify according to u'r requirement validateit will work.... ======================================= just we can create the view by using two table then we can take that view as lookup table ======================================= If you want single lookup values to be used in multiple target tables this can be done !!! For this we can use Unconnected lookup and can collect the values from source table in any target table depending upon the business rule ...
212.Informatica - why did u use update stategy in your application? Update Strategy is used to drive the data to be Inert Update and Delete depending upon some condition. You can do this on session level tooo but there you cannot define any condition.For eg: If you want to do update and insert in one mapping...you will create two flows and will make one as insert and one as update depending upon some condition.Refer : Update Strategy in Transformation Guide for more information ======================================= Update Strategy is the most important transformation of all Informatica transformations. The basic thing one should understand about this is it is essential transformation to perform DML operations on already data populated targets(i.e targets which contain some records before this mapping loads data) It is used to perform DML operations. Insertion Updation Deletion Rejection when records come to this transformation depending on our requirement we can decide whether to insert or update or reject the rows flowing in the mapping. For example take an input row if it is already there in the target(we find this by lookup transformation) update it otherwise insert it. We can also specify some conditions based on which we can derive which update strategy we have to use. eg: iif(condition DD_INSERT DD_UPDATE) if condition satisfies do DD_INSERT otherwise do DD_UPDATE DD_INSERT DD_UPDATE DD_DELETE DD_REJECT are called as decode options which can perform the respective DML operations. There is a function called DECODE to which we can arguments as 0 1 2 3 DECODE(0) DECODE(1) DECODE(2) DECODE(3) for insertion updation deletion and rejection ======================================= Update Strategy is the most important transformation of all Informatica transformations. The basic thing one should understand about this is it is essential transformation to perform DML operations on already data populated targets(i.e targets which contain some records before this mapping loads data) It is used to perform DML operations. Insertion Updation Deletion Rejection when records come to this transformation depending on our requirement we can decide whether to insert or update or reject the rows flowing in the mapping. For example take an input row if it is already there in the target(we find this by lookup transformation) update it otherwise insert it. We can also specify some conditions based on which we can derive which update strategy we have to use. eg: iif(condition DD_INSERT DD_UPDATE) if condition satisfies do DD_INSERT otherwise do DD_UPDATE DD_INSERT DD_UPDATE DD_DELETE DD_REJECT are called as decode options which can perform the respective DML operations.
There is a function called DECODE to which we can arguments as 0 1 2 3 DECODE(0) DECODE(1) DECODE(2) DECODE(3) for insertion updation deletion and rejection to perform dml operations
205.Informatica - how do we do unit testing in informatica?how do we load data in informatica ? Unit testing are of two types 1. Quantitaive testing 2.Qualitative testing Steps. 1.First validate the mapping 2.Create session on themapping and then run workflow. Once the session is succeeded the right click on session and go for statistics tab. There you can see how many number of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected.This is called Quantitative testing. If once rows are successfully loaded then we will go for qualitative testing. Steps 1.Take the DATM(DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table.If any data is not loaded according to the DATM then go and check in the code and rectify it. This is called Qualitative testing. This is what a devloper will do in Unit Testing.
197.Informatica - how can we store previous session logs Just run the session in time stamp mode then automatically session log will not overwrite current session log. We can do this way also. using $PMSessionlogcount(specify the number of runs of the session log to save) Go to Session-->right click -->Select Edit Task then Goto -->Config Object then set the property Save Session Log By --Runs Save Session Log for These Runs --->To Number of Historical Session logs you want
184.Informatica - what is the difference between constraind base load ordering and target load plan Constraint based load ordering example: Table 1---Master Tabke 2---Detail If the data in table1 is dependent on the data in table2 then table2 should be loaded first.In such cases to control the load order of the tables we need some conditional loading which is nothing but constraint based load In Informatica this feature is implemented by just one check box at the session level. Target load order comes in the designer property..Click mappings tab in designer and then target load plan.It will show all the target load groups in the particular mapping. You specify the order there the server will loadto the target accordingly. A target load group is a set of source-source qulifier-transformations and target. Where as constraint based loading is a session proerty. Here the multiple targets must be generated from one source qualifier. The target tables must posess primary/foreign key relationships. So that the server loads according to the key relation irrespective of the Target load order plan. ======================================= If you have only one source it s loading into multiple target means you have to use Constraint based loading. But the target tables should have key relationships between them. If you have multiple source qualifiers it has to be loaded into multiple target you have to use Target load order.
Constraint based loading : If your mapping contains single pipeline(flow) with morethan one target (If target tables contain Master -Child relationship) you need to use constraint based load in session level. Target Load plan : If your mapping contains multipe pipeline(flow) (specify execution order one by one.example pipeline 1 need to execute first then pipeline 2 then pipeline 3) this is purly based on pipeline dependency
179.Informatica - hwo can we eliminate duplicate rows from flat file? keep aggregator between source qualifier and target and choose group by field key it will eliminate the duplicate records. ======================================= Hi Before loading to target use an aggregator transformation and make use of group by function to eleminate the duplicates on columns .Nanda ======================================= Use Sorter Transformation. When you configure the Sorter Transformation to treat output rows as distinct it configures all ports as part of the sort key. It therefore discards duplicate rows compared during the sort operation Hi Before loading to target Use an aggregator transformation and use group by clause to eliminate the duplicate in columns.Nanda ======================================= Use sorter transformation select distinct option duplicate rows will be eliminated. ======================================= if u want to delete the duplicate rows in flat files then we go for rank transformation or oracle external procedure tranfornation select all group by ports and select one field for rank its easily dupliuctee now ======================================= using Sorter Transformation we can eliminate the Duplicate Rows from Flat file ======================================= to eliminate the duplicate in flatfiles we have distinct property in sorter transformation. If we enable that property automatically it will remove duplicate rows in flatfiles.
178.Informatica - what is Partitioning ? where we can use Partition? wht is advantages?Is it nessisary? The Partitioning Option increases PowerCenters performance through parallel data processing and this option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments. ======================================= partitions are used to optimize the session performance we can select in sesstion propetys for partiotions types default----passthrough partition key range partion round robin partion hash partiotion ======================================= In informatica we can tune performance in 5 different levels that is at source level target level mapping level session level and at network level. So to tune the performance at session level we go for partitioning and again we have 4 types of partitioning those are pass through hash round robin key range. pass through is the default one. In hash again we have 2 types that is userdefined and automatic. round robin can not be applied at source level it can be used at some transformation level key range can be applied at both source or target levels. if you want me to explain each partioning level in detail the i can .
158.Informatica - Difference between Rank and Dense Rank? Rank: 1 2<--2nd position 2<--3rd position 4 5 Same Rank is assigned to same totals/numbers. Rank is followed by the Position. Golf game ususally Ranks this way. This is usually a Gold Ranking. Dense Rank: 1 2<--2nd position 2<--3rd position 3 4 --------------------------------------------------------------------151.Informatica - How do you configure mapping in informatica You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations. For transformations that use data cache (such as Aggregator Joiner Rank and Lookup transformations) limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data cache. You can also perform the following tasks to optimize the mapping: l Configure single-pass reading. l Optimize datatype conversions. l Eliminate transformation errors. l Optimize transformations. l Optimize expressions. You should configure the mapping with the least number of transformations and expressions to do the most amount of work possible. You should minimize the amount of data moved by deleting unnecessary links between transformations. For transformations that use data cache (such as Aggregator Joiner Rank and Lookup transformations) limit connected input/output or output ports. Limiting the number of connected input/output or output ports reduces the amount of data the transformations store in the data You can also perform the following tasks to optimize the mapping: m Configure single-pass reading. m Optimize datatype conversions. m Eliminate transformation errors. m Optimize transformations. m Optimize expressions. 149.Informatica - what are mapping parameters and varibles in which situation we can use it Mapping parameters have a constant value through out the session whereas in mapping variable the values change and the informatica server saves the values in the repository and uses next time when u run the session. ======================================= If we need to change certain attributes of a mapping after every time the session is run it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple. Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the parameter file . But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables . In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run. ======================================= How can you edit the parameter file? Once you setup a mapping variable how can you define them in a parameter file?
How to measure Performance of ETL load process Dear All, Am new to ETL testing. I have a testing requirement to capture the performance of an ETL system's load process for various loads in order to size the product components and perform capacity planning. The measurements to be captured include 1. Amount of data processed per second 2. Number of records processed per second Is there any tool which would help me in achieving this? Any pointers/ guidance in this direction would be greatly appreciated. Many thanks in advance. 1)The metadata of your warehouse ----> Query it to find the things you have asked like throughput, Time taken for a ETL feed to complete and compare it with the expected values assumed prior to development and measure the lag, if any. If you are using a third party scheduler measure the hold time of the feeds (feeds waiting for a prior process/feed to finish).If your metadata is non relational you need to use a metadata reporting utility of the ETL tool. You need not need to do anything in run time if your metadata and logs are having appropriate info. 2)Ask for the logs :----> For a days complete run have the logs and find the statistics of your feeds. These logs will give the time taken for your initialization, extraction,Transformation and Load time taken by the tool. As these files will have a standard format create a script to find the standard lines in log like :-- session started ----time .---session completed. 3)Prepare a sheet to list the values for the parameters of the feeds you are testing , I have mentioned some of the parameters in the above two points. 4)Sum the things from the sheet to find the problem area, if any.
139.Informatica - what are cost based and rule based approaches and the difference Cost based and rule based approaches are the optimization techniques which are used in related to databases where we need to optimize a sql query. Basically Oracle provides Two types of Optimizers (indeed 3 but we use only these two techniques. Bcz the third has some disadvantages.) When ever you process any sql query in Oracle what oracle engine internally does is it reads the query and decides which will the best possible way for executing the query. So in this process Oracle follows these optimization techniques. 1. cost based Optimizer(CBO): If a sql query can be executed in 2 different ways ( like may have path 1 and path2 for same query) then What CBO does is it basically calculates the cost of each path and the analyses for which path the cost of execution is less and then executes that path so that it can optimize the quey execution. 2. Rule base optimizer(RBO): this basically follows the rules which are needed for executing a query. So depending on the number of rules which are to be applied the optimzer runs the query. If the table you are trying to query is already analysed then oracle will go with CBO. If the table is not analysed the Oracle follows RBO. For the first time if table is not analysed Oracle will go with full table scan.
138.Informatica - what are partition points? Partition points mark the thread boundaries in a source pipeline and divide the pipeline into stages.
129.Informatica - How do you handle decimal places while importing a flatfile into informatica? while geeting the data from flat file in informatica ie import data from flat file it will ask for the precision just enter that ======================================= while importing the flat file the flat file wizard helps in configuring the properties of the file so that select the numeric column and just enter the precision value and the scale. precision includes the scale for example if the number is 98888.654 enter precision as 8 and scale as 3 and width as 10 for fixed width flat file ======================================= you can handle that by simply using the source analyzer window and then go to the ports of that flat file representations and changing the precision and scales. ======================================= while importing flat file definetion just specify the scale for a neumaric data type. in the mapping the flat file source supports only number datatype(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source. source ->number datatype port ->SQ -> decimal datatype.Integer is not supported. hence decimal is taken care.
126.Informatica - Can we use aggregator/active transformation after update strategy transformation we can use but the update flag will not be remain.but we can use passive transformation ======================================= I guess no update can be placed just before to the target qs per my knowledge ======================================= You can use aggregator after update strategy. The problem will be once you perform the update strategy say you had flagged some rows to be deleted and you had performed aggregator transformation for all rows say you are using SUM function then the deleted rows will be subtracted from this aggregator transformation. 122.Informatica - What is the procedure to load the fact table.Give in detail? Based on the requirement to your fact table choose the sources and data and transform it based on your business needs. For the fact table you need a primary key so use a sequence generator transformation to generate a unique key and pipe it to the target (fact) table with the foreign keys from the source tables. ======================================= we use the 2 wizards (i.e) the getting started wizard and slowly changing dimension wizard to load the fact and dimension tables by using these 2 wizards we can create different types of mappings according to the business requirements and load into the star schemas(fact and dimension tables). ======================================= first dimenstion tables need to be loaded then according to the specifications the fact tables should be loaded. dont think that fact tables r different in case of loading it is general mapping as we do for other tables. specifications will play important role for loading the fact. ======================================= usually source records are looked up with the records in the dimension table.DIM tables are called lookup or reference table. all the possible values are stored in DIM table. e.g product all the existing prod_id will be in DIM table. when data from source is looked up against the dim table the corresponding keys are sent to the fact table.this is not the fixed rule to be followed it may vary as per ur requirments and methods u follow.some times only the existance check will be done and the prod_id itself will be sent to the fact. 121.Informatica - How to lookup the data on multiple tabels. why using SQL override..we can lookup the Data on multiple tables.See in the properties..tab.. ======================================= Thanks for your responce. But my question is I have two sources or target tables i want to lookup that two sources or target tables. How can i. It is possible to SQL Override. ======================================= just check with import option ======================================= How to lookup the data on multiple tabels. =======================================
if u want to lookup data on multiple tables at a time u can do one thing join the tables which u want then lookup that joined table. informatica provieds lookup on joined tables hats off to informatica. ======================================= You can do it. When you create lookup transformation that time INFA asks for table name so you can choose either source target import and skip. So click skip and the use the sql overide property in properties tab to join two table for lookup. join the two source by using the joiner transformation and then apply a look up on the resaulting table ======================================= what ever my friends have answered earlier is correct. to be more specific if the two tables are relational then u can use the SQL lookup over ride option to join the two tables in the lookup properties.u cannot join a flat file and a relatioanl table. eg: lookup default query will be select lookup table column_names from lookup_table. u can now continue this query. add column_names of the 2nd table with the qualifier and a where clause. if u want to use a order by then use -- at the end of the order by. 120.Informatica - How to retrive the records from a rejected file. explane with syntax or example there is one utility called reject Loader where we can findout the reject records.and able to refine and reload the rejected records.. ======================================= ya. every time u run the session one reject file will be created and all the reject files will be there in the reject file. u can modify the records and correct the things in the records and u can load them to the target directly from the reject file using Regect loader. ======================================= can you explain how to load rejected rows thro informatica ======================================= During the execution of workflow all the rejected rows will be stored in bad files(where your informatica server get installed;C:\Program Files\Informatica PowerCenter 7.1\Server) These bad files can be imported as flat a file in source then thro' direct maping we can load these files in desired format. 98.Informatica - can we modify the data in flat file? ======================================= Let's not discuss about manually modifying the data of flat file. Let's assume that the target is a flat file. I want to update the data in the flat file target based on the input source rows. Like we use update strategy/ target properties in case of relational targets for update; do we have any options in the session or maaping to perform a similar task for a flat file target? I have heard about the append option in INFA 8.x. This may be helpful for incremental load in the flat file. But this is not a workaround for updating the rows. ======================================= You can modify the flat file using shell scripting in unix ( awk grep sed ). 97.Informatica - how to get the first 100 rows from the flat file into the target? please check this one task ----->(link) session (workflow manager) double click on link and type $$source sucsess rows(parameter in session variables) 100 it should automatically stops session.
82.Informatica - If i done any modifications for my table in back end does it reflect in informatca warehouse or mapi Informatica is not at all concern with back end data base.It displays u all the information that is to be stored in repository.If want to reflect back end changes to informatica screens,again u have to import from back end to informatica by valid connection.And u have to replace the existing files with imported files. ======================================= Yes It will be reflected once u refresh the mapping once again. =======================================
It does matter if you have SQL override - say in the SQ or in a Lookup you override the default sql. Then if you make a change to the underlying table in the database that makes the override SQL incorrect for the modified table the session will fail. If you change a table - say rename a column that is in the sql override statement then session will fail. But if you added a column to the underlying table after the last column then the sql statement in the override will still be valid. If you make change to the size of columns the sql will still be valid although you may get truncation of data if the database column has larger size (more characters) than the SQ or subsequent transformation. 17.Informatica - What r the mapping paramaters and maping variables? 17 Maping parameter represents a constant value that U can define before running a session.A mapping parameter retains the same value throughout the entire session. When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session. 21.Informatica - What is aggregate cache in aggregator transforamtion? 21 The aggregator stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files. 26.Informatica - What r the joiner caches? 26 When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows. After building the caches, the Joiner transformation reads records from the detail source and perform joins. 30.Informatica - Differences between connected and unconnected lookup? 32.Informatica - What r the types of lookup caches? 32 Persistent cache: U can save the lookup cache files and reuse them the next time the informatica server processes a lookup transformation configured to use the cache. Recache from database: If the persistent cache is not synchronized with he lookup table,U can configure the lookup transformation to rebuild the lookup cache. Static cache: U can configure a static or readonly cache for only lookup table.By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.when the lookup condition is true,the informatica server does not update the cache while it prosesses the lookup transformation. Dynamic cache: If u want to cache the target table and insert new rows into cache and the target,u can create a look up transformation to use dynamic cache. The informatica server dynamically inerts data to the target table. shared cache: U can share the lookup cache between multiple transactions.U can share unnamed cache between transformations inthe same maping. 36.Informatica - What is the Rankindex in Ranktransformation? 36 The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:
38.Informatica - What r the types of groups in Router transformation? 38 Input group Output group The designer copies property information from the input ports of the input group to create a set of output ports for each output group. Two types of output groups User defined groups Default group U can not modify or delete default groups. 39.Informatica - Why we use stored procedure transformation? l Check the status of a target database before loading data into it. l Determine if enough space exists in a database. l Perform a specialized calculation. l Drop and recreate indexes. 41.Informatica - What r the tasks that source qualifier performs? l Join data originating from the same source database. You can join two or more tables with primary-foreign key relationships by linking the sources to one Source Qualifier. l Filter records when the Informatica Server reads source data. If you include a filter condition the Informatica Server adds a WHERE clause to the default query. l Specify an outer join rather than the default inner join. If you include a user-defined join the Informatica Server replaces the join information specified by the metadata in the SQL query. l Specify sorted ports. If you specify a number for sorted ports the Informatica Server adds an ORDER BY clause to the default SQL query. l Select only distinct values from the source. If you choose Select Distinct the Informatica Server adds a SELECT DISTINCT statement to the default SQL query. l Create a custom query to issue a special SELECT statement for the Informatica Server to read source data. For example you might use a custom query to perform aggregate calculations or execute a stored procedure. Cheers Sithu 42.Informatica - What is the target load order? 42 U specify the target loadorder based on source qualifiers in a maping.If u have the multiple source qualifiers connected to the multiple targets,U can designatethe order in which informatica server loads data into the targets. 45.Informatica - what is update strategy transformation ? 45 This transformation is used to maintain the history data or just most recent changes in to target table. Update strategy transformation is used for flagging the records for insert update delete and reject The model you choose constitutes your update strategy how to handle changes to existing rows. In PowerCenter and PowerMart you set your update strategy at two different levels: l Within a session. When you configure a session you can instruct the Informatica Server to either treat all rows in the same way (for example treat all rows as inserts) or use instructions coded into the session mapping to flag rows for different database operations. l Within a mapping. Within a mapping you use the Update Strategy transformation to flag rows for insert delete update or reject. 46.Informatica - What is the default source option for update stratgey transformation? 46 Data driven.
47.Informatica - What is Datadriven? 47 The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert, update, delete or reject. If u do not choose data driven option setting,the informatica server ignores all update strategy transformations in the mapping. When Data driven option is selected in session properties it the code will consider the update strategy (DD_UPDATE DD_INSERT DD_DELETE DD_REJECT) used in the mapping and not the options selected in the session properties. 48.Informatica - What r the options in the target session of update strategy transsformatioin? 48 Insert Delete Update Update as update Update as insert Update esle insert Truncate table Update as Insert: This option specified all the update records from source to be flagged as inserts in the target. In other words instead of updating the records in the target they are inserted as new records. Update else Insert: This option enables informatica to flag the records either for update if they are old or insert if they are new records from source. 49.Informatica - What r the types of maping wizards that r to be provided in Informatica? 49 The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are designed to create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact table. Getting Started Wizard. Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables. Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data you want to keep and the method you choose to handle historical dimension data. 50.Informatica - What r the types of maping in Getting Started Wizard? 50 Simple Pass through maping : Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all existing data from your table before loading new data. Slowly Growing target : Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when existing data does not require updates. 52.Informatica - What r the different types of Type2 dimension maping? 52 Type2 Dimension/Version Data Maping: In this maping the updated dimension in the source will gets inserted in target along with a new version number.And newly added dimension in source will inserted into target with a primary key. Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In addition it creates a flag value for changed or new dimension.
Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag value 1. And updated dimensions r saved with the value 0. Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly changing dimensions.This maping also inserts both new and changed dimensions in to the target.And changes r tracked by the effective date range for each version of each dimension. 58.Informatica - Why we use partitioning the session in informatica? 58 Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target. Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline. Informatica server can achieve high performance by partitioning the pipleline and performing the extract transformation and load for each partition in parallel. 59.Informatica - How the informatica server increases the session performance through partitioning the source? 59 For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurently. For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file).U can choose to merge the targets. 60.Informatica - What r the tasks that Loadmanger process will do? 60 Manages the session and batch scheduling: Whe u start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run on the informatica server.When u configure the session the loadmanager maintains list of list of sessions and session start times. When u sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process. Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents U starting the session again and again. Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session. Creating log files: Loadmanger creates logfile contains the status of session. 61.Informatica - What r the different threads in DTM process? 61 Master thread: Creates and manages all other threads Maping thread: One maping thread will be creates for each session.Fectchs session and maping information. Pre and post session threads: This will be created to perform pre and post session operations. Reader thread: One thread will be created for each partition of a source.It reads data from source. Writer thread: It will be created to load data to the target. Transformation thread: It will be created to tranform data.
63.Informatica - What is batch and describe about types of batches? 63 Grouping of session is known as batch.Batches r two types Sequential: Runs sessions one after the other Concurrent: Runs session at same time. If u have sessions with source-target dependencies u have to go for sequential batch to start the sessions one after another.If u have several independent sessions u can use concurrent batches. Whch runs all the sessions at the same time. 65.Informatica - When the informatica server marks that a batch is failed? 65 If one of session is configured to "run if previous completes" and that previous session fails. A batch fails when the sessions in the workflow are checked with the property "Fail if parent fails" and any of the session in the sequential batch fails. 66.Informatica - What r the different options used to configure the sequential batches? 66 Two options Run the session only if previous session completes sucessfully. Always runs the session. 67.Informatica - In a sequential batch can u run the session if previous session fails? 67 Yes.By setting the option always runs the session. 71.Informatica - What r the session parameters? 71 Session parameters r like maping parameters,represent values U might want to change between sessions such as database connections or source files. Server manager also allows U to create userdefined session parameters. Following r user defined session parameters. Database connections Source file names: use this parameter when u want to change the name or location of session source file between session runs Target file name : Use this parameter when u want to change the name or location of session target file between session runs. Reject file name : Use this parameter when u want to change the name or location of session reject files between session runs. 72.Informatica - What is parameter file? 72 Parameter file is to define the values for parameters and variables used in a session.A parameter file is a file created by text editor such as word pad or notepad. U can define the following values in parameter file Maping parameters Maping variables session parameters
73.Informatica - What is difference between partioning of relatonal target and partitioning of file targets? 73 If u parttion a session with a relational target informatica server creates multiple connections to the target database to write target data concurently.If u partition a session with a file target the informatica server creates one target file for each partition.U can configure session properties to merge these target files. 74.Informatica - Performance tuning in Informatica? 74 The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server. Increase the session performance by following. The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections. Flat files: If ur flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server. Relational datasources: Minimize the connections to sources ,targets and informatica server to improve session performance.Moving target database into server system may improve session performance. Staging areas: If u use staging areas u force informatica server to perform multiple datapasses. Removing of staging areas may improve session performance. U can run the multiple informatica servers againist the same repository. Distibuting the session load to multiple informatica servers may improve session performance. Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character. If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes. We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time.To do this go to server manger ,choose server configure database connections. If u r target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session. Running a parallel sessions by using concurrent batches will also reduce the time of loading the data.So concurent batches may also increase the session performance. Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines. In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to improve session performance. Aviod transformation errors to improve the session performance. If the sessioin containd lookup transformation u can improve the session performance by enabling the look up cache. If Ur session contains filter transformation ,create that filter transformation nearer to the sources or u can use filter condition in source qualifier. Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before processing it. To improve session performance in this case use sorted ports option.
76.Informatica - Define informatica repository? 76 The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version. Use repository manager to create the repository.The Repository Manager connects to the repository database and runs the code needed to create the repository tables.Thsea tables stores metadata in specific format the informatica server,client tools use. 78.Informatica - What is power center repository? 78 The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise, and a number of local repositories to share the global metadata as needed. l Standalone repository. A repository that functions individually unrelated and unconnected to other repositories. l Global repository. (PowerCenter only.) The centralized repository in a domain a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts. l Local repository. (PowerCenter only.) A repository within a domain that is not the global repository. Each local repository in the domain can connect to the global repository and use objects in its shared folders. 41.What are stored procedure transformations. Purpose of sp transformation. How did you go about using your project? Connected and unconnected stored procudure. Unconnected stored procedure used for data base level activities such as pre and post load Connected stored procedure used in informatica level for example passing one parameter as input and capturing return value from the stored procedure. Normal - row wise check Pre-Load Source - (Capture source incremental data for incremental aggregation) Post-Load Source - (Delete Temporary tables) Pre-Load Target - (Check disk space available) Post-Load Target (Drop and recreate index) 60.Update strategy set DD_Update but in session level have insert. What will happens? Insert take place. Because this option override the mapping level option 101.Variable v1 has values set as 5 in designer(default), 10 in parameter file, 15 in repository. While running session which value informatica will read? Informatica read value 15 from repository 108.What does first column of bad file (rejected rows) indicates? First Column - Row indicator (0, 1, 2, 3) Second Column Column Indicator (D, O, N, T)
Incremental Aggregation Using this, you apply captured changes in the source to aggregate calculation in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes This allows the sever to update the target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session. Steps: The first time you run a session with incremental aggregation enabled, the server process the entire source. (1) At the end of the session, the server stores aggregate data from that session ran in two files, the index file and data file. The server creates the file in local directory. The second time you run the session, use only changes in the source as source data for the session. The server then performs the following actions: For each input record, the session checks the historical information in the index file for a corresponding group, then: If it finds a corresponding group The server performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental changes. Else Server create a new group and saves the record data (2) o o o o o o Each Subsequent time you run the session with incremental aggregation, you use only the incremental source changes in the session. If the source changes significantly, and you want the server to continue saving the aggregate data for the future incremental changes, configure the server to overwrite existing aggregate data with new aggregate data. When writing to the target, the server applies the changes to the existing target. Updates modified aggregate groups in the target Inserts new aggregate data Delete removed aggregate data Ignores unchanged aggregate data Saves modified aggregate data in Index/Data files to be used as historical data the next time you run the session.
Use Incremental Aggregator Transformation Only IF: Mapping includes an aggregate function Source changes only incrementally You can capture incremental changes. You might do this by filtering source data by timestamp.
SESSION LOGS Information that reside in a session log: Allocation of system shared memory Execution of Pre-session commands/ Post-session commands Session Initialization Creation of SQL commands for reader/writer threads Start/End timings for target loading Error encountered during session Load summary of Reader/Writer/ DTM statistics
Other Information By default, the server generates log files based on the server code page.
Thread Identifier Ex: CMN_1039 Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits. The number following a thread name indicate the following: (a) Target load order group number (b) Source pipeline number (c) Partition number (d) Aggregate/ Rank boundary number
Log File Codes Error Codes Description
BR CMN DBGR EPLM TM REP WRT Load Summary (a) Inserted (b) Updated (c) Deleted (d) Rejected Statistics details
Related to reader process, including ERP, relational and flat file. Related to database, memory allocation Related to debugger External Procedure Load Manager DTM Repository Writer
(a) Requested rows shows the no of rows the writer actually received for the specified operation (b) Applied rows shows the number of rows the writer successfully applied to the target (Without Error) (c) Rejected rows show the no of rows the writer could not apply to the target (d) Affected rows shows the no of rows affected by the specified operation Detailed transformation statistics The server reports the following details for each transformation in the mapping (a) Name of Transformation (b) No of I/P rows and name of the Input source (c) No of O/P rows and name of the output target (d) No of rows dropped Tracing Levels Normal Terse Verbose Init Verbose Data NOTE When you enter tracing level in the session property sheet, you override tracing levels configured for transformations in the mapping. - Initialization and status information, Errors encountered, Transformation errors, rows skipped, summarize session details (Not at the level of individual rows) - Initialization information as well as error messages, and notification of rejected data - Addition to normal tracing, Names of Index, Data files used and detailed transformation statistics. - Addition to Verbose Init, Each row that passes in to mapping detailed transformation statistics.
Session Failures and Recovering Sessions Two types of errors occurs in the server Non-Fatal Fatal
(a) Non-Fatal Errors It is an error that does not force the session to stop on its first occurrence. Establish the error threshold in the session property sheet with the stop on option. When you enable this option, the server counts Non-Fatal errors that occur in the reader, writer and transformations. Reader errors can include alignment errors while running a session in Unicode mode. Writer errors can include key constraint violations, loading NULL into the NOT-NULL field and database errors. Transformation errors can include conversion errors and any condition set up as an ERROR,. Such as NULL Input. (b) Fatal Errors This occurs when the server can not access the source, target or repository. This can include loss of connection or target database errors, such as lack of database space to load data. If the session uses normalizer (or) sequence generator transformations, the server can not update the sequence values in the repository, and a fatal error occurs. Others Usages of ABORT function in mapping logic, to abort a session when the server encounters a transformation error. Stopping the server using pmcmd (or) Server Manager
Performing Recovery When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the rowid of the last row commited to the target database. The server then reads all sources again and starts processing from the next rowid. By default, perform recovery is disabled in setup. Hence it wont make entries in OPB_SRVR_RECOVERY table. The recovery session moves through the states of normal session schedule, waiting to run, Initializing, running, completed and failed. If the initial recovery fails, you can run recovery as many times. The normal reject loading process can also be done in session recovery process. The performance of recovery might be low, if o o Mapping contain mapping variables Commit interval is high
Un recoverable Sessions Under certain circumstances, when a session does not complete, you need to truncate the target and run the session from the beginning.
Commit Intervals A commit interval is the interval at which the server commits data to relational targets during a session. (a) Target based commit Server commits data based on the no of target rows and the key constraints on the target table. The commit point also depends on the buffer block size and the commit pinterval. During a session, the server continues to fill the writer buffer, after it reaches the commit interval. When the buffer block is full, the Informatica server issues a commit command. As a result, the amount of data committed at the commit point generally exceeds the commit interval. The server commits data to each target based on primary foreign key constraints.
(b) Source based commit Server commits data based on the number of source rows. The commit point is the commit interval you configure in the session properties. During a session, the server commits data to the target based on the number of rows from an active source in a single pipeline. The rows are referred to as source rows. A pipeline consists of a source qualifier and all the transformations and targets that receive data from source qualifier. Although the Filter, Router and Update Strategy transformations are active transformations, the server does not use them as active sources in a source based commit session. When a server runs a session, it identifies the active source for each pipeline in the mapping. The server generates a commit row from the active source at every commit interval. When each target in the pipeline receives the commit rows the server performs the commit.
Reject Loading During a session, the server creates a reject file for each target instance in the mapping. If the writer of the target rejects data, the server writers the rejected row into the reject file. You can correct those rejected data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data into a flat file target) Each time, you run a session, the server appends a rejected data to the reject file. Locating the BadFiles $PMBadFileDir Filename.bad When you run a partitioned session, the server creates a separate reject file for each partition. Reading Rejected data Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00
To help us in finding the reason for rejecting, there are two main things. (a) Row indicator Row indicator tells the writer, what to do with the row of wrong data. Row indicator 0 1 2 3 Meaning Insert Update Delete Reject Rejected By Writer or target Writer or target Writer or target Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject. (b) Column indicator Column indicator is followed by the first column of data, and another column indicator. They appears after every column of data and define the type of data preceding it Column Indicator D Meaning Valid Data Writer Treats as Good Data. The target accepts it unless a database error occurs, such as finding duplicate key. O N T Overflow Null Truncated Bad Data. Bad Data. Bad Data
NOTE NULL columns appear in the reject file with commas marking their column.
Correcting Reject File Use the reject file and the session log to determine the cause for rejected data. Keep in mind that correcting the reject file does not necessarily correct the source of the reject. Correct the mapping and target database to eliminate some of the rejected data when you run the session again. Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may contain misleading column indicator.
For example, a series of N indicator might lead you to believe the target database does not accept NULL values, so you decide to change those NULL values to Zero. However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an update strategy expression, not because of a target database restriction. If you try to load the corrected file to target, the writer will again reject those rows, and they will contain inaccurate 0 values, in place of NULL values.
Why writer can reject ? Data overflowed column constraints An update strategy expression
Why target database can Reject ? Data contains a NULL column Database errors, such as key violations
Steps for loading reject file: After correcting the rejected data, rename the rejected file to reject_file.in The rejloader used the data movement mode configured for the server. It also used the code page of server/OS. Hence do not change the above, in middle of the reject loading Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name] Other points The server does not perform the following option, when using reject loader (a) (b) (c) (d) (e) Source base commit Constraint based loading Truncated target table FTP targets External Loading
Multiple reject loaders You can run the session several times and correct rejected data from the several session at once. You can correct and load all of the reject files at once, or work on one or two reject files, load then and work on the other at a later time.
External Loading You can configure a session to use Sybase IQ, Teradata and Oracle external loaders to load session target files into the respective databases. The External Loader option can increase session performance since these databases can load information directly from files faster than they can the SQL commands to insert the same data into the database. Method:
When a session used External loader, the session creates a control file and target flat file. The control file contains information about the target flat file, such as data format and loading instruction for the External Loader. The control file has an extension of *.ctl and you can view the file in $PmtargetFilesDir. For using an External Loader: The following must be done: configure an external loader connection in the server manager Configure the session to write to a target flat file local to the server. Choose an external loader connection for each target file in session property sheet.
Issues with External Loader: Disable constraints Performance issues o o Increase commit intervals Turn off database logging
Code page requirements The server can use multiple External Loader within one session (Ex: you are having a session with the two target files. One with Oracle External Loader and another with Sybase External Loader)
Other Information: The External Loader performance depends upon the platform of the server The server loads data at different stages of the session The serve writes External Loader initialization and completing messaging in the session log. However, details about EL performance, it is generated at EL log, which is getting stored as same target directory. If the session contains errors, the server continues the EL process. If the session fails, the server loads partial target data using EL. The EL creates a reject file for data rejected by the database. The reject file has an extension of *.ldr reject. The EL saves the reject file in the target file directory You can load corrected data from the file, using database reject loader, and not through Informatica reject load utility (For EL reject file only)
Configuring EL in session In the server manager, open the session property sheet Select File target, and then click flat file options
Caches server creates index and data caches in memory for aggregator ,rank ,joiner and Lookup transformation in a mapping. Server stores key values in index caches and output values in data caches : if the server requires more memory ,it stores overflow values in cache files . When the session completes, the server releases caches memory, and in most circumstances, it deletes the caches files . Caches Storage overflow : releases caches memory, and in most circumstances, it deletes the caches files .
Caches Storage overflow : Transformation Aggregator index cache stores group values As configured in the Group-by ports. Rank stores group values as Configured in the Group-by Joiner stores index values for The master source table As configured in Joiner condition. Lookup stores Lookup condition Information. Determining cache requirements To calculate the cache size, you need to consider column and row requirements as well as processing overhead. server requires processing overhead to cache data and index information. Column overhead includes a null indicator, and row overhead can include row to key information. Steps: first, add the total column size in the cache to the row overhead. Multiply the result by the no of groups (or) rows in the cache this gives the minimum cache requirements . For maximum requirements, multiply min requirements by 2. stores lookup data thats Not stored in the index cache. stores ranking information based on Group-by ports . stores master source rows . data cache stores calculations based on Group-by ports
Location: -by default , the server stores the index and data files in the directory $PMCacheDir. -the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size exceeds 2GB,you may find multiple index and data files in the directory .The server appends a number to the end of filename(PMAGG*.id*1,id*2,etc). Aggregator Caches when server runs a session with an aggregator transformation, it stores data in memory until it the aggregation. completes
when you partition a source, the server creates one memory cache and one disk cache and one and disk cache for each partition .It routes data from one partition to another based on group key values of the transformation.
server uses memory to process an aggregator transformation with sort ports. It .you dont need to configure the cache memory, that use sorted ports.
doesnt use cache memory
Index cache: #Groups (( column size) + 7) Aggregate data cache: #Groups (( column size) + 7) Rank Cache when the server runs a session with a Rank transformation, it compares an input row with rows with rows in data cache. If the input row out-ranks a stored row,the Informatica server replaces the stored row with the input row. If the rank transformation is configured to rank across multiple groups, the server ranks incrementally for each group it finds .
Index Cache : #Groups (( column size) + 7) Rank Data Cache: #Group [(#Ranks * ( column size + 10)) + 20] Joiner Cache: When server runs a session with joiner transformation, it reads all rows from the master source and builds memory caches based on the master rows. After building these caches, the server reads rows from the detail source and performs the joins Server creates the Index cache as it reads the master source into the data cache. The server uses the Index cache to test the join condition. When it finds a match, it retrieves rows values from the data cache. To improve joiner performance, the server aligns all data for joiner cache or an eight byte boundary.
Index Cache : #Master rows [( column size) + 16) Joiner Data Cache: #Master row [( column size) + 8] Lookup cache: When server runs a lookup transformation, the server builds a cache in memory, when it process the first row of data in the transformation. Server builds the cache and queries it for the each row that enters the transformation. If you partition the source pipeline, the server allocates the configured amount of memory for each partition. If two lookup transformations share the cache, the server does not allocate additional memory for the second lookup transformation. The server creates index and data cache files in the lookup cache drectory and used the server code page to create the files.
Index Cache : #Rows in lookup table [( column size) + 16) Lookup Data Cache: #Rows in lookup table [( column size) + 8]
Mapplets When the server runs a session using a mapplets, it expands the mapplets. The server then runs the session as it would any other sessions, passing data through each transformations in the mapplet. If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and every mapping using the mapplet. You can create a non-reusable instance of a reusable transformation. Mapplet Objects: (a) (b) (c) (d) Input transformation Source qualifier Transformations, as you need Output transformation
Mapplet Wont Support: Joiner Normalizer Pre/Post session stored procedure Target definitions XML source definitions
Types of Mapplets: (a) (b) Active Mapplets Passive Mapplets Contains one or more active transformations Contains only passive transformation
Copied mapplets are not an instance of original mapplets. If you make changes to the original, the copy does not inherit your changes You can use a single mapplet, even more than once on a mapping. Ports Default value for I/P portDefault value for O/P port Default value for variables NULL ERROR Does not support default values Session Parameters This parameter represent values you might want to change between sessions, such as DB Connection or source file. We can use session parameter in a session property sheet, then define the parameters in a session parameter file. The user defined session parameter are: (a) (b) (c) (d) DB Connection Source File directory Target file directory Reject file directory
Description: Use session parameter to make sessions more flexible. For example, you have the same type of transactional data written to two different databases, and you use the database connections TransDB1 and TransDB2 to connect to the databases. You want to use the same mapping for both tables.
Instead of creating two sessions for the same mapping, you can create a database connection parameter, like $DBConnectionSource, and use it as the source database connection for the session. When you create a parameter file for the session, you set $DBConnectionSource to TransDB1 and run the session. After it completes set the value to TransDB2 and run the session again. NOTE: You can use several parameter together to make session management easier. Session parameters do not have default value, when the server can not find a value for a session parameter, it fails to initialize the session. Session Parameter File A parameter file is created by text editor. In that, we can specify the folder and session name, then list the parameters and variables used in the session and assign each value. Save the parameter file in any directory, load to the server We can define following values in a parameter o o o Mapping parameter Mapping variables Session parameters
You can include parameter and variable information for more than one session in a single parameter file by creating separate sections, for each session with in the parameter file. You can override the parameter file for sessions contained in a batch by using a batch parameter file. A batch parameter file has the same format as a session parameter file Locale
Informatica server can transform character data in two modes (a) ASCII a. Default one b. Passes 7 byte, US-ASCII character data (b) UNICODE a. Passes 8 bytes, multi byte character data b. It uses 2 bytes for each character to move data and performs additional checks at session level, to ensure data integrity. Code pages contains the encoding to specify characters in a set of one or more languages. We can select a code page, based on the type of character data in the mappings. Compatibility between code pages is essential for accurate data movement. The various code page components are Locale (a) System Locale (b) User locale System Default setting for date, time, display Operating system Locale settings Operating system code page Informatica server data movement mode Informatica server code page Informatica repository code page
Input locale Mapping Parameter and Variables
These represent values in mappings/mapplets. If we declare mapping parameters and variables in a mapping, you can reuse a mapping by altering the parameter and variable values of the mappings in the session. This can reduce the overhead of creating multiple mappings when only certain attributes of mapping needs to be changed. When you want to use the same value for a mapping parameter each time you run the session. Unlike a mapping parameter, a mapping variable represent a value that can change through the session. The server saves the value of a mapping variable to the repository at the end of each successful run and used that value the next time you run the session. Mapping objects: Source, Target, Transformation, Cubes, Dimension Debugger We can run the Debugger in two situations (a) (b) Before Session: After saving mapping, we can run some initial tests. After Session: real Debugging process
MEadata Reporter: Web based application that allows to run reports against repository metadata Reports including executed sessions, lookup table dependencies, mappings and source/target schemas. Repository Types of Repository (a) Global Repository a. This is the hub of the domain use the GR to store common objects that multiple developers can use through shortcuts. These may include operational or application source definitions, reusable transformations, mapplets and mappings (b) Local Repository a. A Local Repository is with in a domain that is not the global repository. Use4 the Local Repository for development.
Standard Repository a. A repository that functions individually, unrelated and unconnected to other repository
NOTE: Once you create a global repository, you can not change it to a local repository However, you can promote the local to global repository Batches Provide a way to group sessions for either serial or parallel execution by server Batches o o Sequential (Runs session one after another) Concurrent (Runs sessions at same time)
Nesting Batches Each batch can contain any number of session/batches. We can nest batches several levels deep, defining batches within batches Nested batches are useful when you want to control a complex series of sessions that must run sequentially or concurrently Scheduling When you place sessions in a batch, the batch schedule override that session schedule by default. However, we can configure a batched session to run on its own schedule by selecting the Use Absolute Time Session Option. Server Behavior Server configured to run a batch overrides the server configuration to run sessions within the batch. If you have multiple servers, all sessions within a batch run on the Informatica server that runs the batch. The server marks a batch as failed if one of its sessions is configured to run if Previous completes and that previous session fails. Sequential Batch If you have sessions with dependent source/target relationship, you can place them in a sequential batch, so that Informatica server can run them is consecutive order. They are two ways of running sessions, under this category (a) Run the session, only if the previous completes successfully (b) Always run the session (this is default) Concurrent Batch In this mode, the server starts all of the sessions within the batch, at same time Concurrent batches take advantage of the resource of the Informatica server, reducing the time it takes to run the session separately or in a sequential batch. Concurrent batch in a Sequential batch If you have concurrent batches with source-target dependencies that benefit from running those batches in a particular order, just like sessions, place them into a sequential batch.
Stopping and aborting a session If the session you want to stop is a part of batch, you must stop the batch If the batch is part of nested batch, stop the outermost batch When you issue the stop command, the server stops reading data. It continues processing and writing data and committing data to targets If the server cannot finish processing and committing data, you can issue the ABORT command. It is similar to stop command, except it has a 60 second timeout. If the server cannot finish processing and committing data within 60 seconds, it kills the DTM process and terminates the session.
Recovery: After a session being stopped/aborted, the session results can be recovered. When the recovery is performed, the session continues from the point at which it stopped. If you do not recover the session, the server runs the entire session the next time. Hence, after stopping/aborting, you may need to manually delete targets before the session runs again.
NOTE: ABORT command and ABORT function, both are different. When can a Session Fail Server cannot allocate enough system resources Session exceeds the maximum no of sessions the server can run concurrently Server cannot obtain an execute lock for the session (the session is already locked) Server unable to execute post-session shell commands or post-load stored procedures Server encounters database errors Server encounter Transformation row errors (Ex: NULL value in non-null fields) Network related errors
When Pre/Post Shell Commands are useful To delete a reject file To archive target files before session begins Session Performance Minimum log (Terse) Partitioning source data. Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed) Adding indexes. Changing commit Level. Using Filter trans to remove unwanted data movement. Increasing buffer memory, when large volume of data. Multiple lookups can reduce the performance. Verify the largest lookup table and tune the expressions. In session level, the causes are small cache size, low buffer memory and small commit interval. At system level, o o WIN NT/2000-U the task manager. UNIX: VMSTART, IOSTART.
Hierarchy of optimization Target. Source. Mapping Session. System.
Optimizing Target Databases: Drop indexes /constraints Increase checkpoint intervals. Use bulk loading /external loading. Turn off recovery. Increase database network packet size.
Source Mapping Session: System: improve network speed. Use multiple preservers on separate systems. Reduce paging. concurrent batches. Partition sessions. Reduce error tracing. Remove staging area. Tune session parameters. Optimize data type conversions. Eliminate transformation errors. Optimize transformations/ expressions. Optimize the query (using group by, group by). Use conditional filters. Connect to RDBMS using IPC protocol.
level
Session Process
Info server uses both process memory and system shared memory to perform ETL process. It runs as a daemon on UNIX and as a service on WIN NT. The following processes are used to run a session: (a) LOAD manager process: (b) starts a session
creates DTM process, which creates the session. creates threads to initialize the session read, write and transform data. handle pre/post session opertions.
DTM process: -
Load manager processes: manages session/batch scheduling. Locks session. Reads parameter file. Expands server/session variables, parameters . Verifies permissions/privileges. Creates session log file.
DTM process: The primary purpose of the DTM is to create and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divides it into buffers. This is known as buffer memory. The default memory allocation is 12,000,000 bytes .it creates the main thread, which is called master thread .this manages all other threads. Various threads Master threadMapping threadfunctions handles stop and abort requests from load manager. one thread for each session. Fetches session and mapping information. Compiles mapping. Cleans up after execution. Reader threadone thread for each partition. Relational sources uses relational threads and Flat files use file threads. Writer threadTransformation threadNote: When you run a session, the to move/transform data. threads for a partitioned source execute concurrently. The threads use buffers one thread for each partition writes to target. One or more transformation for each partition.
What is the use of Forward/Reject rows in Mapping?
Q. What are the advantages of having an index? Or What is an index? The purpose of an index is to provide pointers to the rows in a table that contain a given key value. In a regular index, this is achieved by storing a list of rowids for each key corresponding to the rows with that key value. Oracle stores each key value repeatedly with each stored rowid. Q. What are the different types of indexes supported by Oracle? The different types of indexes are: a. B-tree indexes b. B-tree cluster indexes c. Hash cluster indexes d. Reverse key indexes e. Bitmap indexes Q. Can we have function based indexes? Yes, we can create indexes on functions and expressions that involve one or more columns in the table being indexed. A function-based index precomputes the value of the function or expression and stores it in the index. You can create a function-based index as either a B-tree or a bitmap index. Q. What are the restrictions on function based indexes? The function used for building the index can be an arithmetic expression or an expression that contains a PL/SQL function, package function, C callout, or SQL function. The expression cannot contain any aggregate functions, and it must be DETERMINISTIC. For building an index on a column containing an object type, the function can be a method of that object, such as a map method. However, you cannot build a function-based index on a LOB column, REF, or nested table column, nor can you build a function-based index if the object type contains a LOB, REF, or nested table.
Q. What are the advantages of having a B-tree index? The major advantages of having a B-tree index are: 1. B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches. 2. Inserts, updates, and deletes are efficient, maintaining key order for fast retrieval. 3. B-tree performance is good for both small and large tables, and does not degrade as the size of a table grows. Q. What is a bitmap index? (KPIT Infotech, Pune) The purpose of an index is to provide pointers to the rows in a table that contain a given key value. In a regular index, this is achieved by storing a list of rowids for each key corresponding to the rows with that key value. Oracle stores each key value repeatedly with each stored rowid. In a bitmap index, a bitmap for each key value is used instead of a list of rowids. Each bit in the bitmap corresponds to a possible rowid. If the bit is set, then it means that the row with the corresponding rowid contains the key value. A mapping function converts the bit position to an actual rowid, so the bitmap index provides the same functionality as a regular index even though it uses a different representation internally. If the number of different key values is small, then bitmap indexes are very space efficient. Bitmap indexing efficiently merges indexes that correspond to several conditions in a WHERE clause. Rows that satisfy some, but not all, conditions are filtered out before the table itself is accessed. This improves response time, often dramatically. Q. What are the advantages of having bitmap index for data warehousing applications? (KPIT Infotech, Pune) Bitmap indexing benefits data warehousing applications which have large amounts of data and ad hoc queries but a low level of concurrent transactions. For such applications, bitmap indexing provides: 1. Reduced response time for large classes of ad hoc queries 2. A substantial reduction of space usage compared to other indexing techniques 3. Dramatic performance gains even on very low end hardware 4. Very efficient parallel DML and loads Q. What is the advantage of bitmap index over B-tree index? Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space since the index can be several times larger than the data in the table. Bitmap indexes are typically only a fraction of the size of the indexed data in the table. Q. What is the limitation/drawback of a bitmap index? Bitmap indexes are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data. These indexes are primarily intended for decision support in data warehousing applications where users typically query the data rather than update it. Bitmap indexes are not suitable for high-cardinality data. Q. How do you choose between B-tree index and bitmap index? The advantages of using bitmap indexes are greatest for low cardinality columns: that is, columns in which the number of distinct values is small compared to the number of rows in the table. If the values in a column are repeated more than a hundred times, then the column is a candidate for a bitmap index. Even columns with a lower number of repetitions and thus higher cardinality, can be candidates if they tend to be involved in complex conditions in the WHERE clauses of queries. For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can out-perform a B-tree index, particularly when this column is often queried in conjunction with other columns. B-tree indexes are most effective for high-cardinality data: that is, data with many possible values, such as CUSTOMER_NAME or PHONE_NUMBER. A regular Btree index can be several times larger than the indexed data. Used appropriately, bitmap indexes can be significantly smaller than a corresponding B-tree index.
Q. What are clusters? Clusters are an optional method of storing table data. A cluster is a group of tables that share the same data blocks because they share common columns and are often used together. For example, the EMP and DEPT table share the DEPTNO column. When you cluster the EMP and DEPT tables, Oracle physically stores all rows for each department from both the EMP and DEPT tables in the same data blocks. Q. What is partitioning? (KPIT Infotech, Pune) Partitioning addresses the key problem of supporting very large tables and indexes by allowing you to decompose them into smaller and more manageable pieces called partitions. Once partitions are defined, SQL statements can access and manipulate the partitions rather than entire tables or indexes. Partitions are especially useful in data warehouse applications, which commonly store and analyze large amounts of historical data. Q. What are the different partitioning methods? Two primary methods of partitioning are available: 1. range partitioning, which partitions the data in a table or index according to a range of values, and 2. hash partitioning, which partitions the data according to a hash function. Another method, composite partitioning, partitions the data by range and further subdivides the data into sub partitions using a hash function. Q. What is the necessity to have table partitions? The need to partition large tables is driven by: Data Warehouse and Business Intelligence demands for ad hoc analysis on great quantities of historical data Cheaper disk storage Application performance failure due to use of traditional techniques Q. What are the advantages of storing each partition in a separate tablespace? The major advantages are: 1. You can contain the impact of data corruption. 2. You can back up and recover each partition or subpartition independently. 3. You can map partitions or subpartitions to disk drives to balance the I/O load. Q. What are the advantages of partitioning? Partitioning is useful for: 1. Very Large Databases (VLDBs) 2. Reducing Downtime for Scheduled Maintenance 3. Reducing Downtime Due to Data Failures 4. DSS Performance 5. I/O Performance 6. Disk Striping: Performance versus Availability 7. Partition Transparency Q. What is Range Partitioning? (KPIT Infotech, Pune) Range partitioning maps rows to partitions based on ranges of column values. Range partitioning is defined by the partitioning specification for a table or index: PARTITION BY RANGE ( column_list ) and by the partitioning specifications for each individual partition: VALUES LESS THAN ( value_list ) Q. What is Hash Partitioning? Hash partitioning uses a hash function on the partitioning columns to stripe data into partitions. Hash partitioning allows data that does not lend itself to range partitioning to be easily partitioned for performance reasons such as parallel DML, partition pruning, and partition-wise joins. Q. What are the advantages of Hash partitioning over Range Partitioning? Hash partitioning is a better choice than range partitioning when: a) You do not know beforehand how much data will map into a given range b) Sizes of range partitions would differ quite substantially c) Partition pruning and partition-wise joins on a partitioning key are important
Q. What are the rules for partitioning a table? A table can be partitioned if: It is not part of a cluster It does not contain LONG or LONG RAW datatypes Q. What is a global partitioned index? In a global partitioned index, the keys in a particular index partition may refer to rows stored in more than one underlying table partition or subpartition. A global index can only be range-partitioned, but it can be defined on any type of partitioned table. Q. What are the different types of locks? There are five kinds of locks on repository objects: Read lock. Created when you open a repository object in a folder for which you do not have write permission. Also created when you open an object with an existing write lock. Write lock. Created when you create or edit a repository object in a folder for which you have write permission. Execute lock. Created when you start a session or batch, or when the Informatica Server starts a scheduled session or batch. Fetch lock. Created when the repository reads information about repository objects from the database. Save lock. Created when you save information to the repository.
Q. What is Event-Based Scheduling? When you use event-based scheduling, the Informatica Server starts a session when it locates the specified indicator file. To use event-based scheduling, you need a shell command, script, or batch file to create an indicator file when all sources are available. The file must be created or sent to a directory local to the Informatica Server. The file can be of any format recognized by the Informatica Server operating system. The Informatica Server deletes the indicator file once the session starts. Q: Why doesn't constraint based load order work with a maplet? (08 May 2000) If your maplet has a sequence generator (reusable) that's mapped with data straight to an "OUTPUT" designation, and then the map splits the output to two tables: parent/child - and your session is marked with "Constraint Based Load Ordering" you may have experienced a load problem - where the constraints do not appear to be met?? Well - the problem is in the perception of what an "OUTPUT" designation is. The OUTPUT component is NOT an "object" that collects a "row" as a row, before pushing it downstream. An OUTPUT component is merely a pass-through structural object - as indicated, there are no data types on the INPUT or OUTPUT components of a maplet - thus indicating merely structure. To make the constraint based load order work properly, move all the ports through a single expression, then through the OUTPUT component - this will force a single row to be "put together" and passed along to the receiving maplet. Otherwise - the sequence generator generates 1 new sequence ID for each split target on the other side of the OUTPUT component. Q: How do I handle duplicate rows coming in from a flat file? If you don't care about "reporting" duplicates, use an aggregator. Set the Group By Ports to group by the primary key in the parent target table. Keep in mind that using an aggregator causes the following: The last duplicate row in the file is pushed through as the one and only row, loss of ability to detect which rows are duplicates, caching of the data before processing in the map continues. If you wish to report duplicates, then follow the suggestions in the presentation slides (available on this web site) to institute a staging table. See the pro's and cons' of staging tables, and what they can do for you.
Q: What happens in a database when a cached LOOKUP object is created (during a session)? The session generates a select statement with an Order By clause. Any time this is issued, the databases like Oracle and Sybase will select (read) all the data from the table, in to the temporary database/space. Then the data will be sorted, and read in chunks back to Informatica server. This means, that hot-spot contention for a cached lookup will NOT be the table it just read from. It will be the TEMP area in the database, particularly if the TEMP area is being utilized for other things. Also - once the cache is created, it is not re-read until the next running session re-creates it. Q: Can you explain how "constraint based load ordering" works? (27 Jan 2000) Constraint based load ordering in PowerMart / PowerCenter works like this: it controls the order in which the target tables are committed to a relational database. It is of no use when sending information to a flat file. To construct the proper constraint order: links between the TARGET tables in Informatica need to be constructed. Simply turning on "constraint based load ordering" has no effect on the operation itself. Informatica does NOT read constraints from the database when this switch is turned on. Again, to take advantage of this switch, you must construct primary / foreign key relationships in the TARGET TABLES in the designer of Informatica. Creating primary / foreign key relationships is difficult - you are only allowed to link a single port (field) to a single table as a primary / foreign key. What is the method of loading 5 flat files of having same structure to a single target and which transformations will you use? This can be handled by using the file list in informatica. If we have 5 files in different locations on the server and we need to load in to single target table. In session properties we need to change the file type as Indirect. (Direct if the source file contains the source data. Choose Indirect if the source file contains a list of files. When you select Indirect the PowerCenter Server finds the file list then reads each listed file when it executes the session.) am taking a notepad and giving following paths and filenames in this notepad and saving this notepad as emp_source.txt in the directory /ftp_data/webrep/ /ftp_data/webrep/SrcFiles/abc.txt /ftp_data/webrep/bcd.txt /ftp_data/webrep/srcfilesforsessions/xyz.txt /ftp_data/webrep/SrcFiles/uvw.txt /ftp_data/webrep/pqr.txt In session properties i give /ftp_data/webrep/ in the directory path and file name as emp_source.txt and file type as Indirect. Other methods to Improve Performance Optimizing the Target Database If your session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local to the Informatica Server. If your session writes to a relational target, consider performing the following tasks to increase performance: Drop indexes and key constraints. Increase checkpoint intervals. Use bulk loading. Use external loading. Turn off recovery. Increase database network packet size. Optimize Oracle target databases.
Dropping Indexes and Key Constraints When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before running your session. You can rebuild those indexes and key constraints after the session completes. If you decide to drop and rebuild indexes and key constraints on a regular basis, you can create pre- and postload stored procedures to perform these operations each time you run the session. Note: To optimize performance, use constraint-based loading only if necessary. Increasing Checkpoint Intervals The Informatica Server performance slows each time it waits for the database to perform a checkpoint. To increase performance, consider increasing the database checkpoint interval. When you increase the database checkpoint interval, you increase the likelihood that the database performs checkpoints as necessary, when the size of the database log file reaches its limit. Bulk Loading on Sybase and Microsoft SQL Server You can use bulk loading to improve the performance of a session that inserts a large amount of data to a Sybase or Microsoft SQL Server database. Configure bulk loading on the Targets dialog box in the session properties. When bulk loading, the Informatica Server bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, the Informatica Server cannot perform recovery of the session. Therefore, you must weigh the importance of improved session performance against the ability to recover an incomplete session. If you have indexes or key constraints on your target tables and you want to enable bulk loading, you must drop the indexes and constraints before running the session. After the session completes, you can rebuild them. If you decide to use bulk loading with the session on a regular basis, you can create pre- and post-load stored procedures to drop and rebuild indexes and key constraints. For other databases, even if you configure the bulk loading option, Informatica Server ignores the commit interval mentioned and commits as needed. External Loading on Teradata, Oracle, and Sybase IQ You can use the External Loader session option to integrate external loading with a session. If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files. If your target database runs on Oracle, you can use the Oracle SQL*Loader utility to bulk load target files. When you load data to an Oracle database using a partitioned session, you can increase performance if you create the Oracle target table with the same number of partitions you use for the session. If your target database runs on Sybase IQ, you can use the Sybase IQ external loader utility to bulk load target files. If your Sybase IQ database is local to the Informatica Server on your UNIX system, you can increase performance by loading data to target tables directly from named pipes. Use pmconfig to enable the SybaseIQLocaltoPMServer option. When you enable this option, the Informatica Server loads data directly from named pipes rather than writing to a flat file for the Sybase IQ external loader. Increasing Database Network Packet Size You can increase the network packet size in the Informatica Server Manager to reduce target bottleneck. For Sybase and Microsoft SQL Server, increase the network packet size to 8K - 16K. For Oracle, increase the network packet size in tnsnames.ora and listener.ora. If you increase the network packet size in the Informatica Server configuration, you also need to configure the database server network memory to accept larger packet sizes. Optimizing Oracle Target Databases If your target database is Oracle, you can optimize the target database by checking the storage clause, space allocation, and rollback segments. When you write to an Oracle database, check the storage clause for database objects. Make sure that tables are using large initial and next values. The database should also store table and index data in separate tablespaces, preferably on different disks.
When you write to Oracle target databases, the database uses rollback segments during loads. Make sure that the database stores rollback segments in appropriate tablespaces, preferably on different disks. The rollback segments should also have appropriate storage clauses. You can optimize the Oracle target database by tuning the Oracle redo log. The Oracle database uses the redo log to log loading operations. Make sure that redo log size and buffer size are optimal. You can view redo log properties in the init.ora file. If your Oracle instance is local to the Informatica Server, you can optimize performance by using IPC protocol to connect to the Oracle database. You can set up Oracle database connection in listener.ora and tnsnames.ora. Improving Performance at mapping level Optimizing Datatype Conversions Forcing the Informatica Server to make unnecessary datatype conversions slows performance. For example, if your mapping moves data from an Integer column to a Decimal column, then back to an Integer column, the unnecessary datatype conversion slows performance. Where possible, eliminate unnecessary datatype conversions from mappings. Some datatype conversions can improve system performance. Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations. For example, many databases store U.S. zip code information as a Char or Varchar datatype. If you convert your zip code data to an Integer datatype, the lookup database stores the zip code 94303-1234 as 943031234. This helps increase the speed of the lookup comparisons based on zip code. Optimizing Lookup Transformations If a mapping contains a Lookup transformation, you can optimize the lookup. Some of the things you can do to increase performance include caching the lookup table, optimizing the lookup condition, or indexing the lookup table. Caching Lookups If a mapping contains Lookup transformations, you might want to enable lookup caching. In general, you want to cache lookup tables that need less than 300MB. When you enable caching, the Informatica Server caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the Informatica Server queries the lookup table on a row-by-row basis. You can increase performance using a shared or persistent cache: Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings. Persistent cache. If you want to save and reuse the cache files, you can configure the transformation to use a persistent cache. Use this feature when you know the lookup table does not change between session runs. Using a persistent cache can improve performance because the Informatica Server builds the memory cache from the cache files instead of from the database. Reducing the Number of Cached Rows Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to reduce the number of rows included in the cache. Optimizing the Lookup Condition If you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup performance. Indexing the Lookup Table The Informatica Server needs to query, sort, and compare values in the lookup condition columns. The index needs to include every column used in a lookup condition. You can improve performance for both cached and uncached lookups: Cached lookups. You can improve performance by indexing the columns in the lookup ORDER BY. The session log contains the ORDER BY statement.
Uncached lookups. Because the Informatica Server issues a SELECT statement for each row passing into the Lookup transformation, you can improve performance by indexing the columns in the lookup condition. Improving Performance at Repository level Tuning Repository Performance The PowerMart and PowerCenter repository has more than 80 tables and almost all tables use one or more indexes to speed up queries. Most databases keep and use column distribution statistics to determine which index to use to execute SQL queries optimally. Database servers do not update these statistics continuously. In frequently-used repositories, these statistics can become outdated very quickly and SQL query optimizers may choose a less than optimal query plan. In large repositories, the impact of choosing a sub-optimal query plan can affect performance drastically. Over time, the repository becomes slower and slower. To optimize SQL queries, you might update these statistics regularly. The frequency of updating statistics depends on how heavily the repository is used. Updating statistics is done table by table. The database administrator can create scripts to automate the task. You can use the following information to generate scripts to update distribution statistics. Note: All PowerMart/PowerCenter repository tables and index names begin with OPB_. Oracle Database You can generate scripts to update distribution statistics for an Oracle repository. To generate scripts for an Oracle repository: 1. Run the following queries: select 'analyze table ', table_name, ' compute statistics;' from user_tables where table_name like 'OPB_%' select 'analyze index ', INDEX_NAME, ' compute statistics;' from user_indexes where INDEX_NAME like 'OPB_%' This produces an output like the following: 'ANALYZETABLE' TABLE_NAME 'COMPUTESTATISTICS;'
-------------- ---------------- -----------------------------------------------------------------------------analyze table analyze table analyze table 2. Save the output to a file. 3. Edit the file and remove all the headers. Headers are like the following: OPB_ANALYZE_DEP OPB_ATTR OPB_BATCH_OBJECT compute statistics; compute statistics; compute statistics;
'ANALYZETABLE' TABLE_NAME
'COMPUTESTATISTICS;'
-------------- ---------------- -------------------4. Run this as an SQL script. This updates repository table statistics. Microsoft SQL Server You can generate scripts to update distribution statistics for a Microsoft SQL Server repository. To generate scripts for a Microsoft SQL Server repository: 1. Run the following query: select 'update statistics ', name from sysobjects where name like 'OPB_%' This produces an output like the following: name ------------------ -----------------update statistics OPB_ANALYZE_DEP update statistics OPB_ATTR update statistics OPB_BATCH_OBJECT 2. Save the output to a file. 3. Edit the file and remove the header information. Headers are like the following: name ------------------ -----------------4. Add a go at the end of the file. 5. Run this as a sql script. This updates repository table statistics.
Improving Performance at Session level Optimizing the Session Once you optimize your source database, target database, and mapping, you can focus on optimizing the session. You can perform the following tasks to improve overall performance: Run concurrent batches. Partition sessions. Reduce errors tracing. Remove staging areas. Tune session parameters. Table 19-1 lists the settings and values you can use to improve session performance: Table 19-1. Session Tuning Parameters Setting DTM Size Buffer Default Value Pool 12,000,000 MB] 64,000 bytes [64 KB] 1,000,000 bytes 2,000,000 bytes 10,000 rows Disabled Normal bytes [12 Suggested Value 6,000,000 bytes 4,000 bytes 1,000,000 bytes 2,000,000 bytes N/A N/A Terse Minimum Suggested Value 128,000,000 bytes 128,000 bytes 12,000,000 bytes 24,000,000 bytes N/A N/A N/A Maximum
Buffer block size Index cache size Data cache size Commit interval Decimal arithmetic Tracing Level
How to correct and load the rejected files when the session completes
During a session, the Informatica Server creates a reject file for each target instance in the mapping. If the writer or the target rejects data, the Informatica Server writes the rejected row into the reject file. By default, the Informatica Server creates reject files in the $PMBadFileDir server variable directory. The reject file and session log contain information that helps you determine the cause of the reject. You can correct reject files and load them to relational targets using the Informatica reject loader utility. The reject loader also creates another reject file for the data that the writer or target reject during the reject loading. Complete the following tasks to load reject data into the target: Locate the reject file. Correct bad data. Run the reject loader utility.
NOTE: You cannot load rejected data into a flat file target
After you locate a reject file, you can read it using a text editor that supports the reject file code page. Reject files contain rows of data rejected by the writer or the target database. Though the Informatica Server writes the entire row in the reject file, the problem generally centers on one column within the row. To help you determine which column caused the row to be rejected, the Informatica Server adds row and column indicators to give you more information about each column: Row indicator. The first column in each row of the reject file is the row indicator. The numeric indicator tells whether the row was marked for insert, update, delete, or reject. Column indicator. Column indicators appear after every column of data. The alphabetical character indicators tell whether the data was valid, overflow, null, or truncated.
The following sample reject file shows the row and column indicators: 3,D,1,D,,D,0,D,1094945255,D,0.00,D,-0.00,D 0,D,1,D,April,D,1997,D,1,D,-1364.22,D,-1364.22,D 0,D,1,D,April,D,2000,D,1,D,2560974.96,D,2560974.96,D 3,D,1,D,April,D,2000,D,0,D,0.00,D,0.00,D 0,D,1,D,August,D,1997,D,2,D,2283.76,D,4567.53,D 0,D,3,D,December,D,1999,D,1,D,273825.03,D,273825.03,D 0,D,1,D,September,D,1997,D,1,D,0.00,D,0.00,D Row Indicators The first column in the reject file is the row indicator. The number listed as the row indicator tells the writer what to do with the row of data. Table 15-1 describes the row indicators in a reject file: Table 15-1. Row Indicators in Reject File Row Indicator Meaning Rejected By 0 1 2 3 Insert Update Delete Reject Writer or target Writer or target Writer or target Writer
If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject. If a row indicator is 0, 1, or 2, either the writer or the target database rejected the row. To narrow down the reason why rows marked 0, 1, or 2 were rejected, review the column indicators and consult the session log. Column Indicators After the row indicator is a column indicator, followed by the first column of data, and another column indicator. Column indicators appear after every column of data and define the type of the data preceding it.
Table 15-2 describes the column indicators in a reject file: Table 15-2. Column Indicators in Reject File Column Indicator D Type of data Writer Treats As Good data. Writer passes it to the target database. The target accepts it unless a database error occurs, such as finding a duplicate key.
Valid data.
O N
Overflow. Numeric data exceeded the Bad data, if you configured the mapping target to reject specified precision or scale for the column. overflow or truncated data. Null. The column contains a null value. Good data. Writer passes it to the target, which rejects it if the target database does not accept null values.
Truncated. String data exceeded a specified Bad data, if you configured the mapping target to reject precision for the column, so the Informatica overflow or truncated data. Server truncated it.
After you correct the target data in each of the reject files, append .in to each reject file you want to load into the target database. For example, after you correct the reject file, t_AvgSales_1.bad, you can rename it t_AvgSales_1.bad.in. After you correct the reject file and rename it to reject_file.in, you can use the reject loader to send those files through the writer to the target database.
Use the reject loader utility from the command line to load rejected files into target tables. The syntax for reject loading differs on UNIX and Windows NT/2000 platforms. Use the following syntax for UNIX: pmrejldr pmserver.cfg [folder_name:]session_name Use the following syntax for Windows NT/2000: pmrejldr [folder_name:]session_name
Recovering Sessions
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration. Use one of the following methods to complete the session: Run the session again if the Informatica Server has not issued a commit. Truncate the target tables and run the session again if the session is not recoverable. Consider performing recovery if the Informatica Server has issued at least one commit. When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001. The commit point may be different for source- and target-based commits. By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table. Causes for Session Failure Reader errors. Errors encountered by the Informatica Server while reading the source database or source files. Reader threshold errors can include alignment errors while running a session in Unicode mode. Writer errors. Errors encountered by the Informatica Server while writing to the target database or target files. Writer threshold errors can include key constraint violations, loading nulls into a not null field, and database trigger responses. Transformation errors. Errors encountered by the Informatica Server while transforming data. Transformation threshold errors can include conversion errors, and any condition set up as an ERROR, such as null input. Fatal Error A fatal error occurs when the Informatica Server cannot access the source, target, or repository. This can include loss of connection or target database errors, such as lack of database space to load data. If the session uses a Normalizer or Sequence Generator transformation, the Informatica Server cannot update the sequence values in the repository, and a fatal error occurs. What is target load order? You specify the target loadorder based on source qualifiers in a maping.If you have the multiple source qualifiers connected to the multiple targets,You can designatethe order in which informatica server loads data into the targets. Can we use aggregator/active transformation after update strategy transformation? You can use aggregator after update strategy. The problem will be, once you perform the update strategy, say you had flagged some rows to be deleted and you had performed aggregator transformation for all rows, say you are using SUM function, then the deleted rows will be subtracted from this aggregator transformation.
How can we join the tables if the tables have no primary and forien key relation and no matchig port to join? without common column or common data type we can join two sources using dummy ports. 1.Add one dummy port in two sources. 2.In the expression trans assing '1' to each port. 2.Use Joiner transformation to join the sources using dummy port(use join conditions). In which circumstances that informatica server creates Reject files? When it encounters the DD_Reject in update strategy transformation. Violates database constraint Filed in the rows was truncated or overflowed. When do u we use dynamic cache and when do we use static cache in an connected and unconnected lookup transformation We use dynamic cache only for connected lookup. We use dynamic cache to check whether the record already exists in the target table are not. And depending on that, we insert,update or delete the records using update strategy. Static cache is the default cache in both connected and unconnected. If u select static cache on lookup table in infa, it own't update the cache and the row in the cache remain constant. We use this to check the results and also to update slowly changing records How to get two targets T1 containing distinct values and T2 containing duplicate values from one source S1. Use filter transformation for loading the target with no duplicates. and for the other transformation load it directly from source. How to delete duplicate rows in flat files source is any option in informatica Use a sorter transformation , in that u will have a "distinct" option make use of it . why did u use update stategy in your application? Update Strategy is used to drive the data to be Inert, Update and Delete depending upon some condition. You can do this on session level tooo but there you cannot define any condition.For eg: If you want to do update and insert in one mapping...you will create two flows and will make one as insert and one as update depending upon some condition.Refer : Update Strategy in Transformation Guide for more information What r the options in the target session of update strategy transsformatioin? Update as Insert: This option specified all the update records from source to be flagged as inserts in the target. In other words, instead of updating the records in the target they are inserted as new records. Update else Insert: This option enables informatica to flag the records either for update if they are old or insert, if they are new records from source. What r the different types of Type2 dimension maping? Type2 1. Version number 2. Flag 3.Date What are the basic needs to join two sources in a source qualifier? Two sources should have primary and Foreign key relation ships. Two sources should have matching data types.
What are the different options used to configure the sequential batches? Two options Run the session only if previous session completes sucessfully. Always runs the session. What are conformed dimensions? A data warehouse must provide consistent information for queries requesting similar information. One method to maintain consistency is to create dimension tables that are shared (and therefore conformed), and used by all applications and data marts (dimensional models) in the data warehouse. Candidates for shared or conformed dimensions include customers, time, products, and geographical dimensions, such as the store dimension. What are conformed facts? Fact conformation means that if two facts exist in two separate locations, then they must have the same name and definition. As examples, revenue and profit are each facts that must be conformed. By conforming a fact, then all business processes agree on one common definition for the revenue and profit measures. Then, revenue and profit, even when taken from separate fact tables, can be mathematically combined. Establishing conformity Developing a set of shared, conformed dimensions is a significant challenge. Any dimensions that are common across the business processes must represent the dimension information in the same way. That is, it must be conformed. Each business process will typically have its own schema that contains a fact table, several conforming dimension tables, and dimension tables unique to the specific business function. The same is true for facts. Degenerate dimensions Before we discuss degenerate dimensions in detail, it is important to understand the following: A fact table may consist of the following data: _ Foreign keys to dimension tables _ Facts which may be: Additive Semi-additive Non-additive Pseudo facts (such as 1 and 0 in case of attendance tracking) Textual fact (rarely the case) Derived facts year-to-date facts _ Degenerate dimensions (one or more) What is a degenerate dimension? A degenerate dimension sounds a bit strange, but it is a dimension without attributes. It is a transaction-based number which resides in the fact table. There may be more than one degenerate dimension inside a fact table. Identifying garbage dimensions A garbage dimension is a dimension that consists of low-cardinality columns such as codes, indicators, and status flags. The garbage dimension is also referred to as a junk dimension. The attributes in a garbage dimension are not related to any hierarchy. Non-additive facts Non-additive facts are facts which cannot be added meaningfully across any dimensions.
Textual facts: Adding textual facts does not result in any number. However, counting textual facts may result in a sensible number. _ Per-unit prices: Adding unit prices does not produce any meaningful Percentages and ratios: Measures of intensity: Measures of intensity such as the room temperature Averages: Semi-additive facts Semi-additive facts are facts which can be summarized across some dimensions but not others. Examples of semi-additive facts include the following: _ Account balances _ Quantity-on-hand adding the monthly balances across the different days for the month of January results in an incorrect balance figure. However, if we average the account balance to find out daily average balance during each day of the month, it would be valid. event-based fact tables Event fact tables are tables that record events. For example, event fact tables are used to record events such as Web page clicks and employee or student attendance. Events, such as a Web user clicking on a Web page of a Web site, do not always result in facts. In other words, millions of such Web page click events do not always result in sales. If we are interested in handling such event-based scenarios where there are no facts, we use event fact tables which consist of either pseudo facts or these tables have no facts (factless) at all. From a conceptual perspective, the event-based fact tables capture the many-to-many relationships between the dimension tables. Q. What type of repositories can be created using Informatica Repository Manager? A. Informatica PowerCenter includeds following type of repositories : Standalone Repository : A repository that functions individually and this is unrelated to any other repositories. Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts. Local Repository : Local repository is within a domain and its not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in its shared folders. Versioned Repository : This can either be local or global repository but it allows version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment.
12. How do you improve performance of a lookup? We can improve the lookup performance by using the following methods: Optimizing the lookup condition: If you include more than one lookup condition, place the conditions with an equal sign first to optimize lookup performance. Indexing the lookup table: Create index on the lookup table. The index needs to include every column used in a lookup condition. Reducing the Number of Cached Rows: Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to reduce the number of rows included in the cache. If the lookup source does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Power Center Server then saves and reuses cache files from session to session, eliminating the time required to read the lookup source.
When using a dynamic lookup and WHERE clause in SQL override. Make sure that you add a filter before the lookup. The filter should remove rows which do not satisfy the WHERE Clause. Reason During dynamic lookups while inserting the records in cache the WHERE clause is not evaluated, only the join condition is evaluated. So, the lookup cache and table are not in sync. Records satisfying the join condition are inserted into lookup cache. Its better to put a filter before the lookup using WHERE clause so that it contains records satisfying both join condition and where clause. 1. Difference between Filter and Router? Filter You can filter rows in a mapping with the Filter transformation. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the transformation. All ports in a Filter transformation are input/output, and only rows that meet the condition pass through the Filter transformation
As an active transformation, the Filter transformation may change the number of rows passed through it A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. In filter we can have one condition
Router A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. However, a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group As an active transformation, the router transformation may change the number of rows passed through it In router we can have multiple condition
In router we can have multiple condition
What r the types of metadata that stores in repository? Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data. Target definitions. Definitions of database objects or files that contain the target data. Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions. Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data. Reusable transformations. Transformations that you can use in multiple mappings. Mapplets. A set of transformations that you can use in multiple mappings. Sessions and workflows. Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping. What are the session parameters? Session parameters are like maping parameters,represent values you might want to change between sessions such as database connections or source files. Server manager also allows you to create userdefined session parameters.Following are user defined session parameters:Database connections Source file names: use this parameter when you want to change the name or location of session source file between session runs. Target file name : Use this parameter when you want to change the name or location of session target file between session runs. Reject file name : Use this parameter when you want to change the name or location of session reject files between session runs.
What is Session and Batches? Session - A Session Is A set of instructions that tells the Informatica Server How And When To Move Data From Sources To Targets. After creating the session, we can use either the server manager or the command line program pmcmd to start or stop the session. Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The Informatica Server. There Are Two Types Of Batches : Sequential - Run Session One after the Other. Concurrent - Run Session At The Same Time. If a session fails after loading of 10,000 records in to the target.How can u load the records from 10001 th record when u run the session next time in informatica 6.1? Running the session in recovery mode will work, but the target load type should be normal. If its bulk then recovery wont work as expected What are the different threads in DTM process? Master thread: Creates and manages all other threads Maping thread: One maping thread will be creates for each session.Fectchs session and maping information. Pre and post session threads: This will be created to perform pre and post session operations. Reader thread: One thread will be created for each partition of a source.It reads data from source. Writer thread: It will be created to load data to the target. Transformation thread: It will be created to tranform data. Explain about Recovering sessions? If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration. Use one of the following methods to complete the session: Run the session again if the Informatica Server has not issued a commit. Truncate the target tables and run the session again if the session is not recoverable. Consider performing recovery if the Informatica Server has issued at least one commit. How can u recover the session in sequential batches? If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property To recover sessions in sequential batches configured to stop on failure: 1.In the Server Manager, open the session property sheet. 2.On the Log Files tab, select Perform Recovery, and click OK. 3.Run the session. 4.After the batch completes, open the session property sheet. 5.Clear Perform Recovery, and click OK. If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session. If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone session.
Why in informatica usage of dynamic cache not possible in flat file lookup? Nothing in this thread makes any sense. Nothing gets updated in a dynamic cached other than the cache itself. What happens in the file is a matter of what your mapping does to it, not the cache. A lookup (dynamic or otherwise) is loaded from a source. The source can be anything you have defined in your environment... flat file, table, whatever. The difference between a dynamic and static cache is in a dynamic cache one of the columns in the source must be identified as the primary key (separate from the lookup key) and it must be numeric. It uses the values in that column to figure out what the new key should be should you insert a new row in the cache. If your flat file does not have such a column you cannot use it in a dynamic lookup.
Enable You can configure the Integration Service to perform a test load. Test Load With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files and performs all pre- and post-session functions, as if running the full session. The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions that use XML sources. Note: You can perform a test load when you configure a session for normal mode. If you configure the session for bulk mode, the session fails.

377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

377.informatica - What Are The Main Issues While Working With Flat Files As Source and As Targets ?

Transféré par

Droits d'auteur :

Formats disponibles

377.Informatica - what are the main issues while working with flat files as source and as targets ?

222.Informatica - What about rapidly changing dimensions?Can u analyze with an example?

Log File Codes Error Codes Description

doesnt use cache memory

Input locale Mapping Parameter and Variables

Hierarchy of optimization Target. Source. Mapping Session. System.

What is the use of Forward/Reject rows in Mapping?

In router we can have multiple condition

Vous aimerez peut-être aussi