Vous êtes sur la page 1sur 16

http://dw-informatica-resources.blogspot.

in/

Informatica Scenarios for Interviews Scenario1: A source table contains emp_name and salary columns. Develop an Informatica mapping to load all records with 5th highest salary into the target table. Solution: The mapping will contain following transformations after the Source Qualifier Transformation: 1. Sorter : It will contain 2 ports - emp_name and salary. The property 'Direction' will be selected as 'Descending' on key 'Salary' 2. Expression transformation: It will 6 ports as follows a> emp_name : It will be an I/O port directly connected from previous sorter transformation b> salary_prev : It will be a variable type port. Give any vriable name e.g val in its Expression column c> salary : It will be an I/O port directly connected from previous transformation d> val : It will be a variable port. The expression column of this port will contain 'salary' e> rank: It will be a variable type port. The expression column will contain decode (salary,salary_prev,rank,rank+1) f> rank_o : It will be an output port containg the value of 'rank'. 3. Filter Transformation : It will have 2 I/O ports emp_name and salary with a filter condition rank_o = 5 The ports emp_name and salary from Filter Transformation will be connected to target You can achieve the same result by using Sequence Generator Transformation after sorter and the rest same Scenario2: We have a target source table containing 3 columns : Col1, Col2 and Col3. There is only 1 row in the table as follows: Col1 Col2 Col3 ----------------abc There is target table containg only 1 column Col. Design a mapping so that the target table contains 3 rows as follows: Col ----a b c Solution: Not using a Normalizer transformation: Create 3 expression transformations exp_1,exp_2 and exp_3 with 1 port each. Connect col1 from Source Qualifier to port in exp_1.Connect col2 from Source Qualifier to port in exp_2.Connect col3 from source qualifier to port in exp_3. Make 3 instances of the target. Connect port from exp_1 to target_1. Connect port from exp_2 to target_2 and connect port from exp_3 to target_3.

Scenario 3: There is a source table that contains duplicate rows.Design a mapping to load all the unique rows in 1 target while all the duplicate rows (only 1 occurence) in another target. Solution : Bring all the columns from source qualifier to an Aggregator transformation. Check group by on the key column. Create a new output port count_col in aggregator transformation and write an expression count(key_column). Make a router transformation with 2 groups:Dup and Non-Dup. Check the router conditions count_col>1 in Dup group while count_col=1 in Non-dup group. Load these 2 groups in different targets. Scenario 4: There is a source table containing 2 columns Col1 and Col2 with data as follows: Col1 Col2 al bp am an bq xy Design a mapping to load a target table with following values from the above mentioned source: Col1 Col2 a l,m,n b p,q xy Solution: Use a sorter transformation after the source qualifier to sort the values with col1 as key. Build an expression transformation with following ports(order of ports should also be the same): 1. Col1_prev : It will be a variable type port. Expression should contain a variable e.g val 2. Col1 : It will be Input/Output port from Sorter transformation 3. Col2 : It will be input port from sorter transformation 4. val : It will be a variable type port. Expression should contain Col1 5. Concatenated_value: It will be a variable type port. Expression should be decode(Col1,Col1_prev,Concatenated_value||','||Col2,Col1) 6. Concatenated_Final : It will be an outpur port conating the value of Concatenated_value After expression, build a Aggregator Transformation. Bring ports Col1 and Concatenated_Final into aggregator. Group by Col1. Don't give any expression. This effectively will return the last row from each group. Connect the ports Col1 and Concatenated_Final from aggregator to the target table. Scenario 5: Design an Informatica mapping to load first half records to 1 target while other half records to a separate target.

Solution: You will have to assign a row number with each record. To achieve this, either use Oracle's psudo column rownum in Source Qualifier query or use NEXTVAL port of a Sequence generator. Lets name this column as rownumber. From Source Qualifier, create 2 pipelines: First Pipeline: Carry first port Col1 from SQ transformation into an aggregator transformation. Create a new output port "tot_rec" and give the expression as COUNT(Col1). Do not group by any port. This will give us the total number of records in Source Table. Carry this port tot_rec to an Expression Transformation. Add another port DUMMY in expression transformation with default value 1. Second Pipeline: from SQ transformation, carry all the ports(including an additional port rownumber generated by rownum or sequence generator) to an Expression Transformation. Add another port DUMMY in expression transformation with default value 1. Join these 2 pipelines with a Joiner Transformation on common port DUMMY. carry all the source table ports and 2 additional ports tot_rec and rownumber to a router transformation. Add 2 groups in Router : FIRST_HALF and SECOND_HALF. Give condition rownumber<=tot_rec/2 in FIRST_HALF. Give condition rownumber>tot_rec/2 in SECOND_HALF. Connect the 2 groups to 2 different targets. Posted by chand at 04:51 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

Thursday, 14 April 2011

Informatica Mapping Design Guidelines


Mapping Design ________________________________________ Challenge Use the PowerCenter tool suite to create an efficient execution environment. Description Although PowerCenter environments vary widely, most sessions and/or mappings can benefit from the implementation of common objects and optimization procedures. Follow these procedures and rules of thumb when creating mappings to help ensure optimization. Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions. General Suggestions for Optimizing 1. Reduce the number of transformations 2. There is always overhead involved in moving data between transformations. 3. Consider more shared memory for large number of transformations. Session shared memory between 12M and 40MB should suffice. 4. Calculate once, use many times. 5. Avoid calculating or testing the same value over and over. 6. Calculate it once in an expression, and set a True/False flag. 7. Within an expression, use variables to calculate a value used several times. 8. Only connect what is used. 9. Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier. 10. This is also helpful for maintenance, if you exchange transformations (e.g., a Source Qualifier). 11. Watch the data types. - The engine automatically converts compatible types. - Sometimes conversion is excessive, and happens on every transformation. o Minimize data type changes between transformations by planning data flow prior to developing the

mapping. 12. Facilitate reuse. - Plan for reusable transformations upfront. - Use variables. - Use mapplets to encapsulate multiple reusable transformations. 13. Only manipulate data that needs to be moved and transformed. - Delete unused ports particularly in Source Qualifier and Lookups. Reducing the number of records used throughout the mapping provides better performance - Use active transformations that reduce the number of records as early in the mapping as possible (i.e., placing filters, aggregators as close to source as possible). - Select appropriate driving/master table while using joins. The table with the lesser number of rows should be the driving/master table. 14. When DTM bottlenecks are identified and session optimization has not helped, use tracing levels to identify which transformation is causing the bottleneck (use the Test Load option in session properties). 15. Utilize single-pass reads. - Single-pass reading is the servers ability to use one Source Qualifier to populate multiple targets. - For any additional Source Qualifier, the server reads this source. If you have different Source Qualifiers for the same source (e.g., one for delete and one for update/insert), the server reads the source for each Source Qualifier. - Remove or reduce field-level stored procedures. - If you use field-level stored procedures, PowerMart has to make a call to that stored procedure for every row so performance will be slow. 16. Lookup Transformation Optimizing Tips - When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This typically improves performance by 10-20%. - The rule of thumb is not to cache any table over 500,000 rows. This is only true if the standard row byte count is 1,024 or less. If the row byte count is more than 1,024, then the 500k rows will have to be adjusted down as the number of bytes increase (i.e., a 2,048 byte row can drop the cache row count to 250K 300K, so the lookup table will not be cached in this case). - When using a Lookup Table Transformation, improve lookup performance by placing all conditions that use the equality operator = first in the list of conditions under the condition tab. - Cache only lookup tables if the number of lookup calls is more than 10-20% of the lookup table rows. For fewer number of lookup calls, do not cache if the number of lookup table rows is big. For small lookup tables, less than 5,000 rows, cache for more than 5-10 lookup calls. - Replace lookup with decode or IIF (for small sets of values). - If caching lookups and performance is poor, consider replacing with an unconnected, uncached lookup - For overly large lookup tables, use dynamic caching along with a persistent cache. Cache the entire table to a persistent file on the first run, enable update else insert on the dynamic cache and the engine will never have to go back to the database to read data from this table. It would then also be possible to partition this persistent cache at run time for further performance gains - Review complex expressions. 17. Examine mappings via Repository Reporting and Dependency Reporting within the mapping. 18. Minimize aggregate function calls. 19. Replace Aggregate Transformation object with an Expression Transformation object and an Update Strategy Transformation for certain types of Aggregations. 20. Operations and Expression Optimizing Tips - Numeric operations are faster than string operations. - Optimize char-varchar comparisons (i.e., trim spaces before comparing). - Operators are faster than functions (i.e., || vs. CONCAT). - Optimize IIF expressions. - Avoid date comparisons in lookup; replace with string. - Test expression timing by replacing with constant 21. Use Flat Files - Using flat files located on the server machine loads faster than a database located in the server machine - Fixed-width files are faster to load than delimited files because delimited files require extra parsing - If processing intricate transformations, consider loading first to a source flat file into a relational

database, which allows the PowerCenter mappings to access the data in an optimized fashion by using filters and custom SQL Selects where appropriate 22. If working with data that is not able to return sorted data (e.g., Web Logs) consider using the Sorter Advanced External Procedure. 23. Use a Router Transformation to separate data flows instead of multiple Filter Transformations 24. Use a Sorter Transformation or hash-auto keys partitioning before an Aggregator Transformation to optimize the aggregate. With a Sorter Transformation, the Sorted Ports option can be used even if the original source cannot be ordered 25. Use a Normalizer Transformation to pivot rows rather than multiple instances of the same Target 26. Rejected rows from an Update Strategy are logged to the Bad File. Consider filtering if retaining these rows is not critical because logging causes extra overhead on the engine 27. When using a Joiner Transformation, be sure to make the source with the smallest amount of data the Master source 28. If an update override is necessary in a load, consider using a lookup transformation just in front of the target to retrieve the primary key. The primary key update will be much faster than the non-indexed lookup override. Suggestions for Using Mapplets A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as necessary. Use the Mapplet Designer to create mapplets. 1. Create a mapplet when you want to use a standardized set of transformation logic in several mappings. For example, if you have several fact tables that require a series of dimension keys, you can create a mapplet containing a series of Lookup transformations to find each dimension key. You can then use the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping. 2. To create a mapplet, add, connect, and configure transformations to complete the desired transformation logic. After you save a mapplet, you can use it in a mapping to represent the transformations within the mapplet. When you use a mapplet in a mapping, you use an instance of the mapplet. All uses of a mapplet are all tied to the parent mapplet. Hence, all changes made to the parent mapplet logic are inherited by every child instance of the mapplet. When the server runs a session using a mapplet, it expands the mapplet. The server then runs the session as it would any other session, passing data through each transformation in the mapplet as designed. 3. A mapplet can be active or passive depending on the transformations in the mapplet. Active mapplets contain at least one active transformation. Passive mapplets only contain passive transformations. Being aware of this property when using mapplets can save time when debugging invalid mappings. 4. There are several unsupported transformations that should not be used in a mapplet, these include: COBOL source definitions, joiner, normalizer, non-reusable sequence generator, pre- or post-session stored procedures, target definitions, and PowerMart 3.5 style lookup functions 5. Do not reuse mapplets if you only need one or two transformations of the mapplet while all other calculated ports and transformations are obsolete 6. Source data for a mapplet can originate from one of two places: - Sources within the mapplet. Use one or more source definitions connected to a Source Qualifier or ERP Source Qualifier transformation. When you use the mapplet in a mapping, the mapplet provides source data for the mapping and is the first object in the mapping data flow. - Sources outside the mapplet. Use a mapplet Input transformation to define input ports. When you use the mapplet in a mapping, data passes through the mapplet as part of the mapping data flow. 7. To pass data out of a mapplet, create mapplet output ports. Each port in an Output transformation connected to another transformation in the mapplet becomes a mapplet output port. - Active mapplets with more than one Output transformations. You need one target in the mapping for each Output transformation in the mapplet. You cannot use only one data flow of the mapplet in a mapping. - Passive mapplets with more than one Output transformations. Reduce to one Output Transformation otherwise you need one target in the mapping for each Output transformation in the mapplet. This means you cannot use only one data flow of the mapplet in a mapping. Posted by chand at 22:53 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

Informatica Error Reference


Informatica Error Reference

Posted by chand at 05:38 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

Informatica PowerCenter Development Best Practices


Informatica PowerCenter Development Best Practices Content overview Lookup - Performance considerations Workflow performance basic considerations Pre/Post-Session commands - Uses Sequence generator design considerations FTP Connection object platform independence 1. Lookup - Performance considerations What is a lookup transformation? It is just not another transformation that fetches you data to look up against source data. A Lookup is an important and useful transformation when used effectively. If used improperly, performance of your mapping will be severely impaired. Let us see the different scenarios where you can face problems with Lookup and also how to tackle them. 1.1. Unwanted columns By default, when you create a lookup on a table, PowerCenter gives you all the columns in the table. If not all the columns are required for the lookup condition or return, delete the unwanted columns from the transformations. By not removing the unwanted columns, the cache size will increase. 1.2. Size of the source versus size of lookup

Let us say, you have 10 rows in the source and one of the columns has to be checked against a big table (1 million rows). Then PowerCenter builds the cache for the lookup table and then checks the 10 source rows against the cache. It takes more time to build the cache of 1 million rows than going to the database 10 times and lookup against the table directly. Use uncached lookup instead of building the static cache, as the number of source rows is quite less than that of the lookup. 1.3. JOIN instead of Lookup In the same context as above, if the Lookup transformation is after the source qualifier and there is no active transformation in-between, you can as well go for the SQL over ride of source qualifier and join traditionally to the lookup table using database joins, if both the tables are in the same database and schema. 1.4. Conditional call of lookup Instead of going for connected lookups with filters for a conditional lookup call, go for unconnected lookup. Is the single column return bothering for this? Go ahead and change the SQL override to concatenate the required columns into one big column. Break them at the calling side into individual columns again. 1.5. SQL query Find the execution plan of the Lookup SQL and see if you can add some indexes or hints to the query to make it fetch data faster. You may have to take the help of a database developer to accomplish this if you, yourself are not a SQLer. 1.6. Increase cache If none of the above options provide performance enhancements, then the problem may lie with the cache. The cache that you assigned for the lookup is not sufficient to hold the data or index of the lookup. Whatever data that doesn't fit into the cache is spilt into the cache files designated in $PMCacheDir. When the PowerCenter doesn't find the data you are looking for in the cache, it swaps the data from the file to the cache and keeps doing this until the data is found. This is quite expensive being that this type of operation is very I/O intense. To stop this issue from occurring, increase the size of the cache so the entire data set resides in memory. When increasing the cache you also have to be aware of the system constraints. If your cache size is greater than the resources available, the session will fail due to the lack of resources. 1.7. Cachefile file-system In many cases, if you have cache directory in a different file-system than that of the hosting server, the cache file piling up may take time and result in latency. So with the help of your system administrator try to look into this aspect as well. 1.8. Useful cache utilities If the same lookup SQL is being used by another lookup, then shared cache or a reusable lookup should be used. Also, if you have a table where the data is not changed often, you can use the persist cache option to build the cache once and use it many times by consecutive flows. 2. Workflow performance basic considerations

Though performance tuning has been the most feared part of development, it is the easiest, if the intricacies are known. With the newer and newer versions of PowerCenter, there is added flexibility for the developer to build betterperforming workflows. The major blocks for performance are the design of the mapping, SQL tuning if databases are involved. Regarding the design of the mapping, I have few basic considerations to be made. Please note that these are not any rules-of-thumb, but will make you act sensibly in different scenarios. 1. I would always suggest you to think twice before using an Update Strategy, though it adds a certain level of flexibility in the mapping. If you have a straight-through mapping which takes data from source and directly inserts all the records into the target, you wouldnt need an update strategy. 2. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target. Use truncate option in the session properties, if you wish to clean the table before loading. I would avoid a separate pipe-line in the mapping that runs before the load with update-strategy transformation. 3. You have 3 sources and 3 targets with one-on-one mapping. If the load is independent according to business requirement, I would create 3 different mappings and 3 different session instances and they all run in parallel in my workflow after my Start task. Ive observed that the workflow runtime comes down between 30-60% of serial processing. 4. PowerCenter is built to work of high volumes of data. So let the server be completely busy. Induce parallelism as far as possible into the mapping/workflow. 5. If using a transformation like a Joiner or Aggregator transformation, sort the data on the join keys or group by columns prior to these transformations to decrease the processing time. 6. Filtering should be done at the database level instead within the mapping. The database engine is much more efficient in filtering than PowerCenter. The above examples are just some things to consider when tuning a mapping. 2.1. SQL tuning SQL queries/actions occur in PowerCenter in one of the below ways. Relational Source Qualifier Lookup SQL Override Stored Procedures Relational Target Using the execution plan to tune a query is the best way to gain an understanding of how the database will process the data. Some things to keep in mind when reading the execution plan include: "Full Table Scans are not evil", "Indexes are not always fast", and Indexes can be slow too". Analyse the table data to see if picking up 20 records out of 20 million is best using index or using table scan. Fetching 10 records out of 15 using index is faster or using full table scan is easier. Many times the relational target indexes create performance problems when loading records into the relational target. If the indexes are needed for other purposes, it is suggested to drop the indexes at the time of loading and then rebuild them in post-SQL. When dropping indexes on a target you should consider integrity constraints and the time it takes to rebuild the index on post load vs. actual load time.

3. Pre/Post-Session command - Uses It is a very good practice to email the success or failure status of a task, once it is done. In the same way, when a business requirement drives, make use of the Post Session Success and Failure email for proper communication. The built-in feature offers more flexibility with Session Logs as attachments and also provides other run-time data like Workflow runinstance ID, etc. Any archiving activities around the source and target flat files can be easily managed within the session using the session properties for flat file command support that is new in PowerCenter v8.6. For example, after writing the flat file target, you can setup a command to zip the file to save space. If you have any editing of data in the target flat files which your mapping couldnt accommodate, write a shell/batch command or script and call it in the Post-Session command task. I prefer taking trade-offs between PowerCenter capabilities and the OS capabilities in these scenarios. 4. Sequence generator design considerations In most of the cases, I would advice you to avoid the use of sequence generator transformation, while populating an ID column in the relational target table. I suggest you rather create a sequence on the target database and enable the trigger on that table to fetch the value from the database sequence. There are many advantages to using a database sequence generator: Fewer PowerCenter objects will be present in a mapping which reduces development time and also maintenance effort. ID generation is PowerCenter independent if a different application is used in future to populate the target. Migration between environments is simplified because there is no additional overhead of considering the persistent values of the sequence generator from the repository database. In all of the above cases, a sequence created in the target database would make life lot easier for the table data maintenance and also for the PowerCenter development. In fact, databases will have specific mechanisms (focused) to deal with sequences and so you can implement manual Push-down optimization on your PowerCenter mapping design for yourself. DBAs will always complain about triggers on the databases, but I would still insist on using sequence-trigger combination for huge volumes of data as well. 5. FTP Connection object platform independence If you have any files to be read as source from Windows server when your PowerCenter server is hosted on UNIX/LINUX, then make use of FTP users on the Windows server and use File Reader with FTP Connection object. This connection object can be added as any other connection string. This gives the flexibility of platform independence. This will further reduce the overhead of having SAMBA mounts on to the Informatica boxes. Posted by chand at 05:19 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

Processing Multiple XML Files through Informatica


Processing Multiple XML Files through Informatica

Problem Statement: Data to be processed in Informatica were XML files in nature. The number of XML files to be processed was dynamic in nature. The need was also to ensure that the XML file name from which data is being processed is to be captured. Resolution: Option 1 Using File list as part of Indirect File Sources in session Option 2 Using Parameter File and workflow variable Implementation Details for option 1: Using File list XML file names to be processed were read using batch script and file list was created containing XML file. This file list name was set under source properties at session level. XML file were read sequentially and data pertaining to every XML file was processed. Since the number of XML files to be processed was dynamic the need of the hour was to achieve looping in Informatica. Challenge in using File List Created in a session to run multiple source files for one source instance in the mapping. When file list is used in a mapping as multiple source files for one source instance, the properties of all files must match the source definition. File list are configured in session properties by mentioning the file name of the file list in the Source Filename field and location of the file list in the Source File Directory field. When the session starts, the Integration Service reads the file list, then locates and reads the first file source in the list. After the Integration Service reads the first file, it locates and reads the next file in the list. The issue using XML file names in file list was further compounded by Informatica grouping records pertaining to similar XML node together. This lead to difficultly in identifying which record belonged to which XML file. Batch Script batch scripts controlled over all looping in Informatica by encompassing below mentioned tasks: Reading XML file names from staging location and creating file list containing XML file names. Moving XML files from staging location to archive location. Verifying whether there are any more XML files to be processed and depending on the outcome either loop the process by invoking first workflow or end the process Using PMCMD commands invoke appropriate workflows. Workflow Details There were two Informatica workflows designed to achieve looping: First workflow created indirect file to be used as source in session properties and will trigger second workflow. Details of workflow are: - Command task will execute a DOS batch script which will create indirect file after reading XML filenames from a pre-defined location on server. - Command task which will execute the second workflow to process data within XML files.

Second workflow will read process XML files and populate staging tables. Details of workflow are: - A session will read XML file names using indirect file and load into staging tables. - A command task will move the XML file just processed in file into an archive folder. Using batch script - A command task will execute a batch script which will: Check whether there are any more XML files to be processed. If yes then it will trigger the first workflow. This will ensure all XML files are processed and loaded into staging tables. If no then process will complete.

Posted by chand at 04:42 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

Different Output Files in Informatica PowerCenter


Output Files in Informatica The Integration Service process generates output files when we run workflows and sessions. By default, the Integration Service logs status and error messages to log event files. Log event files are binary files that the Log Manager uses to display log events. When we run each session, the Integration Service also creates a reject file. Depending on transformation cache settings and target types, the Integration Service may create additional files as well. The Integration Service creates the following output files:

Output Files Session Details/logs: When we run a session, the Integration service creates session log file with the load statistics/table names/Error information/threads created etc based on the tracing level that have set in the session properties. We can monitor session details in the session run properties while session running/failed/succeeded. Workflow Log: Workflow log is available in Workflow Monitor. The Integration Service process creates a workflow log for each workflow it runs. It writes information in the workflow log such as

- Initialization of processes, - Workflow task run information, - Errors encountered and - Workflows run summary. The Integration Service can also be configured to suppress writing messages to the workflow log file. As with Integration Service logs and session logs, the Integration Service process enters a code number into the workflow log file message along with message text. Performance Detail File: The Integration Service process generates performance details for session runs. Through the performance details file we can determine where session performance can be improved. Performance details provide transformation-by-transformation information on the flow of data through the session. Reject Files: By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets. The writer may reject a row in the following circumstances: - It is flagged for reject by an Update Strategy or Custom transformation. - It violates a database constraint such as primary key constraint - A field in the row was truncated or overflowed - The target database is configured to reject truncated or overflowed data. Note: By default, the Integration Service process saves the reject file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager, and names the reject file target_table_name.bad. We can view this file name in session level. Open Session Select any of the target View the options - Reject File directory. - Reject file name. If you enable row error logging, the Integration Service process does not create a reject file. Row Error Logs: When we configure a session, we can choose to log row errors in a central location. When a row error occurs, the Integration Service process logs error information that allows to determine the cause and source of the error. The Integration Service process logs information such as source name, row ID, current row data, transformation, timestamp, error code, error message, repository name, folder name, session name, and mapping information. we enable flat file logging, by default, the Integration Service process saves the file in the directory entered for the service process variable $PMBadFileDir in the Workflow Manager. Recovery Tables Files: The Integration Service process creates recovery tables on the target database system when it runs a session enabled for recovery. When you run a session in recovery mode, the Integration Service process uses information in the recovery tables to complete the session. When the Integration Service process performs recovery, it restores the state of operations to recover the workflow from the point of interruption. The workflow state of operations includes information such as active service requests, completed and running status, workflow variable values, running workflows and sessions, and workflow schedules. Control File: When we run a session that uses an external loader, the Integration Service process creates a control file and a target flat file. The control file contains information about the target flat file such as data format and loading instructions for the external loader. The control file has an extension of .ctl. The Integration Service process creates the control file and the target flat file in the Integration Service variable directory,$PMTargetFileDir, by default. Email: We can compose and send email messages by creating an Email task in the Workflow Designer or Task Developer and the Email task can be placed in a workflow, or can be associated it with a session. The Email task allows to automatically communicate information about a workflow or session run to

designated recipients. Email tasks in the workflow send email depending on the conditional links connected to the task. For post-session email, we can create two different messages, one to be sent if the session completes successfully, the other if the session fails. We can also use variables to generate information about the session name, status, and total rows loaded. Indicator File: If we use a flat file as a target, we can configure the Integration Service to create an indicator file for target row type information. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject. The Integration Service process names this file target_name.ind and stores it in the Integration Service variable directory, $PMTargetFileDir, by default. Target or Output File: If the session writes to a target file, the Integration Service process creates the target file based on a file target definition. By default, the Integration Service process names the target file based on the target definition name. If a mapping contains multiple instances of the same target, the Integration Service process names the target files based on the target instance name. The Integration Service process creates this file in the Integration Service variable directory, $PMTargetFileDir, by default. Cache Files: When the Integration Service process creates memory cache, it also creates cache files. The Integration Service process creates cache files for the following mapping objects: - Aggregator transformation - Joiner transformation - Rank transformation - Lookup transformation - Sorter transformation - XML target By default, the DTM creates the index and data files for Aggregator, Rank, Joiner, and Lookup transformations and XML targets in the directory configured for the $PMCacheDir service process variable. Posted by chand at 04:25 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

How to load session statistics into a Database Table


How to load session statistics into a Database Table The below solution will help you to load the session statistics into a database table which can be used for audit purpose. In real life, developer generally doesnt have access to the metadata tables. This solution will help to get the session statistics for audit purpose. Solution: Create a Database table to store the session statistics. Note: The following syntax is for Sybase DB. Please change according to your DB. create table Infa_Audit ( workflow_name varchar(50), start_time datetime, end_time datetime, success_rows numeric,

failed_rows numeric ) Create two sessions e.g. Session1 and Session2. Session1: The Session1 will be your actual session for which you want to load the statistics. Session2: Session2 will be used to load the statistics of Session1 into database table. For this, create a mapping and define the below mapping variables

The flow of Mapping should look like this

Where, Source will be a dummy source and inside the expression assign all the mapping variables to output ports. The audit table will be the target.

The workflow will look like as shown below.

Create workflow variables as shown below

Assign the values to workflow variables in Assignment Task as shown below

In the Pre-session variable assignment tab of Session2, assign the mapping variables to workflow variable as shown below:

Execute the workflow

Vous aimerez peut-être aussi