Vous êtes sur la page 1sur 31

DATAWAREHOUSING

WHY Data Warehousing? Data warehousing is mainly done for the reporting purposes. All the historical data is put into a Data warehouse, which can be thought of as a Very large Database. Later on, reports are generating out of this Data Warehouse to do analysis of the business. What is difference bet een Enter!rise Data Warehouse "EDW# and a Data $art? EDW consists of all the information associated with the entire Organization. or e!ample, it will contain information about all the departments "#ay inance, $uman %esource, &ar'eting, #ales etc(. Where as Data &art O)L* contains the data that is specific to one department "#ay only inance(. Data Warehousing Too%s ET& Too%s E+L means E!traction, +ransformation and Loading. And tools that e!tract data from different data sources "#,L#er-er, Oracle, lat iles, #ybase etc( into a Datawarehouse are 'nown as E+L tools. #ome popular E+L tools in mar'et are .nformatica, Ab .nitio and Datastage. %eporting +ools %eporting tools are used to generate the reports out of the information "data( stored in the Data warehouse. #ome popular reporting tools in the mar'et are /usiness Ob0ects, 1ognos, &icrostrategy etc. Data $ode%ing A Datawarehouse is based on act and Dimension tables ' Establishing relationship between -arious act table"s( and Dimension tables is called (Data &odeling). act table contains numeric data that is needed in reports e.g. re-enue, sales etc act table contain information about all dimension table that it is related to. +his means A1+ table has all the Dimension 2eys as oreign 2eys. Data &odeling is of two types3 4. #tar #chema Design3 Dimension tables surrounds act table. Data is in de5normalized form. 6. #now la'es #chema Design3 Dimension tables surrounds act table. Data is in normalized form. Dimension table may be further split into a sub5dimension table.

.nformatica +ool .nstallation 4. .nstall Oracle.

6. .nstall .nformatica 1lient +ools. 7. .nstall .nformatica #er-er. While .nstalling the .nformatica #er-er, 8i-e 'eys for all databases, 8i-e name for %epository "and user name and password(, 8i-e +19:.9 9ort number ";<<4(, 1hoose Oracle Version. +he OD/1 Dri-er or Oracle is =&erant576 bit for Oracle>.

IN*OR$ATI+A
About ,o er+enter and ,o er$art Welcome to 9ower&art and 9ower1enter, .nformatica?s integrated suite of software products that deli-er an open, scalable solution addressing the complete life cycle for data warehouse and analytic application de-elopment. /oth 9ower&art and 9ower1enter combine the latest technology enhancements for reliably managing data repositories and deli-ering information resources in a timely, usable manner. +he metadata repository coordinates and dri-es a -ariety of core functions including e!traction, transformation, loading, and management. +he .nformatica #er-er can e!tract large -olumes of data from multiple platforms, handle comple! transformations, and support high5speed loads. 9ower&art and 9ower1enter can simplify and accelerate the process of mo-ing data warehouses from de-elopment to test to full production. #oftware features that differ between the 9ower&art and 9ower1enter3 .f *ou Are @sing 9ower1enter With 9ower1enter, you recei-e all product functionality, including the ability to register multiple ser-ers, share metadata across repositories, and partition data. A 9ower1enter license lets you create a single repository that you can configure as a global repository, the core component of a data warehouse. When this guide mentions a 9ower1enter #er-er, it is referring to an .nformatica #er-er with a 9ower1enter license. .f *ou Are @sing 9ower&art +his -ersion of 9ower&art includes all features e!cept distributed metadata, multiple registered ser-ers, and data partitioning. Also, the -arious options a-ailable with 9ower1enter "such as 9ower1enter .ntegration #er-er for /W, 9ower1onnect for ./& D/6, 9ower1onnect for ./& &,#eries, 9ower1onnect for #A9 %:7, 9ower1onnect for #iebel, and 9ower1onnect for 9eople#oft( are not a-ailable with 9ower&art. When this guide mentions a 9ower&art #er-er, it is referring to an .nformatica #er-er with a 9ower&art license. Infor-atica +%ient Too%s. Designer #er-er &anager %epository &anager Infor-atica Ser/er Too%s. 4. .nformatica #er-er &oad $anager ,rocess and Data Transfor-ation $anager ,rocess +he Load &anager is the primary .nformatica #er-er process. .t performs the following tas's3 &anages session and batch scheduling. Loc's the session and reads session properties. %eads the parameter file. E!pands the ser-er and session -ariables and parameters. Verifies permissions and pri-ileges. Validates source and target code pages.

1reates the session log file. 1reates the Data +ransformation &anager "D+&( process, which e!ecutes the session.

+he Data +ransformation &anager "D+&( process e!ecutes the session. DESIGNER +he Designer has fi-e tools to help you build mappings and mapplets so you can specify how to mo-e and transform data between sources and targets. +he Designer helps you create source definitions, target definitions, and transformations to build your mappings. +he Designer allows you to wor' with multiple tools at one time and to wor' in multiple folders and repositories at the same time. .t also includes windows so you can -iew folders, repository ob0ects, and tas's. Designer Too%s +he Designer pro-ides the A following tools3 #ource Analyzer. @sed to import or create source definitions for flat file " i!ed5width and delimited flat files(, B&L, 1O/OL, E%9, and relational sources "tables, -iews, and synonyms(. Warehouse Designer. @sed to import or create target definitions. +ransformation De-eloper. @sed to create reusable transformations. &applet Designer. @sed to create mapplets. &apping Designer. @sed to create mappings. What is a +ransformationC A transformation is a repository ob0ect that generates, modifies, or passes data. +he Designer pro-ides a set of transformations that perform specific functions. or e!ample, an Aggregator transformation performs calculations on groups of data. +ransformations in a mapping represent the operations the .nformatica #er-er performs on the data. Data passes into and out of transformations through ports that you connect in a mapping or mapplet. +ransformations can be acti-e or passi-e. An acti-e transformation can change the number of rows that pass through it, such as a ilter transformation that remo-es rows that do not meet the configured filter condition. A passi-e transformation does not change the number of rows that pass through it, such as an E!pression transformation that performs a calculation on data and passes all rows through the transformation. +ransformations can be connected to the data flow, or they can be unconnected. An unconnected transformation is not connected to other transformations in the mapping. .t is called within another transformation, and returns a -alue to that transformation. +able D54 pro-ides a brief description of each transformation3

+able D54. +ransformation Descriptions Transfor-ation Ad-anced 9rocedure Aggregator E%9 #ource ,ualifier E!pression T0!e Descri!tion 1alls a procedure in a shared library or in the 1O& layer of Windows )+. 9erforms aggregate calculations. %epresents the rows that the .nformatica #er-er reads from an E%9 source when it runs a session. 1alculates a -alue. 1alls a procedure in a shared library or in the 1O& layer of Windows )+. ilters records. Defines mapplet input rows. A-ailable only in the &applet Designer. Eoins records from different databases or flat file systems. or Loo's up -alues. )ormalizes records, including those read from 1O/OL sources. Defines mapplet output rows. A-ailable only in the &applet Designer. Limits records to a top or bottom range. 8enerates primary 'eys. %epresents the rows that the .nformatica #er-er reads from a relational or flat file source when it runs a session. %outes data into multiple transformations based on a group e!pression. or 1alls a stored procedure.

E!ternal Acti-e: 1onnected Acti-e: 1onnected Acti-e: 1onnected 9assi-e: 1onnected 9assi-e: 1onnected @nconnected Acti-e: 1onnected 9assi-e: 1onnected Acti-e: 1onnected 9assi-e: 1onnected @nconnected Acti-e: 1onnected 9assi-e: 1onnected Acti-e: 1onnected 9assi-e: 1onnected Acti-e: 1onnected Acti-e: 1onnected 9assi-e: 1onnected @nconnected or

E!ternal 9rocedure

ilter .nput Eoiner

Loo'up

)ormalizer Output %an' #eFuence 8enerator

#ource ,ualifier

%outer

#tored 9rocedure

@pdate #trategy B&L #ource ,ualifier

Acti-e: 1onnected 9assi-e: 1onnected

Determines whether to insert, delete, update, or re0ect records. %epresents the rows that the .nformatica #er-er reads from an B&L source when it runs a session.

O/er/ie

Of Transfor-ations.

4. Aggregator +he Aggregator transformation allows you to perform aggregate calculations, such as a-erages and sums. +he Aggregator transformation is unli'e the E!pression transformation, in that you can use the Aggregator transformation to perform calculations on groups. +he E!pression transformation permits you to perform calculations on a row5by5row basis only. When using the transformation language to create aggregate e!pressions, you can use conditional clauses to filter records, pro-iding more fle!ibility than #,L language. +he .nformatica #er-er performs aggregate calculations as it reads, and stores necessary data group and row data in an aggregate cache. After you create a session that includes an Aggregator transformation, you can enable the session option, .ncremental Aggregation. When the .nformatica #er-er performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally. 6. ilter +he ilter transformation pro-ides the means for filtering rows in a mapping. *ou pass all the rows from a source Fualifier transformation through the ilter transformation, and then enter a filter condition for the transformation. All ports in a ilter transformation are input:output, and only rows that meet the condition pass through the ilter transformation. .n some cases, you need to filter data based on one or more conditions before writing it to targets. or e!ample, if you ha-e a human resources data warehouse containing information about current employees, you might want to filter out employees who are part5time and hourly. +he mapping in igure 4D54 passes the rows from a human resources table that contains employee data through a ilter transformation. +he filter only allows rows through for employees that ma'e salaries of G7<,<<< or higher. 7. Eoiner While a #ource ,ualifier transformation can 0oin data originating from a common source database, the Eoiner transformation 0oins two related heterogeneous sources residing in different locations or file systems. +he combination of sources can be -aried. *ou can use the following sources3 a( +wo relational tables e!isting in separate databases b( +wo flat files in potentially different file systems

c( +wo different OD/1 sources d( +wo instances of the same B&L source e( A relational table and a flat file source f( A relational table and an B&L source *ou use the Eoiner transformation to 0oin two sources with at least one matching port. +he Eoiner transformation uses a condition that matches one or more pairs of ports between the two sources. or e!ample, you might want to 0oin a flat file with in5house customer .Ds and a relational database table that contains user5defined customer .Ds. *ou could import the flat file into a temporary database table, and then perform the 0oin in the database. $owe-er, if you use the Eoiner transformation, there is no need to import or create temporary tables. .f two relational sources contain 'eys, then a #ource ,ualifier transformation can easily 0oin the sources on those 'eys. Eoiner transformations typically combine information from two different sources that do not ha-e matching 'eys, such as flat file sources. +he Eoiner transformation allows you to 0oin sources that contain binary data. +he Eoiner transformation supports the following 0oin types, which you set in the 9roperties tab3 4. )ormal "Default( 6. &aster Outer 7. Detail Outer ;. ull Outer

;. #ource ,ualifier When you add a relational or a flat file source definition to a mapping, you need to connect it to a #ource ,ualifier transformation. +he #ource ,ualifier represents the records that the .nformatica #er-er reads when it runs a session. *ou can use the #ource ,ualifier to perform the following tas's3 a( 1oin data originating fro- the sa-e source database *ou can 0oin two or more tables with primary5foreign 'ey relationships by lin'ing the sources to one #ource ,ualifier. b( *i%ter records hen the Infor-atica Ser/er reads source data .f you include a filter condition, the .nformatica #er-er adds a W$E%E clause to the default Fuery. c( S!ecif0 an outer 2oin rather than the defau%t inner 2oin .f you include a user5defined 0oin, the .nformatica #er-er replaces the 0oin information specified by the metadata in the #,L Fuery. d( S!ecif0 sorted !orts .f you specify a number for sorted ports, the .nformatica #er-er adds an O%DE% /* clause to the default #,L Fuery. e( Se%ect on%0 distinct /a%ues fro- the source .f you choose #elect Distinct, the .nformatica #er-er adds a #ELE1+ D.#+.)1+ statement to the default #,L Fuery. f( +reate a custo- 3uer0 to issue a s!ecia% SE&E+T state-ent for the Infor-atica Ser/er to read source data or e!ample, you might use a custom Fuery to perform aggregate calculations or e!ecute a stored procedure.

A. #tored 9rocedure A #tored 9rocedure transformation is an important tool for populating and maintaining databases. Database administrators create stored procedures to automate time5consuming tas's that are too complicated for standard #,L statements. A stored procedure is a precompiled collection of +ransact5#,L statements and optional flow control statements, similar to an e!ecutable script. #tored procedures are stored and run within the database. *ou can run a stored procedure with the EBE1@+E #,L statement in a database client tool, 0ust as you can run #,L statements. @nli'e standard #,L, howe-er, stored procedures allow user5defined -ariables, conditional statements, and other powerful programming features. )ot all databases support stored procedures, and database implementations -ary widely on their synta!. *ou might use stored procedures to3 a( Drop and recreate inde!es. b( 1hec' the status of a target database before mo-ing records into it. c( Determine if enough space e!ists in a database. d( 9erform a specialized calculation. Database de-elopers and programmers use stored procedures for -arious tas's within databases, since stored procedures allow greater fle!ibility than #,L statements. #tored procedures also pro-ide error handling and logging necessary for mission critical tas's. De-elopers create stored procedures in the database using the client tools pro-ided with the database. +he stored procedure must e!ist in the database before creating a #tored 9rocedure transformation, and the stored procedure can e!ist in a source, target, or any database with a -alid connection to the .nformatica #er-er. *ou might use a stored procedure to perform a Fuery or calculation that you would otherwise ma'e part of a mapping. or e!ample, if you already ha-e a well5tested stored procedure for calculating sales ta!, you can perform that calculation through the stored procedure instead of recreating the same calculation in an E!pression transformation. H. #eFuence 8enerator +he #eFuence 8enerator transformation generates numeric -alues. *ou can use the #eFuence 8enerator to create uniFue primary 'ey -alues, replace missing primary 'eys, or cycle through a seFuential range of numbers. +he #eFuence 8enerator transformation is a connected transformation. .t contains two output ports that you can connect to one or more transformations. +he .nformatica #er-er generates a -alue each time a row enters a connected transformation, e-en if that -alue is not used. When )EB+VAL is connected to the input port of another transformation, the .nformatica #er-er generates a seFuence of numbers. When 1@%%VAL is connected to the input port of another transformation, the .nformatica #er-er generates the )EB+VAL -alue plus one. *ou can ma'e a #eFuence 8enerator reusable, and use it in multiple mappings. *ou might reuse a #eFuence 8enerator when you perform multiple loads to a single target. or e!ample, if you ha-e a large input file that you separate into three sessions running in parallel, you can use a #eFuence 8enerator to generate primary 'ey -alues. .f you use different #eFuence 8enerators, the .nformatica #er-er might accidentally generate duplicate 'ey -alues.

.nstead, you can use the same reusable #eFuence 8enerator for all three sessions to pro-ide a uniFue -alue for each target row. I. %an' +he %an' transformation allows you to select only the top or bottom ran' of data. *ou can use a %an' transformation to return the largest or smallest numeric -alue in a port or group. *ou can also use a %an' transformation to return the strings at the top or the bottom of a session sort order. During the session, the .nformatica #er-er caches input data until it can perform the ran' calculations. +he %an' transformation differs from the transformation functions &AB and &.), in that it allows you to select a group of top or bottom -alues, not 0ust one -alue. or e!ample, you can use %an' to select the top 4< salespersons in a gi-en territory. Or, to generate a financial report, you might also use a %an' transformation to identify the three departments with the lowest e!penses in salaries and o-erhead. While the #,L language pro-ides many functions designed to handle groups of data, identifying top or bottom strata within a set of rows is not possible using standard #,L functions. *ou connect all ports representing the same row set to the transformation. Only the rows that fall within that ran', based on some measure you set when you configure the transformation, pass through the %an' transformation. *ou can also write e!pressions to transform data or perform calculations. igure 6654 shows a mapping that passes employee data from a human resources table through a %an' transformation. +he %an' only passes the rows for the top 4< highest paid employees to the ne!t transformation. D. Loo' @p @se a Loo'up transformation in your mapping to loo' up data in a relational table, -iew, or synonym. .mport a loo'up definition from any relational database to which both the .nformatica 1lient and #er-er can connect. *ou can use multiple Loo'up transformations in a mapping. +he .nformatica #er-er Fueries the loo'up table based on the loo'up ports in the transformation. .t compares Loo'up transformation port -alues to loo'up table column -alues based on the loo'up condition. @se the result of the loo'up to pass to other transformations and the target. *ou can use the Loo'up transformation to perform many tas's, including3 a( Get a re%ated /a%ue. or e!ample, if your source table includes employee .D, but you want to include the employee name in your target table to ma'e your summary data easier to read. b( ,erfor- a ca%cu%ation. &any normalized tables include -alues used in a calculation, such as gross sales per in-oice or sales ta!, but not the calculated -alue "such as net sales(. c( U!date s%o %0 changing di-ension tab%es. *ou can use a Loo'up transformation to determine whether records already e!ist in the target. *ou can configure the Loo'up transformation to perform different types of loo'ups. *ou can configure the transformation to be connected or unconnected, cached or uncached3

a( +onnected or unconnected. 1onnected and unconnected transformations recei-e input and send output in different ways. b( +ached or uncached. #ometimes you can impro-e session performance by caching the loo'up table. .f you cache the loo'up table, you can choose to use a dynamic or static cache. /y default, the loo'up cache remains static and does not change during the session. With a dynamic cache, the .nformatica #er-er inserts rows into the cache during the session. .nformatica recommends that you cache the target table as the loo'up. +his enables you to loo' up -alues in the target and insert them if they do not e!ist.

J. E!pression *ou can use the E!pression transformations to calculate -alues in a single row before you write to the target. or e!ample, you might need to ad0ust employee salaries, concatenate first and last names, or con-ert strings to numbers. *ou can use the E!pression transformation to perform any non5aggregate calculations. *ou can also use the E!pression transformation to test conditional statements before you output the results to target tables or other transformations. )ote3 +o perform calculations in-ol-ing multiple rows, such as sums or a-erages, use the Aggregator transformation. @nli'e the E!pression transformation, the Aggregator allows you to group and sort data. or details, see Aggregator +ransformation. 4<. %outer A %outer transformation is similar to a ilter transformation because both transformations allow you to use a condition to test data. A ilter transformation tests data for one condition and drops the rows of data that do not meet the condition. $owe-er, a %outer transformation tests data for one or more conditions and gi-es you the option to route rows of data that do not meet any of the conditions to a default output group. .f you need to test the same input data based on multiple conditions, use a %outer +ransformation in a mapping instead of creating multiple ilter transformations to perform the same tas'. +he %outer transformation is more efficient when you design a mapping and when you run a session. or e!ample, to test data based on three conditions, you only need one %outer transformation instead of three filter transformations to perform this tas'. Li'ewise, when you use a %outer transformation in a mapping, the .nformatica #er-er processes the incoming data only once. When you use multiple ilter transformations in a mapping, the .nformatica #er-er processes the incoming data for each transformation. 44. @pdate #trategy When you design your data warehouse, you need to decide what type of information to store in targets. As part of your target table design, you need to determine whether to maintain all the historic data or 0ust the most recent changes. or e!ample, you might ha-e a target table, +K1@#+O&E%# that contains customer data. When customers address changes, you may want to sa-e the original address in the table, instead of updating that portion of the customer record. .n this case, you would create a new record

10

containing the updated address, and preser-e the original record with the old customer address. +his illustrates how you might store historical information in a target table. $owe-er, if you want the +K1@#+O&E%# table to be a snapshot of current customer data, you would update the e!isting customer record and lose the original address. +he model you choose constitutes your update strategy, how to handle changes to e!isting records. .n 9ower&art and 9ower1enter, you set your update strategy at two different le-els3 a( Within a session. When you configure a session, you can instruct the .nformatica #er-er to either treat all records in the same way "for e!ample, treat all records as inserts(, or use instructions coded into the session mapping to flag records for different database operations. b( Within a -a!!ing. Within a mapping, you use the @pdate #trategy transformation to flag records for insert, delete, update, or re0ect. 4' SER5ER $ANAGER +he .nformatica #er-er mo-es data from sources to targets based on mapping and session metadata stored in a repository. What is a &appingC A mapping is a set of source and target definitions lin'ed by transformation ob0ects that define the rules for data transformation. What is a #essionC A session is a set of instructions that describes how and when to mo-e data from sources to targets. @se the Designer to import source and target definitions into the repository and to build mappings. @se the #er-er &anager to create and manage sessions and batches, and to monitor and stop the .nformatica #er-er. When a session starts, the .nformatica #er-er retrie-es mapping and session metadata from the repository to e!tract data from the source, transform it, and load it into the target. &ore about a #ession A session is a set of instructions that tells the .nformatica #er-er how and when to mo-e data from sources to targets. *ou create and maintain sessions in the #er-er &anager. When you create a session, you enter general information such as the session name, session schedule, and the .nformatica #er-er to run the session. *ou can also select options to e!ecute pre5session shell commands, send post5session email, and +9 source and target files. @sing session properties, you can also o-erride parameters established in the mapping, such as source and target location, source and target type, error tracing le-els, and transformation attributes. or details on ser-er acti-ity while e!ecuting a session, see @nderstanding the #er-er Architecture. *ou can group sessions into a batch. +he .nformatica #er-er can run the sessions in a batch in seFuential order, or start them concurrently. #ome batch settings o-erride session settings.

11

Once you create a session, you can use either the #er-er &anager or the command line program pmcmd to start or stop the session. *ou can also use the #er-er &anager to monitor, edit, schedule, abort, copy, and delete the session. What is a /atchC /atches pro-ide a way to group sessions for either serial or parallel e!ecution by the .nformatica #er-er. +here are two types of batches3 a( Se3uentia%. %uns sessions one after the other. b( +oncurrent. %uns sessions at the same time. *ou might create a seFuential batch if you ha-e sessions with source5target dependencies that you want to run in a specific order. *ou might also create a concurrent batch if you ha-e se-eral independent sessions you need scheduled at the same time. *ou can place them all in one batch, then schedule the batch as needed instead of scheduling each indi-idual session. *ou can create, edit, start, schedule, and stop batches with the #er-er &anager. $owe-er, you cannot copy or abort batches. With pmcmd, you can start and stop batches.

12

6# RE,OSITORY $ANAGER +he .nformatica repository is a relational database that stores information, or metadata, used by the .nformatica #er-er and 1lient tools. &etadata can include information such as mappings describing how to transform source data, sessions indicating when you want the .nformatica #er-er to perform the transformations, and connect strings for sources and targets. +he repository also stores administrati-e information such as usernames and passwords, permissions and pri-ileges, and product -ersion. *ou create and maintain the repository with the %epository &anager client tool. With the %epository &anager, you can also create folders to organize metadata and groups to organize users. +he .nformatica repository is an integral part of a data mart. A data mart includes the following components3 a( Targets. +he data mart includes one or more databases or flat file systems that store the information used for decision support. b( A ser/er engine. E-ery data mart needs some 'ind of ser-er application that reads, transforms, and writes data to targets. .n traditional data warehouses, this ser-er application consists of 1O/OL or #,L code you write to perform these operations. .n 9ower&art and 9ower1enter, you use a single ser-er application that runs on @).B or Windows )+ to read, transform, and write data. c( $etadata. Designing a data mart in-ol-es writing and storing a comple! set of instructions. *ou need to 'now where to get data "sources(, how to change it, and where to write the information "targets(. 9ower&art and 9ower1enter call this set of instructions metadata. Each piece of metadata "for e!ample, the description of a source table in an operational database( can contain comments about it. d( A re!ositor0. +he place where you store the metadata is called a repository. +he more sophisticated your repository, the more comple! and detailed metadata you can store in it. 9ower&art and 9ower1enter use a relational database as the repository.

13

I$,RO5ING $A,,ING ,ER*OR$AN+E 7 TI,S


4. Aggregator +ransformation *ou can use the following guidelines to optimize the performance of an Aggregator transformation. a( @se #orted .nput to decrease the use of aggregate caches3 +he #orted .nput option reduces the amount of data cached during the session and impro-es session performance. @se this option with the #ource ,ualifier )umber of #orted 9orts option to pass sorted data to the Aggregator transformation. b( Limit connected input:output or output ports3 Limit the number of connected input:output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache. c( ilter before aggregating3 .f you use a ilter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.

6. ilter +ransformation +he following tips can help filter performance3 a( @se the ilter transformation early in the mapping3 +o ma!imize session performance, 'eep the ilter transformation as close as possible to the sources in the mapping. %ather than passing rows that you plan to discard through the mapping, you can filter out unwanted data early in the flow of data from sources to targets. b( @se the #ource ,ualifier to filter3 +he #ource ,ualifier transformation pro-ides an alternate way to filter rows. %ather than filtering rows from within a mapping, the #ource ,ualifier transformation filters rows when read from a source. +he main difference is that the source Fualifier limits the row set extracted from a source, while the ilter transformation limits the row set sent to a target . #ince a source Fualifier reduces the number of rows used throughout the mapping, it pro-ides better performance. $owe-er, the source Fualifier only lets you filter rows from relational sources, while the ilter transformation filters rows from any type of source. Also, note that since it runs in the database, you must ma'e sure that the source Fualifier filter condition only uses standard #,L. +he ilter transformation can define a condition using any statement or transformation function that returns either a +%@E or AL#E -alue. 7. Eoiner +ransformation +he following tips can help impro-e session performance3

14

a( 9erform 0oins in a database3 9erforming a 0oin in a database is faster than performing a 0oin in the session. .n some cases, this is not possible, such as 0oining tables from two different databases or flat file systems. .f you want to perform a 0oin in a database, you can use the following options3 1reate a pre5session stored procedure to 0oin the tables in a database before running the mapping. @se the #ource ,ualifier transformation to perform the 0oin. b( Designate as the master source the source with the smaller number of records3 or optimal performance and dis' storage, designate the master source as the source with the lower number of rows. With a smaller master source, the data cache is smaller, and the search time is shorter. ;. Loo'@p +ransformation @se the following tips when you configure the Loo'up transformation3 a( Add an inde! to the columns used in a loo'up condition3 .f you ha-e pri-ileges to modify the database containing a loo'up table, you can impro-e performance for both cached and uncached loo'ups. +his is important for -ery large loo'up tables. #ince the .nformatica #er-er needs to Fuery, sort, and compare -alues in these columns, the inde! needs to include e-ery column used in a loo'up condition. b( 9lace conditions with an eFuality operator "L( first3 .f a Loo'up transformation specifies se-eral conditions, you can impro-e loo'up performance by placing all the conditions that use the eFuality operator first in the list of conditions that appear under the 1ondition tab. c( 1ache small loo'up tables3 .mpro-e session performance by caching small loo'up tables. +he result of the Loo'up Fuery and processing is the same, regardless of whether you cache the loo'up table or not. d( Eoin tables in the database3 .f the loo'up table is on the same database as the source table in your mapping and caching is not feasible, 0oin the tables in the source database rather than using a Loo'up transformation. e( @n #elect the cache loo'5up option in Loo' @p transformation if there is no loo' up o-er5ride. +his impro-es performance of session.

15

$A,,ING 5ARIA8&ES 1. Go to Mappings Tab, Click Para !t!rs an" #ariabl!s Tab, Cr!at! a $%& port as b!lo'. (()ast*+nTi ! #ariabl! "at!,ti ! 19 0 Ma-

Gi.! an /nitial #al+!. 0or !-a pl! 1,1,1900. 2. /$ %1P Trans2or ation, Cr!at! #ariabl! as b!lo'3 4!t)ast*+nTi ! 5"at!,ti !6 7 SETVARIABLE ($$LastRunTime, SESSSTARTTIME)
3. Go to SOURCE QUALI IER T!ans"o!mation, C#i$% &!o'e!ties Ta(, In Sou!$e i#te! a!ea, E)TER t*e "o##o+in, E-'!ession. U!dateDateTi-e "An0 Date +o%u-n fro- source# 9: ;<<&astRunTi-e; AND U!dateDateTi-e = ;<<<SessStartTi-e;

8an"l! $+lls in 9:T%


iif(isnull(A,e./ate),to_date(0121213440,0MM2//255550),trunc(A,e./ate,0/A50))

&OO> U, AND U,DATE STRATEGY E?,RESSION 0irst, "!clar! a )ook ;p con"ition in )ook ;p Trans2or ation. 0or !-a pl!, %MP/9</$ 5col+ n co ing 2ro so+rc!6 7 %MP/9 5col+ n in targ!t tabl!6

4!con", "rag an" "rop t=!s! t'o col+ ns into ;P9:T% 4trat!g> Trans2or ation.

16

1hec' the Value coming from source "E&9.DK.)( with the column in the target table "E&9.D(. .f both are eFual this means that the record already e!ists in the target. #o we need to update the record "DDK@9DA+E(. Else will insert the record coming from source into the target "DDK.)#E%+(. #ee below for @9DA+E #trategy e!pression.

//0 55%MP/9</$ 7 %MP/96, 99<;P9:T%, 99</$4%*T6 NOTE: :l'a>s t=! ;p"at! 4trat!g> !-pr!ssion s=o+l" b! bas!" on Pri ar> k!>s in t=! targ!t tabl!.

E?,RESSION TRANS*OR$ATION

1. II (IS)ULL (Se!6i$eO!.e!/ateVa#ue1), TO7/ATE (0121213440,0MM2//255550), TRU)C (Se!6i$eO!.e!/ateVa#ue1,0/A50)) 8.

//0 5/4$;)) 5$pa$--/"16 or )%$GT8 5*T*/M 5$pa$--/"16670 or T?<$;M@%* 5$pa$--/"16 A7 0,B;$CB, $pa$--/"16
7.
II (IS)ULL (Insta##Met*o.I.),4,Insta##Met*o.I.)

Date_Diff(TRUNC(O7Se!6i$eO!.e!/ateVa#ue),TRUNC(O7Se!6i$eO!.e!/ateVa#ue), 0//0)

*I&TER +ONDITION +o pass only )O+ )@LL A)D )O+ #9A1E# VAL@E# +$%O@8$ +%A)# O%&A+.O). IIF ( ISNULL(LENGTH(RTRIM(LTRIM(A/SLT))))) ,4 ,LENGTH(RTRIM(LTRIM(A/SLT)))))94

SECOND FILTER CONDITION iif(isnull(USER7)AME), ALSE,TRUE)

:&ass on#; )OT )ULL ROM ILTER<

17

,ER*OR$AN+E TI,S IN GENERA&


&ost of the gains in performance deri-e from good database design, thorough Fuery analysis, and appropriate inde!ing. +he largest performance gains can be realized by establishing a good database design. @' U!date Tab%e Statistics in database' #*/A#E #*)+AB3 update all statistics table_name

Adapti-e #er-er?s cost5based optimizer uses statistics about the tables, inde!es, and columns named in a Fuery to estimate Fuery costs. .t chooses the access method that the optimizer determines has the least cost. /ut this cost estimate cannot be accurate if statistics are not accurate. #ome statistics, such as the number of pages or rows in a table, are updated during Fuery processing. Other statistics, such as the histograms on columns, are only updated when you run the update statistics command or when inde!es are created. .f you are ha-ing problems with a Fuery performing slowly, and see' help from +echnical #upport or a #ybase news group on the .nternet, one of the first Fuestions you are li'ely be as'ed is =Did you run update statisticsC> *ou can use the optdiag command ".) #*/A#E( to see the time update statistics was last run for each column on which statistics e!ist3 NOTE.

18

%unning the update statistics commands reFuires system resources. Li'e other maintenance tas's, it should be scheduled at times when load on the ser-er is light. .n particular, update statistics reFuires table scans or leaf5le-el scans of inde!es, may increase .:O contention, may use the 19@ to perform sorts, and uses the data and procedure caches. @se of these resources can ad-ersely affect Fueries running on the ser-er if you run update statistics at times when usage is high. .n addition, some update statistics commands reFuire shared loc's, which can bloc' updates. D Dropping an inde! does not drop the statistics for the inde!, since the optimizer can use column5 le-el statistics to estimate costs, e-en when no inde! e!ists. .f you want to remo-e the statistics after dropping an inde!, you must e!plicitly delete them with delete statistics. M +runcating a table does not delete the column5le-el statistics in sysstatistics. .n many cases, tables are truncated and the same data is reloaded. #ince truncate table does not delete the column5le-el statistics, there is no need to run update statistics after the table is reloaded, if the data is the same. .f you reload the table with data that has a different distribution of 'ey -alues, you need to run update statistics. M *ou can drop and re5create inde!es without affecting the inde! statistics, by specifying < for the number of steps in the with statistics clause to create inde!. +his create inde! command does not affect the statistics in sysstatistics ".) #*/A#E(3 1reate inde! titleKidKi! on titles "titleKid( with statistics using < -alues +his allows you to re5create an inde! without o-erwriting statistics that ha-e been edited with optdiag. M .f two users attempt to create an inde! on the same table, with the same columns, at the same time, one of the commands may fail due to an attempt to enter a duplicate 'ey -alue in sysstatistics.

4' +reate IndeAes on >EY fie%ds' >ee! IndeA statistics u! to date' NOTE. .f data modification performance is poor, you may ha-e too many inde!es. While inde!es fa-or =select operations>, they slow down =data modifications>. A/O@+ .)DEBE# .nde!es are the most important physical design element in impro-ing database performance3 M .nde!es help pre-ent table scans. .nstead of reading hundreds of data pages, a few inde! pages and data pages can satisfy many Fueries. M or some Fueries, data can be retrie-ed from a nonclustered inde! without e-er accessing the data rows. M 1lustered inde!es can randomize data inserts, a-oiding insert =hot spots> on the last page of a table. M .nde!es can help a-oid sorts, if the inde! order matches the order of columns in an order by clause. .n addition to their performance benefits, inde!es can enforce the uniFueness of data.

19

.nde!es are database ob0ects that can be created for a table to speed direct access to specific data rows. .nde!es store the -alues of the 'ey"s( that were named when the inde! was created, and logical pointers to the data pages or to other inde! pages. Adapti-e #er-er "#*/A#E( pro-ides two types of inde!es3 M 1lustered inde!es, where the table data is physically stored in the order of the 'eys on the inde!3 M or allpages5loc'ed tables, rows are stored in 'ey order on pages, and pages are lin'ed in 'ey order. M or data5only5loc'ed tables, inde!es are used to direct the storage of data on rows and pages, but strict 'ey ordering is not maintained. M )onclustered inde!es, where the storage order of data in the table is not related to inde! 'eys. *ou can create only one clustered inde! on a table because there is only one possible physical ordering of the data rows. *ou can create up to 6;J nonclustered inde!es per table. A table that has no clustered inde! is called a =heap>.

6' Dro! and Re7create the IndeAes that hurt !erfor-ance' Drop .nde!es ".n 9re5#ession( before inserting data A)D %e51reate .nde!es ".n 9ost5#ession( after data is inserted. NOTE. With inde!es, inserting data is slower. Drop inde!es that hurt performance. .f an application performs data modifications during the day and generates reports at night, you may want to drop some inde!es in the morning and re5create them at night. Drop inde!es during periods when freFuent updates occur and rebuild them before periods when freFuent selects occur.

B' A%so 0ou can i-!ro/e !erfor-ance b0 @sing transaction log thresholds to automate log dumps and =a-oid running out of space>. @sing thresholds for space monitoring in data segments. @sing partitions to speed loading of data.

C' To tune the SD& Duer0 We can use =9arallel $ints> in the #ELE1+ stmt of #,L ,uery. Also use the table with large no. of rows last when 0oining. .n other sense, use the table with less no. of rows as a &A#+E% source. Also ,ueries that contain O%DE% /* or 8%O@9 /* clauses may benefit from creating an inde! on the O%DE% /* or 8%O@9 /* columns. Once you optimize the Fuery, use the #,L o-erride option to ta'e full ad-antage of these modifications.

E' Registering $u%ti!%e Ser/ers

20

Also performance can be increased by registering multiple ser-ers which point to same repository.

Other -ethods to I-!ro/e ,erfor-ance


O!ti-iFing the Target Database .f your session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local to the .nformatica #er-er. .f your session writes to a relational target, consider performing the following tas's to increase performance3 Drop inde!es and 'ey constraints. .ncrease chec'point inter-als. @se bul' loading. @se e!ternal loading. +urn off reco-ery. .ncrease database networ' pac'et size. Optimize Oracle target databases. Dro!!ing IndeAes and >e0 +onstraints When you define 'ey constraints or inde!es in target tables, you slow the loading of data to those tables. +o impro-e performance, drop inde!es and 'ey constraints before running your session. *ou can rebuild those inde!es and 'ey constraints after the session completes. .f you decide to drop and rebuild inde!es and 'ey constraints on a regular basis, you can create pre5 and post5load stored procedures to perform these operations each time you run the session. Note. +o optimize performance, use constraint5based loading only if necessary.

21

Increasing +hecG!oint Inter/a%s +he .nformatica #er-er performance slows each time it waits for the database to perform a chec'point. +o increase performance, consider increasing the database chec'point inter-al. When you increase the database chec'point inter-al, you increase the li'elihood that the database performs chec'points as necessary, when the size of the database log file reaches its limit. 8u%G &oading on S0base and $icrosoft SD& Ser/er *ou can use bul' loading to impro-e the performance of a session that inserts a large amount of data to a #ybase or &icrosoft #,L #er-er database. 1onfigure bul' loading on the +argets dialog bo! in the session properties. When bul' loading, the .nformatica #er-er bypasses the database log, which speeds performance. Without writing to the database log, howe-er, the target database cannot perform rollbac'. As a result, the .nformatica #er-er cannot perform reco-ery of the session. +herefore, you must weigh the importance of impro-ed session performance against the ability to reco-er an incomplete session. .f you ha-e inde!es or 'ey constraints on your target tables and you want to enable bul' loading, you must drop the inde!es and constraints before running the session. After the session completes, you can rebuild them. .f you decide to use bul' loading with the session on a regular basis, you can create pre5 and post5load stored procedures to drop and rebuild inde!es and 'ey constraints. or other databases, e-en if you configure the bul' loading option, .nformatica #er-er ignores the commit inter-al mentioned and commits as needed. EAterna% &oading on TeradataH Orac%eH and S0base ID *ou can use the E!ternal Loader session option to integrate e!ternal loading with a session. .f you ha-e a +eradata target database, you can use the +eradata e!ternal loader utility to bul' load target files. .f your target database runs on Oracle, you can use the Oracle #,LNLoader utility to bul' load target files. When you load data to an Oracle database using a partitioned session, you can increase performance if you create the Oracle target table with the same number of partitions you use for the session. .f your target database runs on #ybase .,, you can use the #ybase ., e!ternal loader utility to bul' load target files. .f your #ybase ., database is local to the .nformatica #er-er on your @).B system, you can increase performance by loading data to target tables directly from named pipes. @se pmconfig to enable the #ybase.,Localto9&#er-er option. When you enable this option, the .nformatica #er-er loads data directly from named pipes rather than writing to a flat file for the #ybase ., e!ternal loader. Increasing Database Net orG ,acGet SiFe *ou can increase the networ' pac'et size in the .nformatica #er-er &anager to reduce target bottlenec'. or #ybase and &icrosoft #,L #er-er, increase the networ' pac'et size to D2 5 4H2. or Oracle, increase the networ' pac'et size in tnsnames.ora and listener.ora. .f you increase the networ' pac'et size in the .nformatica #er-er configuration, you also need to configure the database ser-er networ' memory to accept larger pac'et sizes. O!ti-iFing Orac%e Target Databases .f your target database is Oracle, you can optimize the target database by chec'ing the storage clause, space allocation, and rollbac' segments.

22

When you write to an Oracle database, chec' the storage clause for database ob0ects. &a'e sure that tables are using large initial and ne!t -alues. +he database should also store table and inde! data in separate tablespaces, preferably on different dis's. When you write to Oracle target databases, the database uses rollbac' segments during loads. &a'e sure that the database stores rollbac' segments in appropriate tablespaces, preferably on different dis's. +he rollbac' segments should also ha-e appropriate storage clauses. *ou can optimize the Oracle target database by tuning the Oracle redo log. +he Oracle database uses the redo log to log loading operations. &a'e sure that redo log size and buffer size are optimal. *ou can -iew redo log properties in the init.ora file. .f your Oracle instance is local to the .nformatica #er-er, you can optimize performance by using .91 protocol to connect to the Oracle database. *ou can set up Oracle database connection in listener.ora and tnsnames.ora.

I-!ro/ing ,erfor-ance at -a!!ing %e/e%


O!ti-iFing Datat0!e +on/ersions orcing the .nformatica #er-er to ma'e unnecessary datatype con-ersions slows performance. or e!ample, if your mapping mo-es data from an .nteger column to a Decimal column, then bac' to an .nteger column, the unnecessary datatype con-ersion slows performance. Where possible, eliminate unnecessary datatype con-ersions from mappings. #ome datatype con-ersions can impro-e system performance. @se integer -alues in place of other datatypes when performing comparisons using Loo'up and ilter transformations. or e!ample, many databases store @.#. zip code information as a 1har or Varchar datatype. .f you con-ert your zip code data to an .nteger datatype, the loo'up database stores the zip code J;7<75467; as J;7<7467;. +his helps increase the speed of the loo'up comparisons based on zip code. O!ti-iFing &ooGu! Transfor-ations .f a mapping contains a Loo'up transformation, you can optimize the loo'up. #ome of the things you can do to increase performance include caching the loo'up table, optimizing the loo'up condition, or inde!ing the loo'up table. 1aching Loo'ups .f a mapping contains Loo'up transformations, you might want to enable loo'up caching. .n general, you want to cache loo'up tables that need less than 7<<&/.

23

When you enable caching, the .nformatica #er-er caches the loo'up table and Fueries the loo'up cache during the session. When this option is not enabled, the .nformatica #er-er Fueries the loo'up table on a row5by5row basis. *ou can increase performance using a shared or persistent cache3 Shared cache' *ou can share the loo'up cache between multiple transformations. *ou can share an unnamed cache between transformations in the same mapping. *ou can share a named cache between transformations in the same or different mappings. ,ersistent cache' .f you want to sa-e and reuse the cache files, you can configure the transformation to use a persistent cache. @se this feature when you 'now the loo'up table does not change between session runs. @sing a persistent cache can impro-e performance because the .nformatica #er-er builds the memory cache from the cache files instead of from the database. %educing the )umber of 1ached %ows @se the Loo'up #,L O-erride option to add a W$E%E clause to the default #,L statement. +his allows you to reduce the number of rows included in the cache. Optimizing the Loo'up 1ondition .f you include more than one loo'up condition, place the conditions with an eFual sign first to optimize loo'up performance. .nde!ing the Loo'up +able +he .nformatica #er-er needs to Fuery, sort, and compare -alues in the loo'up condition columns. +he inde! needs to include e-ery column used in a loo'up condition. *ou can impro-e performance for both cached and uncached loo'ups3 +ached %ooGu!s' *ou can impro-e performance by inde!ing the columns in the loo'up O%DE% /*. +he session log contains the O%DE% /* statement. Uncached %ooGu!s' /ecause the .nformatica #er-er issues a #ELE1+ statement for each row passing into the Loo'up transformation, you can impro-e performance by inde!ing the columns in the loo'up condition.

I-!ro/ing ,erfor-ance at Re!ositor0 %e/e%


+uning %epository 9erformance

+he 9ower&art and 9ower1enter repository has more than D< tables and almost all tables use one or more inde!es to speed up Fueries. &ost databases 'eep and use column distribution statistics to determine which inde! to use to e!ecute #,L Fueries optimally. Database ser-ers do not update these statistics continuously. .n freFuently5used repositories, these statistics can become outdated -ery Fuic'ly and #,L Fuery optimizers may choose a less than optimal Fuery plan. .n large repositories, the impact of choosing a sub5optimal Fuery plan can affect performance drastically. O-er time, the repository becomes slower and slower.

24

+o optimize #,L Fueries, you might update these statistics regularly. +he freFuency of updating statistics depends on how hea-ily the repository is used. @pdating statistics is done table by table. +he database administrator can create scripts to automate the tas'. *ou can use the following information to generate scripts to update distribution statistics. Note. All 9ower&art:9ower1enter repository tables and inde! names begin with =O9/K>.

Orac%e Database *ou can generate scripts to update distribution statistics for an Oracle repository. +o generate scripts for an Oracle repository3 4. %un the following Fueries3 select Oanalyze table O, tableKname, O compute statisticsPO from userKtables where tableKname li'e OO9/KQO select Oanalyze inde! O, .)DEBK)A&E, O compute statisticsPO from userKinde!es where .)DEBK)A&E li'e OO9/KQO

+his produces an output li'e the following3 OA)AL*RE+A/LEO +A/LEK)A&E O1O&9@+E#+A+.#+.1#PO

55555555555555 5555555555555555 555555555555555555555555555555555555555555555555555555555555555555555555555555 analyze table analyze table analyze table O9/KA)AL*REKDE9 O9/KA++% O9/K/A+1$KO/EE1+ compute statisticsP compute statisticsP compute statisticsP

6. #a-e the output to a file. 7. Edit the file and remo-e all the headers. $eaders are li'e the following3 OA)AL*RE+A/LEO +A/LEK)A&E O1O&9@+E#+A+.#+.1#PO

55555555555555 5555555555555555 55555555555555555555

25

;. %un this as an #,L script. +his updates repository table statistics.

$icrosoft SD& Ser/er *ou can generate scripts to update distribution statistics for a &icrosoft #,L #er-er repository. +o generate scripts for a &icrosoft #,L #er-er repository3 4. %un the following Fuery3 select Oupdate statistics O, name from sysob0ects where name li'e OO9/KQO +his produces an output li'e the following3 name 555555555555555555 555555555555555555 update statistics O9/KA)AL*REKDE9 update statistics O9/KA++% update statistics O9/K/A+1$KO/EE1+ 6. #a-e the output to a file. 7. Edit the file and remo-e the header information. $eaders are li'e the following3 name 555555555555555555 555555555555555555 ;. Add a go at the end of the file. A. %un this as a sFl script. +his updates repository table statistics.

I-!ro/ing ,erfor-ance at Session %e/e%


O!ti-iFing the Session Once you optimize your source database, target database, and mapping, you can focus on optimizing the session. *ou can perform the following tas's to impro-e o-erall performance3

26

%un concurrent batches. 9artition sessions. %educe errors tracing. %emo-e staging areas. +une session parameters.

+able 4J54 lists the settings and -alues you can use to impro-e session performance3 +able 4J54. #ession +uning 9arameters Setting Defau%t 5a%ue bytes Suggested $ini-u- Suggested $aAi-u5a%ue 5a%ue H,<<<,<<< bytes ;,<<< bytes 4,<<<,<<< bytes 6,<<<,<<< bytes ):A ):A +erse 46D,<<<,<<< bytes 46D,<<< bytes 46,<<<,<<< bytes 6;,<<<,<<< bytes ):A ):A ):A

D+& /uffer 9ool 46,<<<,<<< #ize S46 &/T /uffer bloc' size .nde! cache size Data cache size 1ommit inter-al Decimal arithmetic +racing Le-el H;,<<< bytes SH; 2/T

4,<<<,<<< bytes 6,<<<,<<< bytes 4<,<<< rows Disabled )ormal

Ho

to correct and %oad the re2ected fi%es

hen the session co-!%etes

During a session, the .nformatica #er-er creates a re0ect file for each target instance in the mapping. .f the writer or the target re0ects data, the .nformatica #er-er writes the re0ected row into the re0ect file. /y default, the .nformatica #er-er creates re0ect files in the G9&/ad ileDir ser-er -ariable directory. +he re0ect file and session log contain information that helps you determine the cause of the re0ect. *ou can correct re0ect files and load them to relational targets using the .nformatica re0ect loader utility. +he re0ect loader also creates another re0ect file for the data that the writer or target re0ect during the re0ect loading. 1omplete the following tas's to load re0ect data into the target3 Locate the re0ect file. 1orrect bad data. %un the re0ect loader utility.

NOTE. *ou cannot load re0ected data into a flat file target

27

After you locate a re0ect file, you can read it using a te!t editor that supports the re0ect file code page. %e0ect files contain rows of data re0ected by the writer or the target database. +hough the .nformatica #er-er writes the entire row in the re0ect file, the problem generally centers on one column within the row. +o help you determine which column caused the row to be re0ected, the .nformatica #er-er adds row and column indicators to gi-e you more information about each column3 %ow indicator. +he first column in each row of the re0ect file is the row indicator. +he numeric indicator tells whether the row was mar'ed for insert, update, delete, or re0ect. 1olumn indicator. 1olumn indicators appear after e-ery column of data. +he alphabetical character indicators tell whether the data was -alid, o-erflow, null, or truncated.

+he following sample re0ect file shows the row and column indicators3 7,D,4,D,,D,<,D,4<J;J;A6AA,D,<.<<,D,5<.<<,D <,D,4,D,April,D,4JJI,D,4,D,547H;.66,D,547H;.66,D <,D,4,D,April,D,6<<<,D,4,D,6AH<JI;.JH,D,6AH<JI;.JH,D 7,D,4,D,April,D,6<<<,D,<,D,<.<<,D,<.<<,D <,D,4,D,August,D,4JJI,D,6,D,66D7.IH,D,;AHI.A7,D <,D,7,D,December,D,4JJJ,D,4,D,6I7D6A.<7,D,6I7D6A.<7,D <,D,4,D,#eptember,D,4JJI,D,4,D,<.<<,D,<.<<,D Ro Indicators

+he first column in the re0ect file is the row indicator. +he number listed as the row indicator tells the writer what to do with the row of data. +able 4A54 describes the row indicators in a re0ect file3 +able 4A54. %ow .ndicators in %e0ect ile %ow .ndicator &eaning %e0ected /y < 4 6 7 .nsert @pdate Delete %e0ect Writer or target Writer or target Writer or target Writer

.f a row indicator is 7, the writer re0ected the row because an update strategy e!pression mar'ed it for re0ect. .f a row indicator is <, 4, or 6, either the writer or the target database re0ected the row. +o narrow down the reason why rows mar'ed <, 4, or 6 were re0ected, re-iew the column indicators and consult the session log. +o%u-n Indicators

28

After the row indicator is a column indicator, followed by the first column of data, and another column indicator. 1olumn indicators appear after e-ery column of data and define the type of the data preceding it. +able 4A56 describes the column indicators in a re0ect file3 +able 4A56. 1olumn .ndicators in %e0ect ile 1olumn .ndicator +ype of data Writer +reats As 8ood data. Writer passes it to the target database. +he target accepts it unless a database error occurs, such as finding a duplicate 'ey.

Valid data.

O-erflow. )umeric data e!ceeded the /ad data, if you configured the mapping specified precision or scale for the target to re0ect o-erflow or truncated data. column. 8ood data. Writer passes it to the target, )ull. +he column contains a null which re0ects it if the target database does -alue. not accept null -alues. +runcated. #tring data e!ceeded a /ad data, if you configured the mapping specified precision for the column, so target to re0ect o-erflow or truncated data. the .nformatica #er-er truncated it.

After you correct the target data in each of the re0ect files, a!!end ('in) to each re2ect fi%e you want to load into the target database. or e!ample, after you correct the re0ect file, tKA-g#alesK4.bad, you can rename it tKA-g#alesK4.bad.in. After you correct the re0ect file and rename it to reject_file.in, you can use the re0ect loader to send those files through the writer to the target database.

@se the re0ect loader utility from the command line to load re0ected files into target tables. +he synta! for re0ect loading differs on @).B and Windows )+:6<<< platforms. Use the fo%%o ing s0ntaA for UNI?. pmre0ldr pmser-er.cfg Sfolder_name3Tsession_name Use the fo%%o ing s0ntaA for Windo s NTI4JJJ. pmre0ldr Sfolder_name3Tsession_name

29

Reco/ering Sessions
.f you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. 1orrect the errors, and then complete the session. +he method you use to complete the session depends on the properties of the mapping, session, and .nformatica #er-er configuration. @se one of the following methods to complete the session3 %un the session again if the .nformatica #er-er has not issued a commit. +runcate the target tables and run the session again if the session is not reco-erable. 1onsider performing reco-ery if the .nformatica #er-er has issued at least one commit. When the .nformatica #er-er starts a reco-ery session, it reads the O,8KSR5RKRE+O5ERY table and notes the row .D of the last row committed to the target database. +he .nformatica #er-er then reads all sources again and starts processing from the ne!t row .D. or e!ample, if the .nformatica #er-er commits 4<,<<< rows before the session fails, when you run reco-ery, the .nformatica #er-er bypasses the rows up to 4<,<<< and starts loading with row 4<,<<4. +he commit point may be different for source5 and target5based commits.

30

/y default, 9erform %eco-ery is disabled in the .nformatica #er-er setup. *ou must enable %eco-ery in the .nformatica #er-er setup before you run a session so the .nformatica #er-er can create and:or write entries in the O9/K#%V%K%E1OVE%* table.

+auses for Session *ai%ure Reader errors' Errors encountered by the .nformatica #er-er while reading the source database or source files. %eader threshold errors can include alignment errors while running a session in @nicode mode. Writer errors' Errors encountered by the .nformatica #er-er while writing to the target database or target files. Writer threshold errors can include 'ey constraint -iolations, loading nulls into a not null field, and database trigger responses. Transfor-ation errors' Errors encountered by the .nformatica #er-er while transforming data. +ransformation threshold errors can include con-ersion errors, and any condition set up as an E%%O%, such as null input. *ata% Error A fatal error occurs when the .nformatica #er-er cannot access the source, target, or repository. +his can include loss of connection or target database errors, such as lac' of database space to load data. .f the session uses a )ormalizer or #eFuence 8enerator transformation, the .nformatica #er-er cannot update the seFuence -alues in the repository, and a fatal error occurs.

31

Vous aimerez peut-être aussi