Vous êtes sur la page 1sur 6

ETL Team Development Standards

ETL Development Checklist


1. Develop and publish technical specifications based on functional requirements 2. Develop Informatica mappings in development folder, following folder management, version control, naming, data transformation, and dimension table standards. 3. Design Load Status Table process if necessary. 4. Design incremental load process if necessary. 5. Develop Informatica workflows in development folder, following folder management, version control, and naming, standards 6. Test Informatica mappings and sessions. 7. Migrate Informatica objects to production. 1. Technical Specification Development Guidelines Publishing Publish completed specifications and all technical Peer Review Review completed designs with teammates. 2. Informatica Mapping Development Guidelines Folder Management Determine which folder in which to locate new mappings. Develop a new folder if necessary, coordinating with the Informatica PowerCenter Server administrator. Maps and sessions may be developed in personal folders within the info_dev repository, but should be migrated to the dev copy of the relevant production folder after initial development is complete and before QA testing begins. Version Control For new Informatica mappings, create mapping in info_dev repository using Designer tool. Version for new maps is 1.0. See Naming Standards below for details. For revisions to existing Informatica mappings, copy mappings from info_prod repository to info_dev repository within Designer tool. Increment info_dev instance of mapping version/name per Naming Standards below. Do not use shortcuts to develop folders or mappings. Naming Standards Within Informatica Designer, maps should be named using the following template:
Area_TargetName_Qualifier_Action_vX_Y where:

Page 1

ETL Team Development Standards


Area is stg Staging dw EDW fact/dimension tables

TargetName is the final target table name all in upper case Action is del - Delete ins Insert updt - Update extr - Insert and Update scd - Slowly Changing Dimensions copy - Copy (no transformation logic between sources and targets mainly used for source to stage copies, creation of test data, and ad hoc data movement.) A description of the functionality of the mapping. This only needs to be added if multiple mappings use the same target table. stands for version

Qualifier is v

X is the major version number. It is initially set to 1 when a map is first created and is incremented by one for each subsequent major change to that mapping. Major changes involve fundamental changes to a map design, e.g. new sources, transformations and/or targets, replacing or significantly augmenting existing functionality. For minor mapping revisions, the major version number remains constant. Y is the minor version number. It is initially set to 0 when a map is first created and is incremented by one for each minor change to a given mapping (e.g. a change to a Filter transformation condition or a change to derived values within an Expression tranformation). When the Major Version number (X above) is incremented, the minor version number is re-set to 0.

1. 2. 3.

4. 5.

Data Transformation Standards When values are moved into data warehouse fact tables, they will only be transformed when the consumer requests a transformation: Nulls will remain nulls. Right trim spaces from all varchar columns in the ETL mapping except for Desc or Txt types. Name types should be right trimmed. If a code field is equal to null or spaces in the source table, change it to a single space in the fact table. This will correspond to the default value in the associated lookup table (see Dimension Table Standards below for additional information). Where possible, use the target database sequence generator (e.g. Oracle (select nextval from dual) to generate surrogate values. This simplifies data loads from multiple sources (and multiple Informatica maps) into dimensional tables, and also simplifies ETL migrations from development/QA to production environments. High value date: The date 12/31/9999 should be used in DW_LAST_EFFECTIVE_DT to indicate the maximum date. When the source is PeopleSoft, DW_FEFF_DT should be set equal to the date set in the PeopleSoft effective date column. Page 2

ETL Team Development Standards 6. When the source system is not PeopleSoft as a source, DW_FEFF_DT should be set to the date the data was entered into the source system. 7. DW_LEFF_DT of the old current row should be changed from 12/31/9999 to the DW_FEFF_DT of the new current row minus one day. 8. To determine the value in DW_FIRST_EFFECTIVE_DT a. First, take the value from the source system. b. If there is no date in the source system, consult the business owner for the date to use. c. If the column is a Data Warehouse field of no interest to the business owner, use the default date of 01/01/1902 to indicate the earliest possible First Effective Date. The reason this date is used is to distinguish a date set by the Data Warehouse from PeopleSoft system data which uses 01/01/1900 and UCB custom data which uses 01/01/1901. Dimension Table Standards 9. Right trim spaces from all varchar columns in the ETL mapping except for Desc or Txt types. Name types should be right trimmed. 10. When a consuming group uses a set of values that does not match the current set of values in a dimension table, if necessary their values will be added and crossed referenced to the values already in the table. When a consuming group uses different descriptions for the same table value, if necessary their description will also be kept in the table. 11. All dimensions will have a row whose key is the highest value possible for the keys data type (e.g. 99999) and a description of Unknown. When a value is moved into the warehouse that does not match any of the actual dimension values contained in the dimension, this key will be assigned. 12. For all dimension table columns, if a report requires dashes to be displayed instead of a space, this change will be done in the report, not in the underlying view or table. 13. Add a row in all lookup tables (dimensions containing lists of code values and associated descriptions) for no value. This will allow an inner join between the lookup table and the fact table. a. The code value should be a single space all source system code values of null or spaces should be converted to a single space before loading into the lookup table. Special consideration should be given to cases where a source system contains multiple code values equal to null and/or one or more spaces. b. Effective Date will be set to 01/01/1902 (note that the default effective date in PeopleSoft applications is 01/01/1900. Special consideration will neeed to be made for those rare cases where there is data effective before 1900). c. Short Description will be Unknown. d. Long Description will be Unknown.

3. Load Status Table Load start and end times for each session should be captured in a load status table. A status screen that shows the current load status will be developed to keep consumers informed.

Page 3

ETL Team Development Standards 4. Incremental Load Process Design The table BISSTG.ETL_JOB_TRACKER has been developed to support EDW incremental load processes. It contains the columns listed in the matrix below:
Column Name JOB_NAME Data Type VARCHAR2(80) Comments Contains the name of the Informatica Session to be tracked. Should be unique for each session to be tracked. Also should be generic so that if the session name changes the table and mapping doesnt need to be updated; e.g. BR2_jad_atld rather than b_BR2_jad_atld_p3 Timestamp indicating the time a given session started, for date-based incremental loads Timestamp indicating the date/time a given session completed.

START_DATE END_DATE

DATE DATE

To support the functionality of this table, the SP_JOB_TRACKER Oracle Stored Procedure has been created. This stored procedure takes JOB_NAME and either START or FINISH as input (see format below):
SP_JOB_TRACKER (JOB_NAME, [START/FINISH]

If called with START, it creates a new ETL_JOB_TRACKER record, setting END_DATE to NULL. If called with FINISH, it updates END_DATE to equal sysdate in the most recent job tracker record. The following steps will allow incremental loading of data from source tables, independently of the last successful run of a given mapping. Incorporate SP_JOB_TRACKER_START as an unconnected Stored Procedure transformation object of type Source Pre-load. Use the Informatica Job Name and START as inputs. Incorporate SP_JOB_TRACKER_FINISH as an unconnected Stored Procedure transformation object of type Target Post-load. Use Job Name and FINISH as inputs. Add the following to the Source SQL:
WHERE[Source table date field of interest] > (SELECT MAX(START_DATE) FROM ETL_JOB_TRACKER WHERE JOB_NAME=[Job Name] and FINISH_DATE NOT NULL

5. Informatica Workflow Development Folder Management Session folder should be the same as the folder in which mappings are developed using the Designer tool. Version Control Page 4

ETL Team Development Standards For new Informatica mappings, create sessions in info_dev repository using Server Manager tool, per Informatica Naming Standards below. Sessions should have the same version number as their associated mappings. For revisions to existing Informatica mappings, copy sessions from info_prod repository to info_dev repository within Server Manager tool. Verify that connection strings work and that targets are appropriately defined. Maps and associated sessions should share the same version number. See naming standards below for details on how version numbers should be maintained for sessions. Naming Standards Within Informatica Workflow Manager, sessions should be named using the following template: s_MappingName_Qualifier
where: s stands for session

MappingName is the name of the Informatica mapping associated with a given session Qualifier is A description of the functionality of the session. This only needs to be added if the mapping is associated with multiple sessions. Within Informatica Workflow Manager, Workflows should be named using the following template: wf_WorkflowName_Frequency where: wf stands for Workflow

WorkflowName is a description of the functionality contained within the workflow, e.g. HR_ADM_WKFORCE Frequency is how often the workflow runs e.g. Monthly, Daily, Weekly. Daily can be used for workflows which run Monday through Saturday or Monday through Friday.

6. Unit Testing Using test data, verify that Informatica mappings perform as expected. Check results, troubleshoot performance, execution and data quality. issues. Access privileges may prevent developers from being able to view data contained in database Views. Two ways to deal with this: 1. Update the security tables (in dev or QA) to allow access by your userid. Only do this if you really know what you are doing.

Page 5

ETL Team Development Standards 2. Apply for security access through SARA (HRMS Dept. Security, Administer Workforce) 7. Migration to Production Informatica objects o Copy mappings from info_dev repository to info_prod repository using the Designer tool. o Copy sessions from info_dev repository to info_prod repository using the Server Manager tool. Modify database connections as necessary to point to production sources and targets. Verify that connection strings work and that sources and targets are appropriately defined. Maps and associated sessions should share the same version number. o Assemble sessions into workflows as necessary, per technical specifications and functional requirements. o If possible, test mappings/sessions/workflows in production to ensure functionality is as expected.

Page 6

Vous aimerez peut-être aussi