Vous êtes sur la page 1sur 27

Best Practices

Incorporating Target's Standards

DataStage is a trademark of International Business Machines Corporation

Main Source

"DataStage Technical Design and Construction Procedures" This is a "living document"

\\nicsrv10\TTS\E\ETL\Best Practices\DataStageTechDoc\DataStageTech.doc

that is, a work in progress changes will be notified

Job Naming

Each DW "project" has three-letter code

for example GLB

Within Jobs branch create category with that name

keep all objects together in order to support MetaStage functions

Job Naming

Job name begins with database identifier

for example GTL


GTLJB0001 GTLJB0002 GTLJB0002TEST GTLJB0003

Followed by job identifier and sequence


Stage Names

First 3-4 characters: stage type

SEQL (Sequential File stage) LKFS (Lookup File Set stage)

Remainder should be meaningful and descriptive

first character to be capitalized

Link Names

Links prior to final active stage

shortdesc_InTo_stagedesc shortdesc_OutTo_stagedesc

Links after final active stage

Links from passive stage

In_linkdesc
Out_linkdesc_action

Links to passive stage

Links from Lookup stage

Lkup_linkdesc

Example
Images copyright claimed by Ascential Software Corporation

Reusable Components

Images copyright claimed by Ascential Software Corporation

Create reusable components where possible


shared containers flexible routines

Annotations

Annotations are to be used to explain processing Description annotation shows purpose of job

Annotations
Description Annotation
Images copyright claimed by Ascential Software Corporation

Job Descriptions
Images copyright claimed by Ascential Software Corporation

Become text of description annotation Short description visible in Detail view (Manager)

Stage/Link Naming

Stages are named after

the data they access (passive stages) the function they perform (active) for the data they carry
such as Sequential_File_0

Links are named

Do not leave default names

Developing Jobs
1.

Keep it simple
jobs with many stages are hard to debug and maintain documentation

2.

Start small and Build to final Solution


plan use view data, copy, and peek start from source and work out develop with a 1 node configuration file, small set of data

Developing Jobs (continued)


3.

Solve the business problem before the performance problem


dont worry too much about partitioning until the sequential flow works as expected

4.

If you have to write to disk use a persistent Data Set

Developing Jobs (continued)


Images copyright claimed by Ascential Software Corporation

Iterative Design

Use Copy or Peek stage as stub Test job in phases small first, then increasing in complexity Use Peek stage to examine records

Example Phase 1
Images copyright claimed by Ascential Software Corporation

Example Phase 2
Images copyright claimed by Ascential Software Corporation

Example Phase 3
Images copyright claimed by Ascential Software Corporation

Transformer Stage

Transformer stage generates code Always include reject link Always test for null value before using a column in a function Be aware of column and stage variable data types

often developer does not pay attention to Stage Variable data type try to maintain the data type as imported

Avoid data type conversions

Job Parameters

Provide insurance against

things that change over time (for example passwords, filter conditions) things that different in different environments (for example DSNs, pathnames, passwords)

Job Parameters

Created in Job Properties Each parameter has


name prompt text (mandatory) type default value (design time) help text

Defining Job Parameters

Images copyright claimed by Ascential Software Corporation

Click to add environment variables

Using Job Parameters

In fields in passive stages delimit with "#" characters

for example #SourceDir#

Names are case-sensitive In expressions choose from expression editor

not delimited

Useful Environment Variables

APT_DUMP_SCORE

report osh to message log


establishes name of configuration file and therefore degree of parallelism

APT_CONFIG_FILE

DUMP SCORE Output


Images copyright claimed by Ascential Software Corporation

Setting APT_DUMP_SCORE yields:


Partitioner And Collector

Two DataSets

Mapping Node --> partition

Configuration Files

Make a set for 1X, 2X,. Use different ones for test versus production Include as a parameter in each job Automatic scaling

Vous aimerez peut-être aussi