Vous êtes sur la page 1sur 13

Session 10

Improving Performance and Reliability

About Look File

A lookup file is a dataset that associates key values with corresponding data values. Unlike other datasets, the lookup file is not connected with flows of other components. Lookup files are associated with graphs that use transform functions. Refer to the on-line help for more information, in particular: Help >> Contents >> DML

About Lookup Expressions

To use a lookup file, you must write a lookup expression. A lookup expression is a DML function that retrieves data from a lookup file. The first argument is the name of the lookup file and the second is the index expression. Ab Initio find a record in the lookup file whose key value matches the index expression, and return the entire record.

Project Organization: An Introduction

A Project is a set of related graphs, records formats, transforms, database configuration files, and documentation that apply to a particular problem. In your host setup file, you should set environment variables to refer to the various subdirectories.

Lookups or Join Components?

Beginning with Ab Initio version 2.1 there are four ways to compute joins: the Match Sorted component, lookup() expressions inside a Reformat, and the Join component.

Use Join

Use Join with the sorted-input parameter set to Input need not be sorted when

Your data is already sorted, or Only one of your inputs is small enough to fit into
memory

Use Join with the sorted-input parameter set to In memory: Input need not be sorted when

All but one of your inputs are small enough to fit into

memory, and You have multiple records with the same key value on the small input, or You need a outer or semi-join

Use Lookup File


Use a Lookup File when

All but one inputs are small enough to fit into

memory, and You want fastest join when one input has no duplicate key values, or You have a complex joining expression that uses several lookup tables at once

Use Match Sorted


Use Match Sorted when

Your graph must be back-compatible with


GDE versions lower than 1.3.4

How Components Group Data

When these components process unsorted input, they maximize performance by keeping intermediate results in the main memory. There are four components in the Transform folder that can accept either sorted or unsorted input, depending on the setting of the sortedinput parameter. When this parameter is set to In-memory: Input need not be sorted, these components group their data internally according to like keys without sorting them. They are

How Components Group Data

Aggregate generates summary records for groups of input records. Join performs inner-, outer-, and semi-joins with multiple flows of input records. Rollup generates summary records for group of input records. Rollup gives you more control over record selection, grouping, and aggregation than Aggregate. Scan generates a series of cumulative summary records such as successive year-to-date totals for groups of input records.

Checkpoints, Phases, Phase Breaks

A checkpoint is a stage that saves status information in temporary files and allows you to recover from failures. A phase is a stage that runs to completion before the start of the next stage. It does not save checkpoint information. C Means checkpoint P Means phase without checkpoint. Visual representation of the stages in your application are called phase breaks.

Summary

Checkpoint a stage of your application that saves status information and allows you to recover from failure. By default, all stages are checkpoints. DML Lookup Functions transform functions associated with lookup files. Use the name of the lookup file in the DML lookup function in the transform function to retrieve a value based on a known key. Components that group internally components that can store their data in a table and use a hash search algorithm to group records. Ab Initios components that can group internally are: Aggregate, Join, Rollup, and Scan

Summary Contd

Lookup Files dataset components that associate key values with corresponding data values. Unlike other datasets, the lookup file is not connected to other components in a graph. Multistage Transforms transform components that modify records in up to five stages: input selection , temporary initialization , processing, finalization, and output selection. Each stage is written as a DML transform function. Ab Initios multistage transforms are: Denormalize sorted, Rollup, Normalize, and Scan. Packages lists of types and transform functions associated with transform components, particularly multistage transforms. Phase a stage of your application that does not save status information.

Vous aimerez peut-être aussi