Vous êtes sur la page 1sur 19

Slowly Changing Dimensions

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time;
People change their names for some reason; Country and State names may change over time. These are a few examples of
Slowly Changing Dimensions since some changes are happening to them over a period of time.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and Type3. The following section
deals with how to capture and handling these changes over time.

The "Product" table mentioned below contains a product named, Product1 with Product ID being the primary key. In the year
2004, the price of Product1 was $150 and over the time, Product1's price changes from $150 to $350. With this information,
let us explain the three types of Slowly Changing Dimensions.

Product Price in 2010:


Product ID(PK) Year Product Name Product Price
1 2010 Product1 $150

Type 1: Overwriting the old values.


In the year 2005, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price"
have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product
"Product1" in year 2004 since the table now contains only the new price and year information.

Product
Product
Product ID(PK) Year Product Price
Name
1 2015 Product1 $450

Type 2: Creating an additional record.


In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table.
So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This
would be very useful for reporting purposes.

Product
Dimension Key(sk) Year Product Name Product Price
1 2010 Product1 $150
2 2011 Product1 $350
3 2012 Product1 $450
4 2015 Product1 $600

The problem with the above mentioned data structure is "Product ID" cannot store duplicate values of "Product1" since
"Product ID" is the primary key. Also, the current data structure doesn't clearly specify the effective date and expiry date of
Product1 like when the change to its price happened. So, it would be better to change the current data structure to overcome
the above primary key violation.

Product
Product Effective Product Product Expiry
DIMKEY(SK) Year
ID DateTime Name VERSION Price DateTime FLAG
1001 100 01-01-2004 12.00AM 2004 P1 0 $150 31-12-2004 12:00PM N
1002 100 01-01-2005 12.00AM 2005 P1 $250 30-04-2010 12:00PM
1003 100 05-01-2010 12:00AM 2010 P1 $350 31-12-2014 12:00PM
1004 100 01-01-2015 12:00 AM 2015 P1 $600 31-12-2015 12:00PM
2001 101 01-01-2015 12:00 AM 2015 P2 $500 NULL
3001 102 01-01-2016 12:00 AM 2016 P3 $500 NULL
1005 100 01-01-2016 12:00 AM 2016 P1 $600 NULL

In the changed Product table's Data structure, "Product ID" and "Effective DateTime" are composite primary keys. So there
would be no violation of primary key constraint. Addition of new columns, "Effective DateTime" and "Expiry DateTime"
provides the information about the product's effective date and expiry date which adds more clarity and enhances the scope
of this table. Type2 approach may need additional space in the data base, since for every changed record, an additional row
has to be stored. Since dimensions are not that big in the real world, additional space is negligible.

Type 3: Creating new fields.


In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new
columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product,
Product1.

Product
Current Product Current Old Product
Product ID(PK) Old Year
Year Name Product Price Price
1 2010 Product1 $150 $100 2009
The problem with the Type 3 approach, is over years, if the product price continuously changes, then the
complete history may not be stored, only the latest change will be stored. For example, in year 2006, if
the product1's price changes to $350, then we would not be able to see the complete history of 2004
prices, since the old values would have been updated with 2005 product information.
Product
Product Old Product
Product
Product ID(PK) Year Old Year
Price
Name Price
1 2015 Product1 $350 $150 $2010
Time Dimension
In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not
merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called
TIME DIMENSION for performance and slicing data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by
comparing this year sales with the previous year or this week sales with the previous week.

Example of Time Dimension: Figure 1.11

Year Lookup
Year Id Year Number DateTimeStamp
1 2004 1/1/2005 11:23:31 AM
2 2005 1/1/2005 11:23:31 AM

Quarter Lookup
Quarter Number Quarter Name DateTimeStamp
1 Q1 1/1/2005 11:23:31 AM
2 Q2 1/1/2005 11:23:31 AM
3 Q3 1/1/2005 11:23:31 AM
4 Q4 1/1/2005 11:23:31 AM

Month Lookup
Month Number Month Name DateTimeStamp
1 January 1/1/2005 11:23:31 AM
2 February 1/1/2005 11:23:31 AM
3 March 1/1/2005 11:23:31 AM
4 April 1/1/2005 11:23:31 AM
5 May 1/1/2005 11:23:31 AM
6 June 1/1/2005 11:23:31 AM
7 July 1/1/2005 11:23:31 AM
8 August 1/1/2005 11:23:31 AM
9 September 1/1/2005 11:23:31 AM
10 October 1/1/2005 11:23:31 AM
11 November 1/1/2005 11:23:31 AM
12 December 1/1/2005 11:23:31 AM
Week Lookup
Week Number Day of Week DateTimeStamp
1 Sunday 1/1/2005 11:23:31 AM
1 Monday 1/1/2005 11:23:31 AM
1 Tuesday 1/1/2005 11:23:31 AM
1 Wednesday 1/1/2005 11:23:31 AM
1 Thursday 1/1/2005 11:23:31 AM
1 Friday 1/1/2005 11:23:31 AM
1 Saturday 1/1/2005 11:23:31 AM
2 Sunday 1/1/2005 11:23:31 AM
2 Monday 1/1/2005 11:23:31 AM
2 Tuesday 1/1/2005 11:23:31 AM
2 Wednesday 1/1/2005 11:23:31 AM
2 Thursday 1/1/2005 11:23:31 AM
2 Friday 1/1/2005 11:23:31 AM
2 Saturday 1/1/2005 11:23:31 AM

Time Dimension
Time Day Month Day
Year Quarter Month Month Week DateTime
Dim Of Day of Cal Date
No No No Name No Stamp
Id Year No Week
1/1/2005
1 2004 1 Q1 1 January 1 1 5 1/1/2004 11:23:31
AM
1/1/2005
2 2004 32 Q1 2 February 1 5 1 2/1/2004 11:23:31
AM
1/1/2005
3 2005 1 Q1 1 January 1 1 7 1/1/2005 11:23:31
AM
4 2005 32 Q1 2 February 1

Data Warehouse Concepts


What is a Data Warehouse? According to Inmon, famous author for several data warehouse books, "A data warehouse is a
subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making
process".

Example: In order to store data, over the years, many application designers in each branch have made their individual
decisions as to how an application and database should be built. So source systems will be different in naming conventions,
variable measurements, encoding structures, and physical attributes of data. Consider a bank that has got several branches
in several countries, has millions of customers and the lines of business of the enterprise are savings, and loans. The
following example explains how the data is integrated from source systems to target systems.

Example of Source Data


System Name Attribute Name Column Name Datatype Values
Source System 1 Customer Application Date CUSTOMER_APPLICATION_DATE NUMERIC(8,0) 11012005
Source System 2 Customer Application Date CUST_APPLICATION_DATE DATE 11012005
Source System 3 Application Date APPLICATION_DATE DATE 01NOV2005

In the aforementioned example, attribute name, column name, datatype and values are entirely different from one source
system to another. This inconsistency in data can be avoided by integrating the data into a data warehouse with good
standards.

Example of Target Data(Data Warehouse)


Target System Attribute Name Column Name Datatype Values
Record #1 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #2 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #3 Customer Application Date CUSTOMER_APPLICATION_DATE DATE 01112005

In the above example of target data, attribute names, column names, and datatypes are consistent throughout the target
system. This is how data from various source systems is integrated and accurately stored into the data warehouse.

See Figure 1.12 below for Data Warehouse Architecture Diagram.

Figure 1.12 : Data Warehouse Architecture


Data Warehouse & Data Mart
A data warehouse is a relational/multidimensional database that is designed for query and analysis rather than transaction processing. A
data warehouse usually contains historical data that is derived from transaction data. It separates analysis workload from transaction
workload and enables a business to consolidate data from several sources.

In addition to a relational/multidimensional database, a data warehouse environment often consists of an ETL solution, an
OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to
business users.

There are three types of data warehouses:


1. Enterprise Data Warehouse - An enterprise data warehouse provides a central database for decision support throughout
the enterprise.
2. ODS(Operational Data Store) - This has a broad enterprise wide scope, but unlike the real entertprise data warehouse,
data is refreshed in near real time and used for routine business activity.
3. Data Mart - Datamart is a subset of data warehouse and it supports a particular region, business unit or business
function.

Data warehouses and data marts are built on dimensional data modeling where fact tables are connected with dimension
tables. This is most useful for users to access data since a database can be visualized as a cube of several dimensions. A data
warehouse provides an opportunity for slicing and dicing that cube along each of its dimensions.

Data Mart: A data mart is a subset of data warehouse that is designed for a particular line of business, such as sales,
marketing, or finance. In a dependent data mart, data can be derived from an enterprise-wide data warehouse. In an
independent data mart, data can be collected directly from sources. Figure 1.12 : Data Warehouse and Datamarts

What is Star Schema?


Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse
schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship
diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The
center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star
schema are slicing down, performance increase and easy understanding of data.
Steps in designing Star Schema
 Identify a business process for analysis(like sales).
 Identify measures or facts (sales dollar).
 Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
 List the columns that describe each dimension.(region name, branch name, region name).
 Determine the lowest level of summary in a fact table(sales dollar).

Important aspects of Star Schema & Snow Flake Schema


 In a star schema every dimension will have a primary key.
 In a star schema, a dimension table will not have any parent table.
 Whereas in a snow flake schema, a dimension table will have one or more parent tables.
 Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
 Whereas hierachies are broken into separate tables in snow flake schema. These hierachies helps to drill down the data from
topmost hierachies to the lowermost hierarchies.

Glossary:

Hierarchy
A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data
aggregation; for example, in a time dimension, a hierarchy might be used to aggregate data from the Month level to the
Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path,
regardless of whether the levels in the hierarchy represent aggregated totals or not.

Level
A position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter,
and Year levels.

Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically has two types of columns:
those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a
composite key that is made up of all of its foreign keys.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated
facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.

Example of Star Schema: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. It shows that
data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions.
"Sales Dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is
explained below.

 Sales Dollar value for a particular product


 Sales Dollar value for a product in a location
 Sales Dollar value for a product in a year within a location
 Sales Dollar value for a product in a year within a location sold or serviced by an employee

Snowflake Schema

A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. i.e
dimension table hierachies are broken into simpler tables. In star schema example we had 4 dimensions like location,
product, time, organization and a fact table(sales).
In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup tables and 1 fact table. The
reason is that hierarchies(category, branch, state, and month) are being broken out of the dimension tables(PRODUCT,
ORGANIZATION, LOCATION, and TIME) respectively and shown separately. In OLAP, this Snowflake schema approach
increases the number of joins and poor performance in retrieval of data. In few organizations, they try to normalize the
dimension tables to save space. Since dimension tables hold less space, Snowflake schema approach may be avoided.

Example of Snowflake Schema: Figure 1.7

Fact Table
The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts
and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of
its foreign keys

In the example fig 1.6 "Sales Dollar" is a fact(measure) and it can be added across several dimensions. Fact tables store
different types of measures like additive, non additive and semi additive measures.
Measure Types
 Additive - Measures that can be added across all dimensions.
 Non Additive - Measures that cannot be added across all dimensions.
 Semi Additive - Measures that can be added across few dimensions and not with others.

A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated
facts are often instead called summary tables).
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless
Fact tables.
Steps in designing Fact Table
 Identify a business process for analysis(like sales).
 Identify measures or facts (sales dollar).
 Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
 List the columns that describe each dimension.(region name, branch name, region name).
 Determine the lowest level of summary in a fact table(sales dollar).

Example of a Fact Table with an Additive Measure in Star Schema: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and organization. Measure
"Sales Dollar" in sales fact table can be added across all dimensions independently or in a combined manner which is
explained below.

 Sales Dollar value for a particular product


 Sales Dollar value for a product in a location
 Sales Dollar value for a product in a year within a location
 Sales Dollar value for a product in a year within a location sold or serviced by an employee

Database - RDBMS

There are a number of relational databases to store data. A relational database contains normalized data stored in tables.
Tables contain records and columns. RDBMS makes it easy to work with individual records. Each row contains a unique
instance of data for the categories defined by the columns.

RDBMS are used in OLTP applications(e.g. ATM cards) very frequently and sometimes datawarehouse may also use relational
databases. Please refer to Relational data modeling for details to know how data from a source system is normalized and
stored in RDBMS databases.

Popular RDBMS Databases


RDBMS Name Company Name
Oracle Oracle Corporation
IBM DB2 UDB IBM Corporation
IBM Informix IBM Corporation
Microsoft SQL Server Microsoft
Sybase Sybase Corporation
Terradata NCR
What are ETL Tools?

ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools,
the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and
cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code
placed a great challenge among the programmers

These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of
ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data
warehouse when compared to the old method.

There are a number of ETL tools available in the market to do ETL process the data according to business/technical
requirements. Following are some those.

Popular ETL Tools


Tool Name Company Name
Informatica Informatica Corporation
DT/Studio Embarcadero Technologies
DataStage IBM
Ab Initio Ab Initio Software Corporation
Data Junction Pervasive Software
Oracle Warehouse Builder Oracle Corporation
Microsoft SQL Server Integration Microsoft
TransformOnDemand Solonde
Transformation Manager ETL Solutions
ETL Tools: What to Learn?
With the help of ETL tools, we can create powerful target Data Warehouses without much difficulty. Following are the various options that
we have to know and learn in order to use ETL

Software:
» How to install ETL tool on server/client?

Working with an ETL Tool:


» How to work with various options like designer, mapping, workflow, scheduling etc.,?
» How to work with sources like DBMS, relational source databases, files, ERPs etc., and import the source definitions?
» How to import data from data modeling tools, applications etc.,?
» How to work with targets like DBMS, relational source databases, files, ERPs etc., and import the source definitions?
» How to create target definitions?
» How to create mappings between source definitions and target definitions?
» How to create transformations?
» How to cleanse the source data?
» How to create a dimension, slowly changing dimensions, cube etc.,?
» How to create and monitor workflows?
» How to configure, monitor and run debugger?
» How to view and generate metadata reportsETL Concepts
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and
loading it into target database.

The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart).
The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then
loading into the target system.

Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used
instead of ETL.

Glossary of ETL (Reference:www.Oracle.com)

Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping
The definition of the relationship and data flow between source and target objects.

Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema
design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and
populate the data warehouse. A repository contains metadata.

Staging Area
A place where data is processed before entering the warehouse.

Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing,
aggregating, and integrating data from multiple sources.

Transportation
The process of moving copied or transformed data from a source to a data warehouse.

Target System
A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
Figure 1.12 : Sample ETL Process Flow

INFORMATICA

Informatica
Informatica is a powerful ETL tool from Informatica Corporation, a leading provider of enterprise data integration software
and ETL softwares.
The important Informatica Components are:
 Power Exchange
 Power Center
 Power Center Connect
 Power Exchange
 Power Channel
 Metadata Exchange
 Power Analyzer
 Super Glue

Source and Target:


Consider a Bank that has got many branches throughout the world. In each branch data may be stored in different source
systems like oracle, sql server, terradata, etc. When the Bank decides to integrate its data from several sources for its
management decisions, it may choose one or more systems like oracle, sql server, terradata, etc. as its data warehouse
target. Many organisations prefer Informatica to do that ETL process, because Informatica is more powerful in designing and
building data warehouses. It can connect to several sources and targets to extract meta data from sources and targets,
transform and load the data into target systems.

Guidelines to work with Informatica Power Center


 Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the
Repository Server would access this repository to retrieve, store and manage metadata.
 Power Center Client: Informatica client is used for managing users, identifiying source and target systems definitions, creating
mapping and mapplets, creating sessions and run workflows etc.
 Repository Server: This repository server takes care of all the connections between the repository and the Power Center
Client.
 Power Center Server: Power Center server does the extraction from source and then loading data into targets.
 Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source
Analyzer is used for extracting metadata from source systems.
Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data
from source to target.
Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.
 Data Cleansing: The PowerCenter's data cleansing technology improves data quality by validating, correctly naming and
standardization of address data. A person's address may not be same in all source systems because of typos and postal code, city
name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be
loaded in target systems (data warehouse).
 Transformation: Transformations help to transform the source data according to the requirements of target system. Sorting,
Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being
loaded into target and this is done during the mapping process from source to target.
 Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact
tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the
foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.
 Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.
 Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's MQSeries,
Peoplesoft, SAP, Siebel etc. and other third party applications.
 Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM's MQSeries,
Peoplesoft, SAP, Siebel etc. and other third party applications.

Informatica

Power Exchange:
Informatica Power Exchange as a stand alone service or along with Power Center, helps organizations leverage data by
avoiding manual coding of data extraction programs. Power Exchange supports batch, real time and changed data capture
options in main frame(DB2, VSAM, IMS etc.,), mid range (AS400 DB2 etc.,), and for relational databases (oracle, sql server,
db2 etc) and flat files in unix, linux and windows systems.

Power Channel:
This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through Firewalls, tranfer files over
FTP, etc.

Meta Data Exchange:


Metadata Exchange enables organizations to take advantage of the time and effort already invested in defining data
structures within their IT environment when used with Power Center. For example, an organization may be using data
modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc for developing data models.
Functional and technical team should have spent much time and effort in creating the data model's data structures(tables,
columns, data types, procedures, functions, triggers etc). By using meta deta exchange, these data structures can be
imported into power center to identifiy source and target mappings which leverages time and effort. There is no need for
informatica developer to create these data structures once again.

Power Analyzer:
Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing, analyzing, and sharing
enterprise data simple and easily available to decision makers. PowerAnalyzer enables to gain insight into business processes
and develop business intelligence.

With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate information from data stored in a data
warehouse, data mart, operational data store, or otherdata storage models. PowerAnalyzer is best with a dimensional data
warehouse in a relational database. It can also run reports on data in any table in a relational database that do not conform
to the dimensional model.

Super Glue:
Superglue is used for loading metadata in a centralized place from several sources. Reports can be run against this superglue
to analyze meta data.

Power Mart:
Power Mart is a departmental version of Informatica for building, deploying, and managing data warehouses and data marts.
Power center is used for corporate enterprise data warehouse and power mart is used for departmental data warehouses like
data marts. Power Center supports global repositories and networked repositories and it can be connected to several sources.
Power Mart supports single repository and it can be connected to fewer sources when compared to Power Center. Power Mart
can extensibily grow to an enterprise implementation and it is easy for developer productivity through a codeless
environment.
Informatica - Transformations
[Submitted by:Radhika, Michigan, US.]

In Informatica, Transformations help to transform the source data according to the requirements of target system and it
ensures the quality of the data being loaded into target.

Active Transformation
An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that
do not meet the condition in transformation.

Passive Transformation
A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the
transformation.

Transformations can be Connected or UnConnected.

Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.

UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another
transformation, and returns a value to that transformation.

Following are the list of Transformations available in Informatica:


 Aggregator Transformation
 Expression Transformation
 Filter Transformation
 Joiner Transformation
 Lookup Transformation
 Normalizer Transformation
 Rank Transformation
 Router Transformation
 Sequence Generator Transformation
 Stored Procedure Transformation
 Sorter Transformation
 Update Strategy Transformation
 XML Source Qualifier Transformation
 Advanced External Procedure Transformation
 External Transformation

Aggregator Transformation
Aggregator transformation is an Active and Connected transformation. This transformation is useful to perform calculations
such as averages and sums (mainly to perform calculations on multiple rows or groups). For example, to calculate total of
daily sales or to calculate average of monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE,
MAX, SUM etc. can be used in aggregate transformation.

Expression Transformation
Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row
before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to
convert date to a string field.

Filter Transformation
Filter transformation is an Active and Connected transformation. This can be used to filter rows in a mapping that do not meet
the condition. For example, to know all the employees who are working in Department 10 or to find out the products that falls
between the rate category $500 and $1000.

Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two sources coming from two
different locations or from same location. For example, to join a flat file and a relational source or to join two flat files or to
join a relational source and a XML source.

In order to join two sources, there must be atleast one matching port. at least one matching port. While joining two sources it
is a must to specify one source as master and the other as detail.
The Joiner transformation supports the following types of joins:
 Normal
 Master Outer
 Detail Outer
 Full Outer

Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.

Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and
the matching rows from the master source.

Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards
the unmatched rows from the detail source.

Full outer join keeps all rows of data from both the master and detail sources

Lookup Transformation
Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is used to look up data in a
relational table, view, or synonym. Lookup definition can be imported either from source or from target tables.

For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another
table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the
product, with ID 10 in sales table.

Difference between Connected and UnConnected Lookup Transformation:


Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from:
LKP expression from another transformation.

Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and
returns one column from each row.

Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values.
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation. It is used mainly with COBOL sources where most of
the time data is stored in de-normalized format. Also, Normalizer transformation can be used to create multiple rows from a
single row of data.

Rank Transformation
Rank transformation is an Active and Connected transformation. It is used to select the top or bottom rank of data. For
example, to select top 10 Regions where the sales volume was very high or to select 10 lowest priced products.

Router Transformation
Router is an Active and Connected transformation. It is similar to filter transformation. The only difference is, filter
transformation drops the data that do not meet the condition whereas router has an option to capture the data that do not
meet the condition. It is useful to test multiple conditions. It has input, output and default groups. For example, if we want to
filter data like where State=Michigan, State=California, State=New York and all other States. It’s easy to route data to
different tables.

Sequence Generator Transformation


Sequence Generator transformation is a Passive and Connected transformation. It is used to create unique primary key values
or cycle through a sequential range of numbers or to replace missing keys.

It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports
to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target.
CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.

Stored Procedure Transformation


Stored Procedure transformation is a Passive and Connected & UnConnected transformation. It is useful to automate time-
consuming tasks and it is also used in error handling, to drop and recreate indexes and to determine the space in database, a
specialized calculation etc.

The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure
can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an
executable script with SQL statements and control statements, user-defined variables and conditional statements.

Sorter Transformation
Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or descending
order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows
should be distinct.

Source Qualifier Transformation


Source Qualifier transformation is an Active and Connected transformation. When adding a relational or a flat file source
definition to a mapping, it is must to connect it to a Source Qualifier transformation. The Source Qualifier performs the
various tasks such as overriding default SQL query, filtering records; join data from two or more tables etc.

Update Strategy Transformation


Update strategy transformation is an Active and Connected transformation. It is used to update data in target table, either to
maintain history of data or recent changes. You can specify how to treat source rows in table, insert, update, delete or data
driven.
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source
definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources.

Advanced External Procedure Transformation


Advanced External Procedure transformation is an Active and Connected transformation. It operates in conjunction with
procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in
creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed
before emitting any output rows.

External Procedure Transformation


External Procedure transformation is an Active and Connected/UnConnected transformations. Sometimes, the standard
transformations such as Expression transformation may not provide the functionality that you want. In such cases External
procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of
creating the necessary Expression transformations in a mapping.

Differences between Advanced External Procedure and External Procedure Transformations:

External Procedure returns single value, where as Advanced External Procedure returns multiple values.
External Procedure supports COM and Informatica procedures where as AEP supports only Informatica Procedures.

Vous aimerez peut-être aussi