Vous êtes sur la page 1sur 6

See Also: Main_Page - Code Management - Loaders - Integration Services (SSIS)

SSIS supports numerous transformations that allow you to combine data originating from multiple
sources, cleanse the data and give it the shape your data destination expects. Then you can import the
data into a single or multiple destinations.

Examples of when Transformation


Transformation Description
Would be Used
Calculates aggregations such as SUM,
COUNT, AVG, MIN and MAX based Adding aggregated information to your
Aggregate on the values of a given numeric output. This can be useful for adding
column. This transformation produces totals and sub-totals to your output.
additional output records.
Includes auditing information, such as Creates advanced logs which indicate
computer name where the package runs, where and when the package was
Audit
package version ID, task name, etc in executed, how long it took to run the
the data flow. package and the outcome of execution.
Applying string manipulations prior to
Performs minor manipulations on string
loading data into the data warehouse.
columns. Converts all letters to
Character Map You can also apply the same
uppercase, lowercase, reverse bytes,
manipulations to the data while it is
etc.
being loaded into the warehouse.
Cleansing the data to extract specific
rows from the source. If a specific
Accepts an input and determines which
column does not conform to the
Conditional Split destination to pipe the data into based
predefined format (perhaps it has
on the result of an expression.
leading spaces or zeros), move such
records to the error file.
Extracting columns that need to be
Makes a copy of a single or multiple
cleansed of leading / trailing spaces,
columns which will be further
Copy Column applying character map transformation
transformed by subsequent tasks in the
to uppercase all data and then load it
package.
into the table.
Converting columns extracted from the
data source to the proper data type
expected by the data warehouse. Having
Converts input columns from one data such transformation options allows us
Data Conversion
type to another. the freedom of moving data directly
from its source into the destination
without having an intermediary staging
database.
Data Mining Queries a data mining model. Includes a Evaluating the input data set against a
Query query builder to assist you with data mining model developed with
development of Data Mining
Analysis Services.
eXpressions (DMX) prediction queries.
Calculates new column value based on Removing leading and trailing spaces
Derived Column an existing column or multiple from a column. Add title of courtesy
columns. (Mr., Mrs., Dr, etc) to the name.
Saving large strings or images into files
Exports contents of large columns
while moving the rest of the columns
Export Column (TEXT, NTEXT, IMAGE data types)
into a transactional database or data
into files.
warehouse.
Cleansing data by translating various
Finds close or exact matches between
versions of the same value to a common
multiple rows in the data source. Adds
Fuzzy Grouping identifier. For example, "Dr", "Dr.",
columns to the output including the
"doctor", "M.D." should all be
values and similarity scores.
considered equivalent.
Cleansing data by translating various
Compares values in the input data
versions of the same value to a common
source rows to values in the lookup
Fuzzy Lookup identifier. For example, "Dr", "Dr.",
table. Finds the exact matches as well
"doctor", "M.D." should all be
as those values that are similar.
considered equivalent.
This transformation could be useful for
web content developers. For example,
suppose you offer college courses
online. Normalized course meta-data,
Imports contents of a file and appends
such as course_id, name, and
to the output. Can be used to append
description is stored in a typical
Import Column TEXT, NTEXT and IMAGE data
relational table. Unstructured course
columns to the input obtained from a
meta-data, on the other hand, is stored
separate data source.
in XML files. You can use Import
Column transformation to add XML
meta-data to a text column in your
course table.
Obtaining additional data columns. For
Joins the input data set to the reference
example, the majority of employee
table, view or row set created by a SQL
demographic information might be
statement to lookup corresponding
available in a flat file, but other data
Lookup values. If some rows in the input data
such as department where each
do not have corresponding rows in the
employee works, their employment start
lookup table then you must redirect
date and job grade might be available
such rows to a different output.
from a table in relational database.
Merge Merges two sorted inputs into a single Combining the columns from multiple
output based on the values of the key data sources into a single row set prior
columns in each data set. Merged to populating a dimension table in a
columns must have either identical or data warehouse. Using Merge
compatible data types. For example you transformation saves the step of having
a temporary staging area. With prior
can merge VARCHAR(30) and versions of SQL Server you had to
VARCHAR(50) columns. You cannot populate the staging area first if your
merge INT and DATETIME columns. data warehouse had multiple
transactional data sources.
Combining the columns from multiple
data sources into a single row set prior
to populating a dimension table in a
data warehouse. Using Merge Join
transformation saves the step of having
a temporary staging area. With prior
versions of SQL Server you had to
Joins two sorted inputs using INNER
populate the staging area first if your
JOIN, LEFT OUTER JOIN or FULL
Merge Join data warehouse had multiple
OUTER JOIN algorithm. You can
transactional data sources.
specify columns used for joining inputs.
Note that Merge and Merge Join
transformations can only combine two
data sets at a time. However, you could
use multiple Merge Join
transformations to include additional
data sets.
Similar to the conditional split Populating the relational warehouse as
Multicast transformation, but the entire data set is well as the source file with the output of
piped to multiple destinations. a derived column transformation.
Setting the value of a column with BIT
Runs a SQL command for each input
data type (perhaps called
data row. Normally your SQL statement
"has_been_loaded") to 1 after the data
will include a parameter (denoted by
OLEDB row has been loaded into the
the question mark), for example:
Command warehouse. This way the subsequent
UPDATE employee_source SET
loads will only attempt importing the
has_been_loaded=1 WHERE
rows that haven't made it to the
employee_id=?
warehouse as of yet.
Percentage Loads only a subset of your data, Limiting the data set during
Sampling defined as the percentage of all rows in development phases of your project.
the data source. Note that rows are Your data sources might contain
chosen randomly. billions of rows. Processing cubes
against the entire data set can be
prohibitively lengthy.

If you're simply trying to ensure that


your warehouse functions properly and
data values on transactional reports
match the values obtained from your
Analysis Services cubes you might wish
to only load a subset of data into your
cubes.
Pivots the normalized data set by
certain column to create a more easily
readable output. Similar to PIVOT
command in Transact-SQL. You can Creating a row set that displays the
think of this transformation as table data in a more user-friendly
Pivot converting rows into columns. For format. The data set could be consumed
example if your input rows have by a web service or could be distributed
customer, account number and account to users through email.
balance columns the output will have
the customer and one column for each
account.
Determining the total size of your data
set. You could also execute a different
set of tasks based on the number of
rows you have transformed. For
Counts the number of transformed rows example, if you increase the number of
Row count
and store in a variable. rows in your fact table by 5% you could
perform no maintenance. If you
increase the size of the table by 50%
you might wish to rebuild the clustered
index.
Limiting the data set during
development phases of your project.
Your data warehouse might contain
billions of rows. Processing cubes
against the entire data set can be
prohibitively lengthy.
Loads only a subset of your data,
Row sampling defined as the number of rows. Note
If you're simply trying to ensure that
that rows are chosen randomly.
your warehouse functions properly and
data values on transactional reports
match the values obtained from your
Analysis Services cubes you might wish
to only load a subset of data into your
cubes.
Script Every data flow consists of three main
Component components: source, destination and
transformation. Script Component Custom transformations can call
allows you to write transformations for functions in managed assemblies,
otherwise un-supported source and including .NET framework. This type of
destination file formats. Script transformation can be used when the
component also allows you to perform data source (or destination) file format
transformations not directly available cannot be managed by typical
connection managers. For example,
some log files might not have tabular
data structures. At times you might also
need to parse strings one character at a
time to import only the needed data
elements.
through the built-in transformation
algorithms.

Much like Script Task the Script


Component transformation must be
written using Visual Basic .NET.

Useful for maintaining dimension tables


Maintains historical values of the
Slowly Changing in a data warehouse when maintaining
dimension members when new
Dimension historical dimension member values is
members are introduced.
necessary.
Ordering the data prior to loading it into
Sorts input by column values. You can a data warehouse. This could be useful
sort the input by multiple columns in if you're ordering your dimension by
either ascending or descending order. member name values as opposed to
The transformation also allows you to sorting by member keys.
Sort
specify the precedence of columns used
for sorting. This transformation could You can also use Sort transformation
also discard the rows with duplicate sort prior to feeding the data as the input to
values. the Merge Join or Merge
transformation.
Processing large text data and extracting
main concepts. For example, you could
Extracts terms (nouns and noun extract the primary terms used in this
Term Extraction phrases) from the input text into the section of SQLServerPedia by feeding
transformation output column. the Term Extraction transformation the
text column containing the entire
section.
Extracts terms from the input column
with TEXT data type and match them
Analyzing large textual data for specific
with same or similar terms found in the
terms. For example, suppose you accept
lookup table. Each term found in the
email feedback for latest version of
lookup table is scanned for in the input
your software. You might not have time
Term Lookup column. If the term is found the
to read through every single email
transformation returns the value as well
messages that comes to the generic
as the number of times it occurs in the
inbox. Instead you could use this task to
row. You can configure this
look for specific terms of interest.
transformation to perform case-
sensitive search.
Import data from multiple disparate data
sources into a single destination. For
Combines multiple inputs into a single
example, you could extract data from
output. Rows are sorted in the order
mail system, text file, Excel spreadsheet
they're added to the transformation.
and Access database and populate a
Union ALL You can ignore some columns from
SQL Server table.
each output, but each output column
must be mapped to at least one input
Unlike Merge and Merge Join
column.
transformations Union ALL can accept
more than two inputs.

Opposite of Pivot transformation,


Unpivot coverts columns into rows. It
normalizes the input data set that has
many duplicate values in multiple
columns by creating multiple rows that
have the same value in a single Massaging a semi-structured input data
column. file and convert it into a normalized
Unpivot
input prior to loading data into a
warehouse.
For example if your input has a
customer name and a separate column
for checking and savings' accounts
Unpivot can transform it into a row set
that has customer, account and account
balance columns.