Académique Documents
Professionnel Documents
Culture Documents
1-1
Copyright Sennovate 2010. All rights
Ver.1.0
Ver.1.0
1-3
Copyright Sennovate 2010. All rights
Ver.1.0
1-4
Copyright Sennovate 2010. All rights
Ver.1.0
Client tier
The Client tier basically includes the following:
IBM InfoSphere DataStage and QualityStage clients
Administrator
Director
Designer
1-5
Copyright Sennovate 2010. All rights
Ver.1.0
Server tier
The Server tier includes
Services
Engine
Repository Working area
Working areas
Information Services Director resource providers
1-6
Copyright Sennovate 2010. All rights
Ver.1.0
Services tier
Three general categories of Services
Design
Execution
Metadata
1-7
Copyright Sennovate 2010. All rights
Ver.1.0
Repository tier
The Shared Repository is used to share all the IBM
Information Server product module objects.
The common repository contains the following types of
metadata that are required to support InfoSphere
DataStage:
Project metadata
Operational metadata
Design metadata
1-8
Copyright Sennovate 2010. All rights
Ver.1.0
Engine tier
This is a parallel engine that executes IBM Information
Server tasks.
1-9
Copyright Sennovate 2010. All rights
Ver.1.0
Working areas
These are the temporary storage areas used by the
components.
1-10
Copyright Sennovate 2010. All rights
Ver.1.0
1-11
Copyright Sennovate 2010. All rights
Ver.1.0
Topologies
IBM InfoSphere Information Server multiple topologies
to support variety of data integration , hardware and
business requirements.
Consider the performance needs to select the
topology
Topologies supported are as follows
Two-tier
Three-tier
Cluster
Grid
1-12
Copyright Sennovate 2010. All rights
Ver.1.0
Topologies
Two-tier
The engine, application server and the metadata
repository are all on the same computer
systems while client are in different machines.
Three-tier
The engine is on one machine , the application
server and metadata repository is co-located on
other machine.
Clients are in the third machine.
1-13
Copyright Sennovate 2010. All rights
Ver.1.0
Topologies
Cluster
This is a slight variation of a three tier topology.
The engine is duplicated over multiple computers.
Ina cluster environment, a single parallel job execution
can span multiple computer each with its own engine.
The processing of a job on multiple machines is driven
by a configuration file associated with the job.
1-14
Copyright Sennovate 2010. All rights
Ver.1.0
Topologies
Grid topology
Grid computing allows to specify more processing
power
This is similar to cluster but the machine in which a
job executes is determined dynamically through
generation of dynamic configuration file
1-15
Copyright Sennovate 2010. All rights
Ver.1.0
Two-tier
1-16
Copyright Sennovate 2010. All rights
Ver.1.0
Three tier
1-17
Copyright Sennovate 2010. All rights
Ver.1.0
1-18
Copyright Sennovate 2010. All rights
Ver.1.0
1-19
Copyright Sennovate 2010. All rights
Ver.1.0
DataStage architecture
1-20
Copyright Sennovate 2010. All rights
Ver.1.0
1-21
Copyright Sennovate 2010. All rights
Ver.1.0
Runtime architecture
OSH Script
Using the designer, jobs are created.
The jobs are compiled into parallel job flows and
reusable components that execute on the parallel
information server engine.
Designer generates the OSH(Orchestrate Shell
script).
OSH script
Uses the familiar script of Unix shell.
1-22
Copyright Sennovate 2010. All rights
Ver.1.0
1-23
Copyright Sennovate 2010. All rights
Ver.1.0
Ver.1.0
A Job
1-25
Copyright Sennovate 2010. All rights
Ver.1.0
Example of a job
1-26
Copyright Sennovate 2010. All rights
Ver.1.0
Types of Jobs
Parallel Jobs
Server Jobs
Job Sequences
1-27
Copyright Sennovate 2010. All rights
Ver.1.0
Parallel Job
Executed by the DataStage parallel engine.
Built-in functionality for pipeline and partition
parallelism .
Compiled into OSH (Orchestrate Scripting Language).
OSH executes Operators Executable C++ class
instances.
Runtime monitoring in DataStage Director
1-28
Copyright Sennovate 2010. All rights
Ver.1.0
Server Jobs
1-29
Copyright Sennovate 2010. All rights
Ver.1.0
Job Sequences
Master Server jobs that kick-off server or parallel jobs
and other activities.
Runtime monitoring in DataStage Director
Executed by the Server engine
1-30
Copyright Sennovate 2010. All rights
Ver.1.0
Stages
Active stage
Active stages model the flow of data and provide
mechanisms for combining data streams,
aggregating data, and converting data from one
data type to another
Alters the number of rows from source to target.
Passive Stage
A passive stage handles access to databases for
the extraction or writing of data.
Does not alter the number of rows from source
to target.
1-31
Copyright Sennovate 2010. All rights
Ver.1.0
Parallel processing
Parallel processing is the use of multiple processors to
execute the different parts of the same program
simultaneously.
1-32
Copyright Sennovate 2010. All rights
Ver.1.0
1-33
Copyright Sennovate 2010. All rights
Ver.1.0
1-34
Copyright Sennovate 2010. All rights
Ver.1.0
Pipeline Parallelism
Transform, clean, load processes execute
simultaneously
Like a conveyor belt moving rows from process to
process
Start downstream process while upstream process
is running
Advantages
Reduces disk usage for staging areas
Keeps processors busy
Still has limits on scalability
1-35
Copyright Sennovate 2010. All rights
Ver.1.0
Pipeline Parallelism
1-36
Copyright Sennovate 2010. All rights
Ver.1.0
Partition Parallelism
Divide the incoming stream of data into subsets to be
separately processed by an operation .
Subsets are called partitions (nodes)
This is key to Scalability
Each partition of data is processed by the same
operation
E.g., if operation is Filter, each partition will be
filtered in exactly the same way
Facilitates near-linear scalability
8 times faster on 8 processors
24 times faster on 24 processors
This assumes the data is evenly distributed
1-37
Copyright Sennovate 2010. All rights
Ver.1.0
Partitioned Parallelism
1-38
Copyright Sennovate 2010. All rights
Ver.1.0
Three-Node Partitioning
1-39
Copyright Sennovate 2010. All rights
Ver.1.0
1-40
Copyright Sennovate 2010. All rights
Ver.1.0
1-41
Copyright Sennovate 2010. All rights
Ver.1.0
Configuration file
DataStage gets the information about the system from
the configuration file.
Resources needed for the job are organized based on
the configuration file.
The configuration file describes every processing
node.
When system changes , change the file not job.
Configuration file provides the hardware configuration.
The path of the configuration file is identified in the
DataStage Administrator.
The environment variable APT_CONFIG_FILE contains
the path of the configuration file
1-42
Copyright Sennovate 2010. All rights
Ver.1.0
1-43
Copyright Sennovate 2010. All rights
Ver.1.0
1-44
Copyright Sennovate 2010. All rights
Ver.1.0
1-45
Copyright Sennovate 2010. All rights
Ver.1.0
Partitioning methods
1-46
Copyright Sennovate 2010. All rights
Ver.1.0
1-47
Copyright Sennovate 2010. All rights
Ver.1.0
Random Partitioner
Records are randomly distributed over all partitioning
nodes.
Like round robin, random partitioning can rebalance
the partitions of an input data set to guarantee that
each processing node receives an approximately
equal-sized partition.
The random partitioning has a slightly higher
overhead than round robin because of the extra
processing required to calculate a random value for
each record.
1-48
Copyright Sennovate 2010. All rights
Ver.1.0
Same partitioner
The stage using the data set as input performs no
repartitioning and takes as input the partitions output
by the preceding stage.
With this partitioning method, records stay on the
same processing node; that is, they are not
redistributed.
Same is the fastest partitioning method.
This is normally the method DataStage uses when
passing data between stages in your job.
1-49
Copyright Sennovate 2010. All rights
Ver.1.0
Entire Partitioning
Every instance of a stage on every processing node
receives the complete data set as input.
It is useful when you want the benefits of parallel
execution, but every instance of the operator needs
access to the entire input data set.
1-50
Copyright Sennovate 2010. All rights
Ver.1.0
Hash partitioner
Set based on a zip code field, where a large percentage of your
records Partitioning is based on a function of one or more columns
(the hash partitioning keys) in each record. The hash partitioner
examines one or more fields of each input record (the hash key
fields).
Records with the same values for all hash key fields are assigned to
the same processing node.
This method is useful for ensuring that related records are in the
same partition, which might be a prerequisite for a processing
operation.
Hash partitioning does not necessarily result in an even distribution
of data between partitions.
For example, if you hash partition a data are from one or two zip
codes, you can end up with a few partitions containing most of your
records. This behavior can lead to bottlenecks because some nodes
are required to process more records than other nodes.
1-51
Copyright Sennovate 2010. All rights
Ver.1.0
Modulus partitioner
Partitioning is based on a key column modulo the
number of partitions. This method is similar to hash by
field, but involves simpler computation.
1-52
Copyright Sennovate 2010. All rights
Ver.1.0
Range partitioner
Divides a data set into approximately equal-sized
partitions, each of which contains records with key
columns within a specified range. This method is also
useful for ensuring that related records are in the
same partition.
A range partitioner divides a data set into
approximately equal size partitions based on one or
more partitioning keys. Range partitioning is often a
preprocessing step to performing a total sort on a data
set.
In order to use a range partitioner, you have to make a
range map. You can do this using the Write Range Map
stage.
1-53
Copyright Sennovate 2010. All rights
Ver.1.0
DB2 Partitioner
Partitions an input data set in the same way that DB2
would partition it.
For example, if you use this method to partition an
input data set containing update information for an
existing DB2 table, records are assigned to the
processing node containing the corresponding DB2
record. Then, during the execution of the parallel
operator, both the input record and the DB2 table
record are local to the processing node. Any reads and
writes of the DB2 table would entail no network
activity.
1-54
Copyright Sennovate 2010. All rights
Ver.1.0
Auto Partitioner
Leaving it to DataStage to determine the best
partitioning method to use depending on the type of
stage, and what the previous stage in the job has
done.
Typically DataStage would use round robin when
initially partitioning data, and same for the
intermediate stages of a job.
1-55
Copyright Sennovate 2010. All rights
Ver.1.0
Collecting
Collecting the process of joining the multiple partitions
in to single dataset.
Collecting methods
Round robin
Ordered collector
Sort merge collector
Auto collector
1-56
Copyright Sennovate 2010. All rights
Ver.1.0
Round robin
Reads a record from the first input partition, then from
the second partition, and so on. After reaching the last
partition, starts over.
After reaching the final record in any partition, skips
that partition in the remaining rounds
1-57
Copyright Sennovate 2010. All rights
Ver.1.0
Ordered collector
Reads all records from the first partition, then all
records from the second partition, and so on.
This collection method preserves the order of totally
sorted input data sets. In a totally sorted data set,
both the records in each partition and the partitions
themselves are ordered.
This might be useful as a preprocessing action before
exporting a sorted data set to a single data file.
1-58
Copyright Sennovate 2010. All rights
Ver.1.0
1-59
Copyright Sennovate 2010. All rights
Ver.1.0
Auto collector
The default algorithm reads rows from a partition as
soon as they are ready.
This may lead to producing different row orders in
different runs with identical data. The execution is
non-deterministic.
1-60
Copyright Sennovate 2010. All rights
Ver.1.0
Administrator
Administrator is a client program used to carry out
configuration tasks in DataStage.
It has 3 pages
General
The general page is used to set server-wide
properties.
Project
This lists the projects available and options to
add, edit and delete projects.
NLS
National Language support features.
1-61
Copyright Sennovate 2010. All rights
Ver.1.0
Attaching to DataStage
1-62
Copyright Sennovate 2010. All rights
Ver.1.0
Administrator
1-63
Copyright Sennovate 2010. All rights
Ver.1.0
Project Page
Add To add New DataStage Project
Delete To delete a project. This button is enable only
if you have administrator status.
Properties To set the properties of the selected
project.
Cleanup Cleans up files in selected project
NLS To change project map and locales.
Command To execute DataStage Engine commands
directly from the selected project
1-64
Copyright Sennovate 2010. All rights
Ver.1.0
Project page
1-65
Copyright Sennovate 2010. All rights
Ver.1.0
Add Project
1-66
Copyright Sennovate 2010. All rights
Ver.1.0
Creating a project
1-67
Copyright Sennovate 2010. All rights
Ver.1.0
General
Permissions
Tracing
Schedule
Mainframe
Tunables
Parallel
Sequence
1-68
Copyright Sennovate 2010. All rights
Ver.1.0
1-69
Copyright Sennovate 2010. All rights
Ver.1.0
1-70
Copyright Sennovate 2010. All rights
Ver.1.0
Permissions tab
1-71
Copyright Sennovate 2010. All rights
Ver.1.0
Permissions tab
Assign user categories to operating system user groups, or enable
operators to view all the details of an event in a job log file.
The Permissions tab is enabled only if you have logged on to
DataStage using a name that gives you administrator status.
1-72
Copyright Sennovate 2010. All rights
Ver.1.0
Tracing tab
1-73
Copyright Sennovate 2010. All rights
Ver.1.0
Tracing tab
This is to enable or disable tracing on the server.
1-74
Copyright Sennovate 2010. All rights
Ver.1.0
Schedule
1-75
Copyright Sennovate 2010. All rights
Ver.1.0
Schedule tab
Set up a user name and password to use for running
scheduled DataStage jobs.
The Schedule tab is enabled only if you have logged
on to a Windows NT server.
1-76
Copyright Sennovate 2010. All rights
Ver.1.0
1-77
Copyright Sennovate 2010. All rights
Ver.1.0
1-78
Copyright Sennovate 2010. All rights
Ver.1.0
Parallel tab
1-79
Copyright Sennovate 2010. All rights
Ver.1.0
Sequence tab
1-80
Copyright Sennovate 2010. All rights
Ver.1.0
1-81
Copyright Sennovate 2010. All rights
Ver.1.0
Designer
A graphical user interface for creating DataStage
applications known as Jobs.
It is a design interface for both Infosphere DataStage
and Infosphere QualityStage
Jobs
Job defines sequence of steps.
After designing jobs are compiled and run on the
parallel processing engine.
1-82
Copyright Sennovate 2010. All rights
Ver.1.0
Stages
The individual steps that make up the job are called
stages.
Some of the DataStage Prebuilt stages are sort,
merge, join, filter, transform, lookup and aggregate.
Stages provide the 80 to 90 percent of the application
logic required for enterprise data integration
applications.
Each stage has properties that tell how to perform or
process data.
1-83
Copyright Sennovate 2010. All rights
Ver.1.0
1-84
Copyright Sennovate 2010. All rights
Ver.1.0
Aggregator Stage
Complex Flat file Stage
Column Export Stage
Data Set Stage
Distributed Transaction
FTP Enterprise
Funnel
Join
Lookup
Merge
Sequential file Stage
1-85
Copyright Sennovate 2010. All rights
Ver.1.0
Sort Stage
Surrogate Key generator
Transformer
Remove Duplicate stage
1-86
Copyright Sennovate 2010. All rights
Ver.1.0
Ver.1.0
1-88
Copyright Sennovate 2010. All rights
Ver.1.0
1-89
Copyright Sennovate 2010. All rights
Ver.1.0
ry
o
t
osi ts
p
Re bjec
o
es
g
Sta atte
l
Pa
1-90
Copyright Sennovate 2010. All rights
Ver.1.0
1-91
Copyright Sennovate 2010. All rights
Ver.1.0
1-92
Copyright Sennovate 2010. All rights
Ver.1.0
1-93
Copyright Sennovate 2010. All rights
Ver.1.0
1-94
Copyright Sennovate 2010. All rights
Ver.1.0
1-95
Copyright Sennovate 2010. All rights
Ver.1.0
Other
Properti
es
1-96
Copyright Sennovate 2010. All rights
Ver.1.0
Load the
columns
1-97
Copyright Sennovate 2010. All rights
Ver.1.0
1-98
Copyright Sennovate 2010. All rights
Ver.1.0
Columns loaded
1-99
Copyright Sennovate 2010. All rights
Ver.1.0
1100
Ver.1.0
1101
Ver.1.0
1102
Ver.1.0
1103
Ver.1.0
Output page
1104
Ver.1.0
Job
1105
Ver.1.0
Save job
1106
Ver.1.0
1107
Ver.1.0
Run Director
1108
Ver.1.0
Status View
1109
Ver.1.0
1110
Ver.1.0
Annotation Stage
This stage is used to insert notes to the diagram
window.
Two types of Annotation
Annotation
Description Annotation
1111
Ver.1.0
1112
Ver.1.0
1113
Ver.1.0
Lookup Stage
It is used to perform lookup operations
Lookup Stage can have a
Single Input Link
Single Output Link
Optional reject link
And number of reference links
Lookup combines records based on the key column.
1114
Ver.1.0
Lookup Stage
1115
Ver.1.0
Lookup Stage
Link the
key
Specify
key
1116
Ver.1.0
Lookup
Choose to
handle
rejects
1117
Ver.1.0
Lookup Stage
1118
Ver.1.0
Join Stage
I t performs join operation on two or more inputs to
the stage
This is similar to sql join.
It provides
Inner
Full Outer
Left Outer
Righ Outer
1119
Ver.1.0
Join Stage
1120
Ver.1.0
Join Stage
Choose
the key
for join
1121
Ver.1.0
Join Stage
Choose
the join
type
1122
Ver.1.0
Join Stage
1123
Ver.1.0
Merge Stage
Merge stage is processing stage
It can have
More than one input link
Single Output link
Same number of reject link as that of update links.
1124
Ver.1.0
Merge Stage
1125
Ver.1.0
Merge Stage
Choose
the
merge
key
1126
Ver.1.0
Merge
Keep or
drop
1127
Ver.1.0
Merge Stage
1128
Ver.1.0
Comparison
Merge
Join
Lookup
Stream Input
2 to N
2 To N
Reference Input
NA
NA
1-N
Output
Merged data
Master Update Type
Sorting requirements
All input
All Input
Duplicates
Allowed
Partition
Merge Key
Join Key
Unmatched Rows
Master - drop/keep,
warning/no warning
Update - drop/reject
Memory
Use When
Large data
11129
Ver.1.0
Funnel Stage
It combines multiple input to single output
The stage can have any number of input links but a
single Output Link.
The metadata of all the inputs has to be identical
Funnel Stage Operates in 3 modes
Continuous funnel
Sort funnel
Sequence funnel
1130
Ver.1.0
Funnel Stage
1131
Ver.1.0
Funnel Stage
Choose the
funnel type
1132
Ver.1.0
Funnel stage
1133
Ver.1.0
Types of funnel
Continuous Funnel
Continuous funnel combines records of the input data in no
guranteed order
It takes one record from each input link in turn.
If data is not available on an input link, the stage skips to
the next link rather than waiting.
Sort Funnel
Sort Funnel combines the input records in the order defined
by the value(s) of one or more key columns, and the order
of the output records is determined by these sorting keys.
Sequence Funnel
Sequence copies all records from the first input data set to
the output data set, then all the records from the second
input data set, and so on.
1134
Ver.1.0
Head Stage
Tail Stage
Peek Stage
Column Generator Stage
Row Generator Stage
Write Range Map Stage
1135
Ver.1.0
Head Stage
It can have a single input link and a single output link
Select first N rows from each partition of an input data
set and copies selected rows to output data set.
This is used to Debug large Data Sets
Property settings includes the following
Number of records to copy
Partition from which records are copied
Location
1136
Ver.1.0
1137
Ver.1.0
General
Partitioning
Column
Advanced
1138
Ver.1.0
General
Mapping
Column
Advanced
1139
Ver.1.0
Tail Stage
It can have single input link and single output link
It selects last N records from each partition and copies
it to output data set
1140
Ver.1.0
Peek Stage
It has single input link and any number of output links
It let to print the record column values either in the
job log or separate output link as it copies records
from input to output.
It is helpful in monitoring the progress of the
application or diagnose the bug in the application.
1141
Ver.1.0
Sample Stage
It has single input link and any number of output links
Samples an input dataset
Percent mode-It extracts rows by selecting them by
means of a random number generator and writes a
percentage to output data set.
1142
Ver.1.0
1143
Ver.1.0
1144
Ver.1.0
1145
Ver.1.0
ODBC Stages
ODBC stage is used to extract, write or aggregate
data.
Each ODBC stage can have any number of input links
or output links.
Specify the input link using the following methods
An SQL statement
A user defined SQL query
A stored procedure
1146
Ver.1.0
1147
Ver.1.0
ODBC stage
1148
Ver.1.0
1149
Ver.1.0
ODBC STAGE
1150
Ver.1.0
ODBC stage
1151
Ver.1.0
Output Mapping
1152
Ver.1.0
Ouput mapping
1153
Ver.1.0
1154
Ver.1.0
OCI Stage
1155
Ver.1.0
1156
Ver.1.0
1157
Ver.1.0
1158
Ver.1.0
1159
Ver.1.0
1160
Ver.1.0
1161
Ver.1.0
1162
Ver.1.0
1163
Ver.1.0
1164
Ver.1.0
1165
Ver.1.0
1166
Ver.1.0
Output Mapping
1167
Ver.1.0
Filter Stage
Filter
condition
1168
Ver.1.0
Transformer stage
1169
Ver.1.0
Transformer Stage
Column
derivatio
n
1170
Ver.1.0
Constrai
nts
Transformer Stage
Stage
variable
s
1171
Ver.1.0
Transformer stage
1172
Ver.1.0
Transformer
1173
Ver.1.0
Transformer
1174
Ver.1.0
Transformer conditions
Scenario
Product file has pcode and product colour
Products with yellow colour are moved to one file,
blue are moved to one file and rest are moved to
other
This task is done using Transformer stage
constraints
1175
Ver.1.0
1176
Ver.1.0
1177
Ver.1.0
1178
Ver.1.0
Job Properties
1179
Ver.1.0
Job Parameters
It is possible to set up parameters to the job.
Parameters to the job are defined in Job properties
window and default value is provided there.
1180
Ver.1.0
1181
Ver.1.0
1182
Ver.1.0
1183
Ver.1.0
Containers
A Container is a group of stages and links.
It is used to modularize Server job designs using Container Stage
DataStage provides 2 types of containers
Local Container
Shared Container
1184
Ver.1.0
Types of Containers
Local Container
These are created within a job and are accessible only within a
job.
Shared Containers
These are created and stored separately in repository as jobs.
There are 2 types of Shared Containers
Server Shared Containers
Server shared containers can be included in the parallel
jobs
Parallel Shared Containers
1185
Ver.1.0
1186
Ver.1.0
1187
Ver.1.0
1188
Ver.1.0
Shared Container
To store the existing stages and links in the shared
container
Choose the stages and links
Choose EditContainer-Shared
Parameters to the components are copied to shared
container as Container Parameters
Saving it is same as saving a job.
1189
Ver.1.0
Job Sequences
Specifies a sequence of jobs to run.
Sequence can contain control information
ie, It is possible specify different course of action to
be taken depending on whether a job succeeds or
fails.
Job sequence can be scheduled and run using
DataStage Director.
1190
Ver.1.0
1191
Ver.1.0
1192
Ver.1.0
Restartable sequence
Job sequences are optionally restartable.
Checkpoint information enable dataStage to restart
job
It is possible to enable or disable checkpoints
1193
Ver.1.0
1194
Ver.1.0
Job Sequence
Reposito
ry
Palette
1195
Ver.1.0
Activity Stages
Job
Specifies a Server or parallel job
Routine
Specifies any routine but not transforms.
Exec command
Specifies an operating system command to execute,
Email Notification
Specifies that an email notification has to be sent at
this point of sequence(using SMTP)
Wait-for-file
Waits for a particular file to appear or disappear
1196
Ver.1.0
Activity Stages
Nested conditions
Allows you to further branch the execution of a
sequence depending on a condition.
Sequencer
Allows you to synchronize the control flow of
multiple activities in a job sequence.
Start and end loop
Together these two stages allow you to implement a
For...Next or For...Each loop within your sequence
Terminator
Allows you to specify that, if certain situations occur,
the jobs a sequence is running shut down cleanly
Copyright Sennovate 2010. All rights
1197
Ver.1.0
Activity stages
User Variable
Allows you to define variables within a sequence.
These variables can then be used later on in the
sequence, for example to set job parameters.
Exceptional handler
It is executed if a job in the sequence fails to run
(other exceptions are handled by triggers) or if a
job aborts and the Automatically handle activities
that fail option is set for the sequence.
Only one Exception handler for a sequence.
1198
Ver.1.0
Triggers
Triggers provide control information to the Stage
Activities
Specifies different courses of action to be taken based
on jobs status.
Trigger names must be unique
Types of Triggers
Conditional
Unconditional
Otherwise
1199
Ver.1.0
Display
s code
1200
Ver.1.0
layout
Single Server Job available to sort a input file
Wait for a trigger to start the Job
Send a message to a computer after Job completion
(success or failure)
Handle exception
1201
Ver.1.0
ForNext Loop to
execute Sort Job
for 5 input files
Waits for
trigger file to
appear
Executes OS
command to
send
message
Executes Sort
Job
1202
Ver.1.0
Wait-For-File Activity
Waits for a specified file to appear or
disappear
Appear option does not delete the file
after finding it
1203
Ver.1.0
Programming in DataStage
Programming components
Routines
Transforms
Functions
Expressions
Subroutines
Macros
Precedence rules
1204
Ver.1.0
Routines
Routines are stored in the Routines folder by default.
The following components are classified as routines
Transform functions
Before/After Subroutines
While designing a job it is possible to specify
Custom Universe functions
ActiveX functions
1205
Ver.1.0
1206
Ver.1.0
Commands
dsadmin command
DSXImport Service command
SyncProject command
1207
Ver.1.0
Performance tuning in DS
1208
Ver.1.0
Scenarios
Scenario 1
If we have 3 jobs in sequencer while running if job 1
is failed then how to run other 2 jobs
Properties--trigger----unconditional
Scenario 2
Try Left outer join using Lookup stage
1209
Ver.1.0
1210
Ver.1.0
Merge Stage
Pivot Stage
Row merger Stage
Row Splitter Stage
Sort Stage
Transformer Stage
1211
Ver.1.0
1212
Ver.1.0
1213
Ver.1.0
Compress Stage
Expand stage
Copy Stage
Modify Stage
Filter Stage
External filter Stage
Change capture stage
Change apply Stage
Difference Stage
Compare Stage
1214
Ver.1.0
Encode stage
Decode Stage
Switch Stage
FTP Enterprise stage
Generic stage
Surrogate key generator stage
Slowly Changing dimension Stage
Pivot Enterprise Stage
Checksum stage
1215
Ver.1.0
Restructure stage
1216
Ver.1.0