Data Stage ETL Question

1.1. What are the ways to execute datastage jobs?
A job can be run using a few different methods:
• from Datastage Director (menu Job -> Run now...)

• from command line using a dsjob command
• Datastage routine can run a job (DsRunJob command)
• by a job sequencer
1.2. How to invoke a Datastage shell command?
Datastage shell commands can be invoked from :
• Datastage administrator (projects tab -> Command)

• Telnet client connected to the datastage server
1.3. How to stop a job when its status is running?
To stop a running job go to DataStage Director and click the stop button (or Job ->
Stop from menu). If it doesn't help go to Job -> Cleanup Resources, select a
process with holds a lock and click Logout
If it still doesn't help go to the datastage shell and invoke the following command:
ds.tools
It will open an administration panel. Go to 4.Administer processes/locks , then
try invoking one of the clear locks commands (options 7-10).
1.4. How to run and schedule a job from command line?
To run a job from command line use a dsjob command
Command Syntax: dsjob [-file | [-server ][-user ][-password ]] []
The command can be placed in a batch file and run in a system scheduler.
1.5. How to release a lock held by jobs?
Go to the datastage shell and invoke the following command: ds.tools

It will open an administration panel. Go to 4.Administer processes/locks , then
try invoking one of the clear locks commands (options 7-10).
1.6. User privileges for the default DataStage roles?
The role privileges are:
• DataStage Developer - user with full access to all areas of a DataStage

project
• DataStage Operator - has privileges to run and manage deployed
DataStage jobs
• -none- - no permission to log on to DataStage
1.7. What is a command to analyze hashed file?
There are two ways to analyze a hashed file. Both should be invoked from the
datastage command shell. These are:
• FILE.STAT command
• ANALYZE.FILE command
1.8. Is it possible to run two versions of datastage on the same pc?
Yes, even though different versions of Datastage use different system dll libraries.
To dynamically switch between Datastage versions install and run DataStage Multi-
Client Manager. That application can unregister and register system libraries used
by Datastage.
1.9. How to send notifications from Datastage as a text message (sms) to a

cell phone
There is a few possible methods of sending sms messages from Datastage. However,
there is no easy way to do this directly from Datastage and all methods described
below will require some effort.
The easiest way of doing that from the Datastage standpoint is to configure an
SMTP (email) server as a mobile phone gateway. In that case, a Notification
Activity can be used to send message with a job log and any desired details.
DSSendMail Before-job or After-job subroutine can be also used to send sms
messages.
If configured properly, the recipients email address will have the following format:
600123456@oursmsgateway.com
If there is no possibility of configuring a mail server to send text messages, you can
to work it around by using an external application run directly from the
operational system. There is a whole bunch of unix scripts and applications to send
sms messages.
In that solution, you will need to create a batch script which will take care of sending
messages and invoke it from Datastage using ExecDOS or ExecSh subroutines
passing the required parameters (like phone number and message body).
Please keep in mind that all these solutions may require a contact to the local
cellphone provider first and, depending on the country, it may not be free of charge
and in some cases the provider may not support the capability at all.
2. Datastage development and job design
2.1. Error in Link collector - Stage does not support in-process active-to-
active inputs or outputs
To get rid of the error just go to the Job Properties -> Performance and select
Enable row buffer.
Then select Inter process which will let the link collector run correctly.
Buffer size set to 128Kb should be fine, however it's a good idea to increase the
timeout.
2.2. What is the DataStage equivalent to like option in ORACLE
The following statement in Oracle:

select * from ARTICLES where article_name like '%WHT080%';
Can be written in DataStage (for example as the constraint expression):

incol.empname matches '...WHT080...'
2.3. what is the difference between logging text and final text message in
terminator stage
Every stage has a 'Logging Text' area on their General tab which logs an
informational message when the stage is triggered or started.
• Informational - is a green line, DSLogInfo() type message.

• The Final Warning Text - the red fatal, the message which is included in the
sequence abort message
2.4. Error in STPstage - SOURCE Procedures must have an output link
The error appears in Stored Procedure (STP) stage when there are no stages going
out of that stage.
To get rid of it go to 'stage properties' -> 'Procedure type' and select Transform
2.5. How to invoke an Oracle PLSQL stored procedure from a server job
To run a pl/sql procedure from Datastage a Stored Procedure (STP) stage can be
used.
However it needs a flow of at least one record to run.
It can be designed in the following way:
• source odbc stage which fetches one record from the database and maps it to
one column - for example: select sysdate from dual
• A transformer which passes that record through. If required, add pl/sql
procedure parameters as columns on the right-hand side of tranformer's
mapping
• Put Stored Procedure (STP) stage as a destination. Fill in connection
parameters, type in the procedure name and select Transform as procedure
type. In the input tab select 'execute procedure for each row' (it will be run
once).
Design of a DataStage server job with Oracle plsql procedure call:
2.6. Is it possible to run a server job in parallel?
Yes, even server jobs can be run in parallel.

To do that go to 'Job properties' -> General and check the Allow Multiple Instance
button.
The job can now be run simultaneously from one or many sequence jobs. When it
happens datastage will create new entries in Director and new job will be named with
automatically generated suffix (for example second instance of a job named
JOB_0100 will be named JOB_0100.JOB_0100_2). It can be deleted at any time
and will be automatically recreated by datastage on the next run.
2.7. Error in STPstage - STDPROC property required for stage xxx
The error appears in Stored Procedure (STP) stage when the 'Procedure name' field
is empty. It occurs even if the Procedure call syntax is filled in correctly.
To get rid of error fill in the 'Procedure name' field.
2.8. Datastage routine to open a text file with error catching
Note! work dir and file1 are parameters passed to the routine.
* open file1
OPENSEQ work_dir : '\' : file1 TO H.FILE1 THEN
CALL DSLogInfo("******************** File " : file1 : " opened successfully",
"JobControl")
END ELSE
CALL DSLogInfo("Unable to open file", "JobControl")
ABORT
END
2.9. Datastage routine which reads the first line from a text file
Note! work dir and file1 are parameters passed to the routine.
* open file1
OPENSEQ work_dir : '\' : file1 TO H.FILE1 THEN
CALL DSLogInfo("******************** File " : file1 : " opened successfully",
"JobControl")
END ELSE
CALL DSLogInfo("Unable to open file", "JobControl")
ABORT
END
READSEQ FILE1.RECORD FROM H.FILE1 ELSE

Call DSLogWarn("******************** File is empty", "JobControl")
END
firstline = Trim(FILE1.RECORD[1,32]," ","A") ******* will read the first 32

chars
Call DSLogInfo("******************** Record read: " : firstline, "JobControl")
CLOSESEQ H.FILE1
2.10. How to test a datastage routine or transform?
To test a datastage routine or transform go to the Datastage Manager.

Navigate to Routines, select a routine you want to test and open it. First compile it
and then click 'Test...' which will open a new window. Enter test parameters in the
left-hand side column and click run all to see the results.
Datastage will remember all the test arguments during future tests.
2.11. When hashed files should be used? What are the benefits or using
them?
Hashed files are the best way to store data for lookups. They're very fast when
looking up the key-value pairs.
Hashed files are especially useful if they store information with data dictionaries
(customer details, countries, exchange rates). Stored this way it can be spread
across the project and accessed from different jobs.
2.12. How to construct a container and deconstruct it or switch between

local and shared?
To construct a container go to Datastage designer, select the stages that would be

included in the container and from the main menu select Edit -> Construct
Container and choose between local and shared.
Local will be only visible in the current job, and share can be re-used. Shared
containers can be viewed and edited in Datastage Manager under 'Routines' menu.
Local Datastage containers can be converted at any time to shared containers in
datastage designer by right clicking on the container and selecting 'Convert to
Shared'. In the same way it can be converted back to local.
2.13. Corresponding datastage data types to ORACLE types?
Most of the datastage variable types map very well to oracle types. The biggest
problem is to map correctly oracle NUMBER(x,y) format.
The best way to do that in Datastage is to convert oracle NUMBER format to

Datastage Decimal type and to fill in Length and Scale column accordingly.
There are no problems with string mappings: oracle Varchar2 maps to datastage
Varchar, and oracle char to datastage char.
2.14. How to adjust commit interval when loading data to the database?
In earlier versions of datastage the commit interval could be set up in:
General -> Transaction size (in version 7.x it's obsolete)
Starting from Datastage 7.x it can be set up in properties of ODBC or ORACLE stage
in Transaction handling -> Rows per transaction.
If set to 0 the commit will be issued at the end of a successfull transaction.
2.15. What is the use of INROWNUM and OUTROWNUM datastage variables?
@INROWNUM and @OUTROWNUM are internal datastage variables which do the

following:
• @INROWNUM counts incoming rows to a transformer in a datastage job

• @OUTROWNUM counts oucoming rows from a transformer in a datastage job
These variables can be used to generate sequences, primary keys, id's, numbering
rows and also for debugging and error tracing.
They play similiar role as sequences in Oracle.
2.16. Datastage trim function cuts out more characters than expected
By deafult datastage trim function will work this way:

Trim(" a b c d ") will return "a b c d" while in many other programming/scripting
languages "a b c d" result would be expected.
That is beacuse by default an R parameter is assumed which is R - Removes leading
and trailing occurrences of character, and reduces multiple occurrences to a single
occurrence.
To get the "a b c d" as a result use the trim function in the following way: Trim(" a
b c d "," ","B")
2.17. Database update actions in ORACLE stage
The destination table can be updated using various Update actions in Oracle stage.
Be aware of the fact that it's crucial to select the key columns properly as it will
determine which column will appear in the WHERE part of the SQL statement.
Available actions:
• Clear the table then insert rows - deletes the contents of the table
(DELETE statement) and adds new rows (INSERT).
• Truncate the table then insert rows - deletes the contents of the table
(TRUNCATE statement) and adds new rows (INSERT).
• Insert rows without clearing - only adds new rows (INSERT statement).
• Delete existing rows only - deletes matched rows (issues only the DELETE
statement).
• Replace existing rows completely - deletes the existing rows (DELETE
statement), then adds new rows (INSERT).
• Update existing rows only - updates existing rows (UPDATE statement).
• Update existing rows or insert new rows - updates existing data rows
(UPDATE) or adds new rows (INSERT). An UPDATE is issued first and if
succeeds the INSERT is ommited.
• Insert new rows or update existing rows - adds new rows (INSERT) or
updates existing rows (UPDATE). An INSERT is issued first and if succeeds the
UPDATE is ommited.
• User-defined SQL - the data is written using a user-defined SQL statement.
• User-defined SQL file - the data is written using a user-defined SQL
statement from a file.
2.18. Use and examples of ICONV and OCONV functions?
ICONV and OCONV functions are quite often used to handle data in Datastage.
ICONV converts a string to an internal storage format and OCONV converts an
expression to an output format.
Syntax:
Iconv (string, conversion code)
Oconv(expression, conversion )
Some useful iconv and oconv examples:

Iconv("10/14/06", "D2/") = 14167
Oconv(14167, "D-E") = "14-10-2006"
Oconv(14167, "D DMY[,A,]") = "14 OCTOBER 2006"
Oconv(12003005, "MD2$,") = "$120,030.05"
That expression formats a number and rounds it to 2 decimal places:

Oconv(L01.TURNOVER_VALUE*100,"MD2")
Iconv and oconv can be combined in one expression to reformat date format easily:
Oconv(Iconv("10/14/06", "D2/"),"D-E") = "14-10-2006"
2.19. ERROR 81021 Calling subroutine DSR_RECORD ACTION=2
Error message:
DataStage Repository Interface:

Error calling subroutine: DSR_RECORD (Action=2);
check DataStage is set up correctly in project
Development (Internal Error (81021))
Datastage system help gives the following error desription:
SYS.HELP. 081021
MESSAGE.. dsrpc: Error writing to Pipe.
The problem appears when a job sequence is used and it contains many stages
(usually more than 10) and very often when a network connection is slow.
Basically the cause of a problem is a failure between DataStage client and the server
communication.
The solution to the issue is:

 Do not log in to Datastage Designer using 'Omit' option on a login screen.
Type in explicitly username and password and a job should compile
successfully.
 execute the DS.REINDEX ALL command from the Datastage shell - if the above
does not help
2.20. How to check Datastage internal error descriptions
To check the description of a number go to the datastage shell (from administrator

or telnet to the server machine) and invoke the following command:
 SELECT * FROM SYS.MESSAGE WHERE @ID='081021'; - where in that case
the number 081021 is an error number
The command will produce a brief error description which probably will not be helpful
in resolving an issue but can be a good starting point for further analysis.
2.21. Error timeout waiting for mutex
The error message usually looks like follows:

... ds_ipcgetnext() - timeout waiting for mutex
There may be several reasons for the error and thus solutions to get rid of it.
The error usually appears when using Link Collector, Link Partitioner and
Interprocess (IPC) stages. It may also appear when doing a lookup with the use of a
hash file or if a job is very complex, with the use of many transformers.
There are a few things to consider to work around the problem:

- increase the buffer size (up to to 1024K) and the Timeout value in the Job
properties (on the Performance tab).
- ensure that the key columns in active stages or hashed files are composed of
allowed characters – get rid of nulls and try to avoid language specific chars which
may cause the problem.
- try to simplify the job as much as possible (especially if it’s very complex).
Consider splitting it into two or three smaller jobs, review fetches and lookups and
try to optimize them (especially have a look at the SQL statements).
2.22. ERROR 30107 Subroutine failed to complete successfully
Error message:
Error calling subroutine:

DSR_RECORD (Action=2); or *DataStage*DSR_SELECT (Action=7);
check DataStage is set up correctly in project Development
(Subroutine failed to complete successfully(30107))
Datastage system help gives the following error desription:

SYS.HELP. 930107
MESSAGE.. DataStage/SQL: Illegal placement of parameter markers
The problem appears when a project is moved from one project to another (for
example when deploying a project from a development environment to production).
The solution to the issue is:

 Rebuild the repository index by executing the DS.REINDEX ALL command
from the Datastage shell
2.23. Datastage Designer hangs when editing job activity properties
The appears when running Datastage Designer under Windows XP after installing
patches or the Service Pack 2 for Windows.
After opening a job sequence and navigating to the job activity properties window
the application freezes and the only way to close it is from the Windows Task
Manager.
The solution of the problem is very simple. Just Download and install the “XP SP2
patch” for the Datastage client.
It can be found on the IBM client support site (need to log in):
https://www.ascential.com/eservice/public/welcome.do
Go to the software updates section and select an appropriate patch from the
Recommended DataStage patches section.
Sometimes users face problems when trying to log in (for example when the license
doesn’t cover the IBM Active Support), then it may be necessary to contact the IBM
support which can be reached at WDISupport@us.ibm.com
Oracle PL/SQL versus a commercial ETL tool

When implementing a data warehousing environment in a company, a question
needs to be asked: whether to use already owned tools and technologies or buy a
new, commercial ETL product. Having in mind the fact, that most big companies use
an Oracle database, the question becomes: ‘use PL/SQL to do the processing or buy
an ETL tool?’.
Before making a decision on that, the managers need to compare pro’s and con’s of
buying a commercial ETL solution.
Strengths and weaknesses of using Oracle PL/SQL as an ETL tool:
costs – PL/SQL comes along with the standard Oracle licence and if an oracle database
is installed, PL/SQL can be used straight away with no additional costs. No additional
hardware is needed. However, implementing ETL process in PL/SQL is a lot more time
and human resources consuming process. It applies both to the implementation phase and
later – production support and enhancements.
The designers need an in-depth knowledge about oracle and it may take months to
become an expert in PL/SQL. The ETL tool like Datastage or Informatica can be learned
during a few days training and in fact the designers don’t need to know much about
programming, scripting and can do their job without a low-level IT knowledge.
From the other hand, a Datastage or Informatica expert may be far more expensive and
less accessible than a PL/SQL consultant.
 workload and time – it’s a factor directly related to costs. An ETL tool comes with
the whole framework to simplify the design of the process which is usually an
administration panel, GUI frontend, global options, documentation module, day-to-day
operation management, failover capabilities, logging and reporting module, user
management, connectors to different data sources, plugins, etc.
In PL/SQL most of that modules must be programmed manually and it may significantly
increase the time of implementation.
 flexibility – an ETL tools comes along with the set of mostly used components and it
is rather difficult to expand its capabilities. For example, when we need to extract data to
a non-typical format (for example EDI files, EPIC files) or process them in a non-
standard way, then PL/SQL should be a lot more helpful.
 efficiency – if the company uses Oracle databases, then the ETL processing using the
native database language which is PL/SQL will be a lot more efficient. There is no better
way to process data than well optimized queries issued on the database internal engine
and apart from that very often an ETL tools operates on different server than the database
which causes the need to transfer the data across network.
 integration - The commercial ETL tools provide functionality to integrate with
different systems, including connections to multiple data sources, operating systems
integration, FTP support, plugins to ERP systems, etc. PL/SQL doesn't have that features
and it has to be implemented externally.
The conclusion is that there is no easy answer to the question which approach is better.
The most important thing is to review the company’s needs, calculate costs, estimate
results and then make a choice between buying an ETL tool or using a PL/SQL
processing.

Data Stage ETL Question

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Data Stage ETL Question

Transféré par

Droits d'auteur :

Formats disponibles

1.1. What are the ways to execute datastage jobs?

A job can be run using a few different methods:

• from Datastage Director (menu Job -> Run now...)

1.2. How to invoke a Datastage shell command?

Datastage shell commands can be invoked from :

• Datastage administrator (projects tab -> Command)

1.3. How to stop a job when its status is running?

1.4. How to run and schedule a job from command line?

To run a job from command line use a dsjob command

Command Syntax: dsjob [-file | [-server ][-user ][-password ]] []

1.5. How to release a lock held by jobs?

Go to the datastage shell and invoke the following command: ds.tools

The role privileges are:

• DataStage Developer - user with full access to all areas of a DataStage

1.7. What is a command to analyze hashed file?

1.8. Is it possible to run two versions of datastage on the same pc?

1.9. How to send notifications from Datastage as a text message (sms) to a

2. Datastage development and job design

2.2. What is the DataStage equivalent to like option in ORACLE

The following statement in Oracle:

Can be written in DataStage (for example as the constraint expression):

• Informational - is a green line, DSLogInfo() type message.

2.4. Error in STPstage - SOURCE Procedures must have an output link

It can be designed in the following way:

Design of a DataStage server job with Oracle plsql procedure call:

2.6. Is it possible to run a server job in parallel?

Yes, even server jobs can be run in parallel.

2.7. Error in STPstage - STDPROC property required for stage xxx

2.8. Datastage routine to open a text file with error catching

READSEQ FILE1.RECORD FROM H.FILE1 ELSE

firstline = Trim(FILE1.RECORD[1,32]," ","A") ******* will read the first 32

To test a datastage routine or transform go to the Datastage Manager.

2.12. How to construct a container and deconstruct it or switch between

To construct a container go to Datastage designer, select the stages that would be

2.13. Corresponding datastage data types to ORACLE types?

The best way to do that in Datastage is to convert oracle NUMBER format to

2.15. What is the use of INROWNUM and OUTROWNUM datastage variables?

@INROWNUM and @OUTROWNUM are internal datastage variables which do the

• @INROWNUM counts incoming rows to a transformer in a datastage job

By deafult datastage trim function will work this way:

2.17. Database update actions in ORACLE stage

2.18. Use and examples of ICONV and OCONV functions?

Some useful iconv and oconv examples:

That expression formats a number and rounds it to 2 decimal places:

2.19. ERROR 81021 Calling subroutine DSR_RECORD ACTION=2

DataStage Repository Interface:

The solution to the issue is:

2.20. How to check Datastage internal error descriptions

To check the description of a number go to the datastage shell (from administrator

2.21. Error timeout waiting for mutex

The error message usually looks like follows:

There are a few things to consider to work around the problem:

2.22. ERROR 30107 Subroutine failed to complete successfully

Error calling subroutine:

Datastage system help gives the following error desription:

The solution to the issue is:

2.23. Datastage Designer hangs when editing job activity properties

Oracle PL/SQL versus a commercial ETL tool

Strengths and weaknesses of using Oracle PL/SQL as an ETL tool:

Vous aimerez peut-être aussi