Vous êtes sur la page 1sur 10

Handling Special Characters in Datasage

Revision History
Versio n Date Author Reasons for Change Section(s) Affected

Contents
.........................................................................................................1 .............................................................................................................................1 .......................................................................................................................................................1 HANDLING OF SPECIAL CHARACTERS .....................................................................................4

OBJECTIVE.........................................................................................................................4 INTRODUCTION................................................................................................................4
DATASTAGE 7.5.1 HANDLING OF SPECIAL CHARACTERS WITHOUT NLS SETTINGS........5

DETAILS OF PROCEDURE TO FOLLOW......................................................................5


DATASTAGE 8.1 HANDLING OF SPECIAL CHARACTERS WITH NLS SETTINGS..................8

HANDLING OF SPECIAL CHARACTERS OBJECTIVE The objective of this document is to provide steps to handle special characters(, , few of the characters from foreign language etc.) using datastage when reading or writing against any database. Target database must know what character set datastage is loading and if it is different from the database character set it will attempt to convert data during the operation. NLS_LANG is the environment variable that oracle uses for character map recognition. This variable should be set with the value of NLS_CHARACTERSET of database in either .dsenv file or in the administrator client under user defined category of particular datastage project. INTRODUCTION

DataStage has built-in National Language Support (NLS). With NLS installed, DataStage can do the following:

1) 2) 3) 4) 5)

Process data in a wide range of languages Accept data in any character set into most DataStage fields Use local formats for dates, times, and money. Sort data according to local rules Convert data between different encodings of the same language.

Using NLS, the DataStage server engine holds data in Unicode format. This is an international standard character set that contains nearly all the characters used in languages around the world. DataStage maps data to or from Unicode format as required. Datastage 7.5.1 handling of special characters without NLS settings NLS installation is not required for loading special characters like (, ) , but NLS_CHARACTERSET value of database and datastage must be same. DETAILS OF PROCEDURE TO FOLLOW 1.) Check whether the NLS_LANG environment variable is already set in .dsenv file or in administrator client. If the value is set you can view the value through director in the environment settings of the each job logs. 2.) If it is already set then change to NLS_LANG =$ENV in administrator client which will ensure it stays at the currently set value. Then, add the parameter in your datastage job and override the default value. 3.) For overriding or setting new value to this variable Find the NLS_CHARACTERSET value from DB using the query. For eg: Oracle DB Select * from NLS_DATABASE_PARAMETERS; which gives the value AL32UTF8. Or Select userenv ('LANGUAGE') from dual; which gives value AMERICAN_AMERICA.AL32UTF8

4.) Provide the value to the Datastage by setting the NLS_LANG environment variable with oracle characterset value. NLS_LANG=AMERICAN_AMERICA.AL32UTF8 NLS_LANG is a combination of values which should be given in the format (Language_territory.characterset). 5) Client Character Set : When the NLS_LANG was set to AMERICAN_AMERICA.AL32UTF8 the issue faced was that other interfaces like UI and Documentum are unable to view it. SQL>@.[%NLS_LANG%]. If you get something like:
Unable to open file.[AMERICAN_AMERICA.WE8MSWIN1252].

The "file name" between the braces is the value of the registry parameter. If you get this as result: Unable to open file ".[%NLS_LANG%]." then the parameter NLS_LANG is also not set in the registry. Note the @.[%NLS_LANG%]. technique reports the NLS_LANG known by the SQL*Plus executable, it will not read the registry itself. But if you run the HOST command first and the NLS_LANG is not set in the environment then you can be sure the variable is set in the registry if the @. [%NLS_LANG%]. returns a valid value. All other NLS parameters can be retrieved by a:
SELECT * FROM NLS_SESSION_PARAMETERS;

Note: SELECT USERENV ('language') FROM DUAL; gives the session's <Language>_<territory> but the DATABASE character set not the client, so the value returned is not the client's complete NLS_LANG setting!

The DS variable is set to AMERICAN_AMERICA.WE8MSWIN1252

Datastage 8.1 handling of special characters with NLS settings 1. NLS_LANG to be consistent with the database. Document to get the value is as above. $NLS_LANG=AMERICAN_AMERICA.AL32UTF8 used for Oracle
loads

2. The understanding that we currently hold is as below for how ETL processes data when the NLS is turned on. In a scenario wherein we are reading a source file, transforming and loading data into Oracle a. Find out the NLS MAP of the source file from the provider. If they cannot provide we need to figure it out ourselves (refer step b) b. Open in ultra edit (word might also help). Convert to HexValue and compare it to utf8 char table (http://www.utf8chartable.de/). A hex value of c2 indicates a 2 byte representation c. Once we find the character set that the source provider is using, the same needs to be set to NLS_MAP. Refer Point 3. d. The assumption is that DS reads the data based on the NLS_MAP, converts to Unicode and processes the same through the stages. Hence its not necessary to override the NLS_MAP in stages, post the SEQUENTIAL. e. The NLS_LANG is set at a PROJECT level and used during the data load into Oracle f. Verify the data load by using , select dump(<column names>,1016) from <table name> g. Verify the same with utf8 char table (http://www.utf8-chartable.de/) 3. For parametersing NLS_MAP, GSOR is handling the same in sequences based on the region.
used for the reading the sources NLS_MAP=windows-1252 for International NLS_MAP=UTF-8 for Italy and CEE NLS_MAP=ISO_8859-1:1987 austria

a. 4. One of the issues faces was one of the source providers confirmed that they were sending us data in Windows <windows-1252> whereas it was actually utf8 <UTF-8>. Issues Faced 1. Wrong NLS character set was supplied by source system. This issue helped us to come up with the above steps Issue Desc : Source system in Austria provided data in UTF-8 encoded file, which has sales names to be inserted into Oracle table. Records are being loaded from Dataset (DataStage internal file format) into Oracle and found that some of the sales_names are not being loaded properly when using $NLS_LANG = AMERICAN_AMERICA.AL32UTF8 (Oracle Environment Variable) But with the same settings and configurations, many names loaded without issue which contains character like , .

Verified (by extracting values to a file) that the dataset contains the correct values but in the table it is not.

Vous aimerez peut-être aussi