Vous êtes sur la page 1sur 3

Load UNICODE characters from flat files

Oracle OBI Apps provide an OEM version of Informatica Power Center. I found Power Center can process UNICODE strings pretty well if both source and target are some sort of database. If, however, the source is a flat file, Informatica has difficulties to load text correct to target. The limitation seems to be related to Informatica's "Source" transformation. I had text files coded in UTF8 format. Codepage is verfied with other tools. But Informatica somehow consistently interpret the file using ISO8859 codepage, which of course is what has been configured in Power Center Server. The solution to that is simple if you know SQL*Loader. Basically, I loaded the UTF8 code text file to a UNICODE database through SQL*Loader without much transformation. Then Informatica can load from there to target tables. A few useful tips along the way below. And at the end is an example based on practical system configuration. 1. How to verify text file or csv file codepage? You have better options on Windows. When you open a file with MS Word 2007, it allows to preview the document using different codepages. It automatically determine the closest codepage. That in most cases is accurate. I have also used PX Binary Viewer to examine byte sequences. You can clearly tell from the byte sequence if a file is in ASCII or Latin 1252 codepage, but it takes some effort to tell if a file is UTF8 or Simplified Chinese GB2312 encoding. It is a little bit challenging on UNIX. Normally, you can't view UTF8 encoded Chinese or Japanese characters from terminal window. Some tools, such PUTTY, allow you to accept UTF8 characters (Windows->Translation setting). Once you have the translation option set to UTF8, you can "cat" or "head" the file from Putty. 2. How to convert text files to UTF8 codepage? On UNIX systems, you can use "iconv" command to convert file to various codepages.The syntax is: iconv -f -t file_name > save_file_name On windows, there are similar software, but I have not tried any. Typically I save files using notepad by selecting "Unicode" or "UTF-8" option. But that will put a hidden BOM (Byte Order Mask) at the beginning. That messes up the first string when you try to load it.

3. (experiment)Can we set Power Center server codepage to UTF8? If you really want to tackle into Informatica functions, you can set Informatica server and Informatica Repository Server to operated in UTF8. That by itself works, but you will find Informatica clients no longer work with the repository. Well, if you run Informatica clients from Linux or Unix console, you probably can make it work. I assume if Informatica server and Informatica Repository server both are set to UTF8 codepage, then it should be able to process UTF8 text files properly. 4. Key parameters in Informatica Power Center configuration to support UNICODE. Obviously, you should set data tranform mode to UNICODE in pmserver.cfg if on UNIX or Informatica Server configuration if on windows. If you run PM server or repository server on UNIX, the code page of repository and pm server for Unicode support should be ISO9959-1. On windows, they need to be set to Latin Windows 1252. That is tricky when you have to migrate repository from Windows (Usually test system) to Unix (Usually production system) often. If your source or target is Oracle, you also need to the NLS_LANG environment variable to UTF8 (for example, AMERICAN_AMERICA.UTF-8). If you are on UNIX, you need also make sure you have en_US.UTF-8 locale installed. Do a "local -a" to find out all locales you have. Do a "locale" without parameter will list current locale in use. There is a special parameter needed if you use the Siebel OEM version of Informatica. SiebelUnicodeDB should include all database connection need Unicode support. Typically you put in pmserver.cfg a line like this: SiebelUnicodeDB=chrisli@TNSNAME Note: both user name and TNS name are case sensitive, be sure you spell it right! 5. Key parameters in SQL*Loader control file to support UNICODE In SQL*Loader control, make sure you have this line: CHARACTERSET 'UTF8' 6. Oracle/Siebel OEM version Informatica limitations Informatica Siebel OEM version has several limitations. The latest one I used is 7.1.4, which is similar to Informatica 7.1.3, but customized for Siebel products.

This version doesn't support partition, that's a big gap in many cases, especially when data volume is high. SiebelUnicodeDB is a hidden parameter. It works well for me, but is is frequently ignored. Sample Solution: OBIEE Server: Version 10.1.3.3.4 on Windows 2003 OBIEE Application Version: 7.9.2 Informatica Server: 32-bit Version 7.1.4 on Solaris 10 Informatica Repository Server: 32-bit Version 7.1.4 on Solaris 10 Data Warehouse Server: 64-bit Oracle 10gR2 with AL32UTF8 Character Set DAC Server: Solaris 10 DAC Client: Windows 2003 Informatica Clients: Windows 2003 Data Sources: Oracle EBS 10.5.11 with UNICODE support Flat files with UTF-8 encoding Siebel CRM 7.8.2 with UNICODE support International Data: Western Europe, Chinese, and Japanese Time Zone: Global time zons Currency: Global currencies Key settings: Database: AL32UTF8 Character Set Informatica Repository: ISO8859-1 encoding Informatica Repository Server: ISO8859-1 encoding Informatica Server: ISO8859-1 encoding Informaitca Server Data Movemet Mode: UNICODE SiebelUnicodeDB=chrisli1@tnsname1 chrisli2@tnsname2 chrisli3@tnsname3

Vous aimerez peut-être aussi