Vous êtes sur la page 1sur 9

Changing Character Set to UTF8 for

Oracle Database
To implement upgrade of one of our applications, our team scheduled a period of
downtime to get oracle database (10.2.0.4)ready for it. What the DBA team is required is
to create a new oracle database which is identical to the production database. Thanks for
the downtime of production database, the steps to create identical database are quite
straightforward as below.

• issue “alter database backup controlfile to trace” against production database


before downtime
• extract “create controlfile” command from trace file under folder udump
• edit command to reflect new location of target database, and change line of
“CREATE CONTROLFILE REUSE DATABASE “OLDDB” NORESETLOGS
NOARCHIVELOG” to “CREATE CONTROLFILE SET DATABASE
“NEWDB” RESETLOGS NOARCHIVELOG”
• Shutdown Production Database
• create init parameter file, password file and edit listener.ora and tnsnames.ora
• log on idle instance for target database
• issue “startup nomount”
• issue “create controlfile” command to create control file
• issue “alter database open resetlogs” to open database

Once the target database is up, we are ready to implement character set conversion. The
following three oracle metalink documents are what we highly relied on to proceed.

• Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) [ID 260192.1]


• Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner) [ID 745809.1]
• NLS considerations in Import/Export – Frequently Asked Questions [ID 227332.1]

Here, I would like to list step-by-step solution to finish the character set conversion.

Step 1: Installing and Configuring CSSCAN

The first thing to install CSSCAN is to connect database as sysdba and run script csminst.sql
($ORACLE_HOME/rdbms/admin).

If you experience error about no-existence of directory log_file_dir and data_file_dir, please ignore it
because granting read privilege to these two directories is removed.

And then, we need to make sure if CSSCAN is installed propermanticsly and ready for use. To check that,
simply issue the following OS command.

$ csscan “sys/password@dbtnsname as sysdba” FULL=Y TOCHAR=UTF8


LOG=TOUTF8FIN CAPTURE=Y ARRAY=1000000 PROCESS=2
If we have “Scanner terminated successfully.” message, we are ready to use CSSCAN for character set
conversion.

Step 2: Pre-checking for database

Before starting character set conversion, we need to do following pre-checking against database.

• Invalid objects
• Orphaned datqapump master tables (10g and up)
• Objects in the recyclebin (10g and up)
• Leftover temporary tables using CHAR semantics

Note that, please log on database with sysdba to proceed the followign steps.

1) Invalid objects

To check invalid objects in database, we need to Issue the following sql statement to check invalid objects

SQL> select distinct owner from dba_objects where status=INVALID;

The above sql statement lists all of schemas which contains INVALID objects. These
invalid objects need to be compiled or dropped if they are unused. The simplest way to
compile all of objects within single schema is to use package UTL_RECOMP as
following.

SQL> exec utl_recomp.recomp_serial(SCHEMA);

2) Orphaned datapump master tables (10g and up)

To check it, Issue the following sql statement:

SQL> SELECT o.status, o.object_id, o.object_type,


o.owner||'.'||object_name "OWNER.OBJECT"
FROM dba_objects o, dba_datapump_jobs j
WHERE o.owner=j.owner_name AND o.object_name=j.job_name
AND j.job_name NOT LIKE 'BIN$%' ORDER BY 4,2;

If “no rows selected", please go proceed next step. Otherwise, check Note: 336014.1 How To
Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?.

3) Objects in the recyclebin (10g and up)

SQL> SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE from dba_recyclebin


order by 1,2;

If there are objects in the recyclebin then perform

SQL> PURGE DBA_RECYCLEBIN;


This will remove unneeded objects and otherwise during CSALTER an ORA-38301 will
be seen.

4) Leftover temporary tables using CHAR semantics

SQL> select C.owner ||'.'|| C.table_name ||'.'|| C.column_name ||' ('||


C.data_type ||' '|| C.char_length ||' CHAR)'
from all_tab_columns C
where C.char_used = 'C'
and C.table_name in (select table_name from dba_tables where temporary='Y')
and C.data_type in ('VARCHAR2', 'CHAR')
order by 1;

If “no rows selected", please go proceed next step. Otherwise, check Note: 4157602.8
DBMS_STATS "ORA_TEMP_%_DS_%" temporary tables not cleaned up.

Step 3. Check the Source database for “Lossy”

Run CSSCAN with the following syntax:

$ csscan “sys/password@dbtnsname sysdba” FULL=Y FROMCHAR=WE8ISO8859P1


TOCHAR=WE8ISO8859P1 LOG=dbcheck CAPTURE=N ARRAY=1000000 PROCESS=2

Where, WE8ISO8859P1 is the current character set of database. To check current


character set of database, please issue the following sql statement.

SQL> select value from NLS_DATABASE_PARAMETERS where


parameter='NLS_CHARACTERSET';

Running above CSSCAN command will create three files:

1. dbcheck.out a log of the output of csscan


2. dbcheck.txt a Database Scan Summary Report
3. dbcheck.err contains the rowid’s of the Lossy rows reported in dbcheck.txt (if any).

This is to check if all data is stored correctly in the current character set. Because the
TOCHAR and FROMCHAR character sets as the same there cannot be any
“Convertible” or “Truncation” data reported in dbcheck.txt. If all the data in the database
is stored correctly at the moment then there is only “Changeless” data reported in
dbcheck.txt.

If there is any “Lossy” data then those rows contain code points that are not currently
defined correctly and they should be cleared up before you can continue. The most
common situation is when having an US7ASCII/WE8ISO8859P1 database and “Lossy”,
in this case changing your US7ASCII/WE8ISO8859P1 SOURCE database to
WE8MSWIN1252 using Alter Database Character Set / Csalter will most likely solve you
lossy.
To perform character set conversion from WE8ISO8859P1 to WE8MSWIN1252, issue,

SQL> shutdown immediate

SQL> startup restrict

SQL > alter database character set WE8MSWIN1252;

SQL> alter database open;

After conversion to WE8MSWIN1252, repeat above CSSCAN command to make sure


there is no “lossy” data.

Step 4. Check for “Convertible” and “Truncation” data when going to UTF8

$ csscan “sys/password@dbtnsname as sysdba” FULL=Y TOCHAR=UTF8


LOG=TOUTF8 CAPTURE=Y ARRAY=1000000 PROCESS=2

This will create 3 files :


toutf8.out a log of the output of csscan
toutf8.txt the Database Scan Summary Report
toutf8.err contains the rowid’s of the Convertible and Lossy rows reported in toutf8.txt

There should be NO entries under “Lossy” in toutf8.txt, because they should have been
filtered out in step 3, if there is “Lossy” data then please redo step 3.

File toutf8.txt should have the following part of output or similar.

[Scan Summary]

All character type data in the data dictionary are convertible to the new character set
All character type application data remain the same in the new character set

[Data Dictionary Conversion Summary]

Datatype Changeless Convertible Truncation Lossy


——————— —————- —————- —————- —————-
VARCHAR2 4,225,489 0 0 0
CHAR 1,116 0 0 0
LONG 159,629 0 0 0
CLOB 17,743 4,349 0 0
VARRAY 23,462 0 0 0
——————— —————- —————- —————- —————-
Total 4,427,439 4,349 0 0
Total in percentage 99.902% 0.098% 0.000% 0.000%
The data dictionary can be safely migrated using the CSALTER script

[Application Data Conversion Summary]

Datatype Changeless Convertible Truncation Lossy


——————— —————- —————- —————- —————-
VARCHAR2 5,262,627 0 0 0
CHAR 87 0 0 0
LONG 0 0 0 0
CLOB 134 0 0 0
VARRAY 1,587 0 0 0
——————— —————- —————- —————- —————-
Total 5,264,435 0 0 0
Total in percentage 100.000% 0.000% 0.000% 0.000%

If there is only Changeless and Convertible, we can go continue next step.

Step 5. Export application objects

In above toutf8.txt, there are detailed list of application objects which we need to use export/import to deal
with. The sample list of this looks like:

[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy


————————————————– —————- —————- —————-
MDSYS.SDO_COORD_OP_PARAM_VALS 200
0 0
MDSYS.SDO_GEOR_XMLSCHEMA_TABLE 1
0 0
MDSYS.SDO_STYLES_TABLE 78 0 0
MDSYS.SDO_XML_SCHEMAS 4 0 0
SYS.METASTYLESHEET 80 0 0
SYS.RULE$ 4 0 0
SYS.SQL$TEXT 1 0 0
SYS.WRH$_SQLTEXT 1,420 0 0
SYS.WRH$_SQL_PLAN 1,339 0 0
SYS.WRI$_ADV_ACTIONS 4,754 0 0
SYS.WRI$_ADV_OBJECTS 2,872 0 0
SYS.WRI$_ADV_RATIONALE 2,130 0 0
SYS.WRI$_DBU_FEATURE_METADATA 99 0
0
SYS.WRI$_DBU_FEATURE_USAGE 10 0 0
SYS.WRI$_DBU_HWM_METADATA 19 0 0
WEBCT.AGN_ASSIGNMENT 4,130 0 0
WEBCT.AGN_GROUPASSIGNMENT 221 0 0
WEBCT.AGN_SUBMISSION 12,531 0 0
WEBCT.AGN_SUBMISSION_COMMENT 594 0
0
WEBCT.ANNOUNCEMENT 1,776 0 0
WEBCT.ASSMT_ATTEMPT 144 0 0
WEBCT.ASSMT_ATTEMPT_ITEM 4,386 0 0
WEBCT.ASSMT_RESPONSE 3,264 0 0
WEBCT.ASSMT_SETTING 24 0 0
WEBCT.CALENDAR_ENTRY 4,634 0 0
WEBCT.CMS_CE_LANGUAGE 2 0 0
WEBCT.CMS_CONTENT_ENTRY 586,378 0 0
WEBCT.CMS_LINK 653 0 0
WEBCT.CMS_UNIQUE_NAME 618 0 0
WEBCT.CMS_UNIQUE_NAME070809133755 218
0 0
WEBCT.CO_HEADERFOOTER 11,479 0 0

Please note that only data dictionary and oracle schema could be safely converted to
target character set. Other application data, such as objects in schema WEBCT, need to
be converted by using export/import mechanism. The basic step is to do object export
backup, drop objects and do import these object after the character set conversion
is done.

Export these application objects and then drop them from database. Please note that “Do
NOT use Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on
ALL 10g versions lower then 10.2.0.4 (including 10.1.0.5).”

Step 6. Perform Character Set conversion with CSALTER package

After all of application objects/data are dropped from database, please re-run csscan command in step 4 to
check if there is application object/data in toutf8.txt. If not, its ready for conversion by using CSALTER
package.

SQL> shutdown immediate


Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup restrict
ORACLE instance started.

Total System Global Area 884998144 bytes


Fixed Size 2044616 bytes
Variable Size 436211000 bytes
Database Buffers 440401920 bytes
Redo Buffers 6340608 bytes
Database mounted.
Database opened.
SQL> $ORACLE_HOME/rdbms/admin/csalter.plb
0 rows created.

Function created.

Function created.

Procedure created.

This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old 6: if (UPPER(‘&conf’) <> ‘Y’) then
new 6: if (UPPER(‘Y’) <> ‘Y’) then
Checking data validility…
begin converting system objects
15541 rows in table SYS.WRH$_SQL_PLAN are converted
1129 rows in table SYS.WRH$_SQLTEXT are converted
80 rows in table SYS.METASTYLESHEET are converted
421 rows in table SYS.WRI$_ADV_ACTIONS are converted
19 rows in table SYS.WRI$_DBU_HWM_METADATA are converted
87 rows in table SYS.WRI$_DBU_FEATURE_METADATA are converted
4 rows in table SYS.RULE$ are converted
978 rows in table SYS.WRI$_ADV_OBJECTS are converted
117 rows in table SYS.WRI$_DBU_FEATURE_USAGE are converted
1 row in table SYS.SCHEDULER$_EVENT_LOG is converted
354 rows in table SYS.WRI$_ADV_RATIONALE are converted

PL/SQL procedure successfully completed.

Alter the database character set…


CSALTER operation completed, please restart database

PL/SQL procedure successfully completed.

0 rows deleted.

Function dropped.

Function dropped.

Procedure dropped.

SQL> SELECT value$ FROM sys.props$ WHERE name =


‘NLS_CHARACTERSET’ ;
VALUE$
——————————————————————————–
UTF8

Step 7. Import application object/data with preceding export backup

While you start the importing job, you’ll find the screen output as following.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Export file created by EXPORT:V10.02.01 via conventional path


import done in AL32UTF8 character set and AL16UTF16 NCHAR character set
import server uses UTF8 character set (possible charset conversion)

Its clear that importing application object/data will automatically finish character set conversion.

Step 8. Compare related schemas with source database

From my experience, successful importing of application object/data doesnt mean the conversion is done
completely. To be safe, its highly recommended to compare all of related schemas between source database
and target database to make sure that there is no object is missing in target database.

In completing, this is my approach to complete character set conversion to UTF8 on


one of our application database. The above steps may not apply to other database or
application. If you find its helpful, thats great. If you have any concern, please refer
to related articles in Oracle Metalink.

This entry was posted in Oracle Case Study, Oracle Point and tagged Case Study,
Character Set, Oracle. Bookmark the permalink.
← Duplicate Oracle Database with RMAN
嘉樾练习棒球-(20100429) →

4 Responses to Changing Character Set to UTF8 for Oracle Database

1. My First Blog says:

November 5, 2010 at 5:46 am

Superb article, I discovered your website through Google. I bookmarked your site
for furture infomation, thanks.

Reply
2. louis vuitton mens wallets says:

November 6, 2010 at 10:38 am

It is clear to me that I must simply try to desist from that, at least partially.

Vous aimerez peut-être aussi