Vous êtes sur la page 1sur 17

Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) [ID  

260192.1]

  Modified 01-FEB-2010     Type BULLETIN     Status  


PUBLISHED

In this Document
  Purpose
  Scope and Application
  Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode)
     1.A) Prerequisites:
     1.B) When changing an Oracle Applications Database:
     1.C) When to use full export / import and when to use Alter Database Character Set / Csalter?
     1.D) When using Expdp/Impdp (DataPump)
     1.E) Using Alter Database Character Set on 9i
     2) Check the source database for:
     2.a) Invalid objects.
     2.b) Orphaned Datapump master tables (10g and up)
     2.c) Unneeded sample schema's/users.
     2.d) Objects in the recyclebin (10g and up)
     2.e) Leftover Temporary tables using CHAR semantics.
     3) Check the Source database for "Lossy" (invalid code points in the current source character set).
     4) Check for "Convertible" and "Truncation" data when going to AL32UTF8
     5) Dealing with "Truncation" data.
     6.a) Dealing with "Convertible" data.
     6.b) After any "Lossy" is solved, "Truncation" data is planned to be addressed and/or "Convertible"
exported / truncated / addressed run Csscan again as final check.
     7) Before using Csalter / Alter Database Character Set check the database for:
     7.a) Partitions using CHAR semantics:
     7.b) Functional indexes on CHAR semantics columns.
     7.c) SYSTIMESTAMP in the DEFAULT value clause for tables using CHAR semantics.
     7.d) Clusters using CHAR semantics.
     7.e) Unused columns using CHAR semantics
     7.f) Check that you have enough room to run Csalter or to import the "Convertible" data again
afterwards.
     8) Summary of steps needed to use Alter Database Character Set / Csalter:
     8.a) For 9i and lower:
     8.b) For 10g and up:
     9) Running Csalter/Alter Database Character Set
     9.a) For 8i/9i
     9.b) For 10g and up
     10) Reload the data pump packages after a change to AL32UTF8 in 10g and up.
     11) Import the exported data again.
     11.a) When using Csalter/Alter database and there was "Truncation" data in the csscan done in point
4:
     11.b) When using Full export/import and there was "Truncation" data in the csscan done in point 4:
     11.c) When using Csalter/Alter database and there was NO "Truncation" data, only "Convertible"
and "Changeless" in the csscan done in point 4:
     11.d) When using full export/import and there was NO "Truncation" data, only "Convertible" and
"Changeless" in the csscan done in point 4:
     12) Check your data
  References

Applies to:

Oracle Server - Enterprise Edition - Version: 8.0.3.0 to 11.2.0.1.0


Information in this document applies to any platform.
Purpose

To provide a guide to change the NLS_CHARACTERSET to AL32UTF8 or UTF8.


This note will only deal with the database (server side) change itself. For further implications
on clients and application level when going to AL32UTF8 please see Note 788156.1
AL32UTF8 / UTF8 (Unicode) Database Character Set Implications.
It's strongly recommended to read Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database
Character Set Implications first and to make sure your application and clients are checked
and ready for the change on database level.

This note was specific to FROM: WE8ISO8859P1, WE8ISO8859P15 or WE8MSWIN1252


TO: AL32UTF8 or UTF8
The current note however can be used to go from any NLS_CHARACTERSET to
AL32UTF8 / UTF8.
( which also means it can be used to go from UTF8 to AL32UTF8 (or inverse) ).

This "flow" can also be used to go from any single byte characterset (like US7ASCII,
WE8DEC) to any other Multi byte characterset (ZHS16GBK, ZHT16MSWIN950,
ZHT16HKSCS, ZHT16HKSCS31,KO16MSWIN949, JA16SJIS ...), simply substitute
AL32UTF8 with the xx16xxxx target characterset. But in that case going to AL32UTF8 would
be simply be a far better idea. Note 333489.1 Choosing a database character set means
choosing Unicode.

The note is written using AL32UTF8, to use this note to go to an other characterset (for
example UTF8) simply replace "AL32UTF8" with "UTF8" in the CSSCAN TOCHAR and for 9i
and lower in the alter database character set command.

Scope and Application

Any DBA changing the current NLS_CHARACTERSET to AL32UTF8 / UTF8 or an other


multibyte characterset. In this note AL32UTF8 will be used, but it's applicable to UTF8 or
other multibyte charactersets also.

The current NLS_CHARACTERSET is seen in NLS_DATABASE_PARAMETERS.

select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET'
;

Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode)

1) General remarks on going to AL32UTF8

1.A) Prerequisites:

In this note the Csscan tool is used. Please install this first
Note 458122.1 Installing and configuring CSSCAN in 8i and 9i
Note 745809.1 Installing and configuring CSSCAN in 10g and 11g
To have an overview of the output and what it means please see Note 444701.1 Csscan
output explained

1.B) When changing an Oracle Applications Database:

Please see the following note for an Oracle Applications database: Note 124721.1 Migrating
an Applications Installation to a New Character Set.
This is the only way supported by Oracle applications. If you have any doubt log an Oracle
Applications SR for assistance.
1.C) When to use full export / import and when to use Alter Database Character Set
/ Csalter?

Full exp/imp can be used at any time. To avoid data loss please do check your source
database with Csscan even when using full export / import  (= follow this note until point 6 and
then go to step 11).
Using Alter Database Character Set / Csalter has an advantage when the amount of
"Convertible" data is low compared to the amount of "Changeless" data and/or when
recreating the database will take a lot of time.

1.D) When using Expdp/Impdp (DataPump)

Do NOT use Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on


ALL 10g versions lower then 10.2.0.4 (including 10.1.0.5). Also 11.1.0.6 is affected.
It will provoke data corruption unless Patch 5874989 is applied on the Impdp side. Expdp is
not affected, hence the data in the dump file is correct.Also the "old" exp/imp tools are not
affected.
This problem is fixed in the 10.2.0.4 and 11.1.0.7 patch set.
For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later, see Note 276548.1 .
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later, see Note 342443.1 .

1.E) Using Alter Database Character Set on 9i

For 9i systems please make sure you are at least on Patchset 9.2.0.4, see Note 250802.1
Changing character set takes a very long time and uses lots of rollback space

2) Check the source database for:

2.a) Invalid objects.

select owner, object_name, object_type, status from dba_objects where status


='INVALID'; 

If there are any invalid objects, resolve / drop those.

2.b) Orphaned Datapump master tables (10g and up)

SELECT o.status, o.object_id, o.object_type, 
       o.owner||'.'||object_name "OWNER.OBJECT"
  FROM dba_objects o, dba_datapump_jobs j
 WHERE o.owner=j.owner_name AND o.object_name=j.job_name
   AND j.job_name NOT LIKE 'BIN$%' ORDER BY 4,2;

Note 336014.1 How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?

2.c) Unneeded sample schema's/users.

The 'HR', 'OE', 'SH', 'PM', 'IX', 'BI' and 'SCOTT' users are sample schemas. There is no point
in having these sample schemas in a production system. If sample schemas exist we suggest
to remove them.
This note is useful to identify users in your database Note 160861.1 Oracle Created Database
Users: Password, Usage and Files .
An other user that might be removed is SQLTXPLAIN from Note 215187.1

2.d) Objects in the recyclebin (10g and up)


conn / as sysdba
SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE from dba_recyclebin order by
1,2;

If there are objects in the recyclebin then perform

conn / as sysdba
PURGE DBA_RECYCLEBIN;

This will remove unneeded objects and otherwise during CSALTER an ORA-38301 will be
seen.

2.e) Leftover Temporary tables using CHAR semantics.

conn / as sysdba
select C.owner ||'.'|| C.table_name ||'.'|| C.column_name ||' ('||
C.data_type ||' '|| C.char_length ||' CHAR)'
from all_tab_columns C
where C.char_used = 'C'
and C.table_name in (select table_name from dba_tables where temporary='Y')
and C.data_type in ('VARCHAR2', 'CHAR')
order by 1;

These tables MAY (!) give during Alter database Charter Set or Csalter
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-14450: attempt to access a transactional temp table already in use.

Temporary tables should be recreated by the application when needed, so if you have tables
listed by above select it's good idea to confirm the application will recreate them if needed
and drop them now (or if the db is still in use now, do this just before the final Csscan run ,
point 6.b in this note).
If the reported tables are SYS.ORA_TEMP_X_DS_XXXX (like 
SYS.ORA_TEMP_1_DS_27681,
SYS.ORA_TEMP_1_DS_27686 ) they are leftovers of DBMS_STATS ( note:4157602.8 ) so
they can be dropped without problems at any time .

3) Check the Source database for "Lossy" (invalid code points in the current
source character set).

Run Csscan with the following syntax:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y FROMCHAR=<current
NLS_CHARACTERSET> TOCHAR=<current NLS_CHARACTERSET> LOG=dbcheck CAPTURE=N
ARRAY=1000000 PROCESS=2

* Always run Csscan connecting with a 'sysdba' connection/user,do not use "system" or
"csmig" user.

* The <current NLS_CHARACTERSET> is seen in NLS_DATABASE_PARAMETERS.


select value from NLS_DATABASE_PARAMETERS where
parameter='NLS_CHARACTERSET';

* the TOCHAR=<current NLS_CHARACTERSET> is not a typo, the idea is to check the


CURRENT charterset for codes who are not defined in this NLS_CHARACTERSET before
changing the NLS_CHARACTERSET

* The PROCESS= parameter influences the load on your system, the higher this is (6 or 8 for
example) the faster Csscan will be done, the lower this is the less impact it will have on your
system. Adapt if needed.
* The csscan SUPPRESS parameter limits the size of the .err file by limiting the amount of
information logged / table. Using SUPPRESS=1000 will log max 1000 rows for each table in
the .err file. It will not affect the information in the .txt file. It WILL affect the data logged in
the .err file. This is mainly useful for the first scan of big databases, if you have no idea how
much "Convertible" or "Lossy" there is in a database then this will avoid that the .err file
becomes 100's of MB big and it limits also the space used by the csscan tables under the
Csmig schema.

This will create 3 files :

dbcheck.out a log of the output of csscan


dbcheck.txt a Database Scan Summary Report
dbcheck.err contains the rowid's of the Lossy rows reported in dbcheck.txt (if any).

This is to check if all data is stored correctly in the current character set. Because the
TOCHAR and FROMCHAR character sets as the same there cannot be any "Convertible" or
"Truncation" data reported in dbcheck.txt.

If all the data in the database is stored correctly at the moment then there is only
"Changeless" data reported in dbcheck.txt. If this is the case please go to point 4).

If there is any "Lossy" data then those rows contain code points that are not currently defined
correctly and they should be cleared up before you can continue. If this "Lossy" is not
checked/corrected then this "Lossy" data WILL BE LOST.
Please see the following note more information about "Lossy" data Note 444701.1 Csscan
output explained. You can also use (after reading Note 444701.1 ) the flow in Note 225938.1
Database Character Set Healthcheck.

The most common situation is when having an US7ASCII/WE8ISO8859P1 database and


"Lossy", in this case changing your US7ASCII/WE8ISO8859P1 SOURCE database to
WE8MSWIN1252 using Alter Database Character Set / Csalter will most likely solve you
lossy. The reason is explained in Note 252352.1 Euro Symbol Turns up as Upside-Down
Questionmark. The flow to do this is found in Note 555823.1 Changing US7ASCII or
WE8ISO8859P1 to WE8MSWIN1252
Note that using Csscan alone is not enough, you will need to check your whole environment
to deduct the real encoding of the data. 

Do not blindly assume your data is WE8MSWIN1252 and this does not mean ALL lossy
can be "solved" by going to WE8MSWIN1252.

It cannot be repeated enough that


* if LOSSY need to be saved/corrected then it is NEEDED to change the
NLS_CHARACTERSET FIRST to the "real" characterset of the LOSSY in the source
database BEFORE going to AL32UTF8. If your WE8ISO8859P1 database has for example
Hebrew stored you NEED to go most likely to IW8MSWIN1255 before going to AL32UTF8
seen WE8ISO8859P1 simply does not define Hebrew.
* In some cases it is NOT possible the "correct" the "Lossy" data and then the best solution is
to update those rows with something meaningfull using SqlDeveloper.

Also do NOT use exp/imp to "correct" lossy, for example setting the NLS_LANG to
WE8MSWIN1252 while exporting "Lossy" data from a WE8ISO8859P1 database will NOT
solve the lossy, that data WILL be lost.

When preparing a test environment to debug this you can use 2 things: a) a restored physical
copy (backup) of the database or b) export/import (part of) the dataset in a database with the
same NLS_CHARACTERSET as the current source database.

If you use this note to go from AL32UTF8 to UTF8 (or inverse) and you have lossy then log a
SR and ask for a review by the "Advanced Resolution Team".
This select will give all the lossy objects found in the last Cssan run:

Note that when using the csscan SUPPRESS parameter this select may give incomplete
results (not all tables).

select distinct z.owner_name || '.' || z.table_name || '(' ||
          z.column_name || ') - ' || z.column_type || 
          ' ' LossyColumns
 from csmig.csmv$errors z
where z.error_type ='DATA_LOSS'
order by LossyColumns
/
Lossy in Data Dictionary objects

When using Csalter/Alter Database Character Set:

Most "Lossy" in the Data Dictionary objects will be corrected by correcting the database as a
whole, if the only "lossy" is found in Data Dictionary objects then follow the tips for
"Convertible" Data Dictionary data .
For example one common thing seen is "Lossy" found only in SYS.SOURCE$,  most of the
time this means some package source code contain illegal codes/bad data. You can use the
selects found in Note 291858.1 "SYS.SOURCE$ marked as having Convertible or Lossy data
in Csscan output" to find what objects are affected. Note that you CANNOT "fix"
SYS.SOURCE$ itself, you need to recreate the objects who's text is stored in
SYS.SOURCE$.

Do NOT truncate or export Data Dictionary objects itself unless this is said to be
possible in  Note 258904.1 .

When using full export/import into a new AL32UTF8 database:

When using export/import to a new database then "Lossy" in Data Dictionary objects is only
relevant when it concerns "Application data". The thing to check for is to see if there is no
"lossy" in tables like SYS.SOURCE$ (package source code) / SYS.COM$ (comments on
objects) / SYS.VIEW$ (view definitions) / SYS.COL$ (column names) or SYS.TRIGGER$
(triggers). The reason beeing is simply that these Data Dictionary objects objects contain
information about user objects or pl/sql code. If you have "convertible" there that's not a an
issue.
For most conversions , if there is "lossy" it will be in SYS.SOURCE$.

4) Check for "Convertible" and "Truncation" data when going to AL32UTF8

Run csscan with the following syntax:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y TOCHAR=AL32UTF8


LOG=TOUTF8 CAPTURE=Y ARRAY=1000000 PROCESS=2

This will create 3 files :


toutf8.out a log of the output of csscan
toutf8.txt the Database Scan Summary Report
toutf8.err contains the rowid's of the Convertible and Lossy rows reported in toutf8.txt

There should be NO entries under "Lossy" in toutf8.txt, because they should have been
filtered out in step 3), if there is "Lossy" data then please redo step 3).

If there are:
* Entries under "Truncation" then go to step 5)
* Entries for "Convertible" and "Changeless" but no "Truncation" then goto step 6).
* If you have NO entry's under the "Convertible", "Truncation" or "Lossy" and all data is
reported as "Changeless" then proceed to step 7) .
5) Dealing with "Truncation" data.

As explained in Note 788156.1, characters may use more BYTES in AL32UTF8 then in the
source characterset. Truncation data means this row won't fit in the current column definition
once converted to AL32UTF8.

"Truncation" data is always also "Convertible" data, which means that whatever way you do
the change, these rows have to be exported before the character set is changed and re-
imported after the character set has changed. If you proceed with that without dealing with the
truncation issue then the import will fail on these columns because the size of the data
exceeds the maximum size of the column.

Truncation issues will always require some work, there are a number of ways to deal with
them:
A) Update these rows in the source database so that they contain less data.
B) Update the table definition in the source database before exporting so that it can store
more BYTES or by using CHAR length semantics instead of BYTE length semantics (only
possible in Oracle9i and up).
C) Pre-create/adapt the table before the import so that it can contain 'longer' data. Again you
have a choice between simply making it larger in BYTES, or switching from BYTE to CHAR
length semantics.

Typically :
* when using Csalter/Alter database the columns are changed to CHAR semantics after going
to AL32UTF8 but before importing the exported "Convertible/Truncation" data again.
* when using full export import the tables are pre-created in the new database using CHAR
semantics before importing the data.

Note that in some cases the expansion in BYTES is bigger then the max datalength of the
datatype and then using CHAR sementics will not help. This is 2000 BYTES for CHAR and
4000 BYTES  for VARCHAR2.
In that case you or need to reduce the actual data or change to a datatype (like CLOB) that
will allow you to store that length.

Using CHAR semantics is further discussed in Note:144808.1 Examples and limits of BYTE
and CHAR semantics usage. This note has also a link to a script that can change all tables
from BYTE to CHAR semantics.

To know how much the data expands you can:


* Or use this procedure:

Note that when using the csscan SUPPRESS parameter this procedure may give incomplete
results (not all tables or not the correct minimal needed data size).

conn / as sysdba
set serveroutput on
DECLARE
 newmaxsz  NUMBER;
BEGIN
     FOR rec in
      ( SELECT distinct u.owner_name, u.table_name, u.column_name ,
               u.column_type, u.owner_id, u.table_id, u.column_id,
               u.column_intid FROM csmv$errors u
               WHERE u.error_type='EXCEED_SIZE' 
            order by u.owner_name, u.table_name, u.column_name)
      LOOP
        select MAX(cnvsize)INTO newmaxsz from csm$errors WHERE
               usr#=rec.owner_id and obj#=rec.table_id
               and col#=rec.column_id and intcol#=rec.column_intid;
        DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('||
            rec.column_name ||') - '|| rec.column_type ||' - '||
            newmaxsz || ' Bytes');
      END LOOP;
    END;
/

This will give the minimal amount of BYTES the column needs to be to accommodate the
expansion.

* Or check the Csscan output. You can find that in the .err file as "Max Post Conversion Data
Size" For example, check in the .txt file wich table has "Truncation", let's assume you have
there a row that say's:
-- snip from toutf8.txt
[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy
--------------------- ---------------- ---------------- ----------------
...
SCOTT.TESTUTF8 69 6 0
...

then look in the toutf8.err file for "TESTUTF8" until the "Max Post Conversion Data Size" is
bigger then the column size for that table.
-- snip from toutf8.err
User : SCOTT
Table : TESTUTF8
Column: ITEM_NAME
Type : VARCHAR2(80)
Number of Exceptions : 6
Max Post Conversion Data Size: 81
the max size after going to AL32UTF8 will be 81 bytes for this column.

Csalter/Alter Database Character Set has problems with functional indexes on /partitions
using CHAR based columns. See point 7). If you have functional indexes / partitions you can
only change those columns to CHAR semantics after the change to AL32UTF8. Any other
table columns can be changed to CHAR semantics before going to AL32UTF8 if required.

Truncation in Data Dictionary objects is rare and will be solved by using the steps for
"Convertible" Data Dictionary data.

While it's technically only needed to take action on the "Truncation" rows reported by Csscan
it's still a good idea to consider using CHAR Semantics for every column / variable in an
AL32UTF8 database.

6.a) Dealing with "Convertible" data.

Once any "Lossy" or "Truncation" is dealt with, full exp/imp to a new AL32UTF8 database can
be used.
Take the full export and goto point 11). The rest of this note until step 11) will deal only with
using Csalter/Alter Database Character Set combined with partial export/import.

When using Csalter/Alter Database Character Set all User/Application Data "Convertible"
data needs to be exported and truncated/deleted.

When using export/import (full or partial) using the "old" Exp/Imp tools the NLS_LANG setting
is simply AMERICAN_AMERICA.<source NLS_CHARACTERSET>. Expdp/imdpd does not
use the NLS_LANG for data conversion. Note 227332.1 NLS considerations in Import/Export
- Frequently Asked Questions
To check for constraint definitions on the tables before exporting and truncating them  Note
1019930.6 Script: To report Table Constraints can be used

The main challenge when using Csalter/Alter Database Character Set is most of the time
"Convertible" data in Data Dictionary objects.
* For 8i/9i ALL "Convertible" data in the Data Dictionary objects needs to be addressed.

* For 10g and up you do not need to take action on "convertible" Data Dictionary CLOB data.
Convertible CLOB in Data Dictionary objects is handled by Csalter, for CHAR, VARCHAR2
and LONG data however you do need to take action.

Please see Note 258904.1 Solving Convertible data in Data Dictionary objects when changing
the NLS_CHARACTERSET  for selects that gives a better overview then the Csscan *.txt file
output on what objects need action and how to solve common seen "Convertible" data for
Data Dictionary columns.
If there are Data Dictionary columns in your Csscan output that are not listed in Note
258904.1 please log a SR if you need help.

Do NOT truncate or export Data Dictionary objects itself unless this is said to be
possible in Note 258904.1

This note my be useful to identify Oracle created users in your database Note 160861.1
Oracle Created Database Users: Password, Usage and Files. This note may also be useful:
Note 472937.1 Information On Installed Database Components and Schema's.

To remove "Convertible" out of an Intermedia / Oracle Text Index (after it has been removed
from the table) please see Note 176135.1

6.b) After any "Lossy" is solved, "Truncation" data is planned to be addressed


and/or "Convertible" exported / truncated / addressed run Csscan again as final
check.

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\"
FULL=Y TOCHAR=AL32UTF8 LOG=TOUTF8FIN CAPTURE=Y ARRAY=1000000 PROCESS=2

6.b.1) For 8i/9i the Csscan output needs to be "Changeless" for all CHAR, VARCHAR2,
CLOB and LONG data (Data Dictionary and User/Application data).

In order to use "Alter Database Character Set" you need to see in the toutf8fin.txt file under
[Scan Summary] this message::
All character type data in the data dictionary remain the same in the new
character set
All character type application data remain the same in the new character set

If so, then continue in step 7)

6.b.2) For 10g and up the Csscan output needs to be

* "Changeless" for all CHAR VARCHAR2, and LONG data (Data Dictionary and
User/Application data )
* "Changeless" for all User/Application data CLOB
* "Changeless" and/or "Convertible" for all Data Dictionary CLOB

And in order to run Csalter you need to see in the toutf8fin.txt file under [Scan Summary] this
message:
All character type application data remain the same in the new character set
and under [Data Dictionary Conversion Summary] this message:
The data dictionary can be safely migrated using the CSALTER script

If you run Csalter without these conditions met then you will see messages like "
Unrecognized convertible data found in scanner result " in the Csalter output .

Before you can run Csalter you need


* to have above messages in the .txt file.
* to have that FULL=Y run been completed in the 7 days prior to running Csalter. So you can
only run Csalter in the 7 days following the "Clean" FULL=Y scan.
* to be sure the session running Csalter is the ONLY session connected to the database,
otherwise Csalter will give this warning 'Sorry only one session is allowed to run this script'.

7) Before using Csalter / Alter Database Character Set check the database for:

7.a) Partitions using CHAR semantics:

conn / as sysdba
select C.owner, C.table_name, C.column_name, C.data_type, C.char_length
        from all_tab_columns C, all_tables T
       where C.owner = T.owner
         and C.table_name = T.table_name
         and C.char_used = 'C'
         and T.PARTITIONED='YES'
         and C.table_name not in (select table_name from all_external_tables
)
         and C.data_type in ('VARCHAR2', 'CHAR')
      order by 1,2;

If there are, check out Note 330964.1, they will give "ORA-14265: data type or length of a
table subpartitioning column may not be changed" during the change to AL32UTF8.

7.b) Functional indexes on CHAR semantics columns.

conn / as sysdba
 select OWNER, INDEX_NAME, TABLE_OWNER, TABLE_NAME, STATUS, INDEX_TYPE,
        FUNCIDX_STATUS from DBA_INDEXES
   where INDEX_TYPE like 'FUNCTION-BASED%'
    
and TABLE_NAME in (select unique (table_name) from dba_tab_columns where 
     char_used ='C') order by 1,2;

If this gives rows back then the change to AL32UTF8 will fail with "ORA-30556: functional
index is defined on the column to be modified" or with "ORA-02262: ORA-904 occurs while
type-checking column default value expression" . If there are functional indexes on columns
using CHAR semantics (this is including Nchar, Nvarchar2 columns) the index need to be
dropped and recreated after the change. Note that a disable will not be enough.
The DDL of all those indexes can be found using:
conn / as sysdba
SET LONG 2000000
SET PAGESIZE 0
EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM,'S
TORAGE',false);
SELECT DBMS_METADATA.GET_DDL('INDEX',u.index_name,u.owner) FROM
     DBA_INDEXES u where u.INDEX_TYPE like 'FUNCTION-BASED%'
             and u.TABLE_NAME in
              (select unique (x.TABLE_NAME)
               from DBA_TAB_COLUMNS x where x.char_used ='C');
EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM,'D
EFAULT');

7.c) SYSTIMESTAMP in the DEFAULT value clause for tables using CHAR
semantics.

conn / as sysdba
set serveroutput on
BEGIN
     FOR rec in
      ( SELECT OWNER, TABLE_NAME, COLUMN_NAME, DATA_DEFAULT
          FROM dba_tab_columns where CHAR_USED='C')
      LOOP
       IF UPPER(rec.DATA_DEFAULT) LIKE '%TIMESTAMP%' THEN
        DBMS_OUTPUT.PUT_LINE(rec.OWNER ||'.'|| rec.TABLE_NAME ||'.'|| rec. C
OLUMN_NAME);
       END IF;
      END LOOP;
    END;
    /

This will give ORA-604 error occurred at recursive SQL level %s , ORA-1866 the datetime
class is invalid during the change to AL32UTF8.
The workaround is to temporary change affected tables to use a DEFAULT NULL clause eg:
ALTER TABLE tab MODIFY ( col ... DEFAULT NULL NOT NULL );
After the character set change the default clause can be restored.

7.d) Clusters using CHAR semantics.

conn / as sysdba
select OWNER, OBJECT_NAME from ALL_OBJECTS where OBJECT_TYPE = 'CLUSTER'
 and OBJECT_NAME in (select unique (TABLE_NAME) from DBA_TAB_COLUMNS where
 char_used ='C') order by 1,2;

If this gives rows back then the change will fail with "ORA-01447: ALTER TABLE does not
operate on clustered columns". Those clusters need to be dropped and recreated after the
change.

7.e) Unused columns using CHAR semantics

conn / as sysdba
select OWNER, TABLE_NAME from DBA_UNUSED_COL_TABS where TABLE_NAME
 in (select unique (TABLE_NAME) from DBA_TAB_COLUMNS where
 char_used ='C') order by 1,2;

Unused columns using CHAR semantics will give an ORA-00604: error occurred at recursive
SQL level 1 with an "ORA-00904: "SYS_C00002_09031813:50:03$": invalid identifier". Note
that the "SYS_C00002_09031813:50:03$" will change for each column. These unused
columns need to be dropped.

7.f) Check that you have enough room to run Csalter or to import the "Convertible"
data again afterwards.

In 10g and up verify at least in toutf8.txt/toutf8fin.txt the "Expansion" column found under
[Database Size] and check you have at least 2 times the expansion listed for SYSTEM
tablespace free. This is the size needed for Csalter to update Data Dictionary CLOB.
Otherwise you will see errors like "ORA-01691: unable to extend lob segment
SYS.SYS_LOB0000058943C00039$$ by 1598 in tablespace SYSTEM " during Csalter.
In general (for any version) it's a good idea to check the "Expansion" column and see that
there is enough space in each listed tablespace.
The Expansion column gives an estimation on how much more place you need in that
tablespace when going to the new characterset. The Tablespace Expansion for tablespace X
is calculated as the grand total of the differences between the byte length of a string
converted to the target character set and the original byte length of this string over all strings
scanned in tables in X. The distribution of values in blocks, PCTFREE, free extents, etc., are
not taken into account.

8) Summary of steps needed to use Alter Database Character Set / Csalter:

8.a) For 9i and lower:


8.a.1) Export all the "Convertible" User/Application Data data (make sure that the character
set part of the NLS_LANG is set to the current database character set during the export
session)
8.a.2) If you have "convertible" data for the sys objects SYS.METASTYLESHEET,
SYS.RULE$ or SYS.JOB$ then follow the following note for those objects: Note 258904.1
Convertible data in Data Dictionary: Workarounds when changing character set.
8.a.3) Truncate the exported tables of point 8.a.1).
8.a.4) Run csscan again with the syntax of point 6.b) to verify you only have "changeless"
User/Application Data left
8.a.5) If this now reports only Changeless data then proceed to step 9), otherwise do the
same again for the rows you've missed out.
8.a.6) Adapt any columns if needed to avoid "Truncation"
8.a.7) Import the exported data again.

8.b) For 10g and up:

8.b.1) Export all the "Convertible" User/Application Data (make sure that the character set
part of the NLS_LANG is set to the current database character set during the export session)
8.b.2) Fix any "Convertible" in the SYS schema using Note 258904.1, All "9i only" fixes in
Note 258904.1 Convertible data in Data Dictionary: Workarounds when changing character
set should NOT be done in 10g and up.
8.b.3) Truncate the exported tables of point 8.b.1).
8.b.4) Run csscan with the syntax of point 6.b) to verify you only have "convertible" CLOB in
the Data Dictionary and all other data is "Changeless".
8.b.5) If this is now correct then proceed to step 9), otherwise do the same again for the rows
you've missed out.

When using Csscan in 10g and up the toutf8.txt or toutf8fin.txt need to contain this before
doing step 9):

The data dictionary can be safely migrated using the CSALTER script
and
All character type application data remain the same in the new character set

If this is NOT seen in the toutf8.txt then Csalter will NOT work and this means something is
missed or not all steps in this note are followed.

8.b.6) Adapt any columns if needed to avoid "Truncation"


8.b.7) Import the exported data again.

9) Running Csalter/Alter Database Character Set

Please perform a backup of the database. Check the backup. Double-check the backup.

9.a) For 8i/9i

Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's the
sqlplus session where you do the change.

9.a.1) Make sure the PARALLEL_SERVER (8i) and CLUSTER_DATABASE parameter are
set to false or it is not set at all. When using RAC you will need to start the database in single
instance with CLUSTER_DATABASE = FALSE
conn / as sysdba
sho parameter CLUSTER_DATABASE
sho parameter PARALLEL_SERVER

9.a.2) Execute the following commands in sqlplus connected as "/ AS SYSDBA":


conn / as sysdba
SPOOL Nswitch.log
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8;
SHUTDOWN IMMEDIATE;
-- in 8i you need to do another startup/shutdown
STARTUP;
SHUTDOWN;

An alter database takes typically only a few minutes or less, it depends on the number of
columns in the database, not the amount of data. Without the INTERNAL_USE you get a
ORA-12712: new character set must be a superset of old character set

9.a.3) Restore the PARALLEL_SERVER (8i) and CLUSTER_DATABASE parameter if


necessary and start the database. For RAC start the other instances.

WARNING WARNING WARNING

Do NEVER use "INTERNAL_USE" unless you did follow the guidelines STEP BY STEP here
in this note and you have a good idea what you are doing.

Do NEVER use "INTERNAL_USE" to "fix" display problems, but follow Note:179133.1 The
correct NLS_LANG in a Windows Environment or Note:264157.1 The correct NLS_LANG
setting in Unix Environments

If you use the INTERNAL_USE clause on a database where there is data listed as convertible
without exporting that data then the data will be corrupted by changing the database
character set !

9.b) For 10g and up

Csalter.plb needs to be used within 7 days after the Csscan run, otherwise you will get a 'The
CSSCAN result has expired' message.

Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's the
sqlplus session where the change is done. RAC systems need to be started as single
instance.

Run in sqlplus connected as "/ AS SYSDBA":

conn / as sysdba
-- Make sure the CLUSTER_DATABASE parameter is set
-- to false or it is not set at all.
-- If you are using RAC you will need to start the database in single instan
ce
-- with CLUSTER_DATABASE = FALSE
sho parameter CLUSTER_DATABASE
--  if you are using spfile note the
sho parameter job_queue_processes
sho parameter aq_tm_processes
-- (this is Bug 6005344 fixed in 11g )
-- then do

shutdown
startup restrict
SPOOL Nswitch.log
-- do this alter system or you might run into "ORA-22839: Direct updates on 
SYS_NC columns are disallowed"
-- This is only needed in 11.1.0.6, fixed in 11.1.0.7, not applicable to 10.
2 or lower
-- ALTER SYSTEM SET EVENTS '22838 TRACE NAME CONTEXT LEVEL 1,FOREVER';

then run Csalter.plb


 @?/rdbms/admin/csalter.plb

-- Csalter will aks confirmation - do not copy paste the whole actions on on
e time
-- sample Csalter output:

-- 3 rows created.
...
-- This script will update the content of the Oracle Data Dictionary.
-- Please ensure you have a full backup before initiating this procedure.
-- Would you like to proceed (Y/N)?y
-- old 6: if (UPPER('&conf') <> 'Y') then
-- New 6: if (UPPER('y') <> 'Y') then
-- Checking data validility...
-- begin converting system objects

-- PL/SQL procedure successfully completed.

-- Alter the database character set...
-- CSALTER operation completed, please restart database

-- PL/SQL procedure successfully completed.
...
-- Procedure dropped.

-- if you are using spfile then you need to also

-- ALTER SYSTEM SET job_queue_processes=<original value> SCOPE=BOTH;
-- ALTER SYSTEM SET aq_tm_processes=<original value> SCOPE=BOTH;

shutdown
startup

and the database will be AL32UTF8.

Note: in 10.1 csalter is asking for "Enter value for 1: ".

-- Would you like to proceed ?(Y/N)?Y
-- old 5: if (UPPER('&conf') <> 'Y') then
-- new 5: if (UPPER('Y') <> 'Y') then
-- Enter value for 1:

-> simply hit enter.

10) Reload the data pump packages after a change to AL32UTF8 in 10g and up.

For 10g or up the datapump packages need to be reloaded after a conversion to AL32UTF8.
In order to do this run the following scripts from $ORACLE_HOME/rdbms/admin in sqlplus
connected as "/ AS SYSDBA":

For 10.2.X and higher:


catnodp.sql
catdph.sql
catdpb.sql

For 10.1.X:
catnodp.sql
catdp.sql

In some cases exp (the original export tool) fails in 10g after changing to AL32UTF8. please
see Note 339938.1  Full Export From 10.2.0.1 Aborts With EXP-56 ORA-932 (Inconsistent
Datatypes) EXP-0

11) Import the exported data again.

Note that if you had in the Csscan done in point 4) ONLY "Changeless" and NO "Convertible"
(this is not often seen) then there is no data to import when using Csalter/Alter database.

11.a) When using Csalter/Alter database and there was "Truncation" data in the
csscan done in point 4:

Truncation data is always ALSO "Convertible", it's "Convertible" data that needs action before
you can import this again. If there was "Truncation" then typically this is handled by pre-
creating the tables using CHAR semantics or enlarged column size in bytes after changing
the database to AL32UTF8 using Csalter/Alter database and before starting the import.

Note that simply setting NLS_LENGTH_SEMANTICS=CHAR in the init.ora will NOT work to
go to CHAR semantics.

Once the measures for solving the "Truncation" are in place you can then import the
"Truncation/Convertible" data.

Set the parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note


779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when
importing
Use the IGNORE=Y parameter for imp or the TABLE_EXISTS_ACTION=TRUNCATE option
for Impdp to import the data into the pre-created tables.

Import the exported data back into the -now AL32UTF8- database, when using export/import
using the "old" Exp/Imp tools the NLS_LANG setting is simply
AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR
AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.
Once the data is imported goto step 12.

11.b) When using Full export/import and there was "Truncation" data in the csscan
done in point 4:

Truncation data is always ALSO "Convertible", it's "Convertible" data that needs action before
you can import this again. If there was "Truncation" then typically this is handled by pre-
creating the tables using CHAR semantics or enlarged column size in bytes after creating the
new AL32UTF8 and before starting the import.

Note that simply setting NLS_LENGTH_SEMANTICS=CHAR in the init.ora will NOT work to
go to CHAR semantics.

Once the measures for solving the "Truncation" are in place you can then import the 
"Truncation/Convertible" data.

Set the parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note


779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when
importing
Use the IGNORE=Y parameter for imp or the TABLE_EXISTS_ACTION=TRUNCATE option
for Impdp to import the data into the pre-created tables.
Import the exported data back into the new AL32UTF8 database, when using export/import
using the "old" Exp/Imp tools the NLS_LANG setting is simply
AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR
AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.
Once the data is imported goto step 12.

11.c) When using Csalter/Alter database and there was NO "Truncation" data, only
"Convertible" and "Changeless" in the csscan done in point 4:

Set the parameter BLANK_TRIMMING=TRUE to avoid the problem documented in Note


779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when
importing

Import the exported data back into the -now AL32UTF8- database, when using export/import
using the "old" Exp/Imp tools the NLS_LANG setting is simply
AMERICAN_AMERICA.<source NLS_CHARACTERSET> OR
AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.

Once the data is imported goto step 12.

11.d) When using full export/import and there was NO "Truncation" data, only
"Convertible" and "Changeless" in the csscan done in point 4:

Create a new AL32UTF8 database, set the parameter BLANK_TRIMMING=TRUE to avoid


the problem documented in Note 779526.1 CSSCAN does not detect data truncation for
CHAR datatype - ORA-12899 when importing

Import the exported data into the new AL32UTF8 database, when using export/import using
the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA.<source
NLS_CHARACTERSET> OR AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.

Once the data is imported goto step 12.

12) Check your data

Use a correctly configured client or Oracle SQL Developer / iSqlplus and verify you data.
Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications.

For RAC restore the CLUSTER_DATABASE parameter, remove the


BLANK_TRIMMING=TRUE parameter if needed and restart the instance(s).
The Csmig user can also be dropped.
If you did not use CHAR semantics for all CHAR and VARCHAR2 columns and pl/sql
variables it might be an idea to consider this.

References

BUG:5172797 - DATA PUMP THROWING ORA-600 [[KWQBESPAYL:PICKLE] WHEN


CONVERTING TO UTF8
NOTE:179133.1 - The correct NLS_LANG in a Windows Environment
NOTE:225912.1 - Changing the Database Character Set ( NLS_CHARACTERSET )
NOTE:264157.1 - The correct NLS_LANG setting in Unix Environments
NOTE:444701.1 - Csscan output explained
NOTE:458122.1 - Installing and Configuring Csscan in 8i and 9i (Database Character Set
Scanner)
NOTE:745809.1 - Installing and configuring Csscan in 10g and 11g (Database Character Set
Scanner)
NOTE:788156.1 - AL32UTF8 / UTF8 (Unicode) Database Character Set Implications

Related

Products

 Oracle Database Products > Oracle Database > Oracle Database > Oracle Server -
Enterprise Edition

Keywords

AL32UTF8; CHARACTERSET; CSSCAN; MULTIBYTE; NLS_CHARACTERSET;


UNICODE
Errors

XP-56; XP-0; EXP-0; EXP-56; ORA-30556; ORA-932; ORA-904; ORA-14265; ORA-


38301; ORA-12712; ORA-1866; ORA-22839; ORA-2262; ORA-12899; ORA-1691; ORA-
1447; ORA-604; ORA-12710; ORA-14450; 604 ERROR

Vous aimerez peut-être aussi