Académique Documents
Professionnel Documents
Culture Documents
red as-is with no translation, but character data always undergoes translation w
hen the client and server character sets are not identical.
Character set translation may be more significant than it appears. Even if all
of your computers use ASCII, there are different implementations of ASCII. Most
Unix servers use a seven bit ASCII, while web applications use an ISO code page
and Macintosh has an eight bit ASCII character set all its own. In general, yo
u should never store binary data in a VARCHAR2 or LONG data type, because it cou
ld get confused in a heterogenous computing environment.
Another important difference between character and binary data is that each offe
rs a different set of built-in operations that may be performed. Character data
can be capitalized and lower cased, for example, while binary data (starting in
Oracle 7.3, anyway) can be bitwise ANDed and ORed.
Limitations Imposed on Some Data Types
Because LONG and LONG RAW data types are unconstrained when stored in the databa
se, there are limitations on what Oracle can do with them. Traditional characte
r manipulation functions like upper() and initcap() cannot be used with LONG dat
a. Likewise, binary manipulation functions like rawtohex() and utl_raw.bit_and(
) cannot be used with LONG RAW data.
LONG and LONG RAW data also cannot be used as criteria in a WHERE clause, and ce
rtain SQL functionality like INSERT INTO...SELECT is not supported when LONG or
LONG RAW data types are involved. Further, a table may have at most one column
with the LONG or LONG RAW data type, and Oracle s Replication Option will not repl
icate LONG or LONG RAW data..
Prior to Oracle 7.3, the OCI does not support piecewise variable binding for LON
G and LONG RAW data. This means that if an application wants to insert a large i
mage into the database, for example, it must first store the entire image in one
contiguous block of memory before issuing the INSERT statement to Oracle. (Beg
inning in Oracle 7.3, an application can submit the data in separate pieces that
will be concatenated when inserted into the database.)
Additional Data Types in Oracle 8
Oracle 8 offers a whole new set of data types that will prove useful for storing
unstructured data in the database:
*
CLOB: large character data from a single-byte character set
*
NCLOB: large character data, possibly multi-byte
*
BLOB: large binary data
*
BFILE: large binary data, with the data itself stored entirely outside t
he database
CLOBs and NCLOBs are similar to LONGs in Oracle 7 (and BLOBs are similar to LONG
RAWs), but there are some key differences: The new data types, collectively cal
led LOBs, are stored much more efficiently than LONGs and LONG RAWs. Also a new
PL/SQL package called DBMS_LOB makes LOBs much easier to work with in PL/SQL and
new OCI calls make LOBs easier for external applications to work with. Oracle
8 supports piecewise random access of LOBs, allowing applications to access just
the part of a LOB they are interested in instead of having to fetch an entire L
OB.
The BFILE data type allows you to store just a pointer to the data in the databa
se, with the data itself stored in a file outside the database. In this case the
database will not be able to ensure the integrity or availability of the binary
data.
Should You Store Unstructured Data in Oracle?
Most applications have a need for at least some unstructured data. The question
of where to store this data often comes up during application design. You coul
d use some of the data types described in the previous section to store the unst
ructured data entirely in the database, or you could store the unstructured data
in operating system files outside the database and store just the names of the
files in the database.
Suppose you are building a search application for a web site where the user can
get a list of web pages that contain a certain word or phrase. You could store
the actual documents to be searched inside the database in a table created as fo
llows:
7, unstructured data is stored in the same blocks as the structured data. Each
row of an employee table that includes employee photographs, for example, might
require 50 bytes to store structured data (such as employee name and department)
and 2000 bytes to store the image. To retrieve the names of 100 employees woul
d require accessing 205K of data because the employee names are sandwiched in be
tween the images. If the images were stored elsewhere, this query would only re
quire accessing 5K of data.
Also, Oracle s robust transaction control and read consistency model comes at a co
st. Each Oracle block has an area set aside for transaction control, and this m
eans that you can t fit 8K of data in an 8K Oracle block. So unstructured data ta
kes up more storage space when stored in Oracle instead of operating system file
s. Disks are cheap these days, but also think about the extra memory you d want t
o devote to the buffer cache, and the extra I/Os that will be necessary to acces
s the data.
Oracle s robust recoverability comes at a high cost, too. When data is inserted i
nto a table in Oracle, the data is written to a rollback segment and the redo lo
g in addition to being written into the table itself. This causes a lot of extr
a disk I/O when loading data into an Oracle database up to three times as much as
simply writing the data to operating system files.
Oracle s backup methods can also be extremely inefficient when large volumes of un
structured data that changes now and then is stored in the database. Complete d
atabase recoveries can only be made from backups taken at the tablespace level,
meaning that you can t backup only the individual data that has changed since the
last backup. You can mark an entire tablespace as read-only and then you don t ne
ed to back it up more than once, but as soon as you want to update one record of
data in that tablespace, you will need to back up the entire tablespace again.
This means that if you store large volumes of unstructured data in your databas
e and some of this data changes from time to time, then backing up this data wil
l be much more time consuming than if the data had been stored in operating syst
em files outside the database. This is because most operating systems allow you
to backup only files that have changed since the last backup, while Oracle has
the all or nothing approach at the tablespace level.
Yet another concern about storing unstructured data in the database is that it c
an lead to database instability. As we discussed earlier, long pieces of unstru
ctured data can lead to many chained rows in the database. It s an unfortunate fa
ct of life that the Oracle database can become unstable in situations where huge
numbers of chained rows exist. Oracle has definitely fixed a lot of bugs in th
is area over the years, but my clients have consistently run into stability prob
lems when storing large volumes of frequently changing large unstructured data i
n Oracle. As recently as Oracle release 7.3.2.1.0 running on Sun Solaris 2.5.1,
I ve had to help clients rebuild tables and recreate data lost to ORA-00600 error
s and interesting corruptions such as an 1186 byte value appearing in a VARCHAR2
(1) column.
Manipulating Unstructured Data in Oracle
First, a schema design tip: If you choose to store unstructured data in Oracle u
sing the LONG or LONG RAW data types, store the unstructured data in a separate
table used just for that purpose. The table should hold only the unstructured d
ata itself and a foreign key to link the data to structured data. This foreign
key should also be the primary key for the table. Use Oracle s declared integrity
constraints in order to ensure integrity between the structured and unstructure
data. This strategy can improve performance when accessing the structured data
(by reducing chaining and interleaving with unstructured data) and will confine
any potential database instability to the unstructured data only. Sample emplo
yee tables might be created as follows:
CREATE TABLE employees
(
employee_id NUMBER(9)
PRIMARY KEY,
first_name
VARCHAR2(15) NOT NULL,
last_name
VARCHAR2(15) NOT NULL,
hire_date
DATE
NOT NULL,
dept_id
NUMBER(9)
NOT NULL
);
CREATE TABLE employee_photos
(
employee_id NUMBER(9)
PRIMARY KEY,
photo_image LONG RAW
NOT NULL,
CONSTRAINT employee_photos_employee
FOREIGN KEY (employee_id)
REFERENCES employees ON DELETE CASCADE
);
Following are some tips you can use when manipulating unstructured data. These
methods will help you get around the limitations Oracle imposes on the RAW, LONG
, and LONG RAW data types.
Using SQL and PL/SQL
VARCHAR2 and RAW variables in PL/SQL blocks may be declared with lengths up to 3
2,767 bytes. These variables may be manipulated within PL/SQL statements with a
ll of the regular built-in character and binary functions, even though they can
have lengths longer than the usual 2000 or 255 bytes respectively. This can all
ow you to process longer data, as long as it is still shorter than 32K. Figure
1 shows a simple PL/SQL function that can search for text in LONGs up to 32,767
bytes in length.
CREATE TABLE html_page_texts
(
page_id NUMBER PRIMARY KEY,
html_text LONG
);
CREATE FUNCTION my_instr
(
search_page IN NUMBER,
search_text IN VARCHAR2,
start_pos IN NUMBER DEFAULT 1
) RETURN NUMBER
AS
data VARCHAR2(32767);
BEGIN
SELECT html_text
INTO data
FROM html_page_texts
WHERE page_id = search_page;
RETURN INSTR (data, search_text,
start_pos);
END my_instr;
Figure 1: A PL/SQL function that can search for text in a LONG column, provided
the LONG data never exceeds 32,767 bytes in length.
The my_instr() function show in Figure 1 is simplistic and limited in usefulness
, but offers an example of the possibilities. Of course, this technique only wo
rks when your LONG or LONG RAW data won t exceed 32K in length.
Starting in Oracle release 7.2, there is a way to fetch LONG data of arbitrary l
ength in PL/SQL by fetching and processing the data in parts that are each of a
managable size. Figure 2 shows an enhanced version of the my_instr() function,
now supporting LONG data of virtually any length.
CREATE FUNCTION my_instr2
(
search_page IN NUMBER,
search_text IN VARCHAR2,
start_pos IN NUMBER DEFAULT 1
) RETURN NUMBER
AS
c
NUMBER := dbms_sql.open_cursor;
i
NUMBER;
pos NUMBER := start_pos;
len NUMBER;
data VARCHAR2(32767);
BEGIN
dbms_sql.parse (c, 'SELECT html_text FROM html_page_texts WHERE page_id = :p',
dbms_sql.native);
dbms_sql.bind_variable (c, 'p', search_page);
dbms_sql.define_column_long (c, 1);
i := dbms_sql.execute_and_fetch (c);
IF i = 1 THEN
LOOP
dbms_sql.column_value_long (c, 1, 32767, pos, data, len);
i := INSTR (data, search_text);
IF i > 0 THEN
dbms_sql.close_cursor (c);
RETURN i + pos - 1;
END IF;
EXIT WHEN len < 32767;
pos := pos + 32767 - LENGTH (search_text);
END LOOP;
END IF;
dbms_sql.close_cursor (c);
RETURN NULL;
END my_instr2;
Figure 2: A PL/SQL function that can search for text in LONG data of virtually a
ny length.
Prior to Oracle 7.3, there is not a whole lot you can do to manipulate binary da
ta. You can use the hextoraw() and rawtohex() functions to convert binary data
into hexadecimal character strings, but that s about it. Starting in Oracle 7.3,
though, you get the UTL_RAW built-in package which allows you to bitwise AND and
OR, concatenate, pattern match, and complement binary data among other things.
(UTL_RAW might not be covered in your documentation and it may not be installed
by the Oracle Installer, but look for the file utlraw.sql in your rdbms/admin d
irectory; the package specification contains enough comments to get you going.)
Remember that beginning in Oracle release 7.1 you can embed PL/SQL functions in
ordinary SQL statements. This allows you to use the UTL_RAW functions in ordina
ry SQL. Here s a simple example:
CREATE TABLE application_users
(
user_id
NUMBER PRIMARY KEY,
user_name
VARCHAR2(30),
session_state RAW(10)
);
SELECT user_id, user_name
FROM application_users
WHERE UTL_RAW.COMPARE (session_state, hextoraw ('FF0088A3B5')) IN (4, 5);
The ability to embed PL/SQL functions into straight SQL statements can be combin
ed with the ability to better manipulate LONG and LONG RAW data in PL/SQL to all
ow accessing LONG data in the WHERE clause of a SQL statement. Using the my_ins
tr() function shown in Figure 1, here is a SQL statement that searches LONG data
in the WHERE clause:
SELECT page_id
FROM html_page_texts
WHERE my_instr (page_id, '<TITLE>Database Specialists, Inc.</title> <!-- Graphi
cs and Design by Write Design --> ') > 0;
User beware: Referencing a PL/SQL function in a SQL statement is not as efficien
SELECT
FROM
WHERE
AND
*
sys.dba_views
owner = 'SYS'
view_name = 'USER_DB_LINKS';
OWNER
VIEW_NAME
TEXT_LENGTH
------------------------------ ------------------------------ ----------TEXT
------------------------------------------------------------------------------SYS
USER_DB_LINKS
107
select l.name, l.userid, l.password, l.host, l.ctime
from sys.link$ l
where l.ow
SQL> SET LONG 5000
SQL> LIST
1 SELECT *
2 FROM sys.dba_views
3 WHERE owner = 'SYS'
4* AND
view_name = 'USER_DB_LINKS'
SQL> /
buffer overflow. Use SET command to reduce ARRAYSIZE or increase MAXDATA.
SQL> SET ARRAYSIZE 1
SQL> LIST
1 SELECT *
2 FROM sys.dba_views
3 WHERE owner = 'SYS'
4* AND
view_name = 'USER_DB_LINKS'
SQL> /
OWNER
VIEW_NAME
TEXT_LENGTH
------------------------------ ------------------------------ ----------TEXT
------------------------------------------------------------------------------SYS
USER_DB_LINKS
107
select l.name, l.userid, l.password, l.host, l.ctime
from sys.link$ l
riting the export file. Since seven bit ASCII does not offer equivalents for ma
ny of the special Mac characters, question marks were substituated in their plac
e. The export file now did not represent the data because many quotes and apost
rophes and other special characters in the data had been replaced by question ma
rks. The problem was ultimately resolved by setting Unix environment variables
to trick Oracle into thinking the export was being taken in a Mac character set
environment and thus disabling character set translation.
Granted, this was a pretty exotic situation, but it demonstrates the possible is
sues character set translation can create.
Conclusion
Oracle provides a variety of data types to allow storage of unstructured data in
a relational database. Storing your unstructured data in Oracle gives you the
benefits of recoverability, transaction control, and access from SQL and PL/SQL.
But these benefits come at some cost less efficient physical storage, slower loa
ding due to redo log and rollback segment I/O, and slower backups to name a few.
Ultimately the pros and cons must be considered on an application by application
basis in order to determine whether or not to store unstructured data in the da
tabase instead of operating system files outside the database.
As a high level generalization, though, it seems that if you will have large vol
umes of sometimes changing unstructured data you probably should not store it in
an Oracle 7 database. (If you re using Oracle 8 you might want to try it.) At t
he other end of the spectrum, if you have small pieces of unstructured data that
change frequently and must be kept synchronized with structured data and transa
ctions, then it would be beneficial to store the unstructured data in either an
Oracle 7 or Oracle 8 database.
Clearly, the trend is moving toward storing more in your database and less in op
erating system files. Back in Oracle V6, LONGs and LONG RAWs were limited to 64
K in size. Oracle 7.0 raised the bar to two gigabytes. Oracle 7.2 brought piec
ewise fetching of arbitrary length LONG data, and Oracle 7.3 brought PL/SQL func
tions for manipulating binary data. Oracle 8 took support for unstructured data
to a new level with the new LOB family of data types, the DBMS_LOB built-in pac
kage, and more efficient physical storage. The steady progress Oracle is making
in the area of support for unstructured data means more exciting applications f
or Oracle databases in the future. This is great news for all those committed t
o Oracle technology.
The author is a DBA who has been working with Oracle database technology since O
racle V5.1 in the late 1980s. Although identifying characteristics have been ch
anged to protect client confidentiality, the anecdotes and code samples presente
d here are all based on real life situations encountered by Oracle users. If yo
u have any questions or comments, you may contact the author by email at rschrag
@dbspecialists.com.