Vous êtes sur la page 1sur 10

Data Obfuscation and Encryption

Jared Still
Certified Oracle DBA and Part Time Perl Evangelist
jkstill@cybcon.com

Data encryption is a hot topic these days. This is hardly a new subject, but has
received and increasing amount of attention, largely due to ecommerce. Protecting
credit card numbers, medical data and other sensitive information has become more
important than ever before, and on a larger scale.

Before launching into a discussion on algorithm choices and methods of implementing


encryption, it is important to consider some related decisions that need to be made first.
It may become clear that encryption is not necessarily what is required.

Please keep in mind that encryption in this article refers to some method of modifying
data so that it is meaningless and unreadable in it’s encrypted form. It also must be
reasonably secure, that is it must not be easily decrypted without the proper key.

Anything less than that will be referred to as obfuscation. This is data that is rendered
unusable by some means, but is not considered as a serious form of encryption.
Examples will follow.

Obfuscation

Why would you want to merely obfuscate data, rather than use a strong encryption
algorithm?

A good example would be an audit report on a medical system. This report may be
generated for an external auditor, and contain sensitive information. The auditor will be
examining the report for information that indicates possible cases of fraud or abuse.

Assume that management has required that Names, Social Security Numbers and other
personal information should not be available to the auditor except on an as needed
basis.

The data needs to be presented to the auditor, but in a way that allows the examination
of all data, so that patterns in the data may be detected.

Encryption would be a poor choice in this case, as the data would be rendered into
ASCII values outside of the range of normal ASCII characters. This would be
impossible to read.
A better choice might be to obfuscate the data with a simple substitution cipher. While
this is not considered encryption, it may be suitable for this situation.

When the auditor finds a possible case of abuse, he will need the real name and SSN of
the party involved. He could obtain this by calling a customer service representative at
the insurance company that supplied the report, and ask for the real information.

The obfuscated data is read to the customer service rep, who then inputs it into an
application that supplies the real data.

The importance of using pronounceable characters becomes very clear. Strong


encryption would render this impossible.

Here’s some simple example code to do the obfuscation:


create or replace package obfs
is

function obfs( varchar2 in ) return varchar2;

pragma restrict_references( obfs, WNPS, WNDS );

function unobfs( varchar2 in ) return varchar2;


pragma restrict_references( unobfs, WNPS, WNDS );

end;
/

create or replace package body obfs


is

xlate_from varchar2(62) :=
‘0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz’;

xlate_to varchar2(62) :=
‘nopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm’;

function obfs ( clear_text_in varchar2 ) return varchar2


is
begin
return translate( clear_text_in, xlate_from, xlate_to );
end;

function unobfs ( obfs_text_in varchar2 ) return varchar2


is
begin
return translate( obfs_text_in, xlate_to, xlate_from );
end;

end;
/

Here is some sample output:

SSN OBFS SSN


---------- ----------
540407786 srnrnuuvt
542800170 srpvnnoun
542802063 srpvnpntq
541466830 srorttvqn

As you can see, it wouldn’t be very difficult to decipher this scheme given enough data.
A somewhat more effective method involves chopping the text into segments and
rearranging it as well as obfuscating it. Below is some sample output from this
algorithm.

OBFS OBFS
---------- ----------
540407786 &24B23B&Z
542800170 -4B*23&&&
542802063 -4Z&23-&_
541466830 *2_423ZZ&

While this is still not encryption, this data would be more difficult to decipher without the
key. Source code for this in PL/SQL is available at the URL provided at the end of this
article.

Another way to hide sensitive data is through masking. This is different from the
previous example in that the clear text cannot be reconstructed from the displayed data.

This is useful in situations where it is only necessary to display a portion of the data. A
good case for this method is the receipts printed at gas stations and convenience
stores. When a purchase is made with a credit card, the last 4 digits of the credit are
often displayed as clear text, while the rest of the credit card number has been masked
with a series of X’s.

Slop n Slurp 1 Stop Shop


5/25/2000 8:53 P.M.

Football Burrito 1 2.49 2.49


Premium Gasoline 12.5 1.699 21.24
=====
23.73

AMEX 2/02 XXXX-XXXXXX-65498

This method can also be used for reports where the person reading the report requires
only a portion of the sensitive data. This method is also commonly used for the account
numbers on printed transactions from ATM’s.

Encryption
Ok, you’ve decided that your data is too sensitive to trust to anything other than a real
encryption algorithm. You must use some form of encryption to give reasonable
assurance that sensitive information will be hidden from anyone other than those
entitled to it.

The choices of algorithms and implementation methods you can make use of are
dependent on the version of Oracle that your database uses. All solutions discussed
here will be either home grown software, or solutions provided with the Oracle RDBMS,
and operate inside of the database.

If you need to encrypt data in a version 7 database, your choices are somewhat limited.
The only solution outside of third party software is to use an external program to encrypt
and decrypt data.

While a program written in C could do this very quickly, it precludes the ability to encrypt
and decrypt data in a SQL statement, limiting the usefulness of encryption.

The only solution within the database is to write an encryption routine in PL/SQL.

This works well for limited amounts of data, but may be unwieldy for large amounts of
data. This is because PL/SQL is quite a bit slower than a language such as C for this
type of operation.

Timings of various encryption methods are shown in Table 1.

TABLE 1

Cipher Method UPDATE SELECT SELECT


COUNT(*)
Baseline Queries 0.28 1.41 0.03
RC4 in PL/SQL 15.83 17.87 15.59
RC4 in C 0.74 1.96 0.56
DBMS_OBFUSCATION_TOOLKIT 0.82 1.88 0.53
OBFS in PL/SQL 0.65 1.88 0.48

Timings for various methods of obscuring and encrypting data within a


database.

Test platform is a Sun E4500 with 2 Gigabytes of RAM and 4 CPU’s.


Operating System is Solaris 2.6
Database is Oracle 8.1.6

Testing is done on a 1000 row table as shown.

Table CRYPT_TEST
Column Name Type
------------ ----------------------------
SOC_SEC_NBR VARCHAR2(9)
CIPHER_SSN VARCHAR2(16)

All timings are in seconds.


The encryption method used in this case is RC41, a very fast stream cipher. It is fast in
relation to other encryption methods, but as you will see in table 1, it is still rather slow
in PL/SQL.

The complete PL/SQL source code is too lengthy to include in this article, but I will show
the package header. The complete source code with examples can be found at the
URL at the end of the article.

create or replace package rc4


is

-- returns text encrypted with 'return_key'


-- client must decrypt
function crypt_secure (
plain_text_in varchar2,
key_in varchar2,
return_key_in varchar2
) return varchar2;

pragma restrict_references( crypt_secure, WNDS);

-- does not encrypt returned text


function crypt_clear (
plain_text_in varchar2,
key_in varchar2
) return varchar2;

pragma restrict_references( crypt_clear, WNDS);

-- returns text encrypted with 'return_key'


-- client must decrypt
procedure crypt_secure (
cipher_text_inout in out varchar2,
key_in varchar2,
return_key_in varchar2
);
pragma restrict_references( crypt_secure, WNDS);

-- does not encrypt returned text


procedure crypt_clear (
cipher_text_inout in out varchar2,
key_in varchar2);
pragma restrict_references( crypt_clear, WNDS);

end rc4;

As of Oracle 8, it is possible to write external library routines in C, and call them from
within a SQL statement or PL/SQL block. A share library native to the Os must be
created. With NT this would be a DLL. In many versions of Unix this is a .so file. This
method will provide much faster results than a pure PL/SQL implementation.

External libraries are somewhat difficult to setup however. When an external callout is
made, Oracle starts a dedicated process for each session to interface to the shared
1
RC4 is a very fast stream cipher created by Ronald Rivest, one of the founders of RSA
library. The listener.ora, sqlnet.ora and tnsnames.ora files must all be configured
properly to launch this process.

The SQL*Net modifications that worked for this on our Solaris 2.6 with an Oracle 8.1.6
database are included in a README file that is available at the URL at the end of this
article.

For my tests, I again used the streaming cipher RC4, this time coded in C. The
performance of this test was 24 times faster than RC4 as implemented in PL/SQL.
1000 rows of data were update in .75 seconds, as compared to about 18 seconds for
the PL/SQL version of RC4.

While this is quite a bit faster, I was actually expecting better performance than that.
More on that later.

There is though one caveat on this RC4 implementation. While writing this article, I
discovered that my C implementation of RC4 as modified for use with Oracle is less
than perfect.

It has the nasty habit of occasionally truncating the encrypted data at some point. So I
have to make the disclaimer that this code is FOR DEMONSTRATION PURPOSES
ONLY.

Its purpose is to demonstrate the concept of using an external library to encrypt data in
the database, and for comparing the performance of this method to other methods.

This source code is also posted on my web site. It will include a notification stating if
the bug still exists.

Here is the PL/SQL portion of this RC4 implementation.


create or replace library librc4p as
'/home/jkstill/articles/encryption/rc4/c/rc4/rc4p.so

create or replace package crypt


is
function rc4( key_in varchar2, data_in varchar2 ) return varchar2;
-- pragma not required in 8.1 databases
-- pragma restrict_references( rc4, wnds, rnds, wnps );
end;

create or replace package body crypt


is

function rc4p( key_in varchar2, data_in in out varchar2 )


return pls_integer is external
library librc4p
name "rc4p"
language C;

function rc4( key_in varchar2, data_in varchar2 ) return varchar2


is
dummy pls_integer;

v_data varchar2(500);
clear_len integer;
crypt_len integer;

begin

v_data := data_in;
clear_len := length(v_data);
dummy := rc4p( key_in, v_data );

return v_data;

end;

end;

Another encryption choice is available as of version 8.1.6 of the Oracle RDBMS.

That is the DBMS_OBFUSCATION_TOOLKIT. This package is an interface to a 40 bit


implementation of the DES2 algorithm. While 40 bit encryption is not exactly state of the
art, it is probably sufficient for most applications. It is subject to other restrictions as well
due to U.S. export laws. This package cannot be used to encrypt already encrypted
data, as that is one of the export restrictions imposed at the time this package was
released. The Oracle8i Application Developers Guide contains more information.

Here is the package header:

CREATE OR REPLACE PACKAGE dbms_obfuscation_toolkit AS

---------------------- FUNCTIONS AND PROCEDURES -------------


---------------------------- DATA ENCRYPTION ----------------
-- The following routines encrypt and decrypt data.
-- There are two versions of each procedure: one for raw data
-- and the other for strings.
-------------------------------------------------------------

PROCEDURE DESEncrypt(input IN RAW,


key IN RAW,
encrypted_data OUT RAW);

PROCEDURE DESEncrypt(input_string IN VARCHAR2,


key_string IN VARCHAR2,
encrypted_string OUT VARCHAR2);

PROCEDURE DESDecrypt(input IN RAW,


key IN RAW,
decrypted_data OUT RAW);

PROCEDURE DESDecrypt(input_string IN VARCHAR2,


key_string IN VARCHAR2,
decrypted_string OUT VARCHAR2);
2
Data Encryption Standard
END dbms_obfuscation_toolkit;

One of the chief advantages of using this package is that it does not require any of the
setup that is required for using external libraries. You must be using at least version
8.1.6 of the Oracle RDBMS to make use of this package.

Creating it is as simple as logging into the database as SYS and running the package
$ORACLE_HOME/rdbms/admin/catobtk.sql.

It is also a good performer. This implementation of DES was able to update 1000 rows
in about 0.8 seconds in testing. Even though it is fast, the performance was still a
concern. This can be seen by the timing results in Table 1.

This performance of DBMS_OBFUSCATION_TOOLKIT may be fine for many


applications, but would be a cause for concern where large amounts of data are
processed in a single transaction, such as in a Data Warehouse.

Doing an UPDATE of 1000 rows, and encrypting a single column in each row, the DES
encryption took about three times longer to do the update than if no form of encryption
was used.

During SELECT’s though, the performance difference was much less noticeable, as
decrypting a column added about 40% to the time required for a query without any
encryption.

My motivation for using RC4 to encrypt data was to increase the performance of
updating encrypted data. RC4 is about 10 times faster than DES, yet the results show
that RC4 has a very small performance advantage over DES, at least as implemented
in an Oracle database.

This difference leads me to believe that most of the extra overhead in encrypting data is
not in the actual encryption routine, but is in the interface between the Oracle engine
and the encryption routines. I haven’t attempted to prove this yet, but the observed
performance during testing suggests it.

One small advantage that RC4 has over DES is that data can be encrypted in place in a
table. In the text below Table 1 you will see the encrypted column is larger than the
column for the clear text.

DES is a block cipher, and Oracle’s implementation of it requires that the length of the
data be a multiple of 8. The functions I used to interface to the DES routines will pad
the incoming data so that the length is a multiple of 8. Therefore, the size of any
encrypted column will need to be rounded up to the next multiple of 8.
i
Armed with these few tools, this article can serve as springboard for further exploration
of encryption in the database.

This article is not exhaustive by any means. There is a lot of background material
available for cryptography if you are interested in learning more about this.

Web sites:
http://www.mach5.com/crypto
http://www.counterpane.com/sites.html

The definitive source for making application of cryptography is the book Applied
Cryptography by Bruce Schneier.

Source code for this article can be found at


www.jaredstill.com/downloads/obfuscation_demo.tar.gz
i
The encryption keys used here have been short. For real security, a longer key should be used. A minimum key length
should be 56 bit. The DBMS_OBFUSCATION_TOOLKIT currently works with 56 bit keys only.

Vous aimerez peut-être aussi