Ott-03-0035 Unicode and C Business Functions

HowTo
Title: Unicode and C Business Functions

Abstract: Beginning with PeopleSoft EnterpriseOne 8.9, Unicode is fully integrated within the software. This document provides an overview of unicode, its
impact, and required changes to C business functions.
What is Unicode?
The Unicode Standard is the universal character encoding standard used for representation of text for computer processing. Unicode provides a unique
number for every character, regardless of the platform, the program, or the language. The Unicode Standard assigns each character a unique numeric value
and name. However, the Unicode Standard does not define glyph images; a visual representation of the character.
There are a few encoding standards, UTF-8, UTF-16 etc. Unicode usually utilizes two bytes per character. This allows for only 64 thousand characters.
Unicode has a mechanism called “surrogates”, using pairs of two bytes to describe characters outside the 64k. This can describe an additional one million
characters. Currently there are about 40 thousand characters in the surrogate area.
By default PeopleSoft EnterpriseOne will assume UCS2 encoding and treat each half of a surrogate as a separate character. 0x00 is a valid byte in a
character. For example, the letter ‘A’ is described as 0x00 0x41. This means normal string functions such as strlen() and strcpy will not work with Unicode
data.
Changes
New Data Types
Because each Unicode character requires two bytes, the char data type is inadequate in c business functions.
● JCHAR, PJSTR, PCJSTR are used for Unicode character.

● ZCHAR, PZSTR, PCZSTR for Non-Unicode character. These are for code that needs to interface with non-Unicode APIs.
To avoid confusion, char will no longer be available as a type. Only JCHAR (Unicode character) and ZCHAR (non-Unicode character) are available.
Byte - Byte storage, is an existing data type that needs to be used in places where the char data type is currently used, but non-character data is stored, byte
file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (1 of 11)07/02/2007 10:47:40 AM
HowTo
masks for example. It might or might not refer to an array of bytes.
It is recommended to convert to Unicode and use Unicode (double-byte) in preference to 8-bit characters wherever possible.
New Macros
Two new macros must be used to define string and character literals:
● _J(“Hello”) – for Unicode string literals.

● _Z(“Hello”) – for non-Unicode string literals.
● _J(‘A’) – for Unicode character literal.
● _Z(‘A’) – for non-Unicode character literal.
The former literals ‘x’ and “xxx” are no longer available except inside of _J or _Z. Note that _J and _Z arguments must be literals as they are
macros, not conversion functions.
The macro %ls must be used for format strings instead of %s to indicate a Unicode string. For example, sprintf (szString,“Hello %s\n”, szName); will need
to be changed to jdeSprintf (szString,_J (“Hello %ls\n”), szName);
The macro DIM() replaces sizeof()
New String Functions
There will be two versions of all string functions, one for Unicode; one for non-Unicode.
Naming standards:
● jdeSxxxxxx() – all Unicode string functions

● jdeZSxxxx() – all non-Unicode string functions
It is recommended that developers use the Unicode versions. The exception is when interfacing with non-Unicode APIs where the data needs to be
manipulated. Convert strings to Unicode at the earliest convenience and use them throughout. Use of traditional string functions such as strcpy, strlen, and
printf will no longer be allowed.
Replacement functions
Former New
strcpy() jdeStrcpy()
HowTo
strlen() jdeStrlen()
strstr() jdeStrstr()
sprintf() jdeSprintf()
strncpy() jdeStrncpy()
... ...
One note about jdeStrcpy(). This function name is already in use, therefore the slimer will change existing jdeStrcpy() to jdeStrncpyTerminate(). Going
forward, developers need to use jdeStrncpyTerminate() where they previously used jdeStrcpy().
New Math Functions
Because MATH_NUMERIC data structures' string member is not Unicode, there are new functions to access the member instead of directly going to
elements of the underlying data structure:
● jdeMathGetCurrencyCodeUNI (MATH_NUMERIC *pMn, JCHAR *szCurrencyCode); This gets the Currency Code for the numeric pMn into the
Unicode string szCurrencyCode.
● jdeMathSetCurrencyCodeUNI (MATH_NUMERIC *pMn, JCHAR *szCurrencyCode); This takes your Currency Code in Unicode and sets the
MATH_NUMERIC pMn appropriately.
The old function jdeMathGetRawString() will return a Unicode string, however it is not safe to use and will be obsolete in the future. Either use JCHAR *
jdeMathGetRawStringEx (MATH_NUMERIC* Value, JCHAR* Str); where you have pre-allocated Str; or use FormatMathNumeric();
Conversion Functions
To convert from a Unicode string to a non-Unicode string, and vice versa:
● jdeToUnicode (JCHAR* szUnicode, ZCHAR* szNonUnicode, DIM (szUnicode), NULL);

● jdeFromUnicode (ZCHAR* szNonUnicode, JCHAR* szUnicode, DIM (szNonUnicode), NULL);
The caller must allocate both buffers. The fourth parameter is a pointer to the code page to convert from or to. When NULL is passed, the Western European
code page will be used. This is what should be used unless some special conversion is intended.
To convert a single character value to Unicode and vice versa, use:
● JCHAR jdeToJCHAR (JCHAR* pChar, ZCHAR zChar, NULL);

● ZCHAR jdeToZCHAR (ZCHAR* zChar, PCHAR pChar, NULL);

HowTo
The character converted is returned through the first argument pointer and the function return value. If the first argument pointer is NULL, the character will
be returned only through the function call.
New Wrapper Functions
To simplify the use of system functions, such as fopen(), a number of wrapper functions have been created. For example:
fopen (“filename.txt”, “wb”); will be changed to jdeFopen (_J (“filename.txt”), _J (“wb”));
Similarly, wrapper functions have been created for non-Unicode strings as well. For example:
if szNonUni is a non-Unicode filename, jdeZFopen (szNonUni, _Z (”wb”)); should be used.
New Memset Functions
jdeMemset(), is a new memset function that sets character by character, rather then byte by byte. jdeMemset() takes a void pointer, a JCHAR and the
number of bytes to set. Example: use jdeMemset (buf, _J (‘ ‘), sizeof (buf)); to set the Unicode string buf so that each character is 0x0020.
New Flat File Functions
New flat file functions have been created to allow PeopleSoft EnterpriseOne to produce and consume encoded text flat files. For these APIs to work, setup
needs to be done using P93081- Work With Flat File Encoding. Available encoding names are stored in UDC H95/FE.
● jdeFwriteConvert(LPBHVRCOM lpBhvrCom, JCHAR *buf, jde_n_char size, size_t count, FILE *stream )
● jdeFreadConvert(LPBHVRCOM lpBhvrCom, JCHAR *buf, jde_n_char size, size_t count, FILE *stream)
● jdeFprintfConvert(LPBHVRCOM lpBhvrCom, FILE *stream, const JCHAR *format, /* [pointer,] */...)
● jdeFscanfConvert(LPBHVRCOM lpBhvrCom, FILE *stream, const JCHAR *format, /* [pointer,] */...)
● jdeFputsConvert(LPBHVRCOM lpBhvrCom, const JCHAR *buf, FILE *stream)
● jdeFgetsConvert(LPBHVRCOM lpBhvrCom, JCHAR *buf, jde_n_char n, FILE *stream)
● jdeFputcConvert(LPBHVRCOM lpBhvrCom, int c, FILE *stream)
● jdeFgetcConvert(LPBHVRCOM lpBhvrCom, FILE *stream)
● jdeGetEncodingNameV1(LPBHVRCOM lpBhvrCom, JCHAR *enc);
fprintf Examples:
a) Character data in file will be encoded the same as it’s encoded in memory:
FILE *fp;
fp = fopen( "c:/testBSFNZ.txt", "w+");
HowTo
fprintf(fp, “%s%d\n”, “Line ”, 1);

fclose(fp);
b) Data will be written to file in Western European code page. (jdeFprintf does a conversion from UCS2 to default Western European code page)
FILE *fp;
fp = jdeFopen(_J( "c:/testBSFNZ.txt"), _J("w+"));
jdeFprintf(fp, _J(“%s%d\n”), _J(“Line ”), 1);
jdeFclose(fp);
c) Data encoded in the file will be based on the encoding configured using P93081:
FILE *fp;
fp = jdeFopen(_J( "c:/testBSFNZ.txt"), _J("w+"));
jdeFprintfConvert(lpBhvrCom, fp, _J(“%s%d\n”), _J(“Line ”), 1);
jdeFclose(fp);
Converting to Unicode and Its Impact
Slimer
The slimer is an application that will convert on average 90% of the C code from pre-Unicode to Unicode. The remaining 10%, plus all future changes will
need to be performed by a programmer.
When To Use Number of Bytes vs. Characters
Because each Unicode character takes two bytes, you must pay special attention to when you need to specify the number of characters and when you need
to use the number of bytes when programming C business function.
In general, all APIs that use a string variable and its size should use character length, not byte length.
Functions that use a byte array (not necessarily a string), like jdeAlloc, should use byte lengths. If the array is actually a string, it is valid to use jdeStrlen(),
the array's length required by jdeAlloc has to be computed by jdeStrlen() * sizeof (JCHAR). This is critical when doing memory allocations. jdeAlloc
allocates a byte array, not necessarily a string, and so uses a byte count, not a string length:

HowTo
b = jdeAlloc(0, strlen(a) + 1, 0); will need to be changed to b = jdeAlloc(0, (jdeStrlen(a) + 1) * sizeof (JCHAR), 0);
On the other hand, all the jdeStrxxx functions explicitly handle strings, so character lengths are used, and the sizeof() operator, which returns a byte count,
becomes a problem. Example:
● When using strncpy() the third parameter is the number of characters, not the number of bytes.
● DIM() is a macro that gives the number of characters of an array, Unicode or otherwise.
● Given JCHAR a[10]; DIM(a) will return 10, while sizeof(a) will return 20.
● strncpy (a, b, sizeof (a)); needs to become jdeStrncpy (a, b, DIM (a));
Another area this can cause problems in array is subscripts: If code currently has
char a[10];
a[sizeof(a) – 1] = ‘\0’; /* a[9]='\0'; */
…it needs to be changed to:
JCHAR a[10];
a[DIM(a) – 1] = _J(‘\0’); /* a[9]=_J('\0'); */
Problem after sliming memset()
The Posix function memset(), changes memory byte by byte. For example, if buf is 10 bytes long, memset(buf, ’ ’, sizeof (buf)); will set the 10 bytes pointed
to by buf to the value 0x20 (on non-AS/400 machines, the value will be the EBCDIC value of a space on an AS/400).
This still holds true even if ‘ ’ is a Unicode ‘ ’ and has the hex value of 0x0020. This is because memset() truncates the second parameter to a single byte.
If you have code that seta a character array to all spaces using memset(buf, ’ ‘, sizeof(buf)); this would get slimed to memset(buf, _J(‘ ‘), sizeof(buf)); .
However, what actually happens is every byte of buf would get set to 0x20, which means the character buf[0] would be 0x2020, which is the Dagger
character (†) in Unicode.
The basic issue is that we need to use a Unicode character set function (one that sets character by character, rather then byte by byte). To solve this, a new
function, jdeMemset() takes a void pointer, a JCHAR and the number of bytes to set. Use jdeMemset (buf, _J (‘ ‘), sizeof (buf)); to set the Unicode string
buf so that each character is 0x0020. Fortunately, memset (buf, 0, sizeof (buf)); works as it always has. Note that the third argument for jdeMemset() is a
byte count, not character count.
Pointer Arithmetics
Code that currently casts a void* with (char*) to deal with pointer arithmetic in a byte array will need to be modified. The slimer will change the (char*) cast
HowTo
to a (JCHAR*) cast, which means any pointer arithmetic will be operating two bytes at a time. For example (lpVoid is a void*, and points to a structure, not
a string):
memcpy(pPointer, (char*)lpVoid + nOffset, nLen);
…gets slimed to:
memcpy(pPointer, (JCHAR*)lpVoid + nOffset, nLen);
This needs to be changed manually to:
memcpy(pPointer, (BYTE*)lpVoid + nOffset, nLen);
There can also be issues when using any memory functions such as memmove, which are all byte array, not string, functions: For example: given the slimed
code
JCHAR* source;
memmove(destination, source, 6);
the statement needs to be changed to
memmove(destination, source, (6 * sizeof(JCHAR)));
because memmove takes the number of bytes and in this example the source is a string. In Unicode, the 6 characters will take up 12 bytes.
Again, memxxx functions are byte (integer) oriented and not designed to handle character data. If the source is always a string, conversion to the appropriate
jdeStrxxx function is recommended.
Byte ordering
When sending data across the network to a different platform, the byte order of character data must be taken into account. Unicode characters are unsigned
shorts, so the byte order now matters.
Cache Keys
String cache keys use the number of characters for the size so use DIM instead of sizeof().
IndexS->CacheKey[0].nOffset = offsetof(I09UI003, glaid);

HowTo
/** IndexS->CacheKey[0].nSize = sizeof(dsI09UI003.glaid); **/

IndexS->CacheKey[0].nSize = DIM(dsI09UI003.glaid);
IndexS->CacheKey[0].idDataType = EVDT_STRING;
When the key is a single character, hard code the nSize = 1 because DIM only works with a character array.
Debugging using Visual C++
To be able to read Unicode strings do the following: Go to the “Tools” menu, select the “Options” entry, click on the “Debug” tab. Make sure the “Display
Unicode strings” checkbox is checked.
Slime Examples and Manual Correction
Example 1
Original code Post slimed code Manually corrected code

/* Original code */ /* Post slimed code */ /* Manually corrected code */
int someFcn (char* a) int someFcn (JCHAR* a) int someFcn (JCHAR* a)
{ { {
char szString[10]; JCHAR szString[10]; JCHAR szString[10];
char* szCopy; JCHAR *szCopy; JCHAR *szCopy;
unsigned short nLen; unsigned short nLen; unsigned short nLen;
nLen = sizeof (szString) nLen = sizeof (szString); nLen = DIM (szString);

strncpy (szString, a, nLen); jdeStrncpy (szString, a, nLen); jdeStrncpy (szString, a, nLen);
... ... ...
Example 2

HowTo
/* Original code */ /* Post slimed code */ /* Manually corrected code */

int someFcn (char* a) int someFcn (JCHAR* a) int someFcn (JCHAR *a)
{ { {
char *szCopy; JCHAR *szCopy; JCHAR *szCopy;
unsigned short nLen; unsigned short nLen; unsigned short nLen;
nLen = strlen (a) + 1; nLen = jdeStrlen (a) + 1; nLen = jdeStrlen (a) + 1;

szCopy = (char*) jdeAlloc (0, nLen, 0); szCopy = (JCHAR*) jdeAlloc (0, nLen, 0); szCopy = (JCHAR*) jdeAlloc (0, nLen*sizeof(JCHAR), 0);
strcpy (szCopy, a); jdeStrcpy (szCopy, a); jdeStrcpy (szCopy, a);
return 0; return 0; return 0;

} } }
Example 3

/* Original code */ /* Manually corrected code*/
/* Slimed code */
int someFileFcn (char* szDir) int someFileFcn (JCHAR* szDir)
int someFileFcn (JCHAR* szDir)
{ {
{
char szFileName[] = “filename.txt”; JCHAR szFileName[] = _J(“filename.txt”);
JCHAR szFileName[] = _J(“filename.txt”);
char* szFullFile = NULL; JCHAR *szFullFile = NULL;
JCHAR *szFullFile = NULL;
FILE* fp = NULL; FILE *fp = NULL;
FILE *fp = NULL;
szFullFile = (char*) jdeAlloc (0, szFullFile = (JCHAR*) jdeAlloc (0,
szFullFile = (JCHAR*) jdeAlloc (0,
strlen (szDir) + strlen (szFileName) + 1, 0); (jdeStrlen (szDir)+
jdeStrlen (szDir) +
sprintf (szFullFile, “%s%s”, szDir, szFileName); jdeStrlen (szFileName) + 1)
jdeStrlen (szFileName) + 1, 0);
fp = fopen (szFullFile, “wb”); * sizeof (JCHAR), 0);
jdeSprintf (szFullFile, _J(“%ls%ls”), szDir, szFileName);
... jdeSprintf (szFullFile, _J(“%ls%ls”), szDir, szFileName);
fp = jdeFopen (szFullFile, _J(“wb”));
fp = jdeFopen (szFullFile, _J(“wb”));
...
...
Example 4

HowTo
/* Manually corrected code */

/* Original code */ /* After slimer */
int someFcn()
int someFcn() int someFcn()
{
{ {
JCHAR szString[10] = {0};
char szString[10] = {0}; JCHAR szString[10] = {0};
/* Set to blank */
/* Set to blank */ /* Set to blank */
jdeMemset (szString, _J(‘ ‘), sizeof (szString)
memset (szString, ‘ ‘, sizeof (szString) - 1); memset (szString, _J(‘ ‘), sizeof (szString) - 1);
- (1 * sizeof (JCHAR)));
... ...
...
Example 5

/* Fix for Macro defined size */ /* Fix for Macro defined size */
/* Fix for Macro defined size */
/* After sliming: */ /* Manually corrected code: */
#define ACCOUNT_LENGTH 10
#define ACCOUNT_LENGTH 10 #define ACCOUNT_LENGTH 10
/* this is # of chars */
/* this is # of chars */ /* this is # of chars */
int someFcn()
int someFcn() int someFcn()
{
{ {
char szString [ACCOUNT_LENGTH];
JCHAR szString[ACCOUNT_LENGTH]; JCHAR szString[ACCOUNT_LENGTH];
/* Set to NULL */
/* Set to NULL */ /* Set to NULL */
memset (szString, 0, ACCOUNT_LENGTH);
memset (szString, 0, ACCOUNT_LENGTH); memset (szString, 0, ACCOUNT_LENGTH * sizeof (JCHAR));
...
... ...
Resources
Unicode Guidelines
Using CodeChangeCom
Development Standards for Business Function Programming
Flat File Guide
For additional information on Unicode refer to the Unicode web site at http://www.unicode.org

HowTo

Ott-03-0035 Unicode and C Business Functions

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ott-03-0035 Unicode and C Business Functions

Transféré par

Droits d'auteur :

Formats disponibles

HowTo

Title: Unicode and C Business Functions

New Data Types

● JCHAR, PJSTR, PCJSTR are used for Unicode character.

masks for example. It might or might not refer to an array of bytes.

● _J(“Hello”) – for Unicode string literals.

The macro DIM() replaces sizeof()

New String Functions

● jdeSxxxxxx() – all Unicode string functions

New Math Functions

To convert from a Unicode string to a non-Unicode string, and vice versa:

● jdeToUnicode (JCHAR* szUnicode, ZCHAR* szNonUnicode, DIM (szUnicode), NULL);

To convert a single character value to Unicode and vice versa, use:

● JCHAR jdeToJCHAR (JCHAR* pChar, ZCHAR zChar, NULL);

file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (3 of 11)07/02/2007 10:47:40 AM

New Wrapper Functions

fopen (“filename.txt”, “wb”); will be changed to jdeFopen (_J (“filename.txt”), _J (“wb”));

if szNonUni is a non-Unicode filename, jdeZFopen (szNonUni, _Z (”wb”)); should be used.

New Memset Functions

New Flat File Functions

fprintf(fp, “%s%d\n”, “Line ”, 1);

Converting to Unicode and Its Impact

When To Use Number of Bytes vs. Characters

file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (5 of 11)07/02/2007 10:47:40 AM

…it needs to be changed to:

Problem after sliming memset()

memcpy(pPointer, (char*)lpVoid + nOffset, nLen);

…gets slimed to:

memcpy(pPointer, (JCHAR*)lpVoid + nOffset, nLen);

This needs to be changed manually to:

memcpy(pPointer, (BYTE*)lpVoid + nOffset, nLen);

the statement needs to be changed to

memmove(destination, source, (6 * sizeof(JCHAR)));

IndexS->CacheKey[0].nOffset = offsetof(I09UI003, glaid);

/** IndexS->CacheKey[0].nSize = sizeof(dsI09UI003.glaid); **/

Debugging using Visual C++

Slime Examples and Manual Correction

Original code Post slimed code Manually corrected code

nLen = sizeof (szString) nLen = sizeof (szString); nLen = DIM (szString);

Original code Post slimed code Manually corrected code

file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (8 of 11)07/02/2007 10:47:40 AM

/* Original code */ /* Post slimed code */ /* Manually corrected code */

nLen = strlen (a) + 1; nLen = jdeStrlen (a) + 1; nLen = jdeStrlen (a) + 1;

return 0; return 0; return 0;

Original code Post slimed code Manually corrected code

Original code Post slimed code Manually corrected code

file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (9 of 11)07/02/2007 10:47:40 AM

/* Manually corrected code */

Original code Post slimed code Manually corrected code

Development Standards for Business Function Programming

Flat File Guide

file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (10 of 11)07/02/2007 10:47:40 AM

file:///C|/Notes/pdf/computer-erp-jde-solution-ott-03-0035-Unicode_and_C_Business_Functions.htm (11 of 11)07/02/2007 10:47:40 AM

Vous aimerez peut-être aussi

/ IndexS->CacheKey[0].nSize = sizeof(dsI09UI003.glaid); /

/* Original code / / Post slimed code / / Manually corrected code */