Vous êtes sur la page 1sur 22

MC Press Online

RPG Has SAX Appeal!


Contributed by Jon Paris Tuesday, 04 March 2008 Last Updated Tuesday, 04 March 2008

In this part of our RPG XML series, you'll learn how to use RPG's XML-SAX op-code to deal with problematic XML documents and handle situations that XML-INTO cannot deal with.

By Jon Paris

In the previous two articles in this series, "%Handling XML-INTO Problems" and "i5/OS Offers Native XML Support in V5R4", we focused on the capabilities of RPG's XML-INTO. As we saw, this op-code processes an entire document, either as a single piece or, when needed or desired, in "chunks" by using the capabilities of the %HANDLER BIF. There are, however, situations when this will not work for you. This often relates to limitations in RPG's data structure (DS) capabilities. As you know, a named DS is limited to a maximum size of 64K (at least until V6R1 anyway). Suppose that even a single repeating element will not fit into this? That may sound unlikely, but it doesn't take a huge number of repeating text fields to exceed this limit. Another example, and one that seems to occur quite often, arises when your XML document contains a structure that simply cannot be represented in an RPG DS. To illustrate this, take a look at the new version of our XML document, shown below:

In this part of our RPG XML series, you'll learn how to use RPG's XML-SAX op-code to deal with problematic XML documents and handle situations that XML-INTO cannot deal with.

In the previous two articles in this series, "%Handling XML-INTO Problems" and "i5/OS Offers Native XML Support in V5R4", we focused on the capabilities of RPG's XML-INTO. As we saw, this op-code processes an entire document, either as a single piece or, when needed or desired, in "chunks" by using the capabilities of the %HANDLER BIF. There are, however, situations when this will not work for you. This often relates to limitations in RPG's data structure (DS) capabilities. As you know, a named DS is limited to a maximum size of 64K (at least until V6R1 anyway). Suppose that even a single repeating element will not fit into this? That may sound unlikely, but it doesn't take a huge number of repeating text fields to exceed this limit. Another example, and one that seems to occur quite often, arises when your XML document contains a structure that simply cannot be represented in an RPG DS. To illustrate this, take a look at the new version of our XML document, shown below:

<Products>

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

<Category Code="02">

<CatDescr>Toasters</CatDescr>

<Product Code="1234">

(A) <Description type="short">Two slot chrome</Description>

(B) <Description type="long">This beautiful two slot chrome finished toaster is

a perfect complement to any modern kitchen ...</Description>

<MSRP>22.95</MSRP>

<SellPrice>15.95</SellPrice>

<QtyOnHand>247</QtyOnHand>

</Product>

<Product Code="2345">

<Description type="short">Four slot matt black</Description>

<MSRP>35.75</MSRP>

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

<SellPrice>23.95</SellPrice>

<QtyOnHand>247</QtyOnHand>

</Product>

</Category>

<Category Code="14">

<CatDescr type="short">Coffee Makers</CatDescr>

<Product Code="9876">

<Description>10 cup auto start</Description>

It is substantively the same as in our previous examples, but with one very significant exception: The <Description> element can now be repeated. If that were the only difference, then we could accommodate it by adding a DIM( ) keyword to the element's definition in the DS. But notice that not only does the element repeat, but there is also a new attribute, type, which is used to indicate the type of description (short or long) that is being defined. This presents us with a problem. Since an attribute is treated in the same way as a child element of the parent, the correct RPG definition for "type" would be this:

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

d description

DS

d type

5a

But this leaves us with nowhere to put the content of the description since the content of a DS is the sum of its subfields and any data placed there would overwrite those subfields. In other words, in our situation, the description would overwrite the type field (or vice versa). Not a lot of help! In theory, a DS that looks like the one below should solve the problem:

d description

DS

Qualified Dim(2)

d description

1000a Varying

d type

5a

In this case, the <Description> would be stored in the field description.description and the "type" attribute would be stored in description.type. Makes sense, doesn't it? Maybe to you, but sadly, not to the compiler.

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

IBM is aware of this deficiency, and it is on their "to-do" list, but don't expect to see it in V6R1. And don't hold me to it working the way I have described it here; IBM may well have other ideas.

So if we cannot create a DS that matches the structure of the XML data, then we cannot use XML-INTO or at least cannot use it for the whole task. So what are our options?

There are effectively three options:

- The first is to take advantage of RPG's XML-SAX op-code. This can be used either by itself to process the entire document or as a follow-on to an XML-INTO parse to "fill in the gaps." We will be dealing with the usage of XML-SAX in the balance of this article.

The second is to reformat the document by using an XSL transform so that it is in a format that can be expressed in RPG terms. This is the approach recommended in the IBM Redbook The Ins and Outs of XML and DB2 UDB for i5/OS. If you have the required XSL skills or are prepared to develop them, this is certainly a valid option and can also help to deal with other issues, such as empty elements. Since the Redbook provides a good working example, we won't duplicate that work here.

- Another option would be to process the document in two passes using XML-INTO with a different target DS on each pass. You would also need to use the "AllowExtra" and "AllowMissing" processing options in order to persuade the parser to handle the document since neither of the DSs will exactly match the document. This is not as effective as the XML-SAX option, so we will not be discussing it further. XML-SAX

The operation of XML-SAX is very different from that of XML-INTO. XML-INTO parses the data from many elements at a
http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

time and places the parsed content into the appropriate field in the target DS or array. XML-SAX on the other hand parses the document one event at a time. Examples of events include the beginning of an element (i.e., its starting tag), the value of an element, the end of an element (i.e., its ending tag), the name of an attribute, the value of the attribute, etc.

With XML-INTO, the use of a handler procedure is optional, but with XML-SAX %HANDLER must always be specified. Your handler procedure will be called for every event that the parser encounters. It is up to your logic to decide if it should simply ignore the event or react to it in some way.

Logic is needed in the handler to recognize and react to the beginning of each element and attribute and to store the values in the appropriate places. You will perhaps get a better idea of the kind of logic that might be required if you study the list below. It represents the sequence of events and the associated data (in parentheses) that would be passed to the handler when processing the section of the XML document that begins at (A) above and ends at (B).

&bull;

Start Element (description)

&bull;

Attribute Name (type)

&bull;

Attribute Characters (short)

&bull;

End Attribute (type)

&bull;

Element Characters (two-slot chrome)

&bull;

End Element (description)


Powered by Joomla! Generated: 29 August, 2008, 00:07

http://www.mcpressonline.com

MC Press Online

Notice that when we receive the element and attribute data, we have no idea which element/attribute it belongs to. That is up to us to determine. In fact, this is not a difficult task as the data will always belong to the last element/attribute that began but has not yet ended. With so many events being signaled to your handler, you can no doubt see that writing the logic to completely process even a simple document with XML-SAX would be somewhat tedious, requiring a lot of rather repetitive code. Luckily, we rarely require all of the data in a document, and we also have the option to combine XMLSAX with XML-INTO to simplify our task.

So to handle the situation in our example, that is what we will do. We will use XML-INTO to capture the bulk of the data and then process again using XML-SAX to fill in the missing piece: the type codes associated with the descriptions.

Let's look at the code that achieves this (shown at the end of this article).

The first thing to notice is the change in the product DS (A). Notice that we have made the description field an array with two elements and also added the type field as a two-element array. Note that the name of the type field in the DS (descrType) does not match the name of the attribute (type) to ensure that XML-INTO will not try to populate it and to make that fact more obvious to those who come after us. In fact, there is no need to actually include the type in the DS at all, but it is convenient to keep all the data together.

The XML-INTO must have the "allowextra=yes" option specified (B) to accommodate the extra type fields. Without this option, the parse would fail since the new version of the DS no longer corresponds to the XML document. Once XMLINTO has completed, we invoke XML-SAX (C) to reprocess the document.

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

There is no difference in the definition of %HANDLER, but there is a difference between the information passed to an XML-SAX handler and the information passed to the XML-INTO handler we saw in the last article. Take a look at the prototype at (D) and you will see what I mean. The only parameter that is common to the two versions is the first one, the Communication Area. The remaining parameters are as follows:

&bull; event is a four-byte integer that identifies the type of event being processed. Don't worry about the fact that the event is identified by a number. As you will see later, RPG supplies a number of named constants that can be compared with the event value.

&bull; data).

pstring is a pointer to the beginning of the string containing the event data (e.g., the element/attribute names or

&bull; stringLen is the length of the string "pointed to" by the previous parameter. This length must be used to determine if data is present as there are occasions when a valid pointer is passed even though there is no data. Only the number of characters indicated by this parameter should be processed.

&bull; exceptionId is an error code identifying any error passed to the handler by the parser. We will not be discussing this in this article. Check the RPG manuals for more information.

Having seen the parameters passed to the handler, it is time to study the mechanics of the handler procedure
http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

MySAXHandler. The first step (E) is to check whether any data was received. If no data is received, then the handler simply returns control to the parser. If data is present, then the procedure RmvWhiteSpace( ) is called to remove any unwanted characters and reduce them to a single space. We will look at what I mean by "unwanted" in a moment. Notice that %SUBST is used to pass only the valid portion of the data to the subprocedure. Remember, we were passed only a pointer and a length, and there is probably other data beyond the point indicated by the length parameter. It is worth noting at this point that the field string, which is based on the pointer, can be very useful during debug. If you display it, you will usually be able to see not only the data you are about to process, but also the next part of the XML document. In other words, you will know what to expect next and can perhaps set appropriate breakpoints. This is not guaranteed as sometimes the pointer references a work area, but it is worth remembering.

What do we mean by "unwanted" and why do we need the RmvWhiteSpace routine? Because carriage returns, new lines, tabs, and excess spaces are often present in XML data (sometimes to make it look "pretty"), and we need to remove them from the data. We will not be studying the detail of this procedure, but you will find it included in the version of the program that is available for download. Hopefully, its operation is self-explanatory. (Many thanks to IBM Toronto's Barbara Morris for supplying this routine.)

At (F), the real work begins. A SELECT group is used to identify the type of event we are handling; this is where the named constants mentioned earlier come into play. For example, *XML_START_ELEMENT represents the event code that announces the arrival of a new element name. In the SELECT group at (G), we then identify the specific element that we are dealing with and process accordingly. All this logic is really doing is setting up the appropriate array indices for the Category, Product, and Description arrays. Since we know that the document we are processing is the same one that we just parsed with XML-INTO, we can afford to short-circuit the process, so no attempt is made to match the product codes with the descriptions or anything.

If the event does not represent the beginning of an element, then we next test to see if it is an attribute name (H). If it is, we check to see if it is the type attribute, and if so, we turn on the waitingForType indicator. This indicator allows us to associate the attribute data when it arrives (I) as belonging to the type attribute. Remember, we said earlier that it is up to us to determine that. We then store the value for the type attribute in the appropriate descrType array element.

After processing the document, the XML-SAX parse completes and control returns to the program's main line at (J). At this point, the complete content of the XML document has been stored in our category DS, so our program can process or store that data as necessary. In this simple example, we will just display the data. The logic simply loops through all of the categories and products. As in our previous example, the category loop is controlled by the RPG-supplied
http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

xmlElements count in the Program Status Data Structure, which was populated by the XML-INTO operation, and the product loop completes when a blank product code is encountered. The format of our XML document is such that there must be a short description, so the first elements of the description and type arrays are displayed. At (K), the logic then tests to see if a second set is present and, if it is, displays the relevant data.

And that's really all there is to it. I won't describe it here, but I have included in the source code accompanying this article a utility program (XMLSAXLIST) that you might find useful when studying XML documents that you need to process. It uses XML-SAX to parse the document and produces a listing of all the events signaled and the length and content of the associated data. If you run the program, you will be able to see the effect of the RmvWhiteSpace procedure as the original length of the data item is included. If you have any questions about the operation of the program, please let me know.

H Option(*NoDebugIO : *SrcStmt )

// This count is populated by XML-INTO whenever the INTO

// variable is an array

D progStatus

SDS

D xmlElements

20i 0 Overlay(progStatus: 372)

(D) D MySAXHandler
http://www.mcpressonline.com

Pr

10i 0
Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

D commArea

Like(dummyCommArea)

D event

10i 0 Value

D pstring

Value

D stringLen

20i 0 Value

D exceptionId

10i 0 Value

D RmvWhitespace pr

65535a Varying

D input

65535a Varying Const

D category

DS

Qualified Dim(20)

D code

2a

D catDescr

20a

D product
http://www.mcpressonline.com

LikeDS(product) Dim(50)
Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

D product

DS

Qualified

D code

4a

(A) D descrType

5a Dim(2)

D description

600a Dim(2)

D mSRP

7p 2

D sellPrice

7p 2

D qtyOnHand

5i 0

D XML_Source

256a Varying

Inz('/Partner400/XML/Example5.xml')

// Short version of Description for display purposes


http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

D dispDescription...

40a

D dummyCommArea S

1a

Di

5i 0

Dp

5i 0

/Free

(B)

XML-INTO category

%XML(XML_Source: 'case=any doc=file allowextra=yes +

allowmissing=yes');

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

// XML-INTO has filled the category array

// Next we use XML-SAX to fill in the missing type details

(C)

XML-SAX %HANDLER(MySAXHandler: dummyCommArea)

%XML(XML_Source: 'doc=file');

Dsply ('xmlElements = ' + %char(xmlElements) );

// The XML parser's element count is used to control the loop

(J)

For i = 1 to xmlElements;

Dsply ('Cat: ' + category(i).code + ' ' +

category(i).catDescr );

For p = 1 to %Elem(category.product);

If category(i).product(p).code = *Blanks;
http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

Leave;

// Exit once blank product code entry located

Else;

// Process the current product entry

dispDescription = category(i).product(p).description(1);

Dsply ('Product: ' + dispDescription);

Dsply ('Type: ' + category(i).product(p).descrType(1));

// If second description is present, display details

(K)

If category(i).product(p).description(2) <> *Blanks;

dispDescription = category(i).product(p).description(2);

Dsply ('Product: ' + dispDescription);

Dsply ('Type: ' + category(i).product(p).descrType(2));

EndIf;
http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

EndIf;

EndFor;

EndFor;

*InLR = *On;

/End-Free

// SAX handler

P MySAXHandler
http://www.mcpressonline.com

B
Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

PI

10i 0

D commArea

Like(dummyCommArea)

D event

10i 0 Value

D pstring

Value

D stringLen

20i 0 Value

D exceptionId

10i 0 Value

D string

65535a

Based(pstring)

D data

65535a

Varying

// Static variables used by handler logic

D catIndex
http://www.mcpressonline.com

10i 0 Static
Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

D prodIndex

10i 0 Static

D descIndex

5i 0 Static

D waitingForType S

Static

// Constants to identify the element and attribute

// names we are interested in.

D categorElem

'Category'

D prodElem

'Product'

D descrElem

'Description'

D typeAttr

'type'

/free

// If any data is supplied strip whitespace from it


http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

// otherwise just return to parser

(E)

If stringLen > 0;

data = RmvWhiteSpace(%subst(string : 1 : stringLen));

Else;

return 0;

endif;

(F)

Select;

When event = *XML_START_ELEMENT;

// Whenever we start a new element, we increment the index

// for that level and zero the index for the next level.

(G)

Select;
Powered by Joomla! Generated: 29 August, 2008, 00:07

http://www.mcpressonline.com

MC Press Online

When data = categorElem;

catIndex += 1;

prodIndex = 0;

When data = prodElem;

prodIndex += 1;

descIndex = 0;

When data = descrElem;

descIndex += 1;

EndSl;

(H)

When event = *XML_ATTR_NAME;

// Turn "waiting" indicator when beginning a "type" attribute

If data = typeAttr;
http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

MC Press Online

waitingForType = *On;

EndIf;

(I)

When event = *XML_ATTR_CHARS;

// If waiting for type information then store type

if waitingForType;

category(catIndex).product(prodIndex).descrType(descIndex)

= data;

waitingForType = *Off;

EndIf;

EndSl;

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

MC Press Online

return 0;

/end-free

http://www.mcpressonline.com

Powered by Joomla!

Generated: 29 August, 2008, 00:07

Vous aimerez peut-être aussi