www.krengeltech.com

Parsing XML With RXS

From Wiki

DOM-Based Parser

User Guide
v2.00+

As of version 2.00 of RPG-XML Suite, a new XML parser was introduced that makes use of a much simpler approach to XML parsing. This simpler approach doesn't require the RPG programmer to understand procedure pointers or “call backs” and instead allows you to access XML data using a one line sub procedure call as follows:


     myVar = RXS_DOMGetData('/PostAdr/name/first/');


This new approach to parsing can save the RPG programmer a lot of time.

You can learn more about these parsing API's by looking at the RXS_DOMBuild and related API.


Concerns and Limitations

  • For releases of the DOM Parser up to and including v2.3, there is a limit of 32,767 data elements that can be parsed using RXS_DOMBuild. A data element would be both element content values and attribute values. For most XML parsing needs you shouldn't have an issue. If you do have more than 32,767 data elements in a given XML document, you can use the traditional approach to parsing by implementing RXS_parse or you can upgrade to v2.4 of the DOM Parser, where this limitation no longer exists.
  • The event-based parser offers better performance, but is more complex.

Event-Based Parser

From an RPG programmer’s perspective, little about parsing XML documents is entirely new. Remember all those reports you wrote with an input primary read physical file? The RPG cycle read each record. You specified which section of code to execute each time one of the level break columns changed in value (i.e. maybe when the customer number changed you wanted to print totals). XML parsers use exactly the same mindset. The programmer tells the XML parser which data changes to act on as it parses the XML document. Instead of calling them level or control breaks, an XML parser describes them as "events" that occur during the parsing of an XML document.

Four types of events take place while parsing an XML document:

  1. Encountering the beginning of an element;
  2. Encountering an attribute of an element;
  3. Encountering the content of an element;
  4. Encountering the end of an element.

The following example includes each of those events.

XML Event Parsing diagram


The above example shows all of the events that will occur during the parsing of the XML document. Each event has a number associated with it to denote the order in which the XML parser will detect it. Basically, the parser will read from left to right, top to bottom.

Every time it encounters one of the entities it will detect an event. It uses four pieces of information to describe each event - Type of Event, XPath of the entity encountered, Data content, and Data Length. Note that the RXS_ELEMCONTENT data structure defined in the copybook named RXSCP stores constant variables which denote types of events:


 
     D RXS_ELEMCONTENT...
     D                 c                   const('/')
     D RXS_ELEMEND     c                   const('/>')
     D RXS_ELEMBEGIN   c                   const('>')
     D RXS_ATTR        c                   const('@')


The XPath of each event has its own ending character or characters. The XPath of an element beginning event has ‘>’ at the end. The XPath of a content element event has ‘/’ at the end. The xPath of an element ending event has the ‘/>’ characters. When the parser encounters an attribute, the XPath includes the ‘@’ sign followed by the attribute name.

What does knowledge of XML events profit us? It allows the programmer to tell the parser which events to act on. For each event an application needs to track, the programmer provides the Xpath of the element or attribute. Within tests of the XPath, the programmer specifies a subprocedure that the parser should call when it comes across a particular event. The following code demonstrates these principles:


     RXS_addHandler('/PostAdr/zip/': %paddr(zipHandler));


RXS_addHandler maps events to subprocedures. It “registers” subprocedures with the XML parser so it knows what to do when it encounters interesting events—in this case, the contents of the <zip> tag. The first parameter passed to RXS_addHandler is the XPath for <zip>'s contents (note the end forward slash). The second parameter is the address of the local subprocedure that the parser should use to handle the event when it encounters <zip>’s content during the parsing of the document. Using the RXS_ELEMCONTENT constant could help code reviewers understand the event the handler tells the parser to act on as in the following:


     RXS_addHandler('/PostAdr/zip' + RXS_ELEMCONTENT: %paddr(zipHandler));


The XML parser issues what’s termed a “call-back” to your program. That means the parser has the ability to call a subprocedure local to your program to notify your program of parser events.

Once an application registers all necessary handlers with the parser, the application can call the RXS_parse subprocedure:


     RXS_parse(gXml: RXS_VAR: %paddr(errHandler));


The first parameter, gXml, is a globally defined variable. In the example used so far, the value is the following XML:

  <PostAdr residential="true">
    <name title="Mr.">
      <first>Aaron</first>
      <last>Bartell</last>
    </name>
    <street>123 Center Rd</street>
    <cty>Mankato</cty>
    <state>MN</state>
    <zip>56001</zip>
    <phone>123-123-1234</phone>
    <phone>321-321-4321</phone>
  </PostAdr>


The second parameter, RXS_VAR, tells the parser to treat the value in the first parameter as raw XML data. The value RXS_STMF in the second position would tell the parser to treat the first parameter as the IFS path to a stream file containing XML. The third parameter of RXS_parse is a pointer to a local subprocedure named errHandler. The parser will call errHandler if it comes across XML that it cannot parse (because it is missing end tags, for example.


      //----------------------------------------------------------------
      // @Desc: Handle any errors that the parser encounters.
      //----------------------------------------------------------------
     P errHandler      B
     D errHandler      PI
     D  pCurLine                     10i 0 value
     D  pCurCol                      10i 0 value
     D  pErrStr                    1024a   value varying
      /free
       gError.code = 'PARSE1.1';
       gError.severity = 100;
       gError.pgm = 'PARSE.errHandler';
       gError.text =
         'Line:' + %char(pCurLine) +
         ' Column:' + %char(pCurCol) +
         ' ' + pErrStr;
      /end-free
     P                 E


Once an application invokes the RXS_parse subprocedure control will not return to the mainline program until the entire XML stream has been parsed. However, the parser WILL call the local subprocedures that have been registered for event handling. For example, when the parser comes across the <zip>’s contents, it might call a local subprocedure like zipHandler defined below:


      //---------------------------------------------
      // @Desc: Handle the <zip> contents event.
      //---------------------------------------------
     P zipHandler      b
     D zipHandler      pi
     D  pType                              value like(RXS_Type)
     D  pXPath                             value like(RXS_XPath)
     D  pData                              value like(RXS_XmlData)
     D  pDataLen                           value like(RXS_Length)
      /free
       RXS_log(RXS_DIAG: 'Zip code is:' + pData);
      /end-free
     P                 e


The zipHandler subprocedure has four parameters - pType, pXPath, pData, and pDataLen. Each event handler must use these four parameter definitions so the parser can report what it found. The parser will put a code for the event detected in the pType parameter. When the parser calls zipHandler, the value will equal the RXS_ELEMCONTENT constant. The parser will put the full path of the event detected in the pXPath parameter. In this example the value is '/PostAdr/zip/'. Please note that the forward slash at the end of the path specifies the contents of the element instead of the beginning or end of the element. pData contains the value of element <zip> which is '56001'. pDataLen is pretty straight forward—it contains the length of data in pData which is 5.

Another application might use more of the procedure input parameters than zipHandler. It is only interested in pData. It uses the RXS_log subprocedure to send a dialog message to the job log that will look like this: 'Zip code is:56001'. For a more complete treatment of this example please refer to source file EXAMPLE, member PARSE1 in the RXS library (Included RXS Example Programs).

To summarize this process, the following diagram illustrates the flow while parsing an XML document with RXS:

Flow of parsing XML with RXS