LNS Home
XML Tools 2.9
XML Tools Home
Documentation
 
XML Parsing
XML Event-Based Parsing
XML Generation
Encodings
Utility AppleScript Code
Sample Applications
 
WatchNews
XML-RPC
Products
Affrus 1.0
x FaceSpan 4.3
Script Debugger 4.0
Freeware
XML Tools
XSLT Tools
List & Record Tools
Property List Tools
JavaScript OSA
Sample XML-RPC Server
OSAXen Fixer
CodeWarrior Tools
Site Contents
Home Page
bullet Mark’s Blog
Product Registration
Bug Reporting
x Freeware
Contacting Us

SAX-like XML Event Processing

With the release of XML Tools 2.6, it is possible to implement event-based XML processing using XML Tools. This is done by passing an AppleScript script object to the new SAX handler parameter of the parse XML command. This script object is expected to provide a series of handlers that respond to XML parsing events.

This approach is useful when you want to populate a custom data structure directly from XML data instead of extracting the data from the nested collection of XML element classes normally generated by the parse XML command.

Here is a very simple example illustrating how this works:

script EventProcessor
   
property elementNames : {}
   
   
on XMLStartElement(elementName, elementAttributes)
      --
called when an XML element begins   
      
set end of elementNames to elementName
   
end XMLStartElement
end script

set theXML to "<data>
    <test/>
    <test name=\"mark\">
   data in second test element
    </test>
    data in root element
</data>"

set xxx to parse XML theXML SAX handler EventProcessor
xxx's elementNames
--
   Result:
--
   {"data", "test", "test"}

In this example, a copy of the EventProcessor script object is passed to the parse XML command. As the parse XML command is parsing the XML data, it will call the EventProcess's XMLStartElement handler whenever a new XML element begins. When parsing completes, the EventProcessor object is returned to AppleScript. In this particular case, the XMLStartElement handler records the name of each XML element tag.

Here is an example of an event handler object implementing all the handlers that the XML parse command can call. You need only include the handlers for events that you are interested in handling:

script AllEventHandlers
   
   
on XMLStartElement(elementName, elementAttributes)
      --
called when a new XML element begins
      
display dialog "XMLStartElement: " & elementName & ", Attributes: " & (length of elementAttributes)
   
end XMLStartElement
   
   
on XMLEndElement(elementName)
      --
called when an XML element ends
      
display dialog "XMLEndElement: " & elementName
   
end XMLEndElement
   
   
on XMLCharacterData(xmlData)
      --
called when there is XML data for an element
      
display dialog "XMLCharacterData: " & xmlData
   
end XMLCharacterData
   
   
on XMLComment(comment)
      --
called when an XML comment is encoutered
      --
must call parse XML with comments
      
display dialog "XMLComment: " & comment
   
end XMLComment
   
   
on XMLDefaultContent(xmlData)
      --
called for content outside the root element (i.e. XML declaration)
      
display dialog "XMLDefaultContent: " & xmlData
   
end XMLDefaultContent
   
   
on XMLStartCData()
      --
called at the beginning of an XML CData section
      
display dialog "XMLStartCData"
   
end XMLStartCData
   
   
on XMLEndCData()
      --
called at the end of an XML CData section
      
display dialog "XMLEndCData"
   
end XMLEndCData
   
   
on XMLStartNamespace(prefix, uri)
      --
called when a namespace reference begins
      
display dialog "XMLStartNamespace: " & prefix & ", URI: " & uri
   
end XMLStartNamespace
   
   
on XMLEndNamespace(prefix)
      --
called when a namespace reference ends
      
display dialog "XMLEndNamespace: " & prefix
   
end XMLEndNamespace
   
   
on XMLProcessingInstruction(target, piData)
      --
called when an XML processing instruction is encountered
      --
must call parse XML with including processing instructions
      
display dialog "XMLStartNamespace: " & target & ", Data: " & piData
   
end XMLProcessingInstruction
   
   
on XMLNotStandalone()
      --
called when XML is not standalone, and there is no DTD.  Return true to allow processing to
      --
continue if this handler is missing, parse XML's strict standalone parameter value is used
      
display dialog "XMLNotStandalone"
      
return true -- allow processing to continue
   
end XMLNotStandalone
   
   
on XMLStartDocTypeDecl(docTypeName, systemID, publicID, hasInternalSubset)
      --
called at the beginning of a DOCTYPE declaration
      
display dialog "XMLStartDocTypeDecl: " & docTypeName & ", systemID: " & systemID & ", ¬
      publicID: " & publicID & ", hasInternalSubset: " & hasInternalSubset
   
end XMLStartDocTypeDecl
   
   
on XMLEndDocTypeDecl()
      --
called at the end of a DOCTYPE declaration
      
display dialog "XMLEndDocTypeDecl"
   
end XMLEndDocTypeDecl
   
   
on XMLExternalEntityRef(context, base, systemID, publicID)
      --
called after an external entity (DTD) has been loaded
      
display dialog "XMLExternalEntityRef: " & context & ", base: " & base & ", ¬
      systemID: " & systemID & ", publicID: " & publicID
   
end XMLExternalEntityRef
   
   
on XMLUnparsedEntityDecl(entityName, base, systemID, publidID, notationName)
      
display dialog "XMLUnparsedEntityDecl: " & entityName & ", base: " & base & ", ¬
      systemID: " & systemID & ", publicID: " & publicID & ", notationName: " & notationName
   
end XMLUnparsedEntityDecl
   
   
on XMLNotationDecl(notationName, base, systemID, publidID)
      
display dialog "XMLNotationDecl: " & notationName & ", base: " & base & ", ¬
      systemID: " & systemID & ", publicID: " & publicID
   
end XMLNotationDecl
   
   
on XMLParseResult()
      --
return the data you want parse XML to return. If this method is omitted, the entire script object is returned
      
return "some data"
   
end XMLParseResult
end script

NOTE 1: Attributes are passed to the XMLStartElement as a record where keys are the attribute name and values are the corresponding attribute value.

NOTE 2: If there is an error in one of the XML event handlers, parse XML will abort the parse. When this happens parse XML will return the result of XMLParseResult() handler or the script object, if XMLParseResult() is not defined, in the partial result of the error. You can extract this information using this syntax:

script SAXHandler
   
property elementNames : {}
   
   
on XMLStartElement(elementName, elementAttributes)
      --
called when an XML element begins
      
set end of elementNames to elementName
      
error "Error Message from SAXHandler" -- signal an error to abort parsing the rest of the XML stream
   
end XMLStartElement
   
   
on XMLParseResult()
      --
return the data you want parse XML to return.  If this method is omitted, the entire script object is returned
      
return elementNames
   
end XMLParseResult
end script

try
   
set xxx to parse XML "<data>
    <!-- a comment -->
    <test/>
    <test name=\"mark\">
   data in second test element
    </test>
    data in root element
</data>"
SAX handler SAXHandler with including processing instructions and including comments
on error errMsg partial result pr
   {errMsg, pr} --
partial result is the data returned by XMLParseResult
end try
--
   Result:
--
   {
--
      "xmlstartelement SAX handler error: Error Message from SAXHandler",
--
      {
--
         "data"
--
      }
--
   }

NOTE3: Script Debugger 's AppleScript debugger is unable to debug XML event handlers while they are being executed by the parse XM command.

Parameters:

Parameter Type Description
SAX handler
(new in v2.6)
script object

When the SAX handler parameter is specified, parse XML switches to a SAX-like event-based mode of parsing where handlers in the script object specified are called in response to events as the XML data is parsed.

When this parameter is omitted, parse XML performs as it has done in the past and returns an XML document class containing a nested data structure representing the content of the parsed XML data.

strict standalone boolean

Ignored if the event handler object implements the XMLNotStandalone handler.

expanding external entities boolean

By default, external entity references (e.g. DTDs) are ignored since XML Tools is a non-validating XML parser. When expanding external entities is true, XML Tools uses the Mac OS URL Access facilities to access the externally referenced entity.

If the external entity exists on another machine, you must have an active internet connection.

Supported URL formats: file:///..., http://..., and ftp://...

NOTE: The XMLExternalEditityRef handler is called after the external entity has been loaded.

including comments boolean

By default, comments in your XML data are ignored. The including comments parameter must be true in order for the event handler's XMLComment handler to be called.

including processing instructions boolean

By default, XML processing instructions are ignored. The including processing instructions parameter must be true in order for the event handler's XMLProcessingInstruction handler to be called.

serializing boolean

Ignored.

base path string

Provides a base URL for all external entity IDs. For example, the following code uses a DTD loaded from http://www.latenightsw.com/dtds/mydtd.dtd:

parse XML "<?xml version=\"1.0\"?>
<!DOCTYPE data SYSTEM \"mydtd.dtd\">
<data>
    <data>
    <tag/>
</data>"
base path "http://www.latenightsw.com/dtds/"

preserving whitespace boolean

By default, the parse XML command strips all leading and trailing whitespace characters and normalizes multiple whitespace characters within a string to a single space.

NOTE 1: The xml-space="preserve" attribute is honored when preserving whitespace is false.
NOTE 2: The xml-space="ignore" attribute is not honored when preserving whitespace is true.
NOTE 3: Whitespace characters in CDATA sections are never stripped.

When preserving whitespace is true, parse XML returns all XML data, including whitespace.

The parse XML command will strip whitespace according to these rules before calling the event handler's XMLCharacterData handler.

allowing leading whitespace boolean

The XML specification states that well formed XML documents have no leading whitespace before the <?xml ... ?> declaration. However, for historical reasons, XML Tools allows XML documents to contain leading whitespace data. If allowing leading whitespace is false, XML Tools will report an error when whitespace appears at the beginning of an XML document.

NOTE: This only applies to documents that begin with a <?xml ...?> declaration. If your document does not have an XML declaration, this option is ignored.

seperate namespace URIs boolean

Ignored.

 

 

 


Copyright © 1998-2007 Late Night Software Ltd. - All Rights Reserved.