|
SAX-like XML Event Processing
With the release of XML Tools 2.6, it is possible to implement event-based
XML processing using XML Tools. This is done by passing an AppleScript
script object to the new SAX handler parameter of the
parse XML command. This script object is expected to provide a series
of handlers that respond to XML parsing events.
This approach is useful when you want to populate a custom data structure
directly from XML data instead of extracting the data from the nested
collection of XML element classes normally generated by the parse XML
command.
Here is a very simple example illustrating how this works:
script EventProcessor
property elementNames : {}
on XMLStartElement(elementName, elementAttributes)
-- called
when an XML element begins
set end of elementNames to elementName
end XMLStartElement
end script
set theXML to "<data>
<test/>
<test name=\"mark\">
data in second test element
</test>
data in root element
</data>"
set xxx to parse XML theXML SAX handler EventProcessor
xxx's elementNames
-- Result:
-- {"data", "test", "test"}
In this example, a copy of the EventProcessor script object is passed
to the parse XML command. As the parse XML command is parsing the XML
data, it
will call the EventProcess's XMLStartElement handler whenever a new XML
element begins. When parsing completes, the EventProcessor object is
returned to AppleScript. In this particular case, the XMLStartElement
handler records the name of each XML element tag.
Here is an example of an event handler object implementing all the handlers
that the XML parse command can call. You need only include the handlers
for events that you are interested in handling:
script AllEventHandlers
on XMLStartElement(elementName, elementAttributes)
-- called when a new XML element begins
display dialog "XMLStartElement: " & elementName & ", Attributes: " & (length of elementAttributes)
end XMLStartElement
on XMLEndElement(elementName)
-- called when an XML element ends
display dialog "XMLEndElement: " & elementName
end XMLEndElement
on XMLCharacterData(xmlData)
--called when there is XML data for an element
display dialog "XMLCharacterData: " & xmlData
end XMLCharacterData
on XMLComment(comment)
-- called when an XML comment is encoutered
-- must call parse XML with comments
display dialog "XMLComment: " & comment
end XMLComment
on XMLDefaultContent(xmlData)
-- called
for content outside the root element (i.e. XML declaration)
display dialog "XMLDefaultContent: " & xmlData
end XMLDefaultContent
on XMLStartCData()
-- called at the beginning of an XML CData section
display dialog "XMLStartCData"
end XMLStartCData
on XMLEndCData()
-- called at the end of an XML CData section
display dialog "XMLEndCData"
end XMLEndCData
on XMLStartNamespace(prefix, uri)
-- called when a namespace reference begins
display
dialog "XMLStartNamespace: " & prefix & ",
URI: " & uri
end XMLStartNamespace
on XMLEndNamespace(prefix)
-- called when a namespace reference ends
display dialog "XMLEndNamespace: " & prefix
end XMLEndNamespace
on XMLProcessingInstruction(target, piData)
-- called when an XML processing instruction is encountered
-- must call parse XML with including processing instructions
display dialog "XMLStartNamespace: " & target & ", Data: " & piData
end XMLProcessingInstruction
on XMLNotStandalone()
-- called
when XML is not standalone, and there is no DTD. Return true to allow
processing to
-- continue if this handler is missing, parse XML's strict standalone parameter value is used
display dialog "XMLNotStandalone"
return true -- allow processing to continue
end XMLNotStandalone
on XMLStartDocTypeDecl(docTypeName, systemID, publicID, hasInternalSubset)
-- called at the beginning of a DOCTYPE declaration
display dialog "XMLStartDocTypeDecl: " & docTypeName & ", systemID: " & systemID & ", ¬
publicID: " & publicID & ", hasInternalSubset: " & hasInternalSubset
end XMLStartDocTypeDecl
on XMLEndDocTypeDecl()
-- called at the end of a DOCTYPE declaration
display dialog "XMLEndDocTypeDecl"
end XMLEndDocTypeDecl
on XMLExternalEntityRef(context, base, systemID, publicID)
-- called after an external entity (DTD) has been loaded
display dialog "XMLExternalEntityRef: " & context & ", base: " & base & ", ¬
systemID: " & systemID & ", publicID: " & publicID
end XMLExternalEntityRef
on XMLUnparsedEntityDecl(entityName, base, systemID, publidID, notationName)
display
dialog "XMLUnparsedEntityDecl: " & entityName & ",
base: " & base & ", ¬
systemID: " & systemID & ", publicID: " & publicID & ", notationName: " & notationName
end XMLUnparsedEntityDecl
on XMLNotationDecl(notationName, base, systemID, publidID)
display dialog "XMLNotationDecl: " & notationName & ", base: " & base & ", ¬
systemID: " & systemID & ", publicID: " & publicID
end XMLNotationDecl
on XMLParseResult()
-- return
the data you want parse XML to return. If this method is omitted,
the entire script object is returned
return "some
data"
end XMLParseResult
end script
NOTE 1: Attributes are passed to the XMLStartElement
as a record where keys are the attribute name and values are the corresponding
attribute
value.
NOTE 2: If there is an error in one of the XML event
handlers, parse XML will abort the parse. When this happens
parse
XML will return the result of XMLParseResult() handler or the script
object, if XMLParseResult()
is not defined, in the partial result of the error. You can extract this
information using this syntax:
script SAXHandler
property elementNames : {}
on XMLStartElement(elementName, elementAttributes)
-- called when an XML element begins
set end of elementNames to elementName
error "Error Message from SAXHandler" -- signal an error to abort parsing the rest of the XML stream
end XMLStartElement
on XMLParseResult()
-- return
the data you want parse XML to return. If this method is omitted,
the entire script object is returned
return elementNames
end XMLParseResult
end script
try
set xxx to parse XML "<data>
<!-- a comment -->
<test/>
<test name=\"mark\">
data in second test element
</test>
data in root element
</data>" SAX handler SAXHandler with including processing instructions and including comments
on error errMsg partial result pr
{errMsg, pr} -- partial
result is the data returned by XMLParseResult
end try
-- Result:
-- {
-- "xmlstartelement SAX handler error: Error Message from SAXHandler",
-- {
-- "data"
-- }
-- }
NOTE3: Script Debugger 's
AppleScript debugger is unable to debug XML event handlers while
they are being executed by the parse XM command.
Parameters:
| Parameter |
Type |
Description |
SAX handler
(new in v2.6) |
script object |
When the SAX handler parameter is
specified, parse XML switches to a SAX-like event-based mode of
parsing where handlers in the script object specified are called
in response to events as the XML data is parsed.
When this parameter is omitted, parse XML performs
as it has done in the past and returns an XML document class containing
a nested data structure representing the content of the parsed
XML data. |
| strict standalone |
boolean |
Ignored if the event handler object implements the
XMLNotStandalone handler.
|
| expanding external entities |
boolean |
By default, external entity references (e.g. DTDs)
are ignored since XML Tools is a non-validating XML parser. When
expanding external entities is true, XML Tools uses the Mac OS
URL Access facilities to access the externally referenced entity.
If the external entity exists on another machine,
you must have an active internet connection.
Supported URL formats: file:///..., http://...,
and ftp://...
NOTE: The XMLExternalEditityRef handler is called
after the external entity has been loaded. |
| including comments |
boolean |
By default, comments in your XML data are ignored.
The including comments parameter must be true in order for the
event handler's XMLComment handler to be called.
|
| including processing instructions |
boolean |
By default, XML processing instructions are ignored.
The including processing instructions parameter must be true in
order for the event handler's XMLProcessingInstruction handler
to be called.
|
| serializing |
boolean |
Ignored.
|
| base path |
string |
Provides a base URL for all external entity IDs.
For example, the following code uses a DTD loaded from http://www.latenightsw.com/dtds/mydtd.dtd:
parse XML "<?xml
version=\"1.0\"?>
<!DOCTYPE data SYSTEM \"mydtd.dtd\">
<data>
<data>
<tag/>
</data>" base path "http://www.latenightsw.com/dtds/"
|
| preserving whitespace |
boolean |
By default, the parse XML command strips all leading
and trailing whitespace characters and normalizes multiple whitespace
characters within a string to a single space.
NOTE 1: The xml-space="preserve" attribute
is honored when preserving whitespace is false.
NOTE 2: The xml-space="ignore" attribute
is not honored when preserving whitespace
is true.
NOTE 3: Whitespace characters in CDATA sections
are never stripped.
When preserving whitespace is true, parse XML
returns all XML data, including whitespace.
The parse XML command will strip whitespace according
to these rules before calling the event handler's XMLCharacterData
handler. |
| allowing leading whitespace |
boolean |
The XML specification states that well formed
XML documents have no leading whitespace before the <?xml ...
?> declaration. However, for historical reasons, XML Tools allows
XML documents to contain leading whitespace data. If allowing leading
whitespace is false, XML Tools will report an error when whitespace
appears at the beginning of an XML document.
NOTE: This only applies to documents that begin
with a <?xml ...?> declaration. If your document does not
have an XML declaration, this option is ignored.
|
| seperate namespace URIs |
boolean |
Ignored.
|
|