|
XML Parsing Reference
The parse XML command parses a block of XML data into an AppleScript
record hierarchy representing the XML information. The XML data may
be read from a file or passed directly to the parse XML command.
The result of the parse XML command is a nested structure of XML element
classes. Each XML element class represents an element (tag) in your XML
data. Nested XML elements are returned in a list within the enclosing
XML element. The root XML element is returned in an XML document class.
The XML document class contains additional properties describing root
level aspects of your XML data (namespaces used, XML declaration, DOCTYPE
declaration, etc.).
Parse XML Syntax
parse XML <xml-data> or <file spec> or <alias> --> XML
Document class
Example:
This example illustrates what happens when you parse a simple XML document:
set theXML to parse
XML "<?xml
version=\"1.0\"?>
<data>
<!-- this is an XML comment which does not appear in the
parsed result -->
<tag1 attName=\"attribute value\">hello</tag1>
<tag2>
<a/>
<b>World</b>
</tag2>
</data>
"
-- Result:
-- {
-- class:XML
document,
-- XML
tag:"data",
-- XML
contents:{
-- {
-- class:XML
element,
-- XML
tag:"tag1",
-- XML
attributes:{|attName|"attribute value"},
-- XML
contents:{"hello"}
-- },
-- {
-- class:XML
element,
-- XML
tag:"tag2",
-- XML
contents:{
-- {
-- class:XML
element,
-- XML
tag:"a"
-- },
-- {
-- class:XML
element,
-- XML
tag:"b",
-- XML
contents:{"World"}
-- }
-- }
-- }
-- }
-- }
The resulting structure is a nested collection of XML element classes
matching the logical structure of the XML document parsed. Note
that white space around tags is stripped and note also that the XML contents
property is omitted in empty XML elements (e.g. <tag/>
and <tag></tag>) so that you can detect empty tags vs space
stripped tags.
To get the text value of the <tag1> XML element:
item 1 of XML
contents of item 1 of XML
contents of theXML
-- Result:
"hello"
To get the attName attribute of the <tag1> XML element:
|attName| of XML attributes of item 1 of XML contents of theXML
-- Result: "attribute
value"
Note the use of AppleScript's pipe syntax to specify the attribute property
name. This is required because AppleScript converts all identifiers to
lowercase internally. The use of pipe syntax preserves identifer case
to match that in the XML data.
This example parses the same XML data from a file:
tell application "Finder" to set myFolder to container of (path to me) as string
parse XML alias (myFolder & "data.xml")
The Utility AppleScript Code page gives more sample code showing how
to access information from the data structure returned by the parse XML
command.
NOTE: XML Tools accepts the following XML encodings:
UTF-8, UTF-16, ISO-8859-1, US-ASCII and MacRoman. When parsing XML strings
directly,
make sure that the string type (plain text, UTF-8 or UTF-16) is compatible
with the encoding specified in the <?xml ...?> header. More information
on handling encodings with XML Tools is available here.
Parameters:
| Parameter |
Type |
Description |
SAX handler
(new in v2.6) |
script object |
When the SAX handler parameter is specified, parse XML switches
to a SAX-like event-based mode of parsing where handlers in the script
object specified are called in response to events as the XML data
is parsed. This mode of operation is explained here.
Some of the parse XML parameters are ignored when
SAX handler is specified. The meaning of others may change. Refer
the XML Event-Based Parsing page for
more details. |
| strict standalone |
boolean |
When true, the Expat parser reports an error when
parsing XML data that is not standalone.
|
| expanding external entities |
boolean |
By default, external entity
references (e.g. DTDs) are ignored since XML Tools is a non-validating
XML parser. When
expanding external entities is true, XML Tools uses the Mac OS
URL Access facilities to access the externally
referenced
entity.
If the external entity exists on another machine,
you must have an active internet connection.
Supported URL formats: file:///..., http://..., and
ftp://... |
encoding
(new in v2.7) |
string |
By default, XML Tools looks (a) for an encoding in the XML declaration,
and (b) for Unicode BOM marks to determine the encoding to use when
processing XML. This parameter allows you to override the encoding
in the XML declaration.
Valid encodings are: UTF-8, UTF-16, ISO-8859-1,
US-ASCII and MacRoman. |
| including comments |
boolean |
By default, comments in your XML data are ignored.
When including comments is true, comments are included in the resulting
XML Contents list
for each
XML Element.
Comments
are
expressed
as
instances of the XML Comment class.
parse XML "
<data>
hello
<!-- a comment -->
world
</data>" with including comments
-- Result:
-- {
-- class:XML
document,
-- XML tag:"data",
-- XML
contents:{
-- "hello",
-- {
-- class:XML
comment,
-- XML comment:" a
comment "
-- },
-- "world"
-- }
-- } |
| including processing instructions |
boolean |
default: false
By default, XML processing instructions are ignored.
When including processing instructions is true, XML processing
instructions are included in the resulting XML Contents
list for
each XML Element.
Processing
Instructions are expressed as instances of the XML Process Instruction
class.
parse XML "
<data>
hello
<?TARGET xxxx?>
world
</data>" with including processing instructions
-- Result:
-- {
-- class:XML
document,
-- XML tag:"data",
-- XML
contents:{
-- "hello",
-- {
-- class:XML
processing instruction,
-- XML target:"TARGET",
-- XML target data:"xxxx"
-- },
-- "world"
-- }
-- } |
| serializing |
boolean |
default: false
When serializing is true, the parse XML command
adds a uniquely valued id property to the XML Attributes
record of each XML Element. This is useful if you later move elements
around and need
to locate a particular element.
parse XML "<?xml version=\"1.0\"?>
<data>
<tag/>
<tag/>
</data>" with serializing
-- Result:
-- {
-- class:XML
document,
-- XML element id:3, --
added by with serialization
-- XML tag:"data",
-- XML
attributes:{
-- id:3
-- added by with serialization
-- },
-- XML
contents:{
-- {
-- class:XML
element,
-- XML
element id:1, -- added by with serialization
-- XML tag:"tag",
-- XML
attributes:{
-- id:1
-- added by with serialization
-- }
-- },
-- {
-- class:XML
element,
-- XML
element id:2, -- added by with serialization
-- XML tag:"tag",
-- XML
attributes:{
-- id:2
-- added by with serialization
-- }
-- }
-- }
-- } |
| base path |
string |
Provides a base URL for all external entity IDs. For example,
the following code uses a DTD loaded from http://www.latenightsw.com/dtds/mydtd.dtd.
parse XML "<?xml version=\"1.0\"?>
<!DOCTYPE data SYSTEM \"mydtd.dtd\">
<data>
<data>
<tag/>
</data>" base path "http://www.latenightsw.com/dtds/"
|
| preserving whitespace |
boolean |
By default, the parse XML command strips all leading
and trailing whitespace characters and normalizes multiple whitespace
characters within a string to a single space.
NOTE 1: The xml-space="preserve"
attribute is honored when preserving whitespace is false.
NOTE 2: The xml-space="ignore" attribute
is not honored when preserving
whitespace is true.
NOTE 3: Whitespace characters in CDATA sections
are never stripped.
When preserving whitespace is true, parse XML returns
all XML data including, white space. |
| allowing leading whitespace |
boolean |
The XML specification states that well formed
XML documents have no leading whitespace before the <?xml ... ?> declaration.
However, for historical reasons, XML Tools allows XML documents
to contain leading whitespace data. If allowing leading whitespace
is false, XML Tools will report an error when whitespace appears
at the beginning of an XML document.
NOTE: This only applies to documents that begin
with a <?xml ...?> declaration. If your document does not
have an XML declaration, this option is ignored.
|
including empty elements
(new in v2.6) |
boolean |
When parsing empty XML elements
(<tag/> and <tag></tag>), the parse XML command returns an XML
contents property containing "". When including empty elements is false,
the parse XML command does not include an XML contents property value
for empty XML elements (this is how the parse XML command operated
prior to v2.6). |
| separate namespace URIs |
boolean |
When parsing documents that utilize
XML namespaces, parse XML normally returns an element's tag name
and the associated
namespace URI as
separate
properties.
Here is an example of how parse XML returns XML
namespace information: set theXML to "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">
<xsl:template match=\"doc\">
<out><xsl:value-of select=\".\"/></out>
</xsl:template>
</xsl:stylesheet>
"
parse XML theXML
-- Result:
-- {
-- class:XML
document,
-- XML namespace prefix:"xsl",
-- XML
namespace uri:"http://www.w3.org/1999/XSL/Transform",
-- XML tag:"stylesheet",
-- XML attributes:{|version|:"1.0"},
-- XML
contents:{
-- {
-- class:XML
element,
-- XML namespace prefix:"xsl",
-- XML
namespace uri:"http://www.w3.org/1999/XSL/Transform",
-- XML tag:"template",
-- XML attributes:{match:"doc"},
-- XML
contents:{
-- {
-- class:XML
element,
-- XML tag:"out",
-- XML
contents:{
-- {
-- class:XML
element,
-- XML namespace prefix:"xsl",
-- XML
namespace uri:"http://www.w3.org/1999/XSL/Transform",
-- XML tag:"value-of",
-- XML attributes:{|select|:"."}
-- }
-- }
-- }
-- }
-- }
-- },
-- XML
namespaces:{
-- {
-- class:XML
namespace,
-- XML namespace prefix:"xsl",
-- XML
namespace uri:"http://www.w3.org/1999/XSL/Transform"
-- }
-- }
-- }
When separate namespace URIs is false, parse XML
returns the namespace URI and element tag name as a single string
in the
format "URI:TagName".
parse XML theXML without seperate namespace URIs
-- Result:
-- {
-- class:XML
document,
-- XML tag:"http://www.w3.org/1999/XSL/Transform:stylesheet",
-- XML attributes:{|version|:"1.0"},
-- XML
contents:{
-- {
-- class:XML
element,
-- XML tag:"http://www.w3.org/1999/XSL/Transform:template",
-- XML attributes:{match:"doc"},
-- XML
contents:{
-- {
-- class:XML
element,
-- XML tag:"out",
-- XML
contents:{
-- {
-- class:XML
element,
-- XML tag:"http://www.w3.org/1999/XSL/Transform:value-of",
-- XML attributes:{|select|:"."}
-- }
-- }
-- }
-- }
-- }
-- },
-- XML
namespaces:{
-- {
-- class:XML
namespace,
-- XML namespace prefix:"xsl",
-- XML
namespace uri:"http://www.w3.org/1999/XSL/Transform"
-- }
-- }
-- } |
|