AppleScript Utility Code
for XML Handling
The following sample code is useful for processing the AppleScript data
structures created by the XML Tools parse XML command:
getAnElement
getElements
getNamedElement
getElementFromPath
getElementValue
NOTE: The code presented here is included with the XML Tools package
in the XMLToolsLib file.
AppleScript Rant
AppleScript suffers from the serious deficiency of not being able to
discover the names of properties in a record, and not being able to create
property references dynamically (i.e. at run-time based on the value
of a variable). The names of record properties must be defined at compile-time.
While there
are
tricks
that
can be done
with
AppleScript's run script command, these suffer from poor performance
and are generally difficult to implement reliably.
These limitations make designing
a tool like XML Tools very difficult. It is for this reason that XML
tools generates a list of records with a separate record for each element.
This design choice makes it possible to loop through the list items and
discover which elements are present. However, this also makes it difficult
to access specific elements when you know they exist.
The samples below are code that I use in my XML Tools based scripts
to simplify accessing XML data generated by the parse XML command. For
very complex XML datastructures, you may find the new event-based
parsing features of XML Tools 2.6 helpful.
Handlers
getAnElement(theXML, theElementName)
This handler searches an XML Contents collection for a particular XML
element (tag). This is for cases where an XML element appears only once
within another XML element. If the element is not present, missing
value is returned.
on getAnElement(theXML, theElementName)
-- find
and return a particular element (this presumes there is only one instance of
the element)
repeat with anElement in XML
contents of theXML
if class of anElement is XML
element and ¬
XML
tag of anElement is theElementName then
return contents of anElement
end if
end repeat
return missing
value
end getAnElement
Here's a brief example showing how this handler is used:
set theData to "<?xml
version =\"1.0\"?>
<data>
<xxx name=\"mark\">XXX 1</xxx>
<yyy>YYY 1</yyy>
<xxx name=\"anya\">XXX 2</xxx>
<yyy>YYY 2</yyy>
<xxx>XXX 3</xxx>
<zzz>ZZZ 5</zzz>
</data>"
getAnElement(parse
XML theData, "zzz")
-- Result:
-- {
-- class:XML
element,
-- XML
tag:"zzz",
-- XML
contents:{"ZZZ 5"}
-- }
getElements(theXML, theElementName)
This handler searches an XML Contents collection for all instances of
an XML element. This is for cases where an XML element may appear many
times within another XML element. If the
element is not present, missing value is returned.
on getElements(theXML, theElementName)
-- find
and return all instatnces of a particular element
local theResult
set theResult to {}
repeat with anElement in XML
contents of theXML
if class of anElement is XML
element and ¬
XML
tag of anElement is theElementName then
set end of theResult to contents of anElement
end if
end repeat
return theResult as list
end getElements
Here's a brief example showing how this handler is used:
set theData to "<?xml
version =\"1.0\"?>
<data>
<xxx name=\"mark\">XXX 1</xxx>
<yyy>YYY 1</yyy>
<xxx name=\"anya\">XXX 2</xxx>
<yyy>YYY 2</yyy>
<xxx>XXX 3</xxx>
<zzz>ZZZ 5</zzz>
</data>"
getElements(parse
XML theData, "yyy")
-- Result:
-- {
-- {
-- class:XML
element,
-- XML
tag:"yyy",
-- XML
contents:{"YYY 1"}
-- },
-- {
-- class:XML
element,
-- XML
tag:"yyy",
-- XML
contents:{"YYY 2"}
-- }
-- }
getNamedElement(theXML, theElementName, theName)
This handler searches an XML Contents collection for an instance of
a particular XML element with a specific name attribute:
on getNamedElement(theXML, theElementName, theName)
-- find and return
the first element with a particular name attribute
if class of theXML is XML
element or class of theXML is XML
document then
repeat with anElement in XML
contents of theXML
try
if class of anElement is XML
element and ¬
XML
tag of anElement is theElementName and ¬
|name| of XML
attributes of anElementis theName then
return contents of anElement
end if
on error number -1728
-- ignore
this error
end try
end repeat
else if class of theXML is list then
repeat with anElement in theXML
try
if class of anElement is XML
element and ¬
XML
tag of anElement is theElementName and ¬
|name| of XML
attributes of anElement is theName then
return contents of anElement
end if
on error number -1728
-- ignore
this error
end try
end repeat
end if
return missing
value
end getNamedElement
Here's a brief example showing how this handler is used:
set theData to "<?xml
version =\"1.0\"?>
<data>
<xxx name=\"mark\">XXX 1</xxx>
<yyy>YYY 1</yyy>
<xxx name=\"anya\">XXX 2</xxx>
<yyy>YYY 2</yyy>
<xxx>XXX 3</xxx>
<zzz>ZZZ 5</zzz>
</data>"
getNamedElement(parse
XML theData, "xxx", "anya")
-- Result:
-- {
-- class:XML
element,
-- XML
tag:"xxx",
-- XML
attributes:{|name|:"anya" },
-- XML
contents:{"XXX 2"}
-- }
getElementFromPath(theXML, theElementPath)
This handler retrieves an XML element based on a path that you specify.
The path is specified as a list of nested XML tag names. If the path
is
invalid (i.e. the target element does not exist), missing value is
returned.
on getElementFromPath(theXML, theElementPath)
if theElementPath is {} then
return theXML
else
local foundElement
set foundElement to getAnElement(theXML, item 1 of theElementPath)
if foundElement is not missing
value and ¬
class of foundElement is XML
element then
return getElementFromPath(foundElement, rest of theElementPath)
else
return missing
value
end if
end if
end getElementFromPath
Here's a brief example showing how this handler is used:
set theData to "<?xml
version =\"1.0\"?>
<data>
<xxx>
<yyy>
<zzz>Testing</zzz>
</yyy>
</xxx>
</data>"
getElementFromPath(parse
XML theData,
{"xxx", "yyy", "zzz"})
-- Result:
-- {
-- class:XML
element,
-- XML
tag:"zzz",
-- XML
contents:{"Testing"}
-- }
getElementValue(theXML)
The parse XML command does not include an XML Contents property when
an element is empty (i.e. <tag/> or <tag></tag>). This
is intended to allow you to detect empty elements. However, if you don't
care, this
can make getting element data more difficult because you need to use
a try block to catch the error when you try to access the missing XML
Contents property. The getElementValue handler does this for you.
on getElementValue(theXML)
if theXML is missing
value or theXML is {} then
return ""
else if class of theXML is string then
return theXML
else
try
return item 1 of XML
contents of theXML
on error number -1728
return ""
end try
end if
end getElementValue
Here's a brief example showing how this handler is used:
set theData to "<?xml
version =\"1.0\"?>
<data>
<test1></test1>
<test2>some data</test2>
</data>"
getElementValue(getAnElement(parse
XML theData, "test1"))
-- Result: ""
getElementValue(getAnElement(parse
XML theData, "test2"))
-- Result: "some
data"
|