The file builtins/xml.e (not an autoinclude) allows conversion of xml (text) <--> DOM (nested structure).

Deliberately kept as simple as possible, to simplify modification. (I fully expect problems the first time this is used in anger!)

Does not use/validate against DTDs (Document Type Definitions) or XSLs (eXtensible Stylesheet Language).

Comments are only supported after the XMLdeclaration (if present) and either before or after the top-level element, not within it.

Unicode handling: via utf-8. I have tested this on some fairly outlandish-looking samples without any problems.
However it contains very little code to actually deal with unicode, instead relying on utf8 to not embed any critical control characters (such as '<') within any multibyte encodings (and even wrote a quick test ditty).

Should you need to process utf-16 (or utf-32) then it must be converted to utf-8 beforehand, and possibly the output back.
One thing it does actually do is skip a utf-8 BOM at the start of the xml input (string/file), however there is nothing in here to help with writing one back, not that prefixing one on the output [externally] should in any way prove difficult.

Note: json is widely considered a better choice for data transfer.
It is of course more efficient, but also less descriptive and does not support comments or any form of self-validation, and may prove more brittle, unless the provider has the common sense to include a field that adequately specifies the precise version/format being sent (but in my experience they rarely do). The bottom line is you should use xml in cases where you really benefit from it, which is not everywhere, eg: use xml for config-type-data, but json for bulk data.


include xml.e
constant eg1 = """
<?xml version="1.0" ?>
  <element>Some text here</element>
-- output:
--          {"document",                    -- XML_DOCUMENT
--           {"<?xml version=\"1.0\" ?>"},  -- XML_PROLOGUE
--           {"root",                       -- XML_CONTENTS[XML_TAGNAME]
--            {},                           --  XML_ATTRIBUTES
--            {{"element",                  --  XML_CONTENTS[XML_TAGNAME]
--              {},                         --   XML_ATTRIBUTES
--              "Some text here"}}},        --   XML_CONTENTS
--           {}}                            -- XML_EPILOGUE
Note the three uses of XML_CONTENTS. The first is the one and only top-level element, the second is a sequence of elements, which happens to be one long, and the third is a string of the "string, or sequence of nested tags" fame. The difference between the first two of those cannot be stressed enough: top-level has precisely one '{' before it, whereas any and all more deeply nested elements always have two, ie "{{", except of course like in the third use above, where it is actually just the lowest-level string contents, rather than a further nested element.
Obviously in the above XML_CONTENTS[XML_TAGNAME] means that XML_CONTENTS is a sequence of length 3 starting at that point, and XML_TAGNAME is the first element of that.


include xml.e
constant eg2 = """
  <Number Flat="b">2</Number>
  <Street>Erdzinderand Beat</Street>
-- output:
--          {"document",                    -- XML_DOCUMENT
--           {},                            -- XML_PROLOGUE
--           {"Address",                    -- XML_CONTENTS[XML_TAGNAME]
--            {},                           --  XML_ATTRIBUTES
--            {{"Number",                   --  XML_CONTENTS[XML_TAGNAME]
--              {{"Flat"},                  --   XML_ATTRIBUTES[XML_ATTRNAMES]
--               {"b"}},                    --    XML_ATTRVALUES
--              "2"},                       --   XML_CONTENTS
--             {"Street",                   --  XML_CONTENTS[XML_TAGNAME]
--              {},                         --   XML_ATTRIBUTES
--              "Erdzinderand Beat"},       --   XML_CONTENTS
--             {"District",                 --  XML_CONTENTS[XML_TAGNAME]
--              {},                         --   XML_ATTRIBUTES
--              "Stooingder"},              --   XML_CONTENTS
--             {"City",                     --  XML_CONTENTS[XML_TAGNAME]
--              {},                         --   XML_ATTRIBUTES
--              "Bush"}}},                  --   XML_CONTENTS
--           {}}                            -- XML_EPILOGUE


Note the precise content of the resulting xml structure is not documented beyond these constants; the programmer is expected to examine the ouput from increasingly more complex, but still valid xml, until they understand the structure and how to use the XML_XXX constants, all quite straightforward really, once you get used to it.
The examples above should get you started. At this point in time the structure is quite likely to change with each new release as more fuctionality is added, and of course more contants and routines are also quite likely to be added with each new release.

global enum XML_DOCUMENT, -- must be "document"
XML_PROLOGUE, -- {} or eg {doctype,comments}
XML_CONTENTS, -- (must be a single element)
XML_EPILOGUE, -- {} or {final_comments}
XML_DOCLEN  = $ -- 4

global enum XML_TAGNAME, -- eg "Students"
-- XML_CONTENTS -- (string, or sequence of nested tags)

global enum XML_ATTRNAMES, -- eg {"Name","Gender",...}
XML_ATTRVALUES -- eg {"Alan","M",...}

global constant XML_DECODE  = #0001, -- convert eg &gt; to '>' in attribute values
XML_ENCODE  = #0002  -- reverse "" (in xml_sprint)


string s = xml_decode(string s) -- convert all eg &lt; to '<', but leaving any CDATA as-is.
string s =
xml_encode(string s) -- Inverse of xml_decode, but without any CDATA handling or any re-coding of anything except the five critical entities (<>&'").
Obviously there is no attempt to preserve CDATA on a round trip.
(These two are really internal routines that are sometimes useful directly.)
sequence res =
xml_parse(string xml, integer options=NULL) -- Convert an xml string into a nested structure. options may be XML_DECODE
Returns {-1,"message",...} if xml could not be parsed.
Success can be determined by checking whether result[1] is a string, or -1, or better yet =="document".
string res =
xml_sprint(sequence xml, integer options=NULL) -- convert xml structure to a string. options may be XML_ENCODE
sequence res =
xml_new_doc(sequence contents={}, prolog=std_prolog, epilog={}) -- create a new xml structure.
note: the default contents is not legal until res[XML_CONTENTS] gets an xml_new_element().
seqeunce elem =
xml_new_element(string tagname, sequence contents) -- returns {tagname,{},contents}, where {} represents an empty set of attributes
contents should be a string or a sequence of nested elements
string res = xml_get_attribute(sequence elem, string name) -- returns attribute value or "" if it does not exist
sequemce elem = xml_set_attribute(sequence elem, string attrib_name, attrib_value) -- set or remove (attrib_value of "") an attribute
sequence res =
xml_get_nodes(sequence xml, string tagname) - return a sequence of all nodes matching tagname
xml can be an entire document or an individual element (but not a sequence of elements)
sequence xml =
xml_add_comment(sequence xml, string comment, bool as_prolog=true) -- add a comment to the prolog or epilog
note that comments on individual elements are not supported, xml must be the entire top-level document.
Everything apart from xml_parse() and xml_sprint() are all pretty trivial and could easily be accomplished directly.

Note that none of these routines have yet undergone any significant real-world testing, but should be easy to fix/enhance as needed.