Conceptually, it is easy to generate well-formed XML: Every start tag must have a corresponding end tag, and inner elements must end before the outer element's end tag.
To minimize our memory usage, we can write the start tag
as soon as we start an element, and write the content of
the element as we go. However, to insure that the XML
is well-formed, the
sox.py module will check that
start tags and end tags match.
The obvious data structure to manage this checking is a stack.
The stack is initially empty.
When an element is started, the start tag is written immediately, and the element name is pushed onto the stack.
When an element's end tag is written, the top element of the stack is popped. Its name should match the name of the end tag just written.
If the stack is empty at the end, then the start and end tags are balanced.
The worst-case memory usage will occur when the file is deeply nested. The author can't imagine an XML application that needs more than a few dozen levels of element nesting.
Here is the basic workflow for programs that use this interface:
Create an instance of
class Sox and tell
it where the output is going. Let's call this instance
To start an element, use the
is the tag name, e.g., “
s.start("html", ...)”. The element's
start tag is written to the output right away.
You can add various other arguments to this method call to specify attribute values and text content. For example, this call:
s.start("p", "Some text", id="x32")
would generate this output:
<p id="x32">Some text
.start() method will return a
token, an instance of class
you will use later to generate the element's end tag.
Let's call this instance
s.start() method also pushes
elt on its internal stack to use later to
check for start/end tag balancing.
To add text content to this element, use a call to
s.write(...) with string values.
Those values will be written to the output right
Any calls to the
s.start() method that
you make before an outer element is finished will
generate child elements of that outer element. All
child elements must be completed before the parent
element's end tag is written.
To finish an element, call the
method of the
elt token that was returned
Assuming that you have completed the generation of
any child elements,
elt should also be
on top of the
sox instance's internal
stack, and that value will be popped off the stack.
Then this method will write the element's end tag.
There is also a
s.leaf( method for generating empty
elements and elements with no element children. It does not
return a value, but immediately writes the element.
In order to insure well-formed XML output, the
will perform several other validity checks.
Element names and attribute names must conform to XML naming
rules. The first character must be a letter, underbar, or
:”, or any of a number of
Unicode character ranges specified in the XML standard. Additional
characters can be any of those plus digits, “
several other Unicode character ranges.
In attribute values, the characters
(double-quote) will be escaped in output as entities