Next / Previous / Contents / Shipman's homepage

3. Design rationale

Conceptually, it is easy to generate well-formed XML: Every start tag must have a corresponding end tag, and inner elements must end before the outer element's end tag.

To minimize our memory usage, we can write the start tag as soon as we start an element, and write the content of the element as we go. However, to insure that the XML is well-formed, the sox.py module will check that start tags and end tags match.

The obvious data structure to manage this checking is a stack.

The worst-case memory usage will occur when the file is deeply nested. The author can't imagine an XML application that needs more than a few dozen levels of element nesting.

3.1. Workflow for XML generation

Here is the basic workflow for programs that use this interface:

  1. Create an instance of class Sox and tell it where the output is going. Let's call this instance s.

  2. To start an element, use the s.start(N, ...) method, where N is the tag name, e.g., “s.start("html", ...)”. The element's start tag is written to the output right away.

    You can add various other arguments to this method call to specify attribute values and text content. For example, this call:

        s.start("p", "Some text", id="x32")
    

    would generate this output:

        <p id="x32">Some text
    

    The .start() method will return a token, an instance of class Elt, that you will use later to generate the element's end tag. Let's call this instance elt. The s.start() method also pushes elt on its internal stack to use later to check for start/end tag balancing.

  3. To add text content to this element, use a call to s.write(...) with string values. Those values will be written to the output right away.

  4. Any calls to the s.start() method that you make before an outer element is finished will generate child elements of that outer element. All child elements must be completed before the parent element's end tag is written.

  5. To finish an element, call the elt.end() method of the elt token that was returned by s.start().

    Assuming that you have completed the generation of any child elements, elt should also be on top of the sox instance's internal stack, and that value will be popped off the stack. Then this method will write the element's end tag.

There is also a s.leaf(N, ...) method for generating empty elements and elements with no element children. It does not return a value, but immediately writes the element.

In order to insure well-formed XML output, the sox module will perform several other validity checks.

  • Element names and attribute names must conform to XML naming rules. The first character must be a letter, underbar, or “:”, or any of a number of Unicode character ranges specified in the XML standard. Additional characters can be any of those plus digits, “-”, “.”, and several other Unicode character ranges.

  • In attribute values, the characters “&”, “<”, and “"” (double-quote) will be escaped in output as entities “&lt;”, “&amp;”, and “&quot;”, respectively.