Next / Previous / Contents / Shipman's homepage

Abstract

Describes a module allowing programs in the Python programming language to generate large XML files without large amounts of processor memory; includes its implementation as a literate program.

This publication is available in Web form and also as a PDF document. Please forward any comments to john@nmt.edu.

This work is licensed under a Creative Commons BY-NC Creative Commons Attribution-NonCommercial 3.0 Unported License.

Table of Contents

1. When you need to generate large XML documents
2. Online files
3. Design rationale
3.1. Workflow for XML generation
3.2. Design refinements: convenience features
3.3. Context managers and the “with” construct
4. Interface to the sox module
4.1. The SoxError class
4.2. The Sox class
4.3. The Elt class: An element in progress
4.4. Name validation
5. sox.py: The code prologue
6. Imported modules
7. Manifest constants
7.1. DEFAULT_ENCODING
7.2. NAME_START_RANGES
7.3. NAME_CHAR_RANGES
7.4. ESCAPE_MAP
8. Specification functions
8.1. content-processor
8.2. escaped-attribute
8.3. content-list
8.4. valid-comment
8.5. xml-name
8.6. unicode-okay
9. class SoxError
10. class Sox
11. Sox.__init__()
12. Sox.start(): Start an element
13. Sox._nameCheck(): Is this a valid xml-name?
14. Sox._unicodify(): Force to Unicode
15. Sox._checkCharRange(): Is this code point in a given set of intervals?
16. Sox._sortParams(): Classify content and attribute arguments
17. Sox._unicodeProcessor()
18. Sox._strProcessor()
19. Sox._intProcessor()
20. Sox._boolProcessor()
21. Sox._dictProcessor()
22. Sox._attrEscape(): Escape forbidden characters in attribute values
23. Sox._typeRouter(): Mapping of content parameter types to converter functions
24. Sox._buildAttrs(): Build XML attributes from a dictionary
25. Sox.end()
26. Sox.write()
27. Sox.leaf(): Write an element with no element children
28. Sox._startTag(): Write a start tag
29. Sox._emptyTag(): Write an empty XML tag
30. Sox._endTag(): Write an XML end tag
31. Sox.flush()
32. Sox.cleanup(): Cleanup
33. Sox.close(): Cleanup and close
34. Sox.comment()
35. Sox.pi(): Write a processing instruction
36. Sox.doctype()
37. class Elt: One open element
38. Elt.__init__()
39. Elt.end()
40. Elt.__enter__()
41. Elt.__exit__()
42. soxtest: Test driver

1. When you need to generate large XML documents

This document describes a module for the Python programming language that helps in the generation of large XML documents without regard to processor memory limitations.

Sometimes, as when formatting a database query for Web presentation as XHTML, or generating an XSL-FO file destined for a PDF rendering, it is necessary to generate quite large XML documents. The author's preferred techniques for reading and writing XML from Python are described in Python XML processing with lxml. The etbuilder module presented in that document is especially handy for XML generation, allowing generation of arbitrary XML structures with compact code. However, these methods require that the entire XML document reside as a tree in memory.

Hence, what the author needed was a module that generates XML sequentially, allowing the generation of quite large documents without significant memory requirements.