Table of Contents
Node classDocument classElement classAttr classCharacterData classes:
Text, CDATA, and Comment
DOMImplementation objectxml4create.py: A more Pythonic module for XML file creationxml4create.py: The source codexml4create.pyDocumentType classDocument classDocument.splitQName(): Process a
qualified nameDocument.serialize()Document.write()Element classElement.__init__()Element.__wrapRoot(): Wrap the root
elementElement.__setAttr(): Set up one
attributeElement.__newElement(): Element child
of an elementElement.__setitem__()Element.update(): Copy XML
attributes to the elementText classText.__init__()Comment classDocumentFragment classDocumentFragment.serialize()DocumentFragment.write()crenscrefragThe Python programming language is a fine vehicle for writing programs that process or generate XML files. This document is a quick reference to the major features of Python's interface to the XML Document Object Model (DOM), which provides for a tree-like data structure that represents an XML document. We won't cover every feature; this is just a selection of commonly used techniques.
The reader is assumed to be generally familiar with the Python language and with the structure of XML documents. For more information, see these references:
The python.org page is the central
site for the Python language. See also TCC's
Python page.
For an introduction to Python's approach to the DOM,
see the Python Library Reference page, “xml.dom—The Document Object
Model API .”
For help on XML, see TCC's XML page.
This document assumes you have the 4Suite package installed. For information and downloads, see the Fourthought page.
This publication is available in Web
form and also as a PDF
document. Please forward any comments to
tcc-doc@nmt.edu.
This document also presents a number of Python source files in literate programming style: a useful module and a few short test drivers demonstrating its use. These files are available online:
xml4create.py:
See Section 13, “xml4create.py: A more Pythonic module for XML file creation”.
xmlcretest: Simple
test driver for xml4create.py.
crens:
Test driver demonstrating output in multiple namespaces.
crefrag: Test
driver demonstrating creation of a document fragment.
For most XML processing, the Document Object Model is a convenient way to read or write XML files. However, there are some situations where other techniques may be necessary.
A DOM tree is resident entirely in memory. Obviously, if you are reading a five-gigabyte XML file, it probably won't fit.
Building a DOM tree from a document can be a slow process for larger files.
If memory size or performance are issues, refer to the SAX processing model, which is outside the scope of this document. Here is a good resource:
Jones, Christopher A., and Fred L. Drake, Jr. Python & XML. O'Reilly, 2002, ISBN 0-596-00128-2.
Throughout the descriptions of these interfaces, we
describe a number of arguments and results as
“strings”. It is safest to use Python
unicode strings throughout these interfaces. Regular
Python strings of class str will be
converted to Unicode automatically, but the automatic
conversion may not do the right thing.