Next / Previous / Contents / TCC Help System / NM Tech homepage

Table of Contents

1. Introduction
1.1. How to get this publication
1.2. When not to use the DOM
1.3. Unicode good, strings risky
2. Terms you should know
2.1. URI
2.2. URL and URN
2.3. Absolute URI
2.4. Relative URI
2.5. Base URI
2.6. Namespace URI
2.7. qName: Qualified name
2.8. Namespace prefix
2.9. Local name
3. Reading an XML document in Python
3.1. A quick and dirty document reader
3.2. A full-featured reader
4. The structure of a DOM tree
5. The Node class
5.1. Node.appendChild()
5.2. Node.cloneNode()
5.3. Node.hasChildNodes()
5.4. Node.insertBefore()
5.5. Node.isSameNode()
5.6. Node.removeChild()
5.7. Node.replaceChild()
5.8. Node.xpath()
6. The Document class
7. The Element class
7.1. Element.getAttributeNS()
7.2. Element.getAttributeNodeNS()
7.3. Element.hasAttributeNS()
7.4. Element.removeAttributeNS()
7.5. Element.setAttributeNS()
7.6. Element.setAttributeNodeNS()
8. The Attr class
9. The CharacterData classes: Text, CDATA, and Comment
9.1. CharacterData.appendData()
9.2. CharacterData.deleteData()
9.3. CharacterData.insertData()
9.4. CharacterData.replaceData()
9.5. CharacterData.substringData()
10. The DOMImplementation object
10.1. DOMImplementation.createDocument()
10.2. DOMImplementation.createRootNode()
11. Printing a document
12. Creating a document from scratch: factory methods
12.1. Document.createElementNS()
12.2. Document.createTextNode()
12.3. Document.createProcessingInstruction()
12.4. Document.createComment()
12.5. Document.createDocumentFragment()
12.6. An example of document creation
13. xml4create.py: A more Pythonic module for XML file creation
13.1. The DocumentType class
13.2. The Document class
13.3. The Element class
13.4. The Text class
13.5. The Comment class
13.6. The DocumentFragment class
13.7. A small example: XHTML page generation
14. xml4create.py: The source code
14.1. Prologue to xml4create.py
14.2. The DocumentType class
14.3. The Document class
14.4. Document.splitQName(): Process a qualified name
14.5. Document.serialize()
14.6. Document.write()
14.7. The Element class
14.8. Element.__init__()
14.9. Element.__wrapRoot(): Wrap the root element
14.10. Element.__setAttr(): Set up one attribute
14.11. Element.__newElement(): Element child of an element
14.12. Element.__setitem__()
14.13. Element.update(): Copy XML attributes to the element
14.14. The Text class
14.15. Text.__init__()
14.16. The Comment class
14.17. The DocumentFragment class
14.18. DocumentFragment.serialize()
14.19. DocumentFragment.write()
15. Test driver for multiple namespaces: crens
16. Test driver for a document fragment: crefrag

1. Introduction

The Python programming language is a fine vehicle for writing programs that process or generate XML files. This document is a quick reference to the major features of Python's interface to the XML Document Object Model (DOM), which provides for a tree-like data structure that represents an XML document. We won't cover every feature; this is just a selection of commonly used techniques.

The reader is assumed to be generally familiar with the Python language and with the structure of XML documents. For more information, see these references:

1.1. How to get this publication

This publication is available in Web form and also as a PDF document. Please forward any comments to tcc-doc@nmt.edu.

This document also presents a number of Python source files in literate programming style: a useful module and a few short test drivers demonstrating its use. These files are available online:

1.2. When not to use the DOM

For most XML processing, the Document Object Model is a convenient way to read or write XML files. However, there are some situations where other techniques may be necessary.

  • A DOM tree is resident entirely in memory. Obviously, if you are reading a five-gigabyte XML file, it probably won't fit.

  • Building a DOM tree from a document can be a slow process for larger files.

If memory size or performance are issues, refer to the SAX processing model, which is outside the scope of this document. Here is a good resource:

Jones, Christopher A., and Fred L. Drake, Jr. Python & XML. O'Reilly, 2002, ISBN 0-596-00128-2.

1.3. Unicode good, strings risky

Throughout the descriptions of these interfaces, we describe a number of arguments and results as “strings”. It is safest to use Python unicode strings throughout these interfaces. Regular Python strings of class str will be converted to Unicode automatically, but the automatic conversion may not do the right thing.