The litlxml program is worth study as an example not
only of literate programming but also of how easy it is to
process XML files in Python.
This program has gone through several rewrites as techniques for XML processing in Python have evolved.
The current version uses the lxml package.
For more details, see the lxml
homepage. This package yields much higher
performance than earlier approaches:
The Document Object Model (DOM) is designed to be
language-independent, so it is not terribly Pythonic
in its processing model. The stock xml.minidom package is also quite slow,
especially with large XML files. See the documentation for minidom.
Serial processing with SAX (Simple API for XML) is faster, but messier. In SAX code, you must set up callbacks that are called as tags or content go by in a serial pass. See the documentation for Python's SAX module.
This program was written using the Cleanroom or zero-defect methodology. The best introduction to the method is given in Stavely, Allan M., Toward Zero-defect Programming, Addison-Wesley, 1999, ISBN 0-201-38595-3. Also see my Cleanroom pages for a discussion of how I practice the methodology.