Next / Previous / Contents / TCC Help System / NM Tech homepage

4. Literate exposition of the litlxml program itself

The litlxml program is worth study as an example not only of literate programming but also of how easy it is to process XML files in Python.

4.1. Design notes

This program has gone through several rewrites as techniques for XML processing in Python have evolved.

The current version uses the lxml package. For more details, see the lxml homepage. This package yields much higher performance than earlier approaches:

  • The Document Object Model (DOM) is designed to be language-independent, so it is not terribly Pythonic in its processing model. The stock xml.minidom package is also quite slow, especially with large XML files. See the documentation for minidom.

  • Serial processing with SAX (Simple API for XML) is faster, but messier. In SAX code, you must set up callbacks that are called as tags or content go by in a serial pass. See the documentation for Python's SAX module.

This program was written using the Cleanroom or zero-defect methodology. The best introduction to the method is given in Stavely, Allan M., Toward Zero-defect Programming, Addison-Wesley, 1999, ISBN 0-201-38595-3. Also see my Cleanroom pages for a discussion of how I practice the methodology.