Next / Previous / Contents / Shipman's homepage

4.1. Design notes

This program has gone through several rewrites as techniques for XML processing in Python have evolved.

The current version uses the xml.etree.ElementTree module, part of the standard Python library; see the online documentation for that module.

Version 1.0 used the lxml package. For more details, see the Python processing with lxml. This package yields much higher performance than earlier approaches, such as the Document Object Model (DOM). However, in March 2015, after many long years of faithful service, this package started intermittently omitting entity references in extracted code. The author has not seen this behavior with xml.etree.ElementTree.

This program was written using the Cleanroom or zero-defect methodology. The best introduction to the method is:

Stavely, Allan M. Toward Zero-defect Programming. Addison-Wesley, 1999, ISBN 0-201-38595-3.

See also the author's Cleanroom pages for a discussion of methods and dozens of examples. The author uses one minor notation variant nowadays: lines of the intended function starting with “P:” are preconditions.

This program is based on an earlier version that works with DocBook-XML 4.3: litlxml: A source extractor for lightweight literate programming, the production version for Writing documentation with DocBook-XML 4.3.