This program has gone through several rewrites as techniques for XML processing in Python have evolved.
The current version uses the
xml.etree.ElementTree module, part of the standard
Python library; see the online
documentation for that module.
Version 1.0 used the
lxml package. For more
details, see the Python processing with
lxml. This package yields much higher performance than
earlier approaches, such as the Document Object Model (DOM).
However, in March 2015, after many long years of faithful
service, this package started intermittently omitting entity
references in extracted code. The author has not seen this
This program was written using the Cleanroom or zero-defect methodology. The best introduction to the method is:
Stavely, Allan M. Toward Zero-defect Programming. Addison-Wesley, 1999, ISBN 0-201-38595-3.
See also the author's Cleanroom
pages for a discussion of methods and dozens of
examples. The author uses one minor notation variant
nowadays: lines of the intended function starting with
P:” are preconditions.
This program is based on an earlier version that works with DocBook-XML 4.3: litlxml: A source extractor for lightweight literate programming, the production version for Writing documentation with DocBook-XML 4.3.