The litsource program is worth study as an example not only of literate programming but also of how easy it is to process XML files in Python.
Here are some good resources for learning about Python's XML libraries:
See the Python web site for information and downloads for the Python language.
See the documentation for the W3C Document Object Model (DOM) for general information about the representation of XML documents as trees.
See XML Path Language for information about XPath, a handy notation for describing the location of nodes in DOM trees.
Python & XML, by Christopher A. Jones and Fred L. Drake, Jr. (O'Reilly Press, 2002, ISBN 0-596-00128-2) is an excellent overview of the major approaches, with copious examples.
The PyXML development page has downloads and installation instructions.
This program was written using the Cleanroom or zero-defect methodology. The best introduction to the method is given in Stavely, Allan M., Toward Zero-defect Programming, Addison-Wesley, 1999, ISBN 0-201-38595-3. Also see my Cleanroom pages for a discussion of how I practice the methodology.
The script starts with the usual Python prologue. The first line makes the script self-executing. This is followed by minimal comments pointing to the online form of the literate programming document, and the Cleanroom intended function for the program as a whole.
#!/usr/bin/env python
#================================================================
# litsource: Extract code from literate-programming source files.
# For documentation, see:
# http://www.nmt.edu/tcc/help/lang/python/examples/litsource/
#----------------------------------------------------------------
# Overall intended function:
# [ output files named in input files given on the command line
# := code fragments designated for those files
# sys.stderr +:= error messages if any ]
#----------------------------------------------------------------
|