<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
  [
    <!ENTITY litlxml   "<code>litlxml</code>">
    <!ENTITY selfURL
      "http://www.nmt.edu/tcc/help/lang/python/examples/litlxml/">
  ]
>
<article>
  <articleinfo>
    <title>A source extractor for lightweight literate programming</title>
    <titleabbrev>
      &litlxml;: A literate source extractor
    </titleabbrev>
    <authorgroup>
      <author>
        <firstname>John W.</firstname>
        <surname>Shipman</surname>
      </author>
    </authorgroup>
    <address><email>tcc-doc@nmt.edu</email>
    </address>
    <revhistory>
      <revision>
        <revnumber>$Revision: 1.9 $</revnumber>
        <date>$Date: 2009/10/10 22:30:47 $</date>
      </revision>
    </revhistory>
    <abstract>
      <para>
        Describes a script that extracts the source code from a
        program presented in lightweight literate programming
        form, using the DocBook documentation toolchain, the
        Python programming language, and the <code >lxml</code >
        module for XML processing.
      </para>
      <para>
        This publication is available in <ulink url="&selfURL;"
        >Web form</ulink > and also as a <ulink
        url="&selfURL;litlxml.pdf" >PDF document</ulink >.
        Please forward any comments to <userinput
        >tcc-doc@nmt.edu</userinput >.
      </para>
    </abstract>
  </articleinfo>
  <section id="intro">
    <title>Introduction</title>
    <blockquote>
      <attribution>
        <citetitle>Structure and interpretation of computer
        programs</citetitle>, Harold Abelson and Gerald Jay
        Sussman, p. xvii
      </attribution>
      <para>
        Programs must be written for people to read, and only
        incidentally for machines to execute.
      </para>
    </blockquote>
    <para>
      By literate programming, we mean programs that are intended
      to be readable.  The idea comes from Dr. Donald
      E. Knuth and has a long history.  For background, see the
      <ulink url='http://www.literateprogramming.com/'>Literate
      Programming web site</ulink>.
    </para>
    <para>
      Knuth's <code>cweb</code> system interwove
      narrative about the program with the actual source code of
      the program.  One then runs a tool named
      <code>ctangle</code> to generate the source code
      an a different tool named <code>cweave</code> to
      generate the online documentation.
    </para>
    <para>
      The present effort was inspired by similar efforts of
      <ulink url='http://www.nmt.edu/~al/'>Dr. Allan
      M. Stavely</ulink>, who suggested using DocBook as a
      general framework for literate programming.  Refer to
      <ulink url='http://www.nmt.edu/tcc/help/pubs/docbook42/'
      ><citetitle>Writing documentation with DocBook-XML
      4.2</citetitle></ulink> for more information on DocBook.
    </para>
    <para>
      Stavely's idea was to use DocBook's existing
      <code>programlisting</code> element to hold the
      program fragments, adding a
      <code>role='executable'</code> attribute to that
      element to distinguish executable source code from other
      uses of the <code>programlisting</code> element.
      This means that the regular processing of DocBook into HTML
      and PDF forms becomes the new equivalent of Knuth's
      <code>cweave</code> step.
    </para>
    <para>
      The remaining half of the problem, the extraction of the
      executable code from the DocBook source file, is the
      subject of this document.
    </para>
    <para>
      The &litlxml; script is embedded in this document.
      Relevant online files include:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          <ulink url='&selfURL;litlxml' >The &litlxml; script
          itself.</ulink >
        </para>
      </listitem>
      <listitem>
        <para>
          <ulink url='&selfURL;litlxml.xml' >The DocBook source
          for this document.</ulink >
        </para>
      </listitem>
    </itemizedlist>
  </section>
  <section id="encoding">
    <title>Encoding the literate program</title>
    <para>
      One limitation of Stavely's approach was that it assembled
      all the executable code fragments into a single file for
      execution.  But the literate exposition of a C program, for
      example, might require the discussion of two source files, a
      header file named <filename>foo.h</filename> and a code
      file named <filename>foo.c</filename>.  We get around this
      problem by using the <code>role</code> attribute
      of the <code>programlisting</code> element in a
      more flexible way.
    </para>
    <para>
      The general form of a literate program source is a valid
      DocBook-XML file, except that each fragment of executable
      code is wrapped in a <code>programlisting</code>
      element with this general format:
      <programlisting
>&lt;programlisting role='outFile:<replaceable>F</replaceable>'&gt;
  (source text)
&lt;/programlisting&gt;
</programlisting>
      where <code><replaceable>F</replaceable></code>
      is the name of the output file to which that source text
      should be written.
    </para>
    <para>
      We can then handle the above example by using a
      <code>role='outFile:foo.h'</code> attribute on
      fragments of the header file and a
      <code>role='outFile:foo.c'</code> attribute on
      fragments of the code file.  For example:
    </para>
    <programlisting
>&lt;programlisting role='outFile:foo.h'&gt;
  (stuff to be written to foo.h)
&lt;/programlisting&gt;
   ...
&lt;programlisting role='outFile:foo.c'&gt;
  (stuff to be written to foo.c)
&lt;/programlisting&gt;
</programlisting>
    <para>
      Of course, either of those files can be broken into many
      fragments spread throughout the document.  They can even be
      intermingled.
    </para>
    <para>
      There are two important refinements to mention:
    </para>
    <itemizedlist>
      <listitem>
        <para>
          You can use a CDATA section to enclose the source
          fragment.  This XML convention uses special delimiters
          to tell processing programs not to mess with anything
          between
          &#x201c;<code>&lt;![CDATA[</code>&#x201d; and
          &#x201c;<code>]]&gt;</code>&#x201d;.  This is
          especially convenient for enclosing XML fragments,
          because you can use
          &#x201c;<code>&lt;</code>&#x201d; and
          &#x201c;<code>&gt;</code>&#x201d; characters
          without having to escape them.
        </para>
      </listitem>
      <listitem>
        <para>
          If your text is not enclosed in a CDATA section, you
          can use DocBook tags inside the
          <code>programlisting</code> element.
        </para>
        <para>
          For example, you can enclose a function call inside a
          <code>link</code> element that links to the
          definition of that function.  In both the HTML and PDF
          generated from the DocBook file, that function name
          will then be clickable.
        </para>
        <para>
          Another element you might want to use inside a code
          fragment is the <code>co</code> element, to
          label lines of the code with callouts that are defined
          later inside DocBook <code>callout</code> elements.
        </para>
      </listitem>
    </itemizedlist>
    <para>
      Here's an example of the use of callouts, as it would be
      encoded in the DocBook source.  This is from the exposition
      of a schema using <ulink
      url="http://www.nmt.edu/tcc/help/pubs/rnc/">Relax NG
      Compact Format (RNC)</ulink>.
    </para>
    <programlisting
><![CDATA[      <programlisting role='outFile:trails.rnc'>
park = element park
{ attribute name { text }?,   <co id='park.name'>
  trail*                      <co id='park.trail'>
}
</programlisting>
      <calloutlist>
        <callout arearefs='park.name'>
          <para>
            This optional attribute contains the name of the park.
          </para>
        </callout>
        <callout arearefs='park.trail'>
          <para>
            The content of a <code>park</code> element
            consists of one or more <code>trail</code>
            elements.
          </para>
        </callout>
      </calloutlist>]]>
    </programlisting>
  </section>
  <section id="operation">
    <title>Operation of the &litlxml; script</title>
    <para>
      A script in the Python language extracts the various output files from
      DocBook source files.  Command line arguments are:
    </para>
    <programlisting
>litlxml <replaceable>file</replaceable> ...
</programlisting>
    <para>
      Each DocBook-XML source file named on the command line is
      read, and all the <code>programlisting</code>
      elements with the correct <code>role</code>
      attribute are assembled and written to the corresponding
      files.
    </para>
    <section id="makefile">
      <title>Suggested <code>Makefile</code> rules</title>
      <para>
        If you are using the Unix <application >make</application
        > utility to build your document and source files, you
        can add lines to your <code>Makefile</code> to take care
        of building the program source files.
      </para>
      <para>
        The exact rules depend on whether your literate programs
        are executable or not.  We'll assume that both executable
        and non-executable programs are produced, and that the
        variable <code >BASENAME</code > is the name of your
        DocBook file minus its &#x201c;<code >.xml</code
        >&#x201d; extension.  below.
      </para>
      <para>
        First, define three variables like this:

        <programlisting
>
MODULES        =  <replaceable >m1 m2 ...</replaceable >
EXECUTABLES    =  <replaceable >e1 e2 ...</replaceable >
CODE_TARGETS   =  $(EXECUTABLES) $(MODULES)
</programlisting>

        where 
        <code ><replaceable >m1</replaceable ></code >, <code
        ><replaceable >m2</replaceable ></code >, and so forth
        are the names of non-executable files, and
        <code ><replaceable >e1</replaceable ></code >,
        <code ><replaceable >e2</replaceable ></code >, and so on
        are the names of your executable files.
      </para>
      <para>
        Then, in the rules part of your <filename
        >Makefile</filename >, change the first (default) target
        to read like this:
        <programlisting
>all: web pdf code
</programlisting>
      </para>
      <para>
        Add these rules:
      </para>
      <programlisting
>code: $(CODE_TARGETS)

$(CODE_TARGETS): $(BASENAME).xml
        litlxml $&lt;; \
        chmod +x $(EXECUTABLES)
</programlisting>
      <para>
        If no executables are produced, change the latter rule
        to:

        <programlisting
>$(CODE_TARGETS): $(BASENAME).xml
        litlxml $&lt;
</programlisting>
      </para>
      <para>
        A model <code >Makefile</code > is online at <code><ulink
        url='http://www.nmt.edu/tcc/doc/docbook43/user-kit/lit-Makefile'
        /></code>.
      </para>
    </section> <!--End makefile-->
  </section>
  <section id="source">
    <title>Literate exposition of the &litlxml; program
    itself</title>
    <para>
      The &litlxml; program is worth study as an example not
      only of literate programming but also of how easy it is to
      process XML files in Python.
    </para>
    <section id="design-notes">
      <title>Design notes</title>
      <para>
        This program has gone through several rewrites as
        techniques for XML processing in Python have evolved.
      </para>
      <para>
        The current version uses the <code >lxml</code > package.
        For more details, see the <ulink
        url='http://codespeak.net/lxml/' ><code >lxml</code >
        homepage</ulink >.  This package yields much higher
        performance than earlier approaches:
      </para>
      <itemizedlist>
        <listitem>
          <para>
            The Document Object Model (DOM) is designed to be
            language-independent, so it is not terribly Pythonic
            in its processing model.  The stock <code
            >xml.minidom</code > package is also quite slow,
            especially with large XML files.  See the <ulink
            url='http://docs.python.org/lib/module-xml.dom.minidom.html'
            >documentation for <code >minidom</code ></ulink >.
          </para>
        </listitem>
        <listitem>
          <para>
            Serial processing with SAX (Simple API for XML) is
            faster, but messier.  In SAX code, you must set up
            callbacks that are called as tags or content go by in
            a serial pass.  See the <ulink
            url='http://docs.python.org/lib/module-xml.sax.html'
            >documentation for Python's SAX module</ulink >.
          </para>
        </listitem>
      </itemizedlist>
      <para>
        This program was written using the Cleanroom or
        zero-defect methodology.  The best introduction to the
        method is given in Stavely, Allan M., <citetitle>Toward
        Zero-defect Programming</citetitle>, Addison-Wesley,
        1999, ISBN 0-201-38595-3.  Also see
        <ulink url="http://www.nmt.edu/~shipman/soft/clean">my
        Cleanroom pages</ulink> for a discussion of how I
        practice the methodology.
      </para>
    </section> <!--End design-notes-->
    <section id="prologue">
      <title>The prologue</title>
      <para>
        The script starts with the usual Python prologue.  The
        first line makes the script self-executing.  This is
        followed by minimal comments pointing to the online form
        of the literate programming document, and the Cleanroom
        intended function for the program as a whole.
      </para>
      <!--NB: It is critical to avoid a blank line at the
       !  beginning of the script, hence the unusual position
       !  of the closing '>' at the end of the next tag:
       !-->
      <programlisting role='outFile:litlxml'
>#!/usr/bin/env python
#================================================================
# litlxml:  Extract code from literate-programming source files.
#   For documentation, see:
#       http://www.nmt.edu/tcc/help/lang/python/examples/litlxml/
#----------------------------------------------------------------
# Overall intended function:
#   [ output files named in input files given on the command line
#         :=  code fragments designated for those files
#     sys.stderr  +:=  error messages if any ]
#----------------------------------------------------------------
</programlisting>
    </section> <!--End prologue-->
    <section id="imports">
      <title>Modules required</title>
      <para>
        Aside from the standard Python <code>sys</code>
        module that gives programs access to their standard I/O
        streams and command line arguments, the program needs
        the <code >etree</code > library from the <code
        >lxml</code > module.
      </para>
      <programlisting role='outFile:litlxml'
>import sys
from lxml import etree
</programlisting>
    </section> <!--End imports-->
    <section id="globals">
      <title>Global declarations</title>
      <para>
        These manifest constants are defined globally.
      </para>
      <variablelist>
        <varlistentry>
          <term>
            <code >PROG_ELT</code >
          </term>
          <listitem>
            <para>
              The element for the <code
              >programlisting</code > element.
            </para>
            <programlisting role='outFile:litlxml'
>#================================================================
# Manifest constants
#----------------------------------------------------------------

PROG_ELT     =  "programlisting"
</programlisting>
          </listitem>
        </varlistentry>
        <varlistentry>
          <term>
            <code >ROLE_ATTR</code >
          </term>
          <listitem>
            <para>
              The name of the <code >role</code >
              attribute.
              <programlisting role='outFile:litlxml'
>ROLE_ATTR    =  "role"
</programlisting>
            </para>
          </listitem>
        </varlistentry>
        <varlistentry>
          <term>
            <code >ROLE_PREFIX</code >
          </term>
          <listitem>
            <para>
              The prefix of the <code >role</code >
              attribute that identifies this <code
              >programlisting</code > element as a code
              fragment.
              <programlisting role='outFile:litlxml'
>ROLE_PREFIX  =  "outFile:"
</programlisting>
            </para>
          </listitem>
        </varlistentry>
      </variablelist>
    </section> <!--End globals-->
    <section id='veri-functions'>
      <title>Verification functions</title>
      <para>
        In the Cleanroom methodology, a verification function is
        a shorthand notation for describing various program
        entities.  The author's preference is to use names for
        these functions that contain a hyphen (&#x201c;<code
        >-</code >&#x201d;), so that it is clear that these are
        not Python functions.
      </para>
      <para>
        Our first verification function is <code >lit-elt</code
        >: an XML element that contains literate code.
      </para>
      <programlisting role='outFile:litlxml'
>#================================================================
# Verification functions
#----------------------------------------------------------------
# lit-elt  ==  an XML element whose GID is PROG_ELT, and which
#    has an attribute ROLE_ATTR whose value starts with
#    ROLE_PREFIX
</programlisting>
      <para>
        Next is the <code >lit-dest</code > function.  This
        describes the destination file to which a literate
        fragment is to be written.
      </para>
      <programlisting role='outFile:litlxml'
>#----------------------------------------------------------------
# lit-dest(elt)  ==  the part of the ROLE_ATTR value after
#    ROLE_PREFIX in a lit-elt
</programlisting>
      <para>
        The <code >lit-content</code > verification function
        describes the text inside the literate fragment.  Note
        that literate code can contain XML tags: in some of the
        author's source code, the DocBook <code >link</code > tag
        is used so that the name of a called function or method
        is a link to the definition of that function or method.
        However, the source text should not include any tags.
      </para>
      <programlisting role='outFile:litlxml'
>#----------------------------------------------------------------
# lit-content(elt)  ==  The text content of element (elt) and
#    any descendants
#----------------------------------------------------------------
</programlisting>
    </section> <!--veri-functions-->
    <section id="main">
      <title>The main program</title>
      <para>
        The only thing the main does is iterate over the list of
        files given as command line arguments, processing each
        one in turn by calling <xref linkend='processFile' />.
      </para>
      <programlisting role='outFile:litlxml'
># - - - - -   m a i n

def main():
    """Main program for litlxml."""

    #-- 1 --
    for inFileName in sys.argv[1:]:
        #-- 1 body --
        # [ if inFileName names a readable, valid DocBook XML file ->
        #     output files named in that file  :=  code fragments
        #       designated for those files
        #     sys.stderr  +:=  error messages from processing that file,
        #                      if any
        #   else ->
        #     sys.stderr  +:=  error message ]
        processFile ( inFileName )
</programlisting>
    </section> <!--End main-->
    <section id="processFile">
      <title>
        <code>processFile()</code>: Process one input file
      </title>
      <para>
        This function handles all the processing for one DocBook
        source file.
      </para>
      <programlisting role='outFile:litlxml'
># - - -   p r o c e s s F i l e

def processFile ( fileName ):
    """Process one input file.

      [ inFileName is a string ->
          if inFileName names a readable, valid DocBook XML file ->
            output files named in lit-elts from that file  :=
              lit-content of those lit-elts
            sys.stderr  +:=  error messages from processing that file,
                             if any
          else ->
            sys.stderr  +:=  error message ]
    """
</programlisting>
      <para>
        The fragments in a given file may be directed to several
        different output files.  To keep track of the output
        files we have seen so far, we'll use a dictionary named
        <code >fileMap</code >, whose keys are file names, and
        each corresponding value is an open, writeable file
        handle for that file.  We'll write the text to each file
        as it is encountered, and leave all the files open until
        the end, at which point we'll close them all.
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 1 --
    fileMap  =  {}
</programlisting>
      <para>
        Next we call the <code >etree</code > package to parse
        the XML file and make it into an element tree.  This may
        raise either of two exceptions:
      </para>
      <itemizedlist>
        <listitem>
          <para>
            If the file can't even be opened, it will raise an
            <code >IOError</code > exception.
          </para>
        </listitem>
        <listitem>
          <para>
            If the file is not well-formed XML, the <code
            >etree</code > package will raise its <code
            >XMLSyntaxError</code > exception.
          </para>
        </listitem>
      </itemizedlist>
      <programlisting role='outFile:litlxml'
>    #-- 2 --
    # [ if fileName names a readable, valid XML file ->
    #     doc  :=  an ElementTree representing that file
    #   else ->
    #     sys.stderr  +:=  error message(s)
    #     return ]
    try:
        doc  =  etree.parse ( fileName )
    except IOError, detail:
        print >>sys.stderr, ( "*** I/O error opening '%s': %s" %
                              (fileName, detail) )
        return
    except etree.XMLSyntaxError, detail:
        print >>sys.stderr, ( "*** Syntax error opening '%s': %s" %
                              (fileName, detail) )
        return
</programlisting>
      <para>
        Now that <code >doc</code > contains the document tree,
        send it off for processing to <xref linkend='processDoc' />.
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 3 --
    # [ (doc is an etree Document) and
    #   (fileMap is a dictionary whose keys are file names and
    #   each corresponding value is a writeable file handle
    #   for that file) ->
    #     fileMap  :=  fileMap with new file names added from
    #         lit-dests in doc
    #     file handles in fileMap  :=  lit-content of those files
    #     sys.stderr  +:=  error messages from processing doc,
    #         if any ]
    processDoc ( fileMap, doc )
</programlisting>
      <para>
        Finally, we close all the output files that are values in
        the <code >fileMap</code > dictionary.
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 4 --
    # [ fileMap is a dictionary whose values are file objects ->
    #     those values  :=  those values, closed ]
    for  outFile in fileMap.values():
        outFile.close()
</programlisting>
    </section> <!--processFile-->
    <section id='processDoc'>
      <title><code >processDoc()</code >: Process one document
      tree</title>
      <para>
        Given a document tree, this function finds all the
        literate program elements, attempts to open output files
        for them if they are not already open, and writes the
        code fragments to those files.
      </para>
      <programlisting role='outFile:litlxml'
># - - -   p r o c e s s  D o c

def processDoc ( fileMap, doc ):
    """Process one document tree

      [ (fileMap is a dictionary whose keys are file names and
        each corresponding value is a writeable file handle
        for that file) and
        (doc is an etree.ElementTree) ->
          fileMap  :=  fileMap with new output files added from
              lit-elts in doc whose lit-dests can be opened anew
          files named in fileMap  :=  lit-content of those files
          sys.stderr  +:=  error messages for lit-dests that
              cannot be opened for output, if any ]
    """
</programlisting>
      <para>
        To find the root element of <code >doc</code >, we use
        its <code >.getroot()</code > method.  Then we use an
        XPath expression to find all the <code >PROG_ELT</code >
        elements.  The XPath expression <code
        >"//programlisting"</code > means to find all <code
        >programlisting</code > elements no matter where they are
        in the tree; it returns a list of matching elements.
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 1 --
    # [ eltList  :=  a list of all the PROG_ELT elements in doc,
    #                in document order ]
    root  =  doc.getroot()
    eltList  =  root.xpath ( "//" + PROG_ELT )
</programlisting>
      <para>
        For each potentially literate element, we look to see if
        it has a <code >ROLE_ATTR</code > attribute, and if so,
        whether that attribute's value starts with <code
        >ROLE_PREFIX</code >.  If so, it is a literate element,
        and is sent for processing to <xref linkend='processElt'
        />.
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 2 --
    # [ fileMap  :=  fileMap with new file names added from lit-dests
    #                in eltList whose lit-dest files could be opened
    #   files named in fileMap  :=  lit-content of those files
    #   sys.stderr  +:=  error messages from failures to open
    #       those files, if any ]
    for  elt in eltList:
        #-- 2 body --
        # [ if  (elt is a lit-elt) and
        #   (lit-dest(elt) is a key in fileMap) ->
        #     that value from fileMap  +:=  lit-content(elt)
        #   else if (elt is a lit-elt) and
        #   (lit-dest(elt) is not a key in fileMap) and
        #   (a new file named lit-dest(elt) can be opened for
        #   writing) ->
        #     fileMap[lit-dest(elt)]  :=  that new file
        #     that new file  +:=  lit-content(elt)
        #   else if (elt is a lit-elt) and
        #   (lit-dest(elt) is not a key in fileMap) and
        #   (a new file named lit-dest(elt) cannot be opened for
        #   writing) ->
        #     sys.stderr  +:=  error message
        #   else -> I ]
</programlisting>
      <para>
        Several conditions must be met for a literate element.
        There must be a <code >ROLE_ATTR</code > attribute; if
        not, trying to extract the element's <code >.attrib</code
        > dictionary's value will raise <code >KeyError</code >.
        If there is such an attribute, it must start with <code
        >ROLE_PREFIX</code >; if it does, the output file name is
        the rest of the attribute after that prefix.
      </para>
      <programlisting role='outFile:litlxml'
>        try:
            attrValue  =  elt.attrib[ROLE_ATTR]
            if  attrValue.startswith ( ROLE_PREFIX ):
                outName  =  attrValue[len(ROLE_PREFIX):]
                processElt ( fileMap, outName, elt )
        except KeyError:
            pass
</programlisting>
    </section> <!--processDoc-->
    <section id='processElt'>
      <title><code >processElt()</code >: Process one literate
      element</title>
      <para>
        This function takes three arguments:
      </para>
      <orderedlist>
        <listitem>
          <para>
            The <code >fileMap</code > is the dictionary whose
            keys are the names of output files we've already
            seen, and each corresponding value is a writeable
            file handle for that file.
          </para>
        </listitem>
        <listitem>
          <para>
            The <code >outName</code > is the name of the output
            file.
          </para>
        </listitem>
        <listitem>
          <para>
            The <code >elt</code > is the actual literate
            element, as an <code >etree.Element</code > instance.
          </para>
        </listitem>
      </orderedlist>
      <programlisting role='outFile:litlxml'
># - - -   p r o c e s s E l t

def processElt ( fileMap, outName, elt ):
    """Process one element that may be literate.

      [ (fileMap is a dictionary whose keys are file names and
        each corresponding value is a writeable file handle
        for that file) and
        (outName is a file name as a string) and
        (elt is an etree.Element) ->
          if fileMap has a key (outName) ->
            fileMap[outName]  +:=  text of elt
          else if outName can be opened new for writing ->
            fileMap[outName]  :=  that file, so opened
            that file  :=  text of elt
          else ->
            sys.stderr  +:=  error message(s) ]
    """
</programlisting>
      <para>
        First we check to see if this is a new output file.  If
        so, we try to open it for writing.  This can fail, in
        which case we'll need to send an error message to the
        standard error stream, and return prematurely.
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 1 --
    # [ if outName is a key of fileMap ->
    #     I
    #   else if outName can be opened new for writing ->
    #     fileMap[outName]  :=  that file, so opened
    #   else ->
    #     sys.stderr  +:=  error message(s)
    #     return ]
    if  not fileMap.has_key(outName):
        try:
            fileMap[outName]  =  open ( outName, "w" )
        except IOError, detail:
            print >>sys.stderr, ( "*** Can't open '%s': %s" %
                                  (outName, detail) )
            return
</programlisting>
      <para>
        At this point we have a destination file handle, <code
        >fileMap[outName]</code >.  We use another XPath
        expression to find all the text descendants of <code
        >elt</code >.  In this expression, the &#x201c;<code
        >descendant-or-self::</code >&#x201d; part is an
        <firstterm >axis specifier</firstterm > that selects
        <code >elt</code >, its children, its children's
        children, and so forth all the way to the leaves of the
        document tree.  The XPath &#x201c;<code >text()</code
        >&#x201d; function selects only text nodes (as opposed to
        element nodes).
      </para>
      <programlisting role='outFile:litlxml'
>    #-- 2 --
    # [ textList  :=  a list of all text descendants of elt ]
    textList  =  elt.xpath ( "descendant-or-self::text()" )

    #-- 3 --
    # [ fileMap[outName]  +:=  elements of textList, concatenated ]
    fileMap[outName].write ( "".join ( textList ) )
</programlisting>
    </section> <!--processElt-->
    <section id="epilogue">
      <title>Epilogue</title>
      <para>
        Rather than placing the main at the end of the script, we
        defined it above (<xref linkend='main' />) as a function
        <code >main()</code > so that the code can be
        presented in top-down order.
      </para>
      <para>
        The lines below cause <code >main()</code >
        to be called, assuming that &litlxml; is the main
        script.  Python sets global variable <code
        >__name__</code > to the string <code
        >'__main__'</code > for the outermost script.
      </para>
      <programlisting role='outFile:litlxml'
># - - - - -   e p i l o g u e   - - - - -

if  __name__ == '__main__':
    main()
</programlisting>
    </section> <!--End epilogue-->
  </section>
</article>
