Next / Previous / Contents / TCC Help System / NM Tech homepage

4.4.  processFile: Process one input file

In Python scripts, the main comes at the end; see Section 4.10, “The main”. The remaining functions are presented in top-down order.

The processFile() function handles all the processing for one DocBook source file.


# - - -   p r o c e s s F i l e   - - -

def processFile ( inFileName ):
    """Process one input file.

      [ inFileName is a string ->
          if inFileName names a readable, valid DocBook XML file ->
            output files named in that file  :=  code fragments
              designated for those files
            sys.stderr  +:=  error messages from processing that file,
                             if any
          else ->
            sys.stderr  +:=  error message ]
    """
      

The first step is to open the input file, and report errors if that fails.

    #-- 1 --
    # [ if inFileName names a readable file ->
    #     inFile  :=  that file opened for reading
    #   else ->
    #     sys.stderr  +:=  error message
    #     return ]
    try:
        inFile  =  open ( inFileName )
    except IOError, detail:
        sys.stderr.write ( "*** Can't open file '%s' for reading: %s\n" %
            (inFileName, detail) )
        return
      

The next step is to pass this stream to the PyExpat parser. This rather arcane process is covered in the O'Reilly book on page 113. The result, dom, is the entire input document as a DOM tree.

    #-- 2 --
    # [ if inFile contains a readable, well-formed XML file ->
    #     dom  :=  a DOM [Document Object Model] representation of
    #       that file
    #   else ->
    #     sys.stderr  +:=  error message
    #     return ]
    #--
    # NB: The actual exception returned is xml.parsers.expat.ExpatError,
    # but a catchall exception should suffice.
    #--
    try:
        reader = PyExpat.Reader()
        dom  =  reader.fromStream ( inFile )
    except Exception, detail:
        sys.stderr.write ( "*** Can't parse file '%s': %s\n" %
            (inFileName, detail) )
        return        
      

Next, we extract from the document tree all of the programlisting elements that have a role attribute. (Some of the role attributes may not be properly formed, but we'll deal with those later.)

It saves us a tremendous amount of work to use XPath to find all these elements and assemble them into a single node list. The Evaluate() function of the xpath module takes an XPath expression and applies it to a node from the DOM tree. The XPath expression we use has these components:

    #-- 3 --
    # [ fragList  :=  a node-set containing all of the programlisting
    #       elements from dom that have role attributes ]
    path  =  "//programlisting[@%s]" % ROLE_ATTR
    fragList  =  xpath.Evaluate ( path, dom.documentElement )
      

Now that we have assembled all the code fragments as a list of programlisting element nodes, we call the writeAllFiles() function to find the set of output files and then write them.

    #-- 4 --
    # [ files named in role attributes of fragList  :=
    #       text descendants of corresponding nodes in fragList ]
    writeAllFiles ( fragList )