Next / Previous / Contents / TCC Help System / NM Tech homepage

4. pyrang internals

Here, in lightweight literate programming (LLP) form, is the actual code for pyrang. For more on LLP, see the author's LLP page.

4.1. Code prologue

The actual script starts with the traditional “pound-bang line” to make the script self-executing under Unix. This is followed by a comment that points back to this document.

pyrang
#!/usr/bin/env python
#================================================================
# pyrang:  Writes Python declarations for RNC schema names.
#   For documentation and code, see:
#     http://www.nmt.edu/tcc/help/lang/python/examples/pyrang/
#----------------------------------------------------------------

4.2. Module imports

First we'll need to import the standard Python sys module, which will give us access to the command line arguments and the standard I/O streams.

pyrang
#================================================================
# Imports
#----------------------------------------------------------------

import sys

Command line argument processing is done with the sysargs.py module; see the author's Python library page.

pyrang
from sysargs import *

To read the RNG schema file, we need the Parse function from the 4Suite package.

pyrang
from Ft.Xml import Parse

4.3. Manifest constants

Here we define symbolic constants for two element names and an attribute name from the Relax NG schema document type.

pyrang
ELEMENT_N  =  'element'
ATTRIBUTE_N  =  'attribute'
NAME_A  =  'name'

4.4. main(): The main program

The purpose of the script is to extract all the element and attribute names from the input schema, and write a Python module containing assignment statements that define a symbolic name for each unique name. For example, if the schema has an element named “player” that has an attribute “first-name”, we'll want to generate two Python assignments from those names:

PLAYER_N  =  'player'
FIRST_NAME_A  =  'first-name'

The obvious data structure for accumulating these names is a Python dictionary; we'll call it nameTable. We'll use the Python name (e.g., FIRST_NAME_A) as the key, and the XML name (e.g., first-name) as the value, in each dictionary entry.

The same attribute name may occur in more than one element. In that case, we'll want to avoid writing duplicate declarations. Because dictionary key values must be unique, we can just throw each name we find into the dictionary, and at the end , there will be no duplications.

Here is the overall program flow:

  1. Process the command line arguments, and figure out whether the output is going to a file or to standard output.

    We'll define a small class named Args to encapsulate the processing and representation of the command line arguments. See Section 4.11, “class Args: Command line argument object”.

  2. The 4Suite Parse() function reads the RNG schema document and gives it to us as a DOM tree.

  3. We create nameTable as an empty dictionary, then walk the DOM tree recursively looking for element and attribute names. Each name we find is stored into nameTable.

  4. Finally, each key-value pair in nameTable is written to the output as a Python assignment statement.

The code follows. First, we process the command line arguments; see Section 4.11, “class Args: Command line argument object”.

pyrang
def main():
    """Main program
    """

    #-- 1 --
    # [ if sys.argv contains valid command line arguments ->
    #     args  :=  an Args instance representing those arguments
    #   else ->
    #     sys.stderr  +:=  error message(s)
    #     stop execution ]
    args  =  Args()

This step takes care of reading the input file and building the nameTable dictionary. See Section 4.6, “processInput(): Read the schema”.

pyrang
    #-- 2 --
    # [ args is an Args object ->
    #     if args.inFileName names a readable file containing a
    #     valid RNG schema ->
    #       nameTable  :=  a dictionary whose keys are the Python
    #           manifest constant names for the element and
    #           attribute names in that schema (using args.prefix
    #           as a prefix), and each corresponding value is the
    #           XML name
    #     else ->
    #       sys.stderr  +:=  error message(s)
    #       stop execution ]
    nameTable  =  processInput ( args )

All that remains is to write the output file; see Section 4.10, “writeOutput(): Generate the Python file”.

pyrang
    #-- 3 --
    # [ args is an Args object ->
    #     if args.outFileName is None ->
    #       outFile  :=  sys.stdout
    #     else if args.outFileName names a writeable file ->
    #       outFile  :=  that file opened new for writing
    #     else ->
    #       sys.stderr  +:=  error message(s)
    #       stop execution ]
    if  args.outFileName is None:
        outFile  =  sys.stdout
    else:
        try:
            outFile  =  open ( args.outFileName, "w" )
        except IOError, detail:
            fatal ( "Can't open output file '%s' for writing." %
                    args.outFileName )
    #-- 4 --
    # [ (outFile is a writeable file) and
    #   (nameTable is a dictionary whose keys are Python names
    #   and whose values are XML names ->
    #     outFile  :=  Python statements of the form 'n = v'
    #         for n in the set of keys of nameTable and each v
    #         is the corresponding nameTable value ]
    writeOutput ( args, outFile, nameTable )

4.5. fatal(): Write a message and stop

This function writes a message to the standard error stream and stops execution.

pyrang
def fatal ( *L ):
    """Write a message and terminate.

      [ L is a list of strings ->
          sys.stderr  +:=  (concatenated elements of L)
          stop execution ]
    """
    print >>sys.stderr, "*** Error: %s" % "".join(L)
    sys.exit(1)

4.6. processInput(): Read the schema

This function reads the schema, extracts the element and attribute names, and returns the nameTable dictionary with the Python/XML name pairs.

pyrang
def processInput ( args ):
    """Process the input schema.

      [ args is an Args object ->
          if args.inFileName names a readable file containing a
          valid RNG schema ->
            nameTable  :=  a dictionary whose keys are the Python
                manifest constant names for the element and
                attribute names in that schema (using args.prefix
                as a prefix), and each corresponding value is the
                XML name
          else ->
            sys.stderr  +:=  error message(s)
            stop execution ]
    """

First we create the name table as an empty dictionary.

pyrang
    #-- 1 --
    # [ nameTable  :=  a new, empty dictionary ]
    nameTable  =  {}

Next we attempt to open the input file and hand it to the 4Suite Parse() function to build a DOM tree.

pyrang
    #-- 2 --
    # [ if args.inFileName can be opened for reading ->
    #     inFile  :=  that file, so opened
    #   else ->
    #     sys.stderr  +:=  error message
    #     stop execution ]
    try:
        inFile  =  open ( args.inFileName )
    except IOError, detail:
        fatal ( "Can't open '%s' for reading: %s" %
                (args.inFileName, detail) )

    #-- 3 --
    # [ if inFile contains a valid XML document ->
    #     doc  :=  that document as a DOM tree
    #   else ->
    #     sys.stderr  +:=  error message(s)
    #     stop execution ]
    try:
        doc  =  Parse ( inFile )
    except Exception, detail:
        fatal ( "Can't parse the schema file: %s" % detail )

Walking the tree to find all the element and attribute elements is quite easy with recursion. The function in Section 4.7, “findNames(): Recursive tree walker” is recursive, and finds all the names in a subtree rooted in the node you pass it as its first argument. In a DOM tree, the doc.documentElement attribute is the root Element node of the schema.

pyrang
    #-- 4 --
    # [ (nameTable is a dictionary) and
    #   (args is an Args object) ->
    #     nameTable  :=  nameTable with Python/XML name pairs added
    #         from the subtree rooted at doc.documentElement,
    #         with the names prefixed by args.prefix ]
    findNames ( doc.documentElement, args, nameTable )

Finally, we return the nameTable dictionary we've built.

pyrang
    #-- 5 --
    return nameTable

4.7. findNames(): Recursive tree walker

This function finds all element and attribute names in a given subtree of the schema's DOM tree, and adds entries for each name to the nameTable dictionary.

pyrang
def findNames ( node, args, nameTable ):
    """Find element and attribute names in a subtree.

      [ (node is a DOM Element node) and
        (args is an Args object) and
        (nameTable is a dictionary whose keys are the Python
        names for elements and attributes, and each corresponding
        value is the XML name) ->
          nameTable  :=  nameTable with Python/XML name pairs added
              from the subtree rooted at node, with the names
              prefixed by args.prefix ]
    """

In an RNG schema, we are looking for elements of these two forms:

  <element name="N">...
   <attribute name="N">...

Therefore, all we have to do is check the node's name to see if it is either element or attribute, and in those cases add the name to nameTable. The logic that builds the Python equivalent of the XML name, and prepends args.prefix, is in Section 4.8, “addName(): Add one name to the name table”.

pyrang
    #-- 1 --
    # [ if node.nodeName is ELEMENT_N or ATTRIBUTE_N ->
    #     nameTable  :=  nameTable with an entry added with a
    #         the Python equivalent of node's "name" attribute as
    #         the key, and node's "name" attribute as the value
    #   else -> I ]
    if  node.nodeName == ELEMENT_N:
        eltName  =  node.getAttributeNS ( None, NAME_A )
        addName ( nameTable, args, eltName, "N" )
    elif  node.nodeName == ATTRIBUTE_N:
        attrname  =  node.getAttributeNS ( None, NAME_A )
        addName ( nameTable, args, attrname, "A" )

That takes care of extracting names from node itself. To recursively add names from its subtree, we iterate over node's children.

pyrang
    #-- 2 --
    # [ nameTable  :=  nameTable with new element and attribute
    #       names added from children of node ]
    for  child in node.childNodes:
        findNames ( child, args, nameTable )

4.8. addName(): Add one name to the name table

This function takes care of building the Python equivalent of each XML name and making sure there is an entry for that pair in nameTable.

pyrang
def addName ( nameTable, args, name, suffix ):
    """Add one element or attribute name to nameTable.

      [ (nameTable is a dictionary) and
        (args is an Args object) and
        (name is an XML element or attribute name as a string) and
        (suffix is a string) ->
          nameTable  :=  nameTable with an entry added with
              (args.prefix + (name, uppercased and with each "-"
              replaced by "_") + "_" + suffix) as the key, and
              name as the corresponding value ]
    """

First, we form the Python equivalent of name. Because name comes out of the DOM, it will be a Unicode string, so we use the str() function to convert it to a regular string. The .upper() method uppercases it, and the .translate() method converts hyphens to underbars.

pyrang
    #-- 1 --
    # [ pyName  :=  name, convert to str, uppercased, and with
    #       hyphens converted to underbars, and "_" inserted
    #       at each lowercase->uppercase transition ]
    pyName  =  pythonizeName ( name )

The key consists of pyName, prefixed with args.prefix, with an underbar and the suffix argument appended.

pyrang
    #-- 2 --
    # [ key  :=  args.prefix + pyName + "_" + suffix ]
    key  =  "%s%s_%s" % (args.prefix, pyName, suffix)

Now we are ready to add the new entry to the table.

pyrang
    #-- 3 --
    # [ nameTable  :=  nameTable with an entry whose key=key and
    #       whose value=name ]
    nameTable[key]  =  name

4.9. pythonizeName(): Sanitize an XML name for Python use

This function implements the various rules for converting an XML name.

pyrang
def pythonizeName ( s ):
    """Convert an XML name to its Python equivalent.
    """

The general approach will be to add the translated characters to a list named xlated. We start by creating this list empty.

pyrang
    #-- 1 --
    xlated  =  []

Next we work through the input string, adding each character's worth of content to the xlated list.

pyrang
    #-- 2 --
    # [ xlated  +:=  characters from s, convert to string type,
    #       uppercasing lowercase characters, adding "_" at each
    #       lowercase->uppercase transition, and converting "-"
    #       to "-" ]
    for  i in range(len(s)):
        #-- 2 body --
        # [ if s[i] == '-' ->
        #     xlated  +:=  ['_']
        #   else if s[i] is lowercase ->
        #     xlated  +:=  [str(s[i]), uppercased]
        #   else if (s[i] is uppercase) and (i>0) and
        #   (s[i-1] is lowercase) ->
        #     xlated  +:=  ["-", str(s[i])]
        #   else ->
        #     xlated  +:=  [str(s[i])] ]

        #-- 2.1 --
        c  =  str(s[i])

        #-- 2.2 --
        if  c == '-':
            xlated.append ( '_' )
        elif  c.islower():
            xlated.append ( c.upper() )
        elif  c.isupper():
            if  ( ( i > 0 ) and
                  ( s[i-1].islower() ) ):
                xlated.append ( '_' )
            xlated.append ( c )
        else:
            xlated.append ( c )

The result is the concatenation of the elements of xlated.

pyrang
    #-- 3 --
    return  "".join ( xlated )

4.10. writeOutput(): Generate the Python file

This function writes the actual Python declarations to the output file. This output is prefaced by a brief Python comment warning users not to edit the file, that it is produced by this script.

pyrang
def writeOutput ( args, outFile, nameTable ):
    """Output the Python assignment statements.

      [ (args is an Args object) and
        (outFile is a writeable file) and
        (nameTable is a dictionary whose keys are Python names
        and each value is a string) ->
            outFile  +:=  (opening comment) + (lines of the
                form 'name = value', one for each entry in
                nameTable) ]
    """

    #-- 1 --
    print >>outFile, (
        "'''Do not edit this file.  It was produced automatically from\n"
        "   the %s schema by the %s script.\n"
        "'''" % (args.inFileName, sys.argv[0]) )

Since the ordering of entries in a dictionary is arbitrary, we'll extract the keys and sort them so the output won't be completely random.

pyrang
    #-- 2 --
    # [ outFile  +:  (lines of the form 'name = value', one for
    #       each entry in nameTable) ]
    keyList  =  nameTable.keys()
    keyList.sort()
    for  key in keyList:
        print >>outFile, "%s = '%s'" % (key, nameTable[key] )

    outFile.close()

4.11. class Args: Command line argument object

This class encapsulates the processing of the command line arguments.

pyrang
class Args:
    """Represents the command line arguments.

      Exports:
        Args():
          [ if sys.argv contains valid command line arguments ->
              return a new Args object representing those arguments
            else ->
              sys.stderr  +:=  (usage message) + (error message)
              stop execution ]
        .inFileName:
          [ the input file name argument from sys.argv ]
        .prefix:
          [ if sys.argv specifies a prefix argument ->
              that argument as a string
            else -> "" ]
        .outFileName:
          [ if sys.argv specifies an output file name ->
              that file name as a string
            else -> None ]
    """

The SysArgs class divides command line arguments into switches (such as -p) and positional arguments. As class variables, we now define symbolic names for the command line switches and positional arguments.

pyrang
    PREFIX_SWITCH  =  'p'
    OUTFILE_SWITCH  =  'o'
    INFILE_ARG  =  'inFile'    

The SysArgs constructor expects a list of SwitchArg objects defining the switch-type arguments, and a list of PosArg objects defining the positional arguments. We define these also as class variables.

pyrang
    SWITCH_SPECS  =  [
        SwitchArg ( PREFIX_SWITCH,
            [ "Prefix for each generated name" ], takesValue=1 ),
        SwitchArg ( OUTFILE_SWITCH,
            [ "Optional output file name" ], takesValue=1 ) ]
    POS_SPECS  =  [
        PosArg ( INFILE_ARG,
            [ "Name of input .rng file" ] ) ]

4.12. Args.__init__(): Constructor

This class checks the command line arguments and makes their values available.

pyrang
    def __init__ ( self ):
        """Check and process command line arguments.
        """

Much of the work of checking and collecting command line arguments is done by the SysArgs class from the author's library module sysargs.py to do preliminary argument processing; for documentation on that class, see the author's Python library page.

pyrang
        #-- 1 --
        # [ if sys.argv contains valid command line arguments ->
        #     sysArgs  :=  a SysArgs object representing those
        #         arguments
        #   else ->
        #     sys.stderr  +:=  (usage message) + (error message)
        #     stop execution ]
        sysArgs  =  SysArgs ( self.SWITCH_SPECS, self.POS_SPECS )

At this point we know that all switches and positional arguments were valid, so we can copy them over to our exported attributes.

pyrang
        #-- 2 --
        # [ if sysArgs.switchMap has a key self.PREFIX_SWITCH ->
        #     self.prefix  :=  the corresponding value
        #   else ->
        #     self.prefix  :=  "" ]
        self.prefix  =  sysArgs.switchMap[self.PREFIX_SWITCH]
        if  self.prefix is None:
            self.prefix  =  ""

        #-- 3 --
        # [ if sysArgs.switchMap has a key self.OUTFILE_SWITCH ->
        #     self.outFileName  :=  the corresponding value
        #   else ->
        #     self.outFileName  :=  None ]
        try:
            self.outFileName  =  sysArgs.switchMap[self.OUTFILE_SWITCH]
        except KeyError:
            self.outFileName  =  None

        #-- 4 --
        self.inFileName  =  sysArgs.posMap[self.INFILE_ARG]

4.13. Code epilogue

This is the last bit of code in the script; it calls the main() function to start execution.

pyrang
#================================================================
# Epilogue
#----------------------------------------------------------------

if  __name__  ==  '__main__':
    main()