Here, in lightweight literate programming (LLP) form, is the actual code for pyrang. For more on LLP, see the author's LLP page.
The actual script starts with the traditional “pound-bang line” to make the script self-executing under Unix. This is followed by a comment that points back to this document.
#!/usr/bin/env python #================================================================ # pyrang: Writes Python declarations for RNC schema names. # For documentation and code, see: # http://www.nmt.edu/tcc/help/lang/python/examples/pyrang/ #----------------------------------------------------------------
First we'll need to import the standard Python sys module, which will give us access to the
command line arguments and the standard I/O streams.
#================================================================ # Imports #---------------------------------------------------------------- import sys
Command line argument processing is done with the sysargs.py module; see the author's Python library page.
from sysargs import *
To read the RNG schema file, we need the Parse function from the 4Suite package.
from Ft.Xml import Parse
Here we define symbolic constants for two element names and an attribute name from the Relax NG schema document type.
ELEMENT_N = 'element' ATTRIBUTE_N = 'attribute' NAME_A = 'name'
The purpose of the script is to extract all the element
and attribute names from the input schema, and write a
Python module containing assignment statements that
define a symbolic name for each unique name. For
example, if the schema has an element named “player” that has an attribute
“first-name”, we'll want to
generate two Python assignments from those names:
PLAYER_N = 'player' FIRST_NAME_A = 'first-name'
The obvious data structure for accumulating these names
is a Python dictionary; we'll call it nameTable. We'll use the Python name (e.g.,
FIRST_NAME_A) as the key, and the XML name
(e.g., first-name) as the value, in each
dictionary entry.
The same attribute name may occur in more than one element. In that case, we'll want to avoid writing duplicate declarations. Because dictionary key values must be unique, we can just throw each name we find into the dictionary, and at the end , there will be no duplications.
Here is the overall program flow:
Process the command line arguments, and figure out whether the output is going to a file or to standard output.
We'll define a small class named Args
to encapsulate the processing and representation of
the command line arguments. See Section 4.11, “class Args: Command line argument object”.
The 4Suite Parse() function reads the
RNG schema document and gives it to us as a DOM tree.
We create nameTable as an empty
dictionary, then walk the DOM tree recursively
looking for element and attribute names. Each name
we find is stored into nameTable.
Finally, each key-value pair in nameTable is written to the output as a
Python assignment statement.
The code follows. First, we process the command line
arguments; see Section 4.11, “class Args: Command line argument object”.
def main():
"""Main program
"""
#-- 1 --
# [ if sys.argv contains valid command line arguments ->
# args := an Args instance representing those arguments
# else ->
# sys.stderr +:= error message(s)
# stop execution ]
args = Args()
This step takes care of reading the input file and
building the nameTable dictionary.
See Section 4.6, “processInput(): Read the schema”.
#-- 2 --
# [ args is an Args object ->
# if args.inFileName names a readable file containing a
# valid RNG schema ->
# nameTable := a dictionary whose keys are the Python
# manifest constant names for the element and
# attribute names in that schema (using args.prefix
# as a prefix), and each corresponding value is the
# XML name
# else ->
# sys.stderr +:= error message(s)
# stop execution ]
nameTable = processInput ( args )
All that remains is to write the output file; see
Section 4.10, “writeOutput(): Generate the Python file”.
#-- 3 --
# [ args is an Args object ->
# if args.outFileName is None ->
# outFile := sys.stdout
# else if args.outFileName names a writeable file ->
# outFile := that file opened new for writing
# else ->
# sys.stderr +:= error message(s)
# stop execution ]
if args.outFileName is None:
outFile = sys.stdout
else:
try:
outFile = open ( args.outFileName, "w" )
except IOError, detail:
fatal ( "Can't open output file '%s' for writing." %
args.outFileName )
#-- 4 --
# [ (outFile is a writeable file) and
# (nameTable is a dictionary whose keys are Python names
# and whose values are XML names ->
# outFile := Python statements of the form 'n = v'
# for n in the set of keys of nameTable and each v
# is the corresponding nameTable value ]
writeOutput ( args, outFile, nameTable )
This function writes a message to the standard error stream and stops execution.
def fatal ( *L ):
"""Write a message and terminate.
[ L is a list of strings ->
sys.stderr +:= (concatenated elements of L)
stop execution ]
"""
print >>sys.stderr, "*** Error: %s" % "".join(L)
sys.exit(1)
This function reads the schema, extracts the element and
attribute names, and returns the nameTable
dictionary with the Python/XML name pairs.
def processInput ( args ):
"""Process the input schema.
[ args is an Args object ->
if args.inFileName names a readable file containing a
valid RNG schema ->
nameTable := a dictionary whose keys are the Python
manifest constant names for the element and
attribute names in that schema (using args.prefix
as a prefix), and each corresponding value is the
XML name
else ->
sys.stderr +:= error message(s)
stop execution ]
"""
First we create the name table as an empty dictionary.
#-- 1 --
# [ nameTable := a new, empty dictionary ]
nameTable = {}
Next we attempt to open the input file and hand it to the
4Suite Parse() function to build a DOM
tree.
#-- 2 --
# [ if args.inFileName can be opened for reading ->
# inFile := that file, so opened
# else ->
# sys.stderr +:= error message
# stop execution ]
try:
inFile = open ( args.inFileName )
except IOError, detail:
fatal ( "Can't open '%s' for reading: %s" %
(args.inFileName, detail) )
#-- 3 --
# [ if inFile contains a valid XML document ->
# doc := that document as a DOM tree
# else ->
# sys.stderr +:= error message(s)
# stop execution ]
try:
doc = Parse ( inFile )
except Exception, detail:
fatal ( "Can't parse the schema file: %s" % detail )
Walking the tree to find all the element
and attribute elements is quite easy with
recursion. The function in Section 4.7, “findNames(): Recursive tree walker”
is recursive, and finds all the names in a subtree rooted
in the node you pass it as its first argument. In a DOM
tree, the doc.documentElement attribute is
the root Element node of the schema.
#-- 4 --
# [ (nameTable is a dictionary) and
# (args is an Args object) ->
# nameTable := nameTable with Python/XML name pairs added
# from the subtree rooted at doc.documentElement,
# with the names prefixed by args.prefix ]
findNames ( doc.documentElement, args, nameTable )
Finally, we return the nameTable
dictionary we've built.
#-- 5 --
return nameTable
This function finds all element and attribute names in a
given subtree of the schema's DOM tree, and adds entries
for each name to the nameTable dictionary.
def findNames ( node, args, nameTable ):
"""Find element and attribute names in a subtree.
[ (node is a DOM Element node) and
(args is an Args object) and
(nameTable is a dictionary whose keys are the Python
names for elements and attributes, and each corresponding
value is the XML name) ->
nameTable := nameTable with Python/XML name pairs added
from the subtree rooted at node, with the names
prefixed by args.prefix ]
"""
In an RNG schema, we are looking for elements of these two forms:
<element name="N">... <attribute name="N">...
Therefore, all we have to do is check the node's name to
see if it is either element or attribute, and in those cases add the name to
nameTable. The logic that builds the
Python equivalent of the XML name, and prepends args.prefix, is in Section 4.8, “addName(): Add one name to the name table”.
#-- 1 --
# [ if node.nodeName is ELEMENT_N or ATTRIBUTE_N ->
# nameTable := nameTable with an entry added with a
# the Python equivalent of node's "name" attribute as
# the key, and node's "name" attribute as the value
# else -> I ]
if node.nodeName == ELEMENT_N:
eltName = node.getAttributeNS ( None, NAME_A )
addName ( nameTable, args, eltName, "N" )
elif node.nodeName == ATTRIBUTE_N:
attrname = node.getAttributeNS ( None, NAME_A )
addName ( nameTable, args, attrname, "A" )
That takes care of extracting names from node itself. To recursively add names from its
subtree, we iterate over node's children.
#-- 2 --
# [ nameTable := nameTable with new element and attribute
# names added from children of node ]
for child in node.childNodes:
findNames ( child, args, nameTable )
This function takes care of building the Python
equivalent of each XML name and making sure there is an
entry for that pair in nameTable.
def addName ( nameTable, args, name, suffix ):
"""Add one element or attribute name to nameTable.
[ (nameTable is a dictionary) and
(args is an Args object) and
(name is an XML element or attribute name as a string) and
(suffix is a string) ->
nameTable := nameTable with an entry added with
(args.prefix + (name, uppercased and with each "-"
replaced by "_") + "_" + suffix) as the key, and
name as the corresponding value ]
"""
First, we form the Python equivalent of name. Because name comes out of the DOM, it
will be a Unicode string, so we use the str() function to convert it to a regular
string. The .upper() method uppercases
it, and the .translate() method converts
hyphens to underbars.
#-- 1 --
# [ pyName := name, convert to str, uppercased, and with
# hyphens converted to underbars, and "_" inserted
# at each lowercase->uppercase transition ]
pyName = pythonizeName ( name )
The key consists of pyName, prefixed with
args.prefix, with an underbar and the
suffix argument appended.
#-- 2 --
# [ key := args.prefix + pyName + "_" + suffix ]
key = "%s%s_%s" % (args.prefix, pyName, suffix)
Now we are ready to add the new entry to the table.
#-- 3 --
# [ nameTable := nameTable with an entry whose key=key and
# whose value=name ]
nameTable[key] = name
This function implements the various rules for converting an XML name.
def pythonizeName ( s ):
"""Convert an XML name to its Python equivalent.
"""
The general approach will be to add the translated
characters to a list named xlated.
We start by creating this list empty.
#-- 1 --
xlated = []
Next we work through the input string, adding each
character's worth of content to the xlated
list.
#-- 2 --
# [ xlated +:= characters from s, convert to string type,
# uppercasing lowercase characters, adding "_" at each
# lowercase->uppercase transition, and converting "-"
# to "-" ]
for i in range(len(s)):
#-- 2 body --
# [ if s[i] == '-' ->
# xlated +:= ['_']
# else if s[i] is lowercase ->
# xlated +:= [str(s[i]), uppercased]
# else if (s[i] is uppercase) and (i>0) and
# (s[i-1] is lowercase) ->
# xlated +:= ["-", str(s[i])]
# else ->
# xlated +:= [str(s[i])] ]
#-- 2.1 --
c = str(s[i])
#-- 2.2 --
if c == '-':
xlated.append ( '_' )
elif c.islower():
xlated.append ( c.upper() )
elif c.isupper():
if ( ( i > 0 ) and
( s[i-1].islower() ) ):
xlated.append ( '_' )
xlated.append ( c )
else:
xlated.append ( c )
The result is the concatenation of the elements of xlated.
#-- 3 --
return "".join ( xlated )
This function writes the actual Python declarations to the output file. This output is prefaced by a brief Python comment warning users not to edit the file, that it is produced by this script.
def writeOutput ( args, outFile, nameTable ):
"""Output the Python assignment statements.
[ (args is an Args object) and
(outFile is a writeable file) and
(nameTable is a dictionary whose keys are Python names
and each value is a string) ->
outFile +:= (opening comment) + (lines of the
form 'name = value', one for each entry in
nameTable) ]
"""
#-- 1 --
print >>outFile, (
"'''Do not edit this file. It was produced automatically from\n"
" the %s schema by the %s script.\n"
"'''" % (args.inFileName, sys.argv[0]) )
Since the ordering of entries in a dictionary is arbitrary, we'll extract the keys and sort them so the output won't be completely random.
#-- 2 --
# [ outFile +: (lines of the form 'name = value', one for
# each entry in nameTable) ]
keyList = nameTable.keys()
keyList.sort()
for key in keyList:
print >>outFile, "%s = '%s'" % (key, nameTable[key] )
outFile.close()
This class encapsulates the processing of the command line arguments.
class Args:
"""Represents the command line arguments.
Exports:
Args():
[ if sys.argv contains valid command line arguments ->
return a new Args object representing those arguments
else ->
sys.stderr +:= (usage message) + (error message)
stop execution ]
.inFileName:
[ the input file name argument from sys.argv ]
.prefix:
[ if sys.argv specifies a prefix argument ->
that argument as a string
else -> "" ]
.outFileName:
[ if sys.argv specifies an output file name ->
that file name as a string
else -> None ]
"""
The SysArgs class divides command line
arguments into switches (such as -p) and
positional arguments. As class variables, we now define
symbolic names for the command line switches and
positional arguments.
PREFIX_SWITCH = 'p'
OUTFILE_SWITCH = 'o'
INFILE_ARG = 'inFile'
The SysArgs constructor expects a list of
SwitchArg objects defining the switch-type
arguments, and a list of PosArg objects
defining the positional arguments. We define these also
as class variables.
SWITCH_SPECS = [
SwitchArg ( PREFIX_SWITCH,
[ "Prefix for each generated name" ], takesValue=1 ),
SwitchArg ( OUTFILE_SWITCH,
[ "Optional output file name" ], takesValue=1 ) ]
POS_SPECS = [
PosArg ( INFILE_ARG,
[ "Name of input .rng file" ] ) ]
This class checks the command line arguments and makes their values available.
def __init__ ( self ):
"""Check and process command line arguments.
"""
Much of the work of checking and collecting command line
arguments is done by the SysArgs class
from the author's library module sysargs.py to do preliminary argument processing; for
documentation on that class, see the author's Python library page.
#-- 1 --
# [ if sys.argv contains valid command line arguments ->
# sysArgs := a SysArgs object representing those
# arguments
# else ->
# sys.stderr +:= (usage message) + (error message)
# stop execution ]
sysArgs = SysArgs ( self.SWITCH_SPECS, self.POS_SPECS )
At this point we know that all switches and positional arguments were valid, so we can copy them over to our exported attributes.
#-- 2 --
# [ if sysArgs.switchMap has a key self.PREFIX_SWITCH ->
# self.prefix := the corresponding value
# else ->
# self.prefix := "" ]
self.prefix = sysArgs.switchMap[self.PREFIX_SWITCH]
if self.prefix is None:
self.prefix = ""
#-- 3 --
# [ if sysArgs.switchMap has a key self.OUTFILE_SWITCH ->
# self.outFileName := the corresponding value
# else ->
# self.outFileName := None ]
try:
self.outFileName = sysArgs.switchMap[self.OUTFILE_SWITCH]
except KeyError:
self.outFileName = None
#-- 4 --
self.inFileName = sysArgs.posMap[self.INFILE_ARG]