Table of Contents
Makefilerules for pyrang
main(): The main program
fatal(): Write a message and stop
processInput(): Read the schema
findNames(): Recursive tree walker
addName(): Add one name to the name table
pythonizeName(): Sanitize an XML name for Python use
writeOutput(): Generate the Python file
class Args: Command line argument object
The author has written several Python-language applications that process XML files using the DOM (Document Object Model), as described in Python and the XML Document Object Model with 4Suite.
These Python scripts need to refer to XML element and
attribute names in order to process them. Suppose,
for example, that in an XML application to represent
sports team rosters, a
team element has
player child elements. If the variable
teamNode is an XML DOM
Element node, we might use this DOM call to get a list of those
playerList = teamNode.xpath ( "player" )
However, the author prefers to avoid using string constants in code, for two reasons:
Stylistically, it is a good idea to avoid, as much as possible, the use of constants in code. If you see the constant 20, for example, the obvious question is: why 20?
The professional way to use constants is to define a name for the constant, and then document that definition with an explanation of what the value represents.
The author uses names in all caps for such “manifest
constants.” In a C-like language, these would
typically be declared using the
construct. Python doesn't have read-only variables, so
we just use an ordinary variable.
For XML element names, he prefers a modest Hungarian
notation, adding a characteristic suffix of
_N” for element names
(generic identifiers) and “
_A” for attribute names. So, for example, the
manifest constant name for the
element would be
Furthermore, “intercapitalized” names
such as “
have underbars inserted at each lowercase-to-uppercase
transition (e.g., “
XML allows names to contain three characters that are not valid in Python names: hyphen, period, and colon. We'll translate hyphens to underbars, but don't use names with periods and colons with this program.
In the life of the vast majority of applications, the design changes over time. In an XML application, it is particularly likely that element and attribute names will be added or changed. When the schema changes, the programmer must find, check, and possibly repair all references to a changed name in the code.
If we define a manifest constant for each element and attribute name, then a simple string search suffices to find all the references.
So we can rewrite the above example as:
PLAYER_N = "player" # Declared at the top of the source file ... playerList = teamNode.xpath ( PLAYER_N )
A more subtle problem in maintainability is that now there are two places where XML element and attribute names are defined: in the schema that defines the XML document type, and in Python programs that process documents of that type. When the schema changes, the programmer has to remember to make parallel changes to the Python code. If the two versions get out of sychronization, Bad Things May Happen.
So we see that having these parallel versions violates the principle of single-sourcing: that is, there should be a single, reference version of any software entity.
The purpose of pyrang, then, is to automate single-sourcing of XML element and attribute names. You must have these software tools installed:
This script should work with any version of the Python programming language from 2.2 on.
Relax NG is the author's preferred schema language. For more information, see Relax NG Compact Syntax (RNC).
The author prefers to write Relax NG schemas using the RNC (Relax NG Compact Syntax) notation. However, there is currently no easy way to access such a schema from Python.
Fortunately, there is an easy short-cut. James Clark's open-source tool trang can translate RNC schemas into RNG format, which is an XML document type. There are several good packages making it easy to access XML files.
trang page for downloads and
The 4Suite package is a package for Python-XML applications. For more information, see Python and the XML Document Object Model (DOM) with 4Suite.
The standard Unix
automates the rebuilding of the Python definitions
whenever the schema changes. This utility is driven
by the file named
your development directory.
This document has these major sections:
Files referenced or created in this document: