Abstract
Describes a script for extracting the names of XML elements and attributes so that Python scripts can use those names in symbolic form.
This publication is available in Web form and also as a PDF document. Please
forward any comments to tcc-doc@nmt.edu.
Table of Contents
Makefile rules for
pyrangmain(): The main programfatal(): Write a message and stopprocessInput(): Read the schemafindNames(): Recursive tree walkeraddName(): Add one name to the name tablepythonizeName(): Sanitize an XML name
for Python usewriteOutput(): Generate the Python fileclass Args: Command line argument objectArgs.__init__(): ConstructorThe author has written several Python-language applications that process XML files using the DOM (Document Object Model), as described in Python and the XML Document Object Model with 4Suite.
These Python scripts need to refer to XML element and
attribute names in order to process them. Suppose,
for example, that in an XML application to represent
sports team rosters, a team element has
player child elements. If the variable
teamNode is an XML DOM Element node, we might use this DOM call to get a list of those
child elements:
playerList = teamNode.xpath ( "player" )
However, the author prefers to avoid using string constants in code, for two reasons:
Stylistically, it is a good idea to avoid, as much as possible, the use of constants in code. If you see the constant 20, for example, the obvious question is: why 20?
The professional way to use constants is to define a name for the constant, and then document that definition with an explanation of what the value represents.
The author uses names in all caps for such “manifest
constants.” In a C-like language, these would
typically be declared using the #define
construct. Python doesn't have read-only variables, so
we just use an ordinary variable.
For XML element names, he prefers a modest Hungarian
notation, adding a characteristic suffix of
“_N” for element names
(generic identifiers) and “_A” for attribute names. So, for example, the
manifest constant name for the player
element would be PLAYER_N.
Furthermore, “intercapitalized” names
such as “nSnakes” should
have underbars inserted at each lowercase-to-uppercase
transition (e.g., “N_SNAKES”).
XML allows names to contain three characters that are not valid in Python names: hyphen, period, and colon. We'll translate hyphens to underbars, but don't use names with periods and colons with this program.
In the life of the vast majority of applications, the design changes over time. In an XML application, it is particularly likely that element and attribute names will be added or changed. When the schema changes, the programmer must find, check, and possibly repair all references to a changed name in the code.
If we define a manifest constant for each element and attribute name, then a simple string search suffices to find all the references.
So we can rewrite the above example as:
PLAYER_N = "player" # Declared at the top of the source file ... playerList = teamNode.xpath ( PLAYER_N )
A more subtle problem in maintainability is that now there are two places where XML element and attribute names are defined: in the schema that defines the XML document type, and in Python programs that process documents of that type. When the schema changes, the programmer has to remember to make parallel changes to the Python code. If the two versions get out of sychronization, Bad Things May Happen.
So we see that having these parallel versions violates the principle of single-sourcing: that is, there should be a single, reference version of any software entity.
The purpose of pyrang, then, is to automate single-sourcing of XML element and attribute names. You must have these software tools installed:
This script should work with any version of the Python programming language from 2.2 on.
Relax NG is the author's preferred schema language. For more information, see Relax NG Compact Syntax (RNC).
The author prefers to write Relax NG schemas using the RNC (Relax NG Compact Syntax) notation. However, there is currently no easy way to access such a schema from Python.
Fortunately, there is an easy short-cut. James Clark's open-source tool trang can translate RNC schemas into RNG format, which is an XML document type. There are several good packages making it easy to access XML files.
See the trang page for downloads and
documentation.
The 4Suite package is a package for Python-XML applications. For more information, see Python and the XML Document Object Model (DOM) with 4Suite.
The standard Unix make utility
automates the rebuilding of the Python definitions
whenever the schema changes. This utility is driven
by the file named Makefile in
your development directory.
This document has these major sections:
Section 2, “Operation of pyrang”: How to run the pyrang script.
Section 3, “Setting up Makefile rules for
pyrang”: How to set up pyrang
in your Makefile.
Section 4, “pyrang internals”: The actual code for pyrang, in lightweight literate programming form.
Files referenced or created in this document:
pyrang:
The script for pyrang.
sysargs.py: The author's module for processing
command line arguments.