The author has written several Python-language applications that process XML files. These Python scripts need to refer to XML element and attribute names in order to process them.
The author's preferred Python tool for both reading and
generating XML is described in Python XML processing with
lxml. This tool represents an XML
document as a tree of
Suppose, for example, that in an XML application to represent
sports team rosters, a
team element has
player child elements. If the variable
teamNode is an
Element node, we might
use this call to get a list of those child elements:
playerList = teamNode.xpath("player")
However, the author prefers to avoid using string constants in code, for two reasons:
Stylistically, it is a good idea to avoid, as much as possible, the use of constants in code. If you see the constant 20, for example, the obvious question is: why 20?
The professional way to use constants is to define a name for the constant, and then document that definition with an explanation of what the value represents.
The author uses names in all caps for such “manifest
constants.” In a C-like language, these would
typically be declared using the
construct. Python doesn't have read-only variables, so we
just use an ordinary variable.
For XML element names, he prefers a modest Hungarian
notation, adding a characteristic suffix of
_N” for element names
(generic identifiers) and “
_A” for attribute names. So, for example, the
manifest constant name for the
element would be
Furthermore, “intercapitalized” names
such as “
have underbars inserted at each lowercase-to-uppercase
transition (e.g., “
XML allows names to contain three characters that are not valid in Python names: hyphen, period, and colon. The pyrang utility will translate hyphens to underbars; don't use names with periods and colons with this program.
In the life of the vast majority of applications, the design changes over time. In an XML application, it is particularly likely that element and attribute names will be added or changed. When the schema changes, the programmer must find, check, and possibly repair all references to a changed name in the code.
If we define a manifest constant for each element and attribute name, then a simple string search suffices to find all the references.
So we can rewrite the above example as:
PLAYER_N = "player" # Declared at the top of the source file ... playerList = teamNode.xpath(PLAYER_N)
A more subtle problem in maintainability is that now there are two places where XML element and attribute names are defined: in the schema that defines the XML document type, and in Python programs that process documents of that type. When the schema changes, the programmer has to remember to make parallel changes to the Python code. If the two versions get out of sychronization, Bad Things May Happen.
So we see that having these parallel versions violates the principle of single-sourcing: that is, there should be a single, reference version of any software entity.
The purpose of pyrang, then, is to automate single-sourcing of XML element and attribute names. You must have these software tools installed:
This script should work with any version of the Python programming language from 2.2 on.
Relax NG is the author's preferred schema language. For more information, see Relax NG Compact Syntax (RNC).
The author prefers to write Relax NG schemas using the RNC (Relax NG Compact Syntax) notation. However, there is currently no easy way to access such a schema from Python.
Fortunately, there is an easy short-cut. James Clark's open-source tool trang can translate RNC schemas into RNG format, which is an XML document type. There are several good packages making it easy to access XML files.
For downloads and documentation, see Trang: Multi-format schema converter based on RELAX NG.
For the author's preferred Python tool for XML work, see
See Python XML processing with
The standard Unix
automates the rebuilding of the Python definitions
whenever the schema changes. This utility is driven
by the file named
your development directory.
This document has these major sections:
Files referenced or created in this document: