Next / Previous / Contents / TCC Help System / NM Tech homepage

1. Introduction: Why pyrang?

The author has written several Python-language applications that process XML files. These Python scripts need to refer to XML element and attribute names in order to process them.

The author's preferred Python tool for both reading and generating XML is described in Python XML processing with lxml. This tool represents an XML document as a tree of Element nodes.

Suppose, for example, that in an XML application to represent sports team rosters, a team element has player child elements. If the variable teamNode is an Element node, we might use this call to get a list of those child elements:

    playerList = teamNode.xpath("player")

However, the author prefers to avoid using string constants in code, for two reasons:

  1. Stylistically, it is a good idea to avoid, as much as possible, the use of constants in code. If you see the constant 20, for example, the obvious question is: why 20?

    The professional way to use constants is to define a name for the constant, and then document that definition with an explanation of what the value represents.

    The author uses names in all caps for such “manifest constants.” In a C-like language, these would typically be declared using the #define construct. Python doesn't have read-only variables, so we just use an ordinary variable.

    For XML element names, he prefers a modest Hungarian notation, adding a characteristic suffix of “_N” for element names (generic identifiers) and “_A” for attribute names. So, for example, the manifest constant name for the player element would be PLAYER_N.

    Furthermore, “intercapitalized” names such as “nSnakes” should have underbars inserted at each lowercase-to-uppercase transition (e.g., “N_SNAKES”).

    Warning

    XML allows names to contain three characters that are not valid in Python names: hyphen, period, and colon. The pyrang utility will translate hyphens to underbars; don't use names with periods and colons with this program.

  2. In the life of the vast majority of applications, the design changes over time. In an XML application, it is particularly likely that element and attribute names will be added or changed. When the schema changes, the programmer must find, check, and possibly repair all references to a changed name in the code.

    If we define a manifest constant for each element and attribute name, then a simple string search suffices to find all the references.

So we can rewrite the above example as:

PLAYER_N  =  "player"  # Declared at the top of the source file
      ...
     playerList = teamNode.xpath(PLAYER_N)

A more subtle problem in maintainability is that now there are two places where XML element and attribute names are defined: in the schema that defines the XML document type, and in Python programs that process documents of that type. When the schema changes, the programmer has to remember to make parallel changes to the Python code. If the two versions get out of sychronization, Bad Things May Happen.

So we see that having these parallel versions violates the principle of single-sourcing: that is, there should be a single, reference version of any software entity.

The purpose of pyrang, then, is to automate single-sourcing of XML element and attribute names. You must have these software tools installed:

Python

This script should work with any version of the Python programming language from 2.2 on.

Relax NG

Relax NG is the author's preferred schema language. For more information, see Relax NG Compact Syntax (RNC).

trang

The author prefers to write Relax NG schemas using the RNC (Relax NG Compact Syntax) notation. However, there is currently no easy way to access such a schema from Python.

Fortunately, there is an easy short-cut. James Clark's open-source tool trang can translate RNC schemas into RNG format, which is an XML document type. There are several good packages making it easy to access XML files.

For downloads and documentation, see Trang: Multi-format schema converter based on RELAX NG.

lxml

For the author's preferred Python tool for XML work, see See Python XML processing with lxml.

make

The standard Unix make utility automates the rebuilding of the Python definitions whenever the schema changes. This utility is driven by the file named Makefile in your development directory.

This document has these major sections:

Files referenced or created in this document: