Next / Previous / Contents / TCC Help System / NM Tech homepage


Describes a script for extracting the names of XML elements and attributes so that Python scripts can use those names in symbolic form.

This publication is available in Web form and also as a PDF document. Please forward any comments to

Table of Contents

1. Introduction: Why pyrang?
2. Operation of pyrang
3. Setting up Makefile rules for pyrang
4. pyrang internals
4.1. Code prologue
4.2. Module imports
4.3. Manifest constants
4.4. main(): The main program
4.5. fatal(): Write a message and stop
4.6. processInput(): Read the schema
4.7. findNames(): Recursive tree walker
4.8. addName(): Add one name to the name table
4.9. pythonizeName(): Sanitize an XML name for Python use
4.10. writeOutput(): Generate the Python file
4.11. class Args: Command line argument object
4.12. Args.__init__(): Constructor
4.13. Code epilogue

1. Introduction: Why pyrang?

The author has written several Python-language applications that process XML files using the DOM (Document Object Model), as described in Python and the XML Document Object Model with 4Suite.

These Python scripts need to refer to XML element and attribute names in order to process them. Suppose, for example, that in an XML application to represent sports team rosters, a team element has player child elements. If the variable teamNode is an XML DOM Element node, we might use this DOM call to get a list of those child elements:

  playerList = teamNode.xpath ( "player" )

However, the author prefers to avoid using string constants in code, for two reasons:

  1. Stylistically, it is a good idea to avoid, as much as possible, the use of constants in code. If you see the constant 20, for example, the obvious question is: why 20?

    The professional way to use constants is to define a name for the constant, and then document that definition with an explanation of what the value represents.

    The author uses names in all caps for such “manifest constants.” In a C-like language, these would typically be declared using the #define construct. Python doesn't have read-only variables, so we just use an ordinary variable.

    For XML element names, he prefers a modest Hungarian notation, adding a characteristic suffix of “_N” for element names (generic identifiers) and “_A” for attribute names. So, for example, the manifest constant name for the player element would be PLAYER_N.

    Furthermore, “intercapitalized” names such as “nSnakes” should have underbars inserted at each lowercase-to-uppercase transition (e.g., “N_SNAKES”).


    XML allows names to contain three characters that are not valid in Python names: hyphen, period, and colon. We'll translate hyphens to underbars, but don't use names with periods and colons with this program.

  2. In the life of the vast majority of applications, the design changes over time. In an XML application, it is particularly likely that element and attribute names will be added or changed. When the schema changes, the programmer must find, check, and possibly repair all references to a changed name in the code.

    If we define a manifest constant for each element and attribute name, then a simple string search suffices to find all the references.

So we can rewrite the above example as:

PLAYER_N  =  "player"  # Declared at the top of the source file
 playerList = teamNode.xpath ( PLAYER_N )

A more subtle problem in maintainability is that now there are two places where XML element and attribute names are defined: in the schema that defines the XML document type, and in Python programs that process documents of that type. When the schema changes, the programmer has to remember to make parallel changes to the Python code. If the two versions get out of sychronization, Bad Things May Happen.

So we see that having these parallel versions violates the principle of single-sourcing: that is, there should be a single, reference version of any software entity.

The purpose of pyrang, then, is to automate single-sourcing of XML element and attribute names. You must have these software tools installed:


This script should work with any version of the Python programming language from 2.2 on.

Relax NG

Relax NG is the author's preferred schema language. For more information, see Relax NG Compact Syntax (RNC).


The author prefers to write Relax NG schemas using the RNC (Relax NG Compact Syntax) notation. However, there is currently no easy way to access such a schema from Python.

Fortunately, there is an easy short-cut. James Clark's open-source tool trang can translate RNC schemas into RNG format, which is an XML document type. There are several good packages making it easy to access XML files.

See the trang page for downloads and documentation.


The 4Suite package is a package for Python-XML applications. For more information, see Python and the XML Document Object Model (DOM) with 4Suite.


The standard Unix make utility automates the rebuilding of the Python definitions whenever the schema changes. This utility is driven by the file named Makefile in your development directory.

This document has these major sections:

Files referenced or created in this document:

  • pyrang: The script for pyrang.

  • The author's module for processing command line arguments.