Next / Previous / Contents / Shipman's homepage

12. Automated validation of input files

What happens to your application if you read a file that does not conform to the schema? There are two ways to deal with error handling.

With the lxml module, the latter approach is inexpensive both in programming effort and in runtime. You can validate a document using either of these major schema languages:

12.1. Validation with a Relax NG schema

The lxml module can validate a document, in the form of an ElementTree, against a schema expressed in the Relax NG notation. For more information about Relax NG, see Relax NG Compact Syntax (RNC).

A Relax NG schema can use two forms: the compact syntax (RNC), or an XML document type (RNG). If your schema uses RNC, you must translate it to RNG format. The trang utility does this conversion for you. Use a command of this form:

trang file.rnc file.rng

Once you have the schema available as an .rng file, use these steps to valid an element tree ET.

  1. Parse the .rng file into its own ElementTree, as described in Section 7.3, “The ElementTree() constructor”.

  2. Use the constructor etree.RelaxNG(S) to convert that tree into a “schema instance,” where S is the ElementTree instance, containing the schema, from the previous step.

    If the tree is not a valid Relax NG schema, the constructor will raise an etree.RelaxNGParseError exception.

  3. Use the .validate(ET) method of the schema instance to validate ET.

    This method returns 1 if ET validates against the schema, or 0 if it does not.

    If the method returns 0, the schema instance has an attribute named .error_log containing all the errors detected by the schema instance. You can print .error_log.last_error to see the most recent error detected.

Presented later in this document are two examples of the use of this validation technique: