Next / Previous / Contents / TCC Help System / NM Tech homepage

3. Reading an XML document in Python

To extract information from an XML document, you'll need to read it and convert into the DOM tree form. There is an easy way to do this, and a full-featured way to do it. In order to select between these methods, it is necessary to think about whether the document needs a correct base URI (see Section 2.5, “Base URI”).

Consequently:

3.1. A quick and dirty document reader

If you don't need to supply a correct base URI, this technique gives you a DOM document object that represents an XML source document in any of four forms:

  • A string that contains the entire document, e.g., "<dog-list><dog breed='bassett' sex='m' >Rover</dog ></dog-list >"

  • A file containing the document, as a readable file object.

  • A string that names the file containing the document.

  • The URI of the document, if it is available at that location.

To use this technique, first import the relevant modules:

from Ft.Xml import Parse

Then, to transform an XML document into a DOM tree:

doc = Parse ( source )

where source is any of: a string containing the document, a stream from which to read the document, the name of the document file, or the URI of the document.

The Parse() function returns a DOM Document node, that is, the root of the document tree. For further information on the structure of this tree, see Section 4, “The structure of a DOM tree”.

If the source does not exist, this function will raise Ft.Lib.UriException. If it exists but is not well-formed, Parse() will raise Ft.Xml.ReaderException.