Next / Previous / Contents / Shipman's homepage

7.6. BirdNoteSet._validate(): Open and validate the file

This method takes care of converting the external XML file into an et.ElementTree instance. It also validates that tree against the schema. For details of the validation process, see Automated validation of input files in Python XML processing with lxml.

birdnotes.py
# - - - B i r d N o t e S e t . _ v a l i d a t e

    def _validate(self, fileName):
        """Build an XML tree and validate it against the schema.

          [ fileName is a string ->
              if (SCHEMA_RNG names a readable, well-formed RNG
              bird notes schema) and
              (fileName names a readable XML bird notes file
              that validates against that schema) ->
                return the root node of a document representing
                that bird notes file as an et.Element ]
        """

Before we can validate the notes file, we have to translate the schema itself into an ElementTree.

birdnotes.py
        #-- 1 --
        # [ if SCHEMA_RNG names a readable, well-formed XML file ->
        #     schemaDoc  :=  a new et.ElementTree representing
        #                    that file
        #   else -> raise IOError ]
        try:
            schemaDoc = et.parse(SCHEMA_RNG)
        except et.XMLSyntaxError:
            raise IOError("Schema file '%s' is not "
                "well-formed XML." % SCHEMA_RNG)
        except IOError, detail:
            raise IOError("Can't read schema file '%s': %s" %
                (SCHEMA_RNG, str(detail)))

The next step is to convert the schema's tree into an et.RelaxNG instance that knows how to validate against that schema.

birdnotes.py
        #-- 2 --
        # [ if schemaDoc is a valid Relax NG schema ->
        #     schema  :=  an et.RelaxNG instance representing schemaDoc
        #   else -> raise IOError ]
        try:
            schema = et.RelaxNG(schemaDoc)
        except et.RelaxNGParseError, detail:
            raise IOError("File '%s' is not a valid "
                "RNG schema: %s " % (SCHEMA_RNG, str(detail)))

Next we convert the bird notes file into a tree. The et.parse() function reads the document and turns it into an et.ElementTree. To find the root element of an ElementTree, use the .getroot() method.

If the file doesn't exist or is unreadable, we'll get an IOError exception. If it exists but is not well-formed, we get an et.XMLSyntaxError exception.

birdnotes.py
        #-- 3 --
        # [ if fileName names a readable, well-formed XML file ->
        #     doc  :=  that file as an et.ElementTree
        #   else -> raise IOError ]
        try:
            doc = et.parse(fileName)
        except et.XMLSyntaxError:
            raise IOError("File '%s' is not well-formed XML." %
                             fileName)
        except IOError, detail:
            raise IOError("Can't read file '%s': %s" %
                (fileName, str(detail)))

The schema.validate() method returns 1 if a document validates, 0 otherwise. Assuming all of that succeeds, we can then return the document's root element.

birdnotes.py
        #-- 4 --
        # [ if doc fails to validate against schema ->
        #     raise IOError
        #   else -> I ]
        if  not schema.validate(doc):
            raise IOError("File %s is not a valid bird notes "
                "file: %s" % (fileName, schema.error_log))

        #-- 5 --
        # [ return the root element of doc ]
        return doc.getroot()