Next / Previous / Contents / TCC Help System / NM Tech homepage

5. archindex.py: Reader for archive index files

The archindex.py module reads one of the index files written by archx.py. For the interface, see Section 5.2, “class ArchiveIndex: Index of one archive”.

5.1. Prologue

The code starts with the module's documentation string and a few vital imports.

archindex.py
"""Reader module for files conforming to archx.rnc.
  For documentation, see:
    http://www.nmt.edu/~john/scans/slides/archx/
"""

The first import, enabling the use of Python generators, must precede all other import statements. Next we import Parse from the 4Suite XML package (see Python and the XML Document Object Model (DOM) with 4Suite) to read the XML file. Also included are two exceptions raised by the Parse() function, so we can catch them gracefully.

Note

This module has been ported to the more modern lxml library. See the copy in tcc/p/cranefest/, which should replace this module using the older 4Suite XML package.

archindex.py
#================================================================
# Imports
#----------------------------------------------------------------

from __future__ import generators
from Ft.Xml import Parse, ReaderException
from Ft.Lib import UriException
from rnc_archx import *

5.2. class ArchiveIndex: Index of one archive

Here is the interface to retrieve an archive index file. An ArchiveIndex is a container for ArchiveImage instances, each of which describes one archived image.

Typically you won't call the constructor directly; instead, use the static method ArchiveIndex.readFile() to read the file for you.

archindex.py
# - - - - -   c l a s s   A r c h i v e I n d e x   - - - - -

class ArchiveIndex:
    """Represents one archive index file, conforming to archx.rnc.

      Exports:
        ArchiveIndex ( imageCatalog ):
          [ imageCatalog is a birdimages.ImageCatalog instance ->
              return a new, empty ArchiveIndex object ]
        .imageCatalog:     [ as passed to constructor ]
        .getArchImage ( catNo ):
          [ catNo is an image catalog number as a string ->
              if self has an entry for that catalog number ->
                return an ArchImage instance representing that entry
              else -> raise KeyError ]
        .genArchImages():
          [ generate the ArchImage instances in self, in ascending
            order by catalog number ]
        .addArchImage ( archImage ):
          [ archImage is an ArchImage instance ->
              self  :=  self with archImage added ]
        ArchiveIndex.readFile ( imageCatalog, fileName ):
          [ (imageCatalog is a birdimages.ImageCatalog instance) and
            (fileName is a string) ->
              if (fileName names an XML file valid against
              archx.rnc) and
              (catalog numbers in that file are all found in
              imageCatalog) -> 
                return a new ArchiveIndex object containing ArchImage
                instances representing entries from fileName that
                do match entries in imageCatalog
              else -> raise IOError ]

The internal state of an ArchiveIndex instance consists of one dictionary:

archindex.py
      State/Invariants:
        .__catNoMap:
          [ a dictionary whose values are the ArchiveIndex
            instances contained in self, and each key is the
            catalog number of that instance ]
    """

5.3. ArchiveIndex.__init__(): Constructor

There is little for this nominal constructor to do: just initialize the .__catNoMap dictionary.

archindex.py
# - - -   A r c h i v e I n d e x . _ _ i n i t _ _   - - -

    def __init__ ( self ):
        """Constructor for ArchiveIndex.
        """
        self.__catNoMap  =  {}

5.4. ArchiveIndex.getArchImage(): Retrieve an archived image

Retrieves the ArchImage for the given catalog number, if any. If the catalog number is not in self.__catNoMap, this method will raise KeyError.

archindex.py
# - - -   A r c h i v e I n d e x . g e t A r c h I m a g e   - - -

    def getArchImage ( self, catNo ):
        """Look up a catalog number.
        """
        return  self.__catNoMap [ catNo ]

5.5. ArchiveIndex.genArchImages(): Generate contained image entries

Sorts the catalog numbers, then generates the corresponding ArchImage instances in that order.

archindex.py
# - - -   A r c h i v e I n d e x . g e n A r c h I m a g e s   - - -

    def genArchImages ( self ):
        """Generate self's images.
        """
        catNoList  =  self.__catNoMap.keys()
        catNoList.sort()
        for  catNo in catNoList:
            yield  self.__catNoMap[catNo]
        raise StopIteration

5.6. ArchiveIndex.addArchImage(): Add one image

Add one ArchImage instance to self.

archindex.py
# - - -   A r c h i v e I n d e x . a d d A r c h I m a g e   - - -

    def addArchImage ( self, archImage ):
        """Add one cataloged entry.
        """
        self.__catNoMap[archImage.original.catNo]  =  archImage

5.7. ArchiveIndex.readFile(): Instantiate from XML

This static method reads an XML file conforming to archx.rnc and returns its contents as an ArchiveIndex instance.

archindex.py
# - - -   A r c h i v e I n d e x . r e a d F i l e   - - -

#   @staticmethod
    def readFile ( imageCatalog, fileName ):
        """Read an XML file.
        """

We use the Parse() function to convert the XML file into a DOM tree.

archindex.py
        #-- 1 --
        # [ if  fileName names a readable, well-formed XML file ->
        #     doc  :=  a DOM Document node representing that file
        #   else -> raise IOError ]
        try:
            doc  =  Parse ( fileName )
        except UriException, detail:
            raise IOError, ( "No such file '%s': %s" %
                (filename, detail) )
        except ReaderException, detail:
            raise IOError, ( "File '%s' not well-formed: %s" %
                (filename, detail) )

Next we build a node set of all the image nodes in the document, and also create an ArchiveIndex instance.

archindex.py
        #-- 2 --
        # [ xList  :=  a node-set of all RNC_IMAGE_N nodes in doc
        #   archx  :=  a new, empty ArchiveIndex instance ]
        xList  =  doc.documentElement.xpath ( '//%s' % RNC_IMAGE_N )
        archx  =  ArchiveIndex()

Each valid node in xList will be converted to an ArchImage object and added to self.__catNoMap.

archindex.py
        #-- 3 --
        # [ if (all the nodes in xList are valid against
        #   archx.rnc, and their catalog numbers are defined
        #   in imageCatalog) ->
        #     archx  :=  archx with ArchImage instances added
        #                representing valid nodes from xList
        #   else -> raise IOError ]
        for  xNode in xList:
            #-- 3 body --
            # [ xNode is a DOM RNC_IMAGE_N Element node ->
            #     if xNode is not valid against archx.rnc ->
            #       raise IOError
            #     else if xNode's catalog number is found in
            #     self.imageCatalog ->
            #       archx  :=  archx with an ArchImage instance
            #                  added representing xNode ]

            #-- 3.1 --
            # [ (imageCatalog is a birdimages.ImageCatalog) and
            #   (xNode is a DOM RNC_IMAGE_N Element node) ->
            #     if (xNode is not valid against archx.rnc) or
            #     (xNode's catalog number is not in imageCatalog) ->
            #       raise IOError
            #     else ->
            #       archImage  :=  an ArchImage instance
            #           representing that catalog number ]
            archImage  =  ArchImage.readNode ( imageCatalog, xNode )

            #-- 3.2 --
            # [ archx  :=  archx with archImage added ]
            archx.addArchImage ( archImage )

Finally the accumulate catalog is returned to the caller.

archindex.py
        #-- 4 --
        return  archx

    readFile  =  staticmethod ( readFile )

5.8. class ArchImage: Archived catalog entry

Each instance of this class represents one image that is not only in the image catalog, but has also been measured for image size, and a thumbnail placed in the thumbnail directory. Most of the cataloging information is represented as an Original instance (as described in An XML-based bird cataloging system), available as the .original attribute of an ArchImage instance.

Here is the class's interface, and its trivial constructor.

archindex.py
# - - - - -   c l a s s   A r c h I m a g e   - - - - -

class ArchImage:
    """Represents the cataloging information for one archived image.

      Exports:
        ArchImage ( original, high, wide ):
          [ (original is a birdimages.Original instance) and
            (high is the image's height in pixels as an int) and
            (wide is the image's width in pixels as an int) ->
              return a new ArchImage object with those values ]
        .original:       [ as passed to constructor, read-only ]
        .high:           [ as passed to constructor, read-only ]
        .wide:           [ as passed to constructor, read-only ]
        ArchImage.readNode ( imageCatalog, xNode ):
          [ (imageCatalog is a birdimages.ImageCatalog instance) and
            (xNode is a DOM RNC_IMAGE_N Element ->
              if (xNode is not valid against archx.rnc) or
              (xNode's catalog number is not found in
              imageCatalog) ->
                raise IOError
              else  ->
                return an ArchImage instance representing that
                catalog number ]
    """
    def __init__ ( self, original, high, wide ):
        """Constructor for ArchImage
        """
        self.original  =  original
        self.high      =  high
        self.wide      =  wide

5.9. ArchImage.readNode(): Convert an XML node

This static method converts an image node into an ArchImage instance, assuming that its catalog number is found in the dictionary.

archindex.py
# - - -   A r c h I m a g e . r e a d N o d e   - - -

#   @staticmethod
    def readNode ( imageCatalog, xNode ):
        """Convert an XML node to an ArchImage.
        """

First we pull out the catalog number, height, and width.

archindex.py
        #-- 1 --
        # [ catNo  :=  xNode's RNC_CAT_NO_A attribute ]
        catNo  =  xNode.getAttributeNS ( None, RNC_CAT_NO_A )

        #-- 2 --
        # [ if xNode has an RNC_HIGH_A attribute that is a valid
        #   float in string form ->
        #     high  :=  that attribute as a float
        #   else -> raise IOError ]
        high  =  getIntAttr ( xNode, RNC_HIGH_A )

        #-- 3 --
        # [ if xNode has an RNC_WIDE_A attribute that is a valid
        #   float in string form ->
        #     wide  :=  that attribute as a float
        #   else -> raise IOError ]
        wide  =  getIntAttr ( xNode, RNC_WIDE_A )

Translate the catalog number into an Original instance, or fail.

archindex.py
        #-- 4 --
        # [ if catNo matches a catalog number in imageCatalog ->
        #     original  :=  the corresponding Original from
        #                   imageCatalog 
        #   else -> raise IOError ]
        original  =  imageCatalog.getOriginal ( catNo )
        
        #-- 5 --
        return ArchImage ( original, high, wide )

    readNode  =  staticmethod ( readNode )

5.10. getIntAttr(): Retrieve an integer attribute value

This utility function handles the retrieval and conversion of an XML attribute that should contain an integer in string form.

archindex.py
# - - -   g e t I n t A t t r   - - -

def getIntAttr ( node, attrName ):
    """Convert an integer attribute value

      [ (node is a DOM Element node) and
        (attrName is an attribute name as a string) ->
          if node has an attribute named attrName and it
          contains a valid int in string form ->
            return that attribute as an int
          else -> raise IOError ]
    """

    #-- 1 --
    # [ if node has an attribute named attrName ->
    #     rawInt  :=  that attribute's value
    #   else -> raise IOError ]
    rawInt  =  node.getAttributeNS ( None, attrName )
    if  not rawInt:
        raise IOError, ( "Missing %s attribute" % attrName )

    #-- 2 --
    # [ if rawInt is a valid int in string form ->
    #     return int(rawInt)
    #   else -> raise IOError ]
    try:
        result  =  int ( rawInt )
        return result
    except ValueError:
        raise IOError, ( "%s='%s': value not an int" %
                         (attrName, rawInt) )