Next / Previous / Contents / TCC Help System / NM Tech homepage

7. birdimages.py: Internals

This section contains the actual code of the birdimages.py module in lightweight literate form. For more information on this methodology, see the author's Lightweight Literate Programming page.

7.1. Prologue

The birdimages.py module starts with a brief module comment that points back to this documentation.

birdimages.py
"""birdimages.py:  Python object for XML files using birdimages.rnc

  Do not edit this file directly.  It is mechanically extracted from
  the documentation:
    http://www.nmt.edu/~john/scans/slides/ims/
"""

7.2. Imports

As always, we need the sys module for access to standard I/O streams.

birdimages.py
#================================================================
# Imports
#----------------------------------------------------------------
import sys

To process the XML input file, we use the technique described in Python XML processing with lxml. We'll use the name et for that implementation of the ElementTree interface.

birdimages.py
from lxml import etree as et

We'll also need global declarations for all the XML element and attribute names from our RNC schema. Rather than attempt to maintain these declarations in parallel with the schema itself, we use the tool described in pyrang: A single-sourcing tool for Python-XML applications . This program reads the Relax NG version of the schema file and writes a module named rnc_slidecat.py containing Python statements that set up the value of these variables.

The variables generated by pyrang have this general form:

RNC_name_suffix

where the name is the element or attribute name, and the suffix is “N” for element names and “A” for attribute names. For example, the variable for the original element is “RNC_ORIGINAL_N”.

birdimages.py
from rnc_slidecat import *

7.3. Manifest constants

birdimages.py
# - - - - -   M a n i f e s t    c o n s t a n t s

There is one global constant (other than the ones imported from rnc_slidecat.py): the default image catalog name.

birdimages.py
DEFAULT_FILENAME  =  "birdimages.xml"

7.4. class ImageCatalog: Catalog object

An instance of this class represents the entire catalog file. The constructor is not intended for direct instantation, and returns only an empty catalog. The static method ImageCatalog.readFile() does all the work of filling the empty catalog object from the XML input.

Here is the class declaration and external interface.

birdimages.py
# - - - - -   c l a s s   I m a g e C a t a l o g   - - - - -

class ImageCatalog:
    """Represents the entire catalog.

      Exports:
        ImageCatalog():
          [ returns a new, empty ImageCatalog object ]
        .addOriginal(o):
          [ o is an Original object ->
              if self contains an original with the same catalog number
              as o ->
                raise KeyError
              else ->
                self  :=  self with o added ]
        .getOriginal(catNo):
          [ catNo is a catalog number as a string ->
              if self has an original whose catalog number
              matches catNo ->
                return that original as an Original object
              else -> raise KeyError ]
        .genOriginals():
          [ generate the Originals in self in catalog number order ]
        .genAb6(code):
          [ code is a birdId string ->
              generate the Originals in self whose .ab6
              attributes contain code ]
        ImageCatalog.readFile(f):    # Static method
          [ f names a readable file valid against birdimages.rnc,
            defaulting to DEFAULT_FILENAME ->
              return a new ImageCatalog representing that file ]

Here are the class's internal state items.

birdimages.py
      State/Invariants:
        .__catNoMap:
          [ a dictionary whose values are the Originals in self,
            and each key is the value's .catNo ]
        .__ab6Map:
          [ a dictionary whose keys are all the birdId strings
            that appear in self's .ab6 attributes (uppercased and
            right-blank-padded to full length), and each
            corresponding value is a list of Originals
            that contain that key in their .ab6 attributes ]
    """

The .__ab6Map dictionary exists to support the .genAb6() method. Note that a given Original can appear in more than one of the lists that are values of the .__ab6Map dictionary. For example, an original with the XML attribute “ab6="virrai sora"” would appear in the lists for both .__ab6Map["VIRRAI"] and .__ab6Map["SORA "].

7.5. ImageCatalog.__init__(): Constructor

This trivial constructor simply creates the two internal dictionaries, initially empty.

birdimages.py
# - - -   I m a g e C a t a l o g . _ _ i n i t _ _

    def __init__(self):
        """Constructor for ImageCatalog.
        """
        self.__catNoMap  =  {}
        self.__ab6Map    =  {}

7.6. ImageCatalog.addOriginal(): Add a new catalog entry

This method takes an Original instance and stores it in self.

birdimages.py
# - - -   I m a g e C a t a l o g . a d d O r i g i n a l

    def addOriginal(self, o):
        """Add an original to the catalog.
        """

First we add the new entry to the .__catNoMap dictionary. Duplicates are not allowed, so we check to insure there wasn't already an entry for that catalog number.

birdimages.py
        #-- 1
        # [ if self.__catNoMap has an entry for o.catNo ->
        #     raise KeyError
        #   else ->
        #     self.__catNoMap[o.catNo]  =  o ]
        if  self.__catNoMap.has_key(o.catNo):
            raise KeyError, "Duplicate catalog number '%s'" % o.catNo
        self.__catNoMap[o.catNo]  =  o

Adding the new entry to the .__ab6Map dictionary is a bit more complicated. Because the XML ab6 attribute can have multiple codes separated by spaces, we must use the Python .split() function to get a set of code strings. For example, if the original attribute is "buwtea^cintea norsho?", that will be indexed on two strings, "buwtea^cintea" and "norsho?".

Then, the first time we observe a code, we set up the dictionary value with a new list containing the Original, but once we've seen that code, we append the code to the list.

birdimages.py
        #-- 2
        # [ self.__ab6Map  +:=  entries mapping code |-> o
        #       for all codes in o.ab6 ]
        codeList  =  o.ab6.split()
        for code in codeList:
            key  =  code.rstrip().upper()
            try:
                self.__ab6Map[key].append(o)
            except KeyError:
                self.__ab6Map[key]  =  [o]

7.7. ImageCatalog.getOriginal(): Retrieve an original by catalog number

This method uses the .__catNoMap dictionary to look up the original by catalog number. It raises KeyError if the dictionary does not have that key value.

birdimages.py
# - - -   I m a g e C a t a l o g . g e t O r i g i n a l

    def getOriginal(self, catNo):
        """Retrieve the original with a given catalog number.
        """
        return self.__catNoMap[catNo]

7.8. ImageCatalog.genOriginals(): Generate all catalog entries

This method first extracts a list of all the keys in the .__catNoMap dictionary, then sorts them, then generates the values using that sorted list.

birdimages.py
# - - -   I m a g e C a t a l o g . g e n O r i g i n a l s

    def genOriginals(self):
        """Generate all originals in catalog order"""
        keyList  =  self.__catNoMap.keys()
        keyList.sort()
        for key in keyList:
            yield self.__catNoMap[key]
        raise StopIteration

7.9. ImageCatalog.genAb6(): Generate all entries with a given bird code

If the catalog has any entries for a given code, that code will be a key in the .__ab6Map dictionary, and the corresponding value will be a list containing the matching Original instances, which we then generate. If there is no such key, we raise KeyError.

birdimages.py
# - - -   I m a g e C a t a l o g . g e n A b 6

    def genAb6(self, code):
        """Retrieve originals with a given bird code.
        """

        #-- 1
        # [ if self.__ab6Map has a key that matches code ->
        #     resultList  :=  the corresponding value
        #   else -> raise KeyError ]
        resultList  =  self.__ab6Map[code.rstrip().upper()]

        #-- 2
        # [ generate the elements of resultList ]
        for  result in resultList:
            yield result
        raise StopIteration

7.10. ImageCatalog.readFile(): Read an XML file

This static method takes a file name as an argument and, assuming the file is well-formed, builds an et.ElementTree instance representing the file. It then walks the tree, converting each original element to a catalog entry and adding it to self.

birdimages.py
# - - -   I m a g e C a t a l o g . r e a d F i l e   - - -      Static

    def readFile(fileName=DEFAULT_FILENAME):
        """Read an XML file, return it as an ImageCatalog.
        """
        #-- 1
        # [ if fileName is a readable, well-formed XML file ->
        #     doc  :=  an et.ElementTree instance representing the file
        #   else -> raise IOError ]
        try:
            doc  =  et.parse(fileName)
        except IOError, detail:
            raise IOError("Can't read the catalog file '%s': %s" %
                (fileName, detail))
        except et.XMLSyntaxError, detail:
            raise IOError("Catalog file '%s' not well-formed: %s" %
                (fileName, detail))

First we instantiate a new, empty ImageCatalog object to which we can add the entries from the tree.

birdimages.py
        #-- 2
        # [ cat    :=  a new, empty ImageCatalog object ]
        cat  =  ImageCatalog()

To get all the RNC_ORIGINAL_N children of doc, we'll use the .getiterator() function.

birdimages.py
        #-- 3
        # [ cat  :=  cat with Original instances added, made from
        #            the RNC_ORIGINAL_N children of doc ]
        for oNode in doc.getiterator(RNC_ORIGINAL_N):

For the logic that converts the XML representation of each catalog entry into an Original object, see Section 7.13, “Original.readNode(): Build a catalog entry from an Element node”.

birdimages.py
            #-- 3 loop
            # [ oNode is an RNC_ORIGINAL_N node ->
            #     result  :=  result with a new original added
            #                 made from oNode ]
            cat.addOriginal(Original.readNode(oNode))

        #-- 4
        return cat

    readFile  =  staticmethod(readFile)

7.11. class Original: One catalog entry

Each instance of this class represents one XML original element. Here is the class interface:

birdimages.py
# - - - - -   c l a s s   O r i g i n a l   - - - - -

class Original:
    """Represents one image catalog entry.

      Exports:
        Original(catNo, ab6, state, qual='', loc='', note='',
                   film='', light='', beh='', desc='', pose=''):
          [ (catNo is the catalog number as a string) and
            (ab6 is a space-separated list of bird-ID strings) and            
            (state is a two-letter US postal code) and
            (qual is a quality rating or '') and
            (loc is locality text or '') and
            (note is note text or '') and
            (film contains filmstock comments or '') and
            (light contains lighting comments or '') and
            (beh contains behavior comments or '') and
            (desc contains plumage details or '') and
            (pose contains pose comments or '') ->
              return a new Original object containing those
              values ]
      .catNo:      [ as passed to constructor, read-only ]
      .ab6:        [ as passed to constructor, read-only ]
      .state:      [ as passed to constructor, read-only ]
      .qual:       [ as passed to constructor, read-only ]
      .loc:        [ as passed to constructor, read-only ]
      .note:       [ as passed to constructor, read-only ]
      .film:       [ as passed to constructor, read-only ]
      .light:      [ as passed to constructor, read-only ]
      .beh:        [ as passed to constructor, read-only ]
      .desc:       [ as passed to constructor, read-only ]
      .pose:       [ as passed to constructor, read-only ]
      Original.readNode(node):  # Static method
        [ node is an RNC_ORIGINAL_N et.Element ->
            if node is valid against birdimages.rnc ->
              return a new Original object representing that
              element
            else -> raise IOError ]
    """

7.12. Original.__init__(): Constructor

This straightforward constructor just stores all the argument values in the instance.

birdimages.py
# - - -   O r i g i n a l . _ _ i n i t _ _

    def __init__(self, catNo, ab6, state, qual='', scan='',
        loc='', note='', film='', light='', beh='', desc='', pose=''):
        """Constructor for Original.
        """
        self.catNo  =  catNo
        self.ab6    =  ab6
        self.state  =  state
        self.qual   =  qual
        self.scan   =  scan
        self.loc    =  loc
        self.note   =  note
        self.film   =  film
        self.light  =  light
        self.beh    =  beh
        self.desc   =  desc
        self.pose   =  pose        

7.13. Original.readNode(): Build a catalog entry from an Element node

This static method operates on an et.Element instance that represents an original element. Assuming that it is valid, it returns a new Original instance representing that element.

birdimages.py
# - - -   O r i g i n a l . r e a d N o d e   - - -   Static method

    def readNode(node):
        """Translate an original element into an Original object.
        """

We could do a lot of error checking here, and our intended function entitles us to throw an IOError exception if the file isn't valid. However, at the moment I prepare the files using nxml-emacs, which continuously validates the file. This allows us to assume here that everything is valid.

Because an et.Element's .attrib attribute works like a dictionary, we can use the usual dictionary .get() method to supply default values for missing attributes.

birdimages.py
        #-- 1
        catNo  =  node.attrib.get(RNC_CAT_NO_A, None)
        ab6    =  node.attrib.get(RNC_AB6_A, None)
        state  =  node.attrib.get(RNC_STATE_A, None)
        qual   =  node.attrib.get(RNC_QUAL_A, None)
        rawScan  =  node.attrib.get(RNC_SCAN_A, None)

        if  rawScan:  scan  =  int(rawScan)
        else:         scan  =  None

For the values that live in child nodes, we use Section 7.14, “childText(): Get text from a child node”.

birdimages.py
        #-- 2
        loc    =  childText(node, RNC_LOC_N)
        note   =  childText(node, RNC_NOTE_N)
        film   =  childText(node, RNC_FILM_N)
        light  =  childText(node, RNC_LIGHT_N)
        beh    =  childText(node, RNC_BEH_N)
        desc   =  childText(node, RNC_DESC_N)
        pose   =  childText(node, RNC_POSE_N)

        #-- 3
        return Original(catNo, ab6, state, qual, scan, loc, note,
                          film, light, beh, desc, pose)

    readNode  =  staticmethod(readNode)

7.14. childText(): Get text from a child node

This utility function looks for a child node with a given name and, if one is found, returns all the text nodes in and under that child node. If there is no child by that name, it returns an empty string.

birdimages.py
# - - -   c h i l d T e x t

def childText(node, childName):
    """Return the textual content of a child node, if any.

      [ (node is an et.Element) and
        (childName is a string) ->
          if node has any child nodes named childName ->
            return a Unicode string containing the concatenation
            of all text node descendants of those children
          else -> return '' ]
    """

First we use an XPath expression to get a list of the matching child nodes.

birdimages.py
    #-- 1
    # [ node is an et.Element ->
    #     childList  :=  a list of all children of node named childName ]
    childList  =  node.xpath(childName)

Whether this list is empty or not, we then create a list containing the text from each entry. Then we concatenate the elements of that list and return that as a result.

birdimages.py
    #-- 2
    # [ childList is a node-set ->
    #     textList  :=  a list of the text descendants from each
    #                   node in childList ]
    textList  =  [ nodeText(c) for c in childList ]

    #-- 3
    return "".join(textList)

7.15. nodeText(): Get text from an element

This helper function takes as an argument an et.Element instance, and returns a Unicode string containing the concatenation of the text content of that node and all its descendants.

birdimages.py
# - - -   n o d e T e x t

def nodeText(node):
    '''Returns text in and under an et.Element, as Unicode.

      [ node is an et.Element ->
          return the concatenation of all descendant Text nodes
          of node ]
    '''

First, we use an XPath expression to find all the text nodes under the given node. My first attempt at an XPath expression was "text()", but that returns only the immediate text children of a node. Adding the axis specifier "descendant-or-self::" applies the text() function to the node and all its descendants. Finally, we concatenate the strings.

birdimages.py
    #-- 1
    # [ textList  :=  a list of all text descendants of
    #                 node, in document order ]
    return ''.join(node.xpath('descendant-or-self::text()'))

7.16. cattest: A test driver for ImageCatalog

This small script instantiates an ImageCatalog object and does these tests:

  • Use .getOriginal() to retrieve an image we know to be in there (2005-09-05-0003).

  • Use .genAb6() to retrieve all images of Yellow Warbler (yelwar), a modest number.

  • Dump the entire catalog using .genOriginals().

The script starts with the usual Unix script prologue line and our imports.

cattest
#!/usr/bin/env python
#================================================================
# cattest:  Test the ImageCatalog object.
#   Do not edit this file directly.  It is extracted mechanically
#   from the documentation:
#     http://www.nmt.edu/~john/scans/slides/ims/
#----------------------------------------------------------------

from birdimages import *

Next comes the main.

cattest
# - - -   m a i n

def main():
    """Main test driver.
    """
    cat = ImageCatalog.readFile()

    print "=== Test: .getOriginal('2005-09-05-0003')"
    orig = cat.getOriginal('2005-09-05-0003')
    showOrig(orig)

    print "\n\n=== Test: genAb6('yelwar')"
    warblerList = [x for x in cat.genAb6('yelwar')]
    for warbler in warblerList:
        showOrig(warbler)

    print "\n\n=== Test: Generate all"
    for  o in cat.genOriginals():
        showOrig(o)

The showOrig() function displays all the components of an Original.

cattest
# - - -   s h o w O r i g

def showOrig(orig):
    """Display the contents of an Original object.
    """
    loc = orig.loc.encode('ascii', 'xmlcharrefreplace')
    print("\n#%s (%s) %s: %s" % 
            (orig.catNo, orig.ab6, orig.state, loc)),
    if  orig.qual:  print "  qual=%s" % orig.qual,
    if  orig.note:  print "  note=%s" % orig.note,
    if  orig.film:  print "  film=%s" % orig.film,
    if  orig.light: print "  light=%s" % orig.light,
    if  orig.beh:   print "  beh=%s" % orig.beh,
    if  orig.desc:  print "  desc=%s" % orig.desc,
    if  orig.pose:  print "  pose=%s" % orig.pose,
    print

Finally, the epilogue, calling the main() defined earlier.

cattest
#================================================================
# Epilogue
#----------------------------------------------------------------

if __name__ == "__main__":
    main()