This section contains the actual code of the birdimages.py module in
lightweight literate form. For more information on this
methodology, see the author's Lightweight Literate
Programming page.
The birdimages.py module starts with a brief module comment that
points back to this documentation.
"""birdimages.py: Python object for XML files using birdimages.rnc
For documentation, see:
http://www.nmt.edu/~john/www/scans/slides/ims/
"""
As always, we need the sys module for
access to standard I/O streams.
#================================================================ # Imports #---------------------------------------------------------------- import sys
To process the XML input file, we use the technique
described in Python XML processing with lxml. We'll use the name et for that implementation of the ElementTree interface.
from lxml import etree as et
We'll also need global declarations for all the XML
element and attribute names from our RNC schema. Rather
than attempt to maintain these declarations in parallel
with the schema itself, we use the tool described in
pyrang: A
single-sourcing tool for Python-XML applications
. This program reads the Relax NG
version of the schema file and writes a module named
rnc_slidecat.py containing Python
statements that set up the value of these variables.
The variables generated by pyrang have this general form:
RNC_name_suffix
where the is the element or attribute name, and the name is
“suffixN” for element names and
“A” for attribute names.
For example, the variable for the original
element is “RNC_ORIGINAL_N”.
from rnc_slidecat import *
There is one global constant (other than the ones
imported from rnc_slidecat.py):
the default image catalog name.
#================================================================ # Manifest constants #---------------------------------------------------------------- DEFAULT_FILENAME = "birdimages.xml"
An instance of this class represents the entire catalog
file. The constructor is not intended for direct
instantation, and returns only an empty catalog. The
static method ImageCatalog.readFile() does
all the work of filling the empty catalog object from the
XML input.
Here is the class declaration and external interface.
# - - - - - c l a s s I m a g e C a t a l o g - - - - -
class ImageCatalog:
"""Represents the entire catalog.
Exports:
ImageCatalog():
[ returns a new, empty ImageCatalog object ]
.addOriginal(o):
[ o is an Original object ->
if self contains an original with the same catalog number
as o ->
raise KeyError
else ->
self := self with o added ]
.getOriginal(catNo):
[ catNo is a catalog number as a string ->
if self has an original whose catalog number
matches catNo ->
return that original as an Original object
else -> raise KeyError ]
.genOriginals():
[ generate the Originals in self in catalog number order ]
.genAb6(code):
[ code is a birdId string ->
generate the Originals in self whose .ab6
attributes contain code ]
ImageCatalog.readFile(f): # Static method
[ f names a readable file valid against birdimages.rnc,
defaulting to DEFAULT_FILENAME ->
return a new ImageCatalog representing that file ]
Here are the class's internal state items.
State/Invariants:
.__catNoMap:
[ a dictionary whose values are the Originals in self,
and each key is the value's .catNo ]
.__ab6Map:
[ a dictionary whose keys are all the birdId strings
that appear in self's .ab6 attributes (uppercased and
right-blank-padded to full length), and each
corresponding value is a list of Originals
that contain that key in their .ab6 attributes ]
"""
The .__ab6Map dictionary exists to support
the .genAb6() method. Note that a given
Original can appear in more than one of
the lists that are values of the .__ab6Map
dictionary. For example, an original with the XML
attribute “ab6="virrai sora"” would appear in the lists for both .__ab6Map["VIRRAI"] and .__ab6Map["SORA "].
This trivial constructor simply creates the two internal dictionaries, initially empty.
# - - - I m a g e C a t a l o g . _ _ i n i t _ _ - - -
def __init__ ( self ):
"""Constructor for ImageCatalog.
"""
self.__catNoMap = {}
self.__ab6Map = {}
This method takes an Original instance and
stores it in self.
# - - - I m a g e C a t a l o g . a d d O r i g i n a l - - -
def addOriginal ( self, o ):
"""Add an original to the catalog.
"""
First we add the new entry to the .__catNoMap dictionary. Duplicates are not
allowed, so we check to insure there wasn't already an
entry for that catalog number.
#-- 1 --
# [ if self.__catNoMap has an entry for o.catNo ->
# raise KeyError
# else ->
# self.__catNoMap[o.catNo] = o ]
if self.__catNoMap.has_key(o.catNo):
raise KeyError, "Duplicate catalog number '%s'" % o.catNo
self.__catNoMap[o.catNo] = o
Adding the new entry to the .__ab6Map
dictionary is a bit more complicated. Because the XML
ab6 attribute can have multiple codes
separated by spaces, we must use the Python .split() function to get a set of code strings.
For example, if the original attribute is
"buwtea^cintea norsho?", that will be indexed on two
strings, "buwtea^cintea" and "norsho?".
Then, the first time we observe a code, we set up the
dictionary value with a new list containing the Original, but once we've seen that code, we
append the code to the list.
#-- 2 --
# [ self.__ab6Map +:= entries mapping code |-> o
# for all codes in o.ab6 ]
codeList = o.ab6.split()
for code in codeList:
key = code.rstrip().upper()
try:
self.__ab6Map[key].append(o)
except KeyError:
self.__ab6Map[key] = [o]
This method uses the .__catNoMap
dictionary to look up the original by catalog number. It
raises KeyError if the dictionary does not
have that key value.
# - - - I m a g e C a t a l o g . g e t O r i g i n a l - - -
def getOriginal ( self, catNo ):
"""Retrieve the original with a given catalog number.
"""
return self.__catNoMap[catNo]
This method first extracts a list of all the keys in the
.__catNoMap dictionary, then sorts them,
then generates the values using that sorted list.
# - - - I m a g e C a t a l o g . g e n O r i g i n a l s - - -
def genOriginals ( self ):
"""Generate all originals in catalog order"""
keyList = self.__catNoMap.keys()
keyList.sort()
for key in keyList:
yield self.__catNoMap[key]
raise StopIteration
If the catalog has any entries for a given code, that
code will be a key in the .__ab6Map
dictionary, and the corresponding value will be a list
containing the matching Original
instances, which we then generate. If there is no such
key, we raise KeyError.
# - - - I m a g e C a t a l o g . g e n A b 6 - - -
def genAb6 ( self, code ):
"""Retrieve originals with a given bird code.
"""
#-- 1 --
# [ if self.__ab6Map has a key that matches code ->
# resultList := the corresponding value
# else -> raise KeyError ]
resultList = self.__ab6Map[code.rstrip().upper()]
#-- 2 --
# [ generate the elements of resultList ]
for result in resultList:
yield result
raise StopIteration
This static method takes a file name as an argument and,
assuming the file is well-formed, builds an et.ElementTree instance representing the file.
It then walks the tree, converting each original element to a catalog entry and adding
it to self.
# - - - I m a g e C a t a l o g . r e a d F i l e - - - Static
def readFile ( fileName=DEFAULT_FILENAME ):
"""Read an XML file, return it as an ImageCatalog.
"""
#-- 1 --
# [ if fileName is a readable, well-formed XML file ->
# doc := an et.ElementTree instance representing the file
# else -> raise IOError ]
try:
doc = et.parse ( fileName )
except IOError, detail:
raise IOError, ( "Can't read the catalog file '%s': %s" %
(fileName, detail) )
except et.XMLSyntaxError, detail:
raise IOError, ( "Catalog file '%s' not well-formed: %s" %
(fileName, detail) )
First we instantiate a
new, empty ImageCatalog object to which we
can add the entries from the tree.
#-- 2 --
# [ cat := a new, empty ImageCatalog object ]
cat = ImageCatalog()
To get all the RNC_ORIGINAL_N children of
doc, we'll use the .getiterator() function.
#-- 3 --
# [ cat := cat with Original instances added, made from
# the RNC_ORIGINAL_N children of doc ]
for oNode in doc.getiterator ( RNC_ORIGINAL_N ):
For the logic that
converts the XML representation of each catalog entry
into an Original object, see
Section 6.13, “Original.readNode(): Build a
catalog entry from an Element node”.
#-- 3 loop --
# [ oNode is an RNC_ORIGINAL_N node ->
# result := result with a new original added
# made from oNode ]
cat.addOriginal ( Original.readNode ( oNode ) )
#-- 4 --
return cat
readFile = staticmethod ( readFile )
Each instance of this class represents one XML original element. Here is the class interface:
# - - - - - c l a s s O r i g i n a l - - - - -
class Original:
"""Represents one image catalog entry.
Exports:
Original ( catNo, ab6, state, qual='', loc='', note='',
film='', light='', beh='', desc='', pose='' ):
[ (catNo is the catalog number as a string) and
(ab6 is a space-separated list of bird-ID strings) and
(state is a two-letter US postal code) and
(qual is a quality rating or '') and
(loc is locality text or '') and
(note is note text or '') and
(film contains filmstock comments or '') and
(light contains lighting comments or '') and
(beh contains behavior comments or '') and
(desc contains plumage details or '') and
(pose contains pose comments or '') ->
return a new Original object containing those
values ]
.catNo: [ as passed to constructor, read-only ]
.ab6: [ as passed to constructor, read-only ]
.state: [ as passed to constructor, read-only ]
.qual: [ as passed to constructor, read-only ]
.loc: [ as passed to constructor, read-only ]
.note: [ as passed to constructor, read-only ]
.film: [ as passed to constructor, read-only ]
.light: [ as passed to constructor, read-only ]
.beh: [ as passed to constructor, read-only ]
.desc: [ as passed to constructor, read-only ]
.pose: [ as passed to constructor, read-only ]
Original.readNode(node): # Static method
[ node is an RNC_ORIGINAL_N et.Element ->
if node is valid against birdimages.rnc ->
return a new Original object representing that
element
else -> raise IOError ]
"""
This straightforward constructor just stores all the argument values in the instance.
# - - - O r i g i n a l . _ _ i n i t _ _ - - -
def __init__ ( self, catNo, ab6, state, qual='', scan='',
loc='', note='', film='', light='', beh='', desc='', pose='' ):
"""Constructor for Original.
"""
self.catNo = catNo
self.ab6 = ab6
self.state = state
self.qual = qual
self.scan = scan
self.loc = loc
self.note = note
self.film = film
self.light = light
self.beh = beh
self.desc = desc
self.pose = pose
This static method operates on an et.Element instance that represents an original
element. Assuming that it is valid, it returns a new
Original instance representing that
element.
# - - - O r i g i n a l . r e a d N o d e - - - Static method
def readNode ( node ):
"""Translate an original element into an Original object.
"""
We could do a lot of error checking here, and our
intended function entitles us to throw an IOError exception if the file isn't valid.
However, at the moment I prepare the files using
nxml-emacs, which
continuously validates the file. This allows us to
assume here that everything is valid.
Because an et.Element's .attrib attribute works like a dictionary, we
can use the usual dictionary .get() method
to supply default values for missing attributes.
#-- 1 --
catNo = node.attrib.get ( RNC_CAT_NO_A, None )
ab6 = node.attrib.get ( RNC_AB6_A, None )
state = node.attrib.get ( RNC_STATE_A, None )
qual = node.attrib.get ( RNC_QUAL_A, None )
rawScan = node.attrib.get ( RNC_SCAN_A, None )
if rawScan: scan = int ( rawScan )
else: scan = None
For the values that live in child nodes, we use Section 6.14, “childText(): Get text from a child node”.
#-- 2 --
loc = childText ( node, RNC_LOC_N )
note = childText ( node, RNC_NOTE_N )
film = childText ( node, RNC_FILM_N )
light = childText ( node, RNC_LIGHT_N )
beh = childText ( node, RNC_BEH_N )
desc = childText ( node, RNC_DESC_N )
pose = childText ( node, RNC_POSE_N )
#-- 3 --
return Original ( catNo, ab6, state, qual, scan, loc, note,
film, light, beh, desc, pose )
readNode = staticmethod ( readNode )
This utility function looks for a child node with a given name and, if one is found, returns all the text nodes in and under that child node. If there is no child by that name, it returns an empty string.
# - - - c h i l d T e x t - - -
def childText ( node, childName ):
"""Return the textual content of a child node, if any.
[ (node is an et.Element) and
(childName is a string) ->
if node has any child nodes named childName ->
return a Unicode string containing the concatenation
of all text node descendants of those children
else -> return '' ]
"""
First we use an XPath expression to get a list of the matching child nodes.
#-- 1 --
# [ node is an et.Element ->
# childList := a list of all children of node named childName ]
childList = node.xpath ( childName )
Whether this list is empty or not, we then create a list containing the text from each entry. Then we concatenate the elements of that list and return that as a result.
#-- 2 --
# [ childList is a node-set ->
# textList := a list of the text descendants from each
# node in childList ]
textList = [ nodeText(c) for c in childList ]
#-- 3 --
return "".join ( textList )
This helper function takes as an argument an et.Element instance, and returns a Unicode string
containing the concatenation of the text content of that
node and all its descendants.
# - - - n o d e T e x t - - -
def nodeText ( node ):
'''Returns text in and under an et.Element, as Unicode.
[ node is an et.Element ->
return the concatenation of all descendant Text nodes
of node ]
'''
First, we use an XPath expression to find all the text
nodes under the given node. My first
attempt at an XPath expression was "text()", but that returns only the immediate text children of a
node. Adding the axis specifier "descendant-or-self::" applies the text() function to the node and all its
descendants. Finally, we concatenate the strings.
#-- 1 --
# [ textList := a list of all text descendants of
# node, in document order ]
return ''.join ( node.xpath ( 'descendant-or-self::text()' ) )
This small script instantiates an ImageCatalog object and does these tests:
Use .getOriginal() to retrieve an
image we know to be in there (2005-09-05-0003).
Use .genAb6() to retrieve all
images of Yellow Warbler (yelwar), a
modest number.
Dump the entire catalog using .genOriginals().
The script starts with the usual Unix script prologue line and our imports.
#!/usr/bin/env python #================================================================ # cattest: Test the ImageCatalog object. # For documentation, see: # http://www.nmt.edu/~john/www/scans/slides/ims/ #---------------------------------------------------------------- from birdimages import *
Next comes the main.
# - - - m a i n - - -
def main():
"""Main test driver.
"""
cat = ImageCatalog.readFile()
print "=== Test: .getOriginal('2005-09-05-0003')"
orig = cat.getOriginal ( '2005-09-05-0003' )
showOrig ( orig )
print "\n\n=== Test: genAb6('yelwar')"
warblerList = [x for x in cat.genAb6('yelwar')]
for warbler in warblerList:
showOrig ( warbler )
print "\n\n=== Test: Generate all"
for o in cat.genOriginals():
showOrig ( o )
The showOrig() function displays all the
components of an Original.
# - - - s h o w O r i g - - -
def showOrig ( orig ):
"""Display the contents of an Original object.
"""
print ( "\n#%s (%s) %s: %s" %
(orig.catNo, orig.ab6, orig.state, orig.loc ) ),
if orig.qual: print " qual=%s" % orig.qual,
if orig.note: print " note=%s" % orig.note,
if orig.film: print " film=%s" % orig.film,
if orig.light: print " light=%s" % orig.light,
if orig.beh: print " beh=%s" % orig.beh,
if orig.desc: print " desc=%s" % orig.desc,
if orig.pose: print " pose=%s" % orig.pose,
print
Finally, the epilogue, calling the main()
defined earlier.
#================================================================
# Epilogue
#----------------------------------------------------------------
if __name__ == "__main__":
main()