The actual archx.py script follows.
The script starts with a comment block pointing back here to the documentation, and two variables for the program name and version number.
#!/usr/local/bin/python #================================================================ #archx.py: Index one archive directory of bird images. # For documentation, see: # http://www.nmt.edu/~john/slides/archx/ #---------------------------------------------------------------- PROGRAM_NAME = "archx.py" EXTERNAL_VERSION = "0.0"
Next comes imports. First, standard Python modules. We
need sys to get the command line
arguments and standard streams. We also need os to read directories.
#================================================================ # Imports #---------------------------------------------------------------- import sys import os
The Python Imaging Library deals with images: it can size an image and make a thumbnail. For more information, see Python Imaging Library (PIL).
import Image
Next we'll need the author's module for generating XML. For sources and documentation, see Python and the XML Document Object Model (DOM) with 4Suite.
import xml4create as xc
The birdimages.py module is an
interface to the birdimages.xml file that allows us to look
up catalog numbers.
import birdimages
The next import needs a little explanation. In the code
that refers to XML, we don't want to use string constants
for element or attribute names like image
and cat-no. Preferred practice is to
declare a global, manifest constant for each element or
attribute name, so that if the schema changes, we can
rapidly locate all references to the changed name. So we
use a tool named pyrang that
extracts all the element and attribute names from a
schema, and generates Python assignment statements for
each one. See pyrang: A single-sourcing
tool for Python-XML applications. The generated
file is named rnc_archx.py, and
the generated names are prefixed with the string
“RNC_”, and
suffixed with “_N”
for element names and “_A” for attribute names. For example, the name of
the image element is RNC_IMAGE_N, and the name of the cat-no attribute is RNC_CAT_NO_A.
from rnc_archx import *
This section defines constants used throughout the script.
#================================================================ # Manifest constants #----------------------------------------------------------------
This string is prefixed to the archive number to get the name of the archive directory.
ARCHIVE_PREFIX = "bird-"
This is the name of the XML file representing the catalog of bird images that we check to insure all archived images are cataloged.
BIRD_CATALOG_NAME = "birdimages.xml"
Name of the subdirectory where thumbnail images are written.
THUMB_DIR = "thumb/"
Maximum width of a thumbnail image in pixels.
THUMB_WIDE = 200
Maximum height of a thumbnail image in pixels.
THUMB_HIGH = 200
This script is written under a blanket precondition
that we have write access to the thumbnail directory,
THUMB_DIR, and the index
directory, INDEX_DIR.
Here is the main program, including an intended function for the script as a whole.
#================================================================
# Functions and classes
#----------------------------------------------------------------
# - - - m a i n - - -
def main():
"""Main program.
[ let
archive-set == set of archive directories named
in command line arguments
file-set == set of image files in archive directories
named in command line arguments
in:
sys.stdout +:= report listing files in file-set
that appear to be bird images but are not in
the bird image catalog
thumbnail directory := thumbnail directory with
thumbnail images made from file-set
index directory := index directory with arch-NNN.xml
files added describing archive-set ]
"""
First we must read the bird image catalog, which will
be used to verify that all the archived images are
properly cataloged. The .readFile() method is a static method in
class ImageCatalog that reads
the XML serialization of the image catalog and returns
it as an ImageCatalog object.
#-- 1 --
# [ BIRD_CATALOG_NAME is a readable file valid against
# birdimages.rnc ->
# imageCatalog := an ImageCatalog object representing
# BIRD_CATALOG_NAME ]
imageCatalog = birdimages.ImageCatalog.readFile (
BIRD_CATALOG_NAME )
All that remains is to step through the
archive names given as command line arguments, and process
each one.
See Section 4.5, “processArchive(): Process
the contents of one archive”.
#-- 2 --
for archNo in sys.argv[1:]:
#-- 2 body --
# [ sys.stdout +:= report listing image files in
# archive (archNo) that appear to be bird images
# but are not in the bird image catalog
# thumbnail directory := thumbnail directory with
# thumbnail images made image files in archive (archNo)
# index directory := index directory with arch-(archNo).xml
# files added describing image files in archive (archNo) ]
processArchive ( imageCatalog, archNo )
This function does all the processing for one archive directory full of images.
# - - - p r o c e s s A r c h i v e - - -
def processArchive ( imageCatalog, archNo ):
"""Process one archive directory.
[ (imageCatalog is a birdimages.ImageCatalog instance) and
(archNo is the numeric part of an archive directory name) ->
sys.stdout +:= report listing image files in
archive (archNo) that appear to be bird images
but are not in imageCatalog
thumbnail directory := thumbnail directory with
thumbnail images made image files in archive (archNo)
index directory := index directory with an
arch-(archNo).xml file added describing image files
in archive (archNo) ]
"""
The argument is the archive number, which must be
appended to ARCHIVE_PREFIX to
get the name of the archive directory. We also allocate
an empty list imagexList that
will accumulate descriptions of each bird image as a
sequence of Imagex objects; see
Section 4.8, “class Imagex: An object to
describe one image”.
#-- 1 --
# [ archDir := directory name for archive (archNo )
# imagexList := a new, empty list ]
archDir = ARCHIVE_PREFIX + archNo
imagexList = []
We use the standard library os.listdir() function to get a list of all
the files in that directory. Just for neatness, we'll
then sort it.
#-- 2 --
# [ fileList := list of files in directory (archDir), sorted
fileList = os.listdir ( archDir )
fileList.sort()
For each file name in fileList,
we call Section 4.6, “processFile(): Check one
image file” to perform the
processing steps for that file, including the generation
of an Imagex object that will
hold the information we need to write the index file.
#-- 3 --
# [ imagexList +:= Imagex objects representing the
# files in fileList that are valid bird images and
# indexed in imageCatalog
# sys.stdout +:= report of bird images in fileList that are
# not in imageCatalog ]
for fileName in fileList:
#-- 3 loop --
# [ if (archDir+fileName) is a bird image in imageCatalog ->
# imagexList +:= an Imagex object representing
# that image
# thumbnail directory +:= a thumbnail of that image
# else if (archDir+fileName) is a bird image but not
# in imageCatalog ->
# sys.stdout +:= (message about uncataloged image)
# else -> I ]
The body of the loop has three steps. First we build the
relative path to the image file. Then we call Section 4.6, “processFile(): Check one
image file”, which returns an Imagex object if everything went okay;
otherwise it returns None. Then
if the return value is not None,
we can append it to imagexList.
See Section 4.6, “processFile(): Check one
image file”.
#-- 3.1 --
# [ pathName := archDir + fileName ]
pathName = os.path.join ( archDir, fileName )
#-- 3.2 --
# [ if pathName is a bird image in imageCatalog ->
# thumbnail directory +:= a thumbnail of that
# image
# result := an Imagex object representing that
# image
# else if pathName is a bird image not in imageCatalog
# or not a bird image ->
# sys.stdout +:= error message
# result := None
# else ->
# result := None ]
result = processFile ( imageCatalog, pathName )
#-- 3.3 --
if result is not None:
imagexList.append ( result )
All that remains is to write the index file for this
archive. Needed for Section 4.7, “writeIndex(): Output the
index for one archive directory”
are two items: the archive number, and the list
imagexList containing the
details of the valid, cataloged images.
#-- 4 --
# [ imagexList is a list of Imagex objects ->
# index directory := index directory with an
# (archDir+".xml") file added representing imagexList ]
writeIndex ( archDir, imagexList )
This function is called to check one file name. If the file isn't a bird image, it gets ignored. If its name resembles that of a bird image, it is checked to make sure it's in the catalog.
# - - - p r o c e s s F i l e - - -
def processFile ( imageCatalog, pathName ):
"""Check one file and, if valid, return an Imagex object.
[ (imageCatalog is an ImageCatalog) and
(pathName is a nonempty string) ->
if (pathName looks like a bird image name) and
(pathName is a catalog number in ImageCatalog) and
(pathname names a readable image file) ->
thumbnail directory +:= a thumbnail of that image
return an Imagex object representing that image
else if (pathName looks like a bird image name) and
((pathName is not a catalog number in ImageCatalog) or
(pathname does not name a readable image file)) ->
sys.stdout +:= error message
return None
else ->
return None ]
"""
First we disassemble the full path name of the image
file, saving its file name (minus the extension) in
baseName.
#-- 1 --
# [ baseName := pathName minus its path component and
# file extension ]
dirPath, fileName = os.path.split ( pathName )
baseName, extension = os.path.splitext ( fileName )
At this writing, all images have one of two file name formats. Bird images have a year-month-day format:
yyyymmddxnn
The part is the film frame number.
The nn character is usually a period, but can be a lowercase
letter when there are multiple images on the same day
with the same frame number.
x
Nonbird images are prefixed with the letter
“n”:
nyyyymmddxnn
So at this point in time, the test for whether a file
represents a bird image is to see whether it starts with
a digit. If not, we can just return None; no error checking is done on nonbird
images.
There is one extra wrinkle. If the filename is a
“hidden file” starting with '.', baseName will
be the empty string. In that case the file is clearly
not an image file.
#-- 2 --
# [ if baseName is nonempty and starts with a letter ->
# I
# else -> return None ]
if ( ( len(baseName) == 0 ) or
( not ( baseName[0].isdigit() ) ) ):
return None
Next we check to see if the image has been cataloged. If
not, we write an error message and return None.
#-- 3 --
# [ if baseName is a catalog number in imageCatalog ->
# I
# else ->
# sys.stdout +:= error message
# return None ]
try:
orig = imageCatalog.getOriginal ( baseName )
except KeyError:
print "*** Uncataloged: %s" % pathName
return None
Next we use the Imagex class constructor
to read the image file and extract the width and height.
See Section 4.8, “class Imagex: An object to
describe one image”.
#-- 4 --
# [ if pathName names a readable, valid image file ->
# result := an Image object representing that image
# else -> raise IOError ]
result = Imagex ( pathName )
One more task remains: creation of the thumbnail image.
The path name of the thumbnail is THUMB_DIR+pathName. Converting the
full-sized image to a thumbnail is a single method call
on the Image object: the
.thumbnail() method takes a
2-tuple specifying the maximum width and height, and the
aspect ratio is preserved.
Technically, we have broken the encapsulation of the
Imagex object by replacing its
.image attribute with a
different image. However, since that attribute is not
used for anything except writing the index file after
this, no harm is done. Also, the steps we would need to
take to avoid this are computationally expensive: we
would need to make a copy of the entire image (many of
which run into the tens of megabytes) before reducing it
to a thumbnail.
#-- 5 --
# [ thumbPath := THUMB_DIR + baseName + THUMB_EXTENSION
# result := result with its image replaced by a
# thumbnail no larger than (THUMB_WIDE, THUMB_HIGH) ]
thumbPath = "%s%s%s" % (THUMB_DIR, baseName, THUMB_EXTENSION)
result.image.thumbnail ( (THUMB_WIDE, THUMB_HIGH) )
Finally, assuming that we can, we write the thumbnail
image. This could fail, but it falls under the blanket
precondition that we have write access to the thumbnail
directory. Assuming all that works, we can then return
the Imagex result to the caller.
#-- 6 --
# [ if thumbPath names a file that can be created new ->
# that file := result.image with its type determined
# by thumbPath's extension
# else -> raise IOError ]
result.image.save ( thumbPath )
#-- 7 --
return result
This function writes an XML file conforming to the
archx.rnc schema (see Section 3, “The archx.rnc schema”).
# - - - w r i t e I n d e x - - -
def writeIndex ( archDir, imagexList ):
"""Generate the XML index file.
[ (archNo is an archive directory name as a string) and
(imagexList is a list of Imagex objects) ->
index directory := index directory with an
arch-(archNo).xml file added representing
imagexList ]
"""
The XML generation technique uses the xmlcreate.py module; for more information,
see the importation of this module in Section 4.2, “Imports”.
We start by creating the document node. No <!DOCTYPE ...> will be attached to
this XML file.
#-- 1 --
# [ doc := a new DOM Document object with root element
# of type RNC_ARCHIVE_INDEX_N ]
doc = xc.Document ( RNC_ARCHIVE_INDEX_N )
Next we add child nodes to the root of this document,
one for each element of imagexList. The actual generation of these child nodes is done by
the .writeNode() method of the
Imagex object; see
Section 4.10, “Imagex.writeNode():
Translate self to XML”.
#-- 2 --
# [ imagexList is a list of Imagex objects ->
# doc.root := doc.root with nodes added representing
# those objects ]
for imagex in imagexList:
imagex.writeNode ( doc.root )
Finally, we write the resulting XML file to the index
directory. The name of the index file is INDEX_DIR + archDir.
#-- 3 --
# [ index directory := index directory with an
# (archDir+".xml") file added representing doc ]
fileName = "%s%s.xml" % (INDEX_DIR, archDir)
try:
indexFile = open ( fileName, "w" )
except IOError, detail:
print ( "*** Can't open index file '%s' for writing." %
fileName )
return
doc.write ( indexFile )
indexFile.close()
Each instance of this class holds the information about one image that we have indexed. The class knows how to add that information to a DOM tree for output as XML.
# - - - - - c l a s s I m a g e x - - - - -
class Imagex:
"""Represents information about one bird image.
Exports:
Imagex ( pathName ):
[ (pathName is a string) ->
if pathName names a readable, valid image file ->
return a new Imagex object representing the image
else ->
raise IOError ]
.pathName: [ as passed to constructor, read-only ]
.baseName:
[ self.pathName, stripped of its directory part and
extension ]
.image: [ the image as an Image.Image object ]
.wide: [ width in pixels as an integer ]
.high: [ height in pixels as an integer ]
.writeNode ( parent ):
[ parent is an xmlcreate.Element ->
parent := parent with a new RNC_IMAGE_N node added
representing self ]
"""
Given the path name of an image file, we need to find the image size. Here's the constructor interface:
# - - - I m a g e x . _ _ i n i t _ _ - - -
def __init__ ( self, pathName ):
"""Constructor for Imagex.
"""
#-- 1 --
# [ self.pathName = pathName
# self.baseName = pathName, stripped of its directory
# part and extension ]
self.pathName = pathName
discard, fileName = os.path.split ( pathName )
self.baseName, discard = os.path.splitext ( fileName )
Python's Image() module does all
the heavy lifting for us here. For documentation on this
module, see Python imaging library (PIL).
This module's Image.open()
method will raise an IOError
exception in two different cases: if the file is
inaccessible or nonexistent; and if the file does not
represent one of the image formats supported by the
Image module. In either case,
we pass the exception back to our caller. If the file is
readable and valid, we get back an Image object.
#-- 2 --
# [ if pathName names a readable, valid image file ->
# pic := an Image object representing that image
# else ->
# raise IOError ]
self.image = Image.open ( pathName )
The .size attribute of this
object is a 2-tuple (width,
height).
#-- 3 --
# [ self.size gives (width,height) in pixels ->
# self.wide := that width as mm
# self.high := that height as mm ]
self.wide, self.high = self.image.size
This method adds a representation of itself as an
<image> element to the DOM
tree.
# - - - I m a g e x . w r i t e N o d e - - -
def writeNode ( self, parent ):
"""Write an RNC_IMAGE_N node representing self.
[ parent is an xmlcreate.Element object ->
parent := parent with a new RNC_IMAGE_N node added
representing self ]
"""
The xc.Element constructor
accepts as an optional third argument a dictionary of
attribute names and values. We first build up that
dictionary, then xc.Element
takes care of building the element and its attributes,
and attaching it to the parent.
attrs = { RNC_CAT_NO_A: self.baseName,
RNC_WIDE_A: str ( self.wide ),
RNC_HIGH_A: str ( self.high ) }
child = xc.Element ( parent, RNC_IMAGE_N, **attrs )
The last lines of the script execute the main() function, assuming that the script
is being executed (not imported).
#================================================================
# Epilogue
#----------------------------------------------------------------
if __name__ == "__main__":
main()