Next / Previous / Contents / Shipman's homepage

10. The abbr.py module

Assorted machinery having to do with just the bird code system is relegated to a separate module, abbr.py.

The module contains an assortment of manifest constants and functions, and one class. The constants and functions are:

ABBR_L

Maximum length of a bird code, 6 in the CBC system.

BLANK_ABBR

A string containing ABBR_L spaces.

RE_ABBR

A regular expression (using Python's standard re regular expression module) that describes a valid bird code.

REL_SIMPLE

The relationship code for simple (non-compound) forms, one space.

REL_HYBRID

The single-character relationship code denoting hybrids, "^".

REL_PAIR

The single-character relationship code denoting a species pair, "|".

abbreviate(eng)

Abbreviates an English name according to the rules of the system. Takes a string containing a name either in the usual word order (e.g., "Aztec Thrush") or in “last, first” order (e.g., "Amakihi, Molokai").

engComma(eng)

Given an English name in the customary order, such as “American Robin”, returns it in the inverted form, e.g., “Robin, American”.

engDeComma(eng)

Given an English name in the inverted form, such as “Robin, American”, returns the customary form, e.g., “American Robin”.

10.1. class BirdId

Representation of Christmas Bird Count data is complicated considerably by the use of what we call compound forms: species pairs (e.g., “Hammond's/Dusky Flycatcher”) and hybrids (e.g., “Baltimore Oriole×Bullock's Oriole”). Also supported is a trailing “?” to indicate that the identification is only a guess.

Here is the interface to the BirdId class, which represents simple and compound forms and an optional question mark.

BirdId ( txny, abbr, rel=None, abbr2=None, q=None )

Because BirdId objects connect bird codes to a firm taxonomic foundation, you must pass a Txny object as the first argument to the constructor.

The second argument is a bird code. It can be in either upper or lower case, and either variable-length or right-padded with spaces. It will be stored in normalized form: uppercased and right-padded with spaces to length ABBR_L.

For single bird identities, omit the remaining arguments. For hybrids, pass rel=REL_HYBRID and the second bird code in the abbr2 argument.

The q argument should be the string "?" if the ID is questionable. The default value is None, meaning that the ID is not in question.

Here's an example. Suppose txny is your Txny object. This code snippet sets b1 to a BirdId object representing Ou (a Hawaiian endemic), and b2 to a BirdId object representing Indigo × Lazuli Bunting:

    b1 = BirdId ( txny, "ou" )
    b2 = BirdId ( txny, "lazbun", REL_HYBRID, "indbun" )

This constructor will raise a KeyError exception if any of the abbreviations are undefined in txny.

.txny

The .txny attribute of a BirdId object is the Txny object passed to the constructor (read-only).

.abbr

The first or only bird code, normalized. A normalized code is uppercased, and right-padded with spaces if necessary to length ABBR_L.

.rel

For single forms, this attribute is None. It is set to REL_HYBRID for hybrids, REL_PAIR for species pairs.

.abbr2

For compound forms, this attribute holds the second bird code, normalized.

We stipulate that for any BirdId instance B, B.abbr < B.abbr2. This means that if you're looking for a specific hybrid or pair, you don't have to look in two different places. So we swap the .abbr and .abbr2 values if necessary to make this true. For example, in the object returned by “b2 = BirdId ( txny, "lazbun", REL_HYBRID, "indbun" )b2.abbr would be "indbun", and "lazbun" would be stored in b2.abbr2.

.q

Has the value "" (the empty string) if the ID is not in question; "?" if there is a question about the ID; or "-" if the ID is correct but the form is not countable under American Birding Association rules.

.taxon

This attribute will contain a Taxon object representing the smallest taxon that contains this identity. For a single form, this will be taxonomic key of the taxon containing the form. For hybrids and species pairs, it will be taken from the smallest taxon that is an ancestor of both forms.

.fullAbbr

Contains a string made from self's .abbr attribute, with the .rel and .abbr2 attributes concatenated only for compound forms. Short codes are blank-stripped.

.engComma()

Returns the English name of self in inverted order, that is, “last, first”. Examples: "robin, American"; "mallard x teal, blue-winged"; "ibis, glossy?".

.__str__(self)

This method is called when a BirdId object is converted to a string, implicitly or by explicit use of the str() function. It returns the English name as a string. Examples of its return values: "Nihoa Finch"; "Blue-winged Teal x Cinnamon Teal"; "Dusky Flycatcher / Hammond's Flycatcher".

BirdId.scan ( txny, scan )

This method works with the Scan object, from the author's personal Python library, to process raw bird codes while scanning an input file. For more information on the Scan object, see the author's library reference.

This is a static method, a relatively new feature of Python. For more information on Python static methods, see the Python 2.2 quick reference.

The txny argument is a Txny object providing the taxonomy system in which the codes are to be interpreted. The scan argument is a Scan object used to scan the input stream containing the bird codes.

This method looks for a bird code, optionally followed by a relationship code and a second bird code (which we call a compound code). Examples: "vireo", for “vireo sp.”; mallar^amewig, Mallard × American Wigeon; and "dowwoo|haiwoo", Downy or Hairy Woodpecker.

If the scan object points at a valid simple or compound code, the scan object is advanced past that code, and the method returns a new BirdId object representing the code. If the scan object doesn't start with a valid code, an error message is sent to the scan object's error log, and a ValueError exception is raised.

This method will recognize a trailing "?" if present.

The method raises KeyError if any bird codes are undefined.

BirdId.scanFlat ( txny, scan )

This is another static method like BirdId.scan(), but it expects to see its input in flat file format. Specifically, the scan object should start with three fixed fields. The first field has length ABBR_L and contains the first or only bird code, left-aligned and right-padded with spaces. The second field is a single character and contains the relationship code: normally blank, but it may contain REL_HYBRID or REL_PAIR for compound codes. The third field has length ABBR_L and contains the second bird code when the relationship code is nonblank. The third field must be blank when the relationship code is blank.

This method does not support the questionable ID flag. If any codes are undefined, it raises KeyError.

BirdId.parse ( txny, s )

This static method is also like BirdId.scan(), but is used when the input is in an ordinary string instead of a Scan object.

This method supports a trailing "?" for questionable IDs. It will raise KeyError for undefined codes.