Abstract
Describes a compiler for processing a set of files that describe taxonomic arrangements of North American birds.
This publication is available in Web form and also as a PDF document.
Please forward any comments to john@nmt.edu.
Table of Contents
LOG_FILEDEFAULT_RANKS_FILESTD_EXTENSIONALT_EXTENSIONTREE_EXTENSIONABBR_EXTENSIONCOLL_EXTENSIONXML_EXTENSIONL_RANK_CODEGENUS_CODESUBGENUS_CODESPECIES_CODESSP_CODEALT_HIGHERALT_EQUIVALENTALT_SUBSPECIFICALT_COLLISIONCOLL_SEPL_SCIL_ENGSTATUS_RESTATUS_NORMALHT_NAME_RESLASH_REGENUS_SPECIES_REabbr-file: Name of the codes output
filealt-file: Name of the alternat forms input
filecan-add-child: Can a given parent have a
given child?coll-file: Name of the collisions output
fileeff-ranks: Effective ranks input file
nameeng-normalize: Normal form of an English
nameinput-files: All input filesoutput-files: All output filesrank-parent: What taxon is the parent
of a new taxon of a given rank?std-file: Name of the standard forms input
filetree-file: Name of the flat tree output
filexml-file: Name of the XML output
filemain(): Main programwriteProductFiles: Write all product
fileswriteTreeFile(): Write the flat tree
filewriteSubtree(): Recursive tree
walkerwriteAbbrFiles(): Write the flat
abbreviation and collision fileswriteXMLFile(): Write the XML fileclass Args: Process command line
argumentsclass Hier: Taxonomic levels of
interestHier.__len__(): How many ranks?Hier.__getitem__(): Return the (k)th
rankHier.__iter__(): Iterator for
ranksHier.__contains__(): Is a given code in
this hierarchy?Hier.lookupRankCode()Hier.canParentHaveChild()Hier.keyLen(): Find the key length for a
given depthHier.txKeyFill(): Right-zero-pad a
taxonomic keyHier.writeXML(): Generate XMLHier.__init__(): ConstructorHier.__readRanksFile(): Read the
ranks fileclass Rank: One taxonomic rankRank.__cmp__()Rank.__str__()Rank.writeXML(): Generate XMLRank.__init__(): ConstructorRank.__scanCode(): Scan the code
fieldRank.__scanRequired(): Scan the required
flag fieldRank.__scanKeyLen(): Scan the key
length fieldRank.__scanName(): Scan the rank name
fieldclass Txny: The entire
classificationTxny.lookupAbbr(): Find the definition of
a codeTxny.lookupSci(): Find a scientific
nameTxny.__init__(): ConstructorTxny.__readStd(): Read the standard forms
fileTxny.__readStdLine(): Process one line
from the standard forms fileTxny.__scanNonSpTail(): Process
higher-taxon tailTxny.__appendTaxon(): Try to append this
taxon to the treeTxny.__scanSpTail(): Process a species
tailTxny.__checkGenus(): Add a new
genus?Txny.__checkSubgenus(): Add a new
subgenus?Txny.__addSpecies(): Add a new species and
its bindingsTxny.__addCodes(): Set up symbol table
entries for a standard formTxny.__addCodeStd(): Create a standard
bindingTxny.__addCodeColl(): Add standard and
collision symbol bindingsTxny.__readAlt(): Read the alternate forms
fileTxny.__readAltLine(): Process one line of
the alt fileTxny.__scanAbbr(): Scan a codeTxny.__scanHigherAlt(): Scan a
higher-taxon lineTxny.__scanEng(): Process an English name
fieldTxny.__bindHigherAlt(): Bind a
higher-taxon codeTxny.__scanEquivalentAlt(): Scan a
higher-taxon lineTxny.__bindEquivalentAlt(): Create an
equivalence bindingTxny.__scanSubspecificAlt(): Scan a
higher-taxon lineTxny.__bindSubspecificAlt()Txny.__findSubspParent(): Under what
species does this new subspecies go?Txny.__scanCollisionAlt(): Scan a
higher-taxon lineTxny.__bindCollision()Txny.dispatchTable: Routing table for alt
recordsTxny.__finalCheck(): Verify correctness of
the symbol tableclass TaxaTree: The taxonomic treeTaxaTree.__init__(): ConstructorTaxaTree.setRoot(): Store the root
taxonTaxaTree.rankParent(): Under what parent
does a new taxon go?TaxaTree.canAddChild()TaxaTree.addTaxon()TaxaTree.lookupTxKey()TaxaTree.__getitem__(): Retrieve a taxon
by scientific nameTaxaTree.__contains__(): Membership test
for a scientific nameTaxaTree.writeXML(): Generate XML
outputclass Taxon: One node in the taxonomyTaxon.__init__()Taxon.__len__(): How many
children?Taxon.__getitem__(): Return the (n)th
childTaxon.__iter__(): Iterator for the
childrenTaxon.__str__()Taxon.abbr(): Is there a standard
code?Taxon.childKey(): Derive a child's
taxonomic keyTaxon.writeFlat(): Write a tree-file
recordTaxon.writeXML(): Recursive XML tree
writerTaxon.__writeXMLNode(): Write one
nodeclass StdHead: The common front part of a
standard forms lineclass NonSpTail: Scanner for the non-species
tailclass SpTail: Scanner for the species
tailclass RawTaxon: Temporary container for
taxon attributesclass AbTab: The symbol table for
codesAbTab.addAbbr(): Create a symbol table
entry or find an existing oneAbTab.__getitem__(): Find the symbol table
entry for a given codeAbTab.__contains__(): Is this code in the
symbol table?AbTab.__iter__(): Generate all symbol
table entriesAbTab.writeXML()AbTab.__init__()class AbSym: One symbol table entryclass AbBind: Base class for
bindingsclass StdBind: Code bound to a
taxonclass EqBind: Code equivalent to another
codeclass CollBind: Cluster of colliding
codesisEngValid(): Validate an English
nameThe program described herein is part of a system for representing bird taxonomy as computer files. For the overall system documentation, see A system for representing bird taxonomy.
In particular, operation of the nomcompile program is described in a section of the above document: Building the standard product files. To recap, the program reads and checks a set of three files that describe a particular checklist, and writes a number of product files used elsewhere in the general system.
The input files are:
The ranks file defines the taxonomic
ranks of interest.
The .std or standard forms file
defines what taxa are considered standard in the
arrangement.
The .alt or alternate forms file
enumerates non-standard names and defines each name's
relationship to the standard names in that arrangement.
If no errors are detected, nomcompile writes four product files.
The .tre file is a flat file
with one line for each standard taxon.
The .ab6 file is a flat file
with one line for each six-letter abbreviation defined
in the arrangement.
The .col file is a flat file
with one line for each pair of six-letter abbreviations
such that the first is a proscribed collision form and
the second is one of the valid alternatives.
The .xml file is a complete
description of all aspects of the
arrangement—ranks, standard forms, alternate
names, abbreviations, and collisions—that forms
the input to a set of Python-language modules used in
processing bird data.
All these files, the input files and the output files,
share the same base file name, which is specified as a
command line argument. The naming convention: base name
“aou7” is the Seventh Edition
of the AOU Check-List; base name
“aou749” is the arrangement
published as the Forty-ninth
Supplement; and so forth as new Supplement numbers are issued.
So, for the Forty-ninth Supplement, nomcompile would read input files ranks, aou749.std, and
aou749.alt, and write aou749.xml, aou749.tre,
aou749.ab6, and aou749.col.