John W. Shipman, firstname.lastname@example.org
Zoological Data Processing
507 Fitch Avenue NW
Socorro, NM 87801
The Christmas Bird Count (CBC) censuses, published in the periodicals Audubon Field Notes and American Birds, have been taken since 1900. As a source of long-term population data, it has few peers; however, its utility in printed form is limited.
This document describes a system for representing the data from these censuses in a general computer database form.
Most years' data (1st-62nd and 73rd-90th) were entered by Zoological Data Processing. Data for the 63rd--72nd CBCs were obtained from Carl Bock from old tapes produced by his project at the University of Colorado. Data for the 91st and succeeding CBCs were provided by the National Audubon Society from their publishing operation. There are some minor differences between these sources in the way data were captured; the record layout narratives below notes the differences between the years.
This version is limited to the data from the U.S.A. and Canada (including the French islands of St.-Pierre and Miquelon). Inclusion of the data from other countries would require significant modifications to the method of encoding bird names.
The word circle generally refers to aspects of one count that are generally unchanged from year to year: the count area's latitude and longitude, its geographic name, and so forth. The term count will generally refer to one year's census of that circle.
The term standard refers to those names and coordinates that are used in the computer database. The term as-published refers to the names and coordinates that appear in the published counts.
Actual computer file names, field names, and field contents (character strings as entered and stored) are shown in typewriter type.
The symbol ``_'' is used to represent a space character.
For proper interpretation of the six-letter codes used to describe types of birds, the CBC database depends on an infrastructure of files that describe all the possible bird types from the census file. This system is described in a separate document, `A system for representing taxonomic nomenclature'.
Before we discuss the layout of the database, it is relevant to bring up a sticky problem of organization that affects the design.
One of the most persistent problems in the design of this database is how to treat circles whose centers have shifted slightly one or more times. When the center moves only a minute or two of latitude or longitude, for the purposes of most researchers the censuses from the different centers might as well be treated as from the same location.
However, sometimes even a tiny shift can make a huge difference in the habitat coverage. For example, if a circle containing no open water is shifted so that it includes a reservoir, any conclusions based on sudden increases in waterfowl populations may be unwarranted.
For this reason, I think it would be best if the user of the database has a choice as to whether to lump overlapping circles or not.
An earlier version of this database used the concatenation of the latitude and longitude fields as the link that connected census data to circle and effort data. However, in practice this meant that a number of common operations---such as correcting center coordinates, or deciding whether or not to lump overlapping circles---required a lengthy pass through many megabytes of census data to change the link values.
The solution is to use a single identifier called the countId field to identify each circle.
We need to identify each count---that is, each set of data for a single circle counted in a particular year. We call this identifier the countId.
The published counts have used two different schemes for identifying the counts within a year:
These identifiers alone are not sufficient to identify a single count. Count #1225 might be Modoc, CA in one year and Malibu, CA in another year.
So, we define an aggregate called the countId field which consists of the CBC number, left-zero-padded to three digits, followed by the circle's identifier.
In the older counts entered by Shipman (and the data from the 63rd--72nd CBCs converted from the Bock project), the countId has the form
yyynnnnawhere yyy is the CBC (year) number (with left zero fill), nnnn is the sequence number (also with left zero fill), and the a column is available for letter suffixes (e.g., 127A), but usually blank.
In data derived from the publisher of the modern counts (Clinchy Associates), the countId has the form
yyyssii_where yyy is the CBC number, ss is the state or province code, and ii is the identifying letter code of the circle.
Here are some examples of countId fields:
0760034_ 76th CBC, count #34. 0830947A 83rd CBC, count #947A. 104NMZU_ 104th CBC, circle ZU in state NM.Note that the countId field can be used to sort circles into their published order (at least within each state).
The countId field, then, uniquely identifies one circle counted in one year. So this field is the key that relates the various files of the database.
The std file serves as the pivot, relating census and effort records to a particular circle name and center. Since the census records are by far the biggest part of the database, omitting the lat-long of the circle center from that database means that we can change the effective location of census records by using a different std file.
We can, for example, have one version of the std file that does not lump circles, no matter how tiny the shift. This version would be good for waterfowl studies, where a tiny shift might drastically change the inclusion of waterfowl habitat. We could have another version that lumps all circles that overlap at least 10%, and another version that lumps all circles that overlap at least 50%. We can even set up special std files that lump all the counts in a particular bioregion.
So, here are the six principal files of the CBC database and their relationships:
All the files produced for this project are ``flat files,'' meaning that each field has a fixed size. This form was chosen because it is easy to import into most any database system.
Every area that has ever been counted will have at least one corresponding circle record in file cir, defining its location. In those cases where the center has been moved one or more times for a distance of more than 1 minute of latitude or longitude, there may be multiple circle records for a count of a given name.
Only the ``standard'' coordinates and names are defined in this file. We have tried to use the most recent names and the most accurate map-checked coordinates whenever possible, but in many cases the coordinates are an estimate or an outright guess. We welcome any additional information that may help us in establishing the true locations of the counts; please forward any such information in writing to the author.
Here are the field sizes, field names, and the descriptions of their contents.
400909036 40° 9' N. Lat., 90° 36' W. Long. (Rushville, IL) 512518042 51° 25' N. Lat., 179° 18' E. Long. (Amchitka, AK)
_ No salt water (blank) o Open ocean included in circle e Ocean estuary included in circle (no open ocean) p Pelagic
_ Ordinary circle, 15-mile diameter (blank) p Pelagic-only transect x Odd-shaped and not pelagic-only
ab Alberta bc British Columbia mb Manitoba nb New Brunswick nf Newfoundland (including Labrador) nt North West Territories ns Nova Scotia on Ontario pe Prince Edward Island pq Province Québec sk Saskatchewan yt Yukon TerritoryThe codes are used from left to right as necessary, and the unused fields are all blanks. For example, if the count is entirely within one region, rr is used and ss and tt are blank. If a circle falls within two states, rr is the primary state code (this determines which state's section of the listing includes that count), ss is the secondary state code, and tt is blank. Here are some examples of encoding the regs field:
mb____ Entirely within Manitoba. ny____ Entirely within New York. onny__ Listed with Ontario, also in New York. iailmo Iowa, Illinois, Missouri (Keokuk). fr____ St. Pierre et Miquelon Islands (France).
M.A. Management Area N.M. National Monument N.P. National Park N.W.R. National Wildlife Refuge P.P. Provincial Park S.P. State Park W.M.A. Wildlife Management Area
The std file stands between the circle file and the effort and census data, representing our current best guesses about the exact association between bird sightings and locality names and coordinates.
Each effort record corresponds to one year's counting of a circle. Fields:
Weather data is separated from effort data because it is recorded for relatively few years---just the Bock data (63rd--72nd CBCs).
bl Blue pc Partly Cloudy mc Moderately Cloudy tc Totally Cloudy pf Partly Foggy mf Moderately Foggy tf Totally Foggy
no No precipitation ir Intermittent rain lr Light rain mr Moderate rain hr Heavy rain is Intermittent snow ls Light snow ms Moderate snow hs Heavy snow ic Intermittent combination (sleet, freezing rain, hail, snow/rain) lc Light combination mc Moderate combination hc Heavy combination
o Open mo Mostly open po Partly open f Frozen over
Each form of bird (species or not) mentioned in the body of a count results in one body (census) record. Furthermore, for records where multiple genders or age classes are mentioned, each different combination of gender and age is encoded in a separate record.
Warning! One implication of the above paragraph is that applications programs cannot expect there to be only one record for a given form within a given count. For example, there might be three Bald Eagle records within one count: one record each for adults, immatures, and birds of unknown age. Given all the different age and sex and other codes, there may be many! Therefore, applications programs that want to extract species totals must find all matching records, and then total them.
Fields of the census record are:
x Hybrid of form and altform, e.g., Mallard x Common Pintail / Either form or altform, e.g., Hammond's/Dusky flycatcher.
yebloo Yellow-billed Loon (standard species) GaviaFor hybrids and pairs of alternatives, the convention is to place in the form field the code that is lower in the alphabet, and place the other code in the altform field. This is so a given hybrid will always have the same code structure; without this rule, someone looking for Mallard x American Wigeon hybrids would have to search for two code groups, amewigxmallar and mallarxamewig.
Gaviasp. (form not identified to species) raptor raptor sp. (form not identified to species) amewigxmallar Mallard x American Wigeon (hybrid) dowwoo/haiwoo Downy / Hairy Woodpecker (two alternatives) blugoo Blue Goose (subspecific identification) resfli Red-shafted Flicker (subspecific identification)
_ Unknown (blank) a Adult i Immature p Female/immature (symbolized by Greek phi)
_ Unknown m Male f Female p Female/immature (phi)
In order to aid in tracking name and center changes (and typographical errors, some of which persist for many years), aspub tracks as closely as possible the actual published center coordinates and circle names used in each year. This file should be proofed carefully against the published circle accounts.
Exception: in cases where the published latitude and longitude have obviously been transposed, the aspub file need not track this typo. In many cases, the coordinates would not physically fit in the fields anyway.
AFB Air Force Base Co. County Cos. Counties Ft. Fort GMA Game Management Area I. Island Jct. Junction L. Lake MA Management Area MBR Migratory Bird Refuge Mt. Mount Mtn. Mountain Mts. Mountains NF National Forest NFRA National Forest Recreation Area NG National Grasslands NGP National Game Preserve NHP National Historical Park NL National Lakeshore NM National Monument NP National Park NR National River NRA National Recreational Area NS National Seashore NWR National Wildlife Refuge NWSC Naval Weapons Support Center PP Provincial Park Pt. Point R. River RA Recreational Area SF State Forest SFWA State Fish & Wildlife Area SGA State Game Area SGP State Game Preserve SGR State Game Reserve/Refuge SP State Park SRA State Recreational Area SWR State Wildlife Refuge Twp. Township WA Wildlife Area WMA Wildlife Management Area WR Wildlife Refuge WS Wildlife SanctuaryIn order that circle records will sort consistently by locality name, abbreviations should not be used at the beginning of the name. For example, ``Point Pelee,'' not ``Pt. Pelee;'' ``Mount Olive,'' not ``Mt. Olive.'' The exception to this exception is ``St.'' for ``Saint:'' ``St. Louis,'' not ``Saint Louis.'' Also, words that are a significant part of a name should be not abbreviated, for example ``Salt Lake City'' instead of ``Salt L. City,'' and ``Rocky Mountain NP,'' not ``Rocky Mtn. NP.''