Next / Previous / Shipman's Home Sweet Homepage / Site map

Christmas Bird Count Database specification

John W. Shipman, john@nmt.edu
Zoological Data Processing
507 Fitch Avenue NW
Socorro, NM 87801
(505) 835-0235
Homepage: http://www.nmt.edu/~shipman


Contents

1. Introduction

The Christmas Bird Count (CBC) censuses, published in the periodicals Audubon Field Notes and American Birds, have been taken since 1900. As a source of long-term population data, it has few peers; however, its utility in printed form is limited.

This document describes a system for representing the data from these censuses in a general computer database form.

Most years' data (1st-62nd and 73rd-90th) were entered by Zoological Data Processing. Data for the 63rd--72nd CBCs were obtained from Carl Bock from old tapes produced by his project at the University of Colorado. Data for the 91st and succeeding CBCs were provided by the National Audubon Society from their publishing operation. There are some minor differences between these sources in the way data were captured; the record layout narratives below notes the differences between the years.

1.1 Scope

This version is limited to the data from the U.S.A. and Canada (including the French islands of St.-Pierre and Miquelon). Inclusion of the data from other countries would require significant modifications to the method of encoding bird names.

1.2 Definitions

The word circle generally refers to aspects of one count that are generally unchanged from year to year: the count area's latitude and longitude, its geographic name, and so forth. The term count will generally refer to one year's census of that circle.

The term standard refers to those names and coordinates that are used in the computer database. The term as-published refers to the names and coordinates that appear in the published counts.

1.3 Typographic conventions

Actual computer file names, field names, and field contents (character strings as entered and stored) are shown in typewriter type.

The symbol ``_'' is used to represent a space character.

2. Nomenclatural base

For proper interpretation of the six-letter codes used to describe types of birds, the CBC database depends on an infrastructure of files that describe all the possible bird types from the census file. This system is described in a separate document, `A system for representing taxonomic nomenclature'.

3. General structure of the database

Before we discuss the layout of the database, it is relevant to bring up a sticky problem of organization that affects the design.

One of the most persistent problems in the design of this database is how to treat circles whose centers have shifted slightly one or more times. When the center moves only a minute or two of latitude or longitude, for the purposes of most researchers the censuses from the different centers might as well be treated as from the same location.

However, sometimes even a tiny shift can make a huge difference in the habitat coverage. For example, if a circle containing no open water is shifted so that it includes a reservoir, any conclusions based on sudden increases in waterfowl populations may be unwarranted.

For this reason, I think it would be best if the user of the database has a choice as to whether to lump overlapping circles or not.

An earlier version of this database used the concatenation of the latitude and longitude fields as the link that connected census data to circle and effort data. However, in practice this meant that a number of common operations---such as correcting center coordinates, or deciding whether or not to lump overlapping circles---required a lengthy pass through many megabytes of census data to change the link values.

The solution is to use a single identifier called the countId field to identify each circle.

3.1 The countId field

We need to identify each count---that is, each set of data for a single circle counted in a particular year. We call this identifier the countId.

The published counts have used two different schemes for identifying the counts within a year:

These identifiers alone are not sufficient to identify a single count. Count #1225 might be Modoc, CA in one year and Malibu, CA in another year.

So, we define an aggregate called the countId field which consists of the CBC number, left-zero-padded to three digits, followed by the circle's identifier.

In the older counts entered by Shipman (and the data from the 63rd--72nd CBCs converted from the Bock project), the countId has the form

    yyynnnna
where yyy is the CBC (year) number (with left zero fill), nnnn is the sequence number (also with left zero fill), and the a column is available for letter suffixes (e.g., 127A), but usually blank.

In data derived from the publisher of the modern counts (Clinchy Associates), the countId has the form

    yyyssii_
where yyy is the CBC number, ss is the state or province code, and ii is the identifying letter code of the circle.

Here are some examples of countId fields:

  0760034_  76th CBC, count #34.
  0830947A  83rd CBC, count #947A.
  104NMZU_  104th CBC, circle ZU in state NM.
Note that the countId field can be used to sort circles into their published order (at least within each state).

3.2 How the database is linked

The countId field, then, uniquely identifies one circle counted in one year. So this field is the key that relates the various files of the database.

The std file serves as the pivot, relating census and effort records to a particular circle name and center. Since the census records are by far the biggest part of the database, omitting the lat-long of the circle center from that database means that we can change the effective location of census records by using a different std file.

We can, for example, have one version of the std file that does not lump circles, no matter how tiny the shift. This version would be good for waterfowl studies, where a tiny shift might drastically change the inclusion of waterfowl habitat. We could have another version that lumps all circles that overlap at least 10%, and another version that lumps all circles that overlap at least 50%. We can even set up special std files that lump all the counts in a particular bioregion.

So, here are the six principal files of the CBC database and their relationships:

  1. The cir table describes each center, and the name that goes with it, along with other permanent features like codes for the bioregion. This table is keyed by lat-long (that is, the latitude concatenated with the longitude).
  2. The std table relates each countId key with a lat-long key, so it connects the cir table with all the other tables.
  3. Each record of the eff table describes one year's count of a single circle. It gives the number of observers, effort data, and other fields such as date. This table is keyed by countId.
  4. The wea table gives weather data, where available. It too is keyed by countId.
  5. Each record of the aspub table describes the name and location of one year's count as published. This file tracks changes in name and nominal coordinates, including even errors, so long as it was published that way. This table is also keyed by countId.
  6. For each form of bird printed in the body of a circle, there is one census (cen) record, keyed by countId.

4. File layouts

All the files produced for this project are ``flat files,'' meaning that each field has a fixed size. This form was chosen because it is easy to import into most any database system.

4.1 Circle file, cir

Every area that has ever been counted will have at least one corresponding circle record in file cir, defining its location. In those cases where the center has been moved one or more times for a distance of more than 1 minute of latitude or longitude, there may be multiple circle records for a count of a given name.

Only the ``standard'' coordinates and names are defined in this file. We have tried to use the most recent names and the most accurate map-checked coordinates whenever possible, but in many cases the coordinates are an estimate or an outright guess. We welcome any additional information that may help us in establishing the true locations of the counts; please forward any such information in writing to the author.

Here are the field sizes, field names, and the descriptions of their contents.

4.2 Standard linkage file, std

The std file stands between the circle file and the effort and census data, representing our current best guesses about the exact association between bird sightings and locality names and coordinates.

Fields are:

4.3 Effort file, eff

Each effort record corresponds to one year's counting of a circle. Fields:

4.4 Weather file, wea

Weather data is separated from effort data because it is recorded for relatively few years---just the Bock data (63rd--72nd CBCs).

4.5 Census file, cen

Each form of bird (species or not) mentioned in the body of a count results in one body (census) record. Furthermore, for records where multiple genders or age classes are mentioned, each different combination of gender and age is encoded in a separate record.

Warning! One implication of the above paragraph is that applications programs cannot expect there to be only one record for a given form within a given count. For example, there might be three Bald Eagle records within one count: one record each for adults, immatures, and birds of unknown age. Given all the different age and sex and other codes, there may be many! Therefore, applications programs that want to extract species totals must find all matching records, and then total them.

Fields of the census record are:

4.6 As-published circle data, aspub

In order to aid in tracking name and center changes (and typographical errors, some of which persist for many years), aspub tracks as closely as possible the actual published center coordinates and circle names used in each year. This file should be proofed carefully against the published circle accounts.

Exception: in cases where the published latitude and longitude have obviously been transposed, the aspub file need not track this typo. In many cases, the coordinates would not physically fit in the fields anyway.

Fields are:


Next: A system for representing taxonomic nomenclature
See also: The Christmas Bird Count database project
Previous: Christmas Bird Count: Status of the CBC database project
Site map
John W. Shipman, john@nmt.edu
Last updated: 2010/03/10 19:55:50
URL: http://www.nmt.edu/~shipman/z/cbc/db_spec.html