Next / Previous / Contents / Shipman's homepage

3.2. The spatial dimension

Space: Where was the bird seen? There is a lot of room for tradeoffs here. Ultimately it would be a fine thing to have a GIS (Geographic Information System), with its highly flexible mechanism for representing points and larger areas in space.

Setting up a GIS takes a lot of time and effort. What we need for a first approximation is a less formal system that will not create too many problems if the geography is formalized more in the future.

3.2.1. The locality hierarchy

For this system, we'll represent locality as three levels:

  1. The region code is the code for the state or province. This code will be the U.S. postal code for U.S. states (e.g., NM for New Mexico). We won't worry about countries yet (since the author has done no birding abroad), but it should be straightforward to add region codes for foreign countries, and infer the nationality from the region code.

  2. The location name is a single string describing the location within the region. It may be very broad (e.g., “Roosevelt Co.” for an entire county) or more specific (the name of a town).

    A good way to add more flexibility to location names is to use an informal but hierarchical structure, enumerating a series of areas from larger to smaller, separated by colons. Example: “Bosque del Apache: Headquarters”. Three or more levels might be used.

  3. Sometimes you need to describe the locality very precisely. For this situation, we'll allow an optional location detail level, consisting of unstructured English text as needed. For example, “In the northeast corner there are some rusted oil drums that have standing water on them from the recent rains. The birds were coming in here to drink.”

The region code is required, because both scientific and recreational consumers of the data need to know in what state the bird occurred. The location name will do most of the rest of the work for most of the cases, with location detail being added where necessary.

Because location names can be rather lengthy, we resort to a formalized shorthand code for the location names used within a single day's records. This location code is used to save typing in the bulk of records. For example, for the location name “Bosque del Apache NWR” we could use the code BdA. Such codes must start with a letter and may contain only letters, digits, hyphen (“-”), and underbar (_).

3.2.2. GPS coordinates

Coordinates obtained from a Geographical Positioning System (GPS) are extremely useful in nailing down the exact locality of bird sightings, road junctions, and such.

In order to represent GPS coordinates precisely with a minimum of fuss, we use an encoding system for waypoints discussed fully in A Python mapping package.

Here's an excerpt from that document that describes what we call flexible angle format:

  • Numbers with 1, 2 or 3 digits are assumed to be in degrees. For example, “34” is assumed to be 34°, and “107” means 107°.

  • Numbers with 4 or 5 digits are assumed to be in the format DDMM or DDDMM, where DD or DDD is degrees and MM is minutes. So, for example, 3456 means 34° 56', and 10703 means 107° 3'.

  • Numbers with 6 or 7 digits are assumed to have be in format DDMMSS or DDDMMSS, where SS is seconds. Examples: 341638 means 34° 16' 38", and 1075301 means 107° 53' 1".

  • A trailing decimal point and fraction are always allowed. The fraction is assumed to be in the same units as the smallest unit to the left of the decimal point. For example, “3401.57” is interpreted as 34° 01.57'.

The syntax of a waypoint is:

    lat{n|s} lon{e|w}

That is, start with the latitude using the flexible angle format described above, followed by n or s for north or south latitude, followed (optionally) by whitespace, followed by the longitude in flexible angle format, then e or w for east or west longitude.

Here are some examples:

5130s0007e 51°30' S. Lat., 0°7' E. Long.
34n 107w 34° N. Lat., 107° W. Long.
34.0242n 107.1811w 34.0242° N. Lat., 107.1811° W. Lat.