Before we examine the actual noteweb script, a few comments on data structures and algorithms are in order.
Because of the need for navigational links between pages, we can't just go out and find monthly XML files and immediately convert them to HTML. Each monthly page must have and navigational links. So, when we build a monthly page, we need to know which month (if any) was the previous one in sequence, and which is the next in sequence. There is no guarantee that every month has a valid input file. There might even be years with no valid input files.
Therefore, the first thing we have to do is read all the
XML files, rendering each one into a birdnotes.BirdNoteSet instance. Then we can work
through these instances, converting each to an HTML page in
the same subdirectory where we found the XML input file.
(Note that keeping all these BirdNoteSet
instances around may eat up a lot of memory. If that is
ever a problem, we'll just have to make two passes: once to
see which files are valid, and another pass to render them,
so that we don't have to keep the entire data set in memory
at once.)
We must also generate the index page, with a table of links to all the months. Each row in this table contains all the months of that year. There is, however, no guarantee that years are contiguous. We just look to see what year directories are present, and that determines the set of table rows.
The above conditions suggest a data structure made from instances of three classes:
One YearCollection instance contains
everything we need to build the index page.
Because this instance contains all the input data, it can figure out which months are the and navigational links for a given month.
The YearCollection instance is a
container for YearRow instances, one for
each year for which there is an input data directory.
Each YearRow instance has all the
information needed to build one row of the index table.
Each YearRow instance is a container for
up to twelve MonthCell instances.
Each MonthCell instance has all the
information about one month for which there is an input
XML file, and has everything needed to build the
monthly HTML page.
The first cut at an overall data structure was a list
named yearList containing YearRow instances. Each YearRow
would be a container for BirdNoteSet
instances, one per valid month.
However, there are certain things we need to know about
the months, such as the month number (e.g., '04'). So the MonthCell class
was invented, with an instance holding one BirdNoteSet and ancillary information such as
the month number and the file name of the month page.
Each YearRow would then be a container for
MonthCell instances, one per valid month.
The next problem was connecting up the navigation links
between month pages. Clearly, when rendering a month
page, we must know the URL of the month page (if any) and the
month page (if any).
However, how does the MonthCell instance
know these URLs? Three approaches were considered:
Let the MonthCell class have
attributes that hold the previous/next links;
initialize them to None.
Then, after all the MonthCell
instances are created, make a serial pass through
them and link them up into a bidirectional linked
list.
Finally, render each MonthCell into
HTML in any old order, using the stored previous/next
attributes to set up its navigation.
The aesthetic objection to this approach was that the
MonthCell objects depend on an
external mechanism to set up their linking.
Logically, the information required to find a month's
neighbors should reside at a higher organizational
level.
Define a global function called something like neighbors() that finds the previous/next
neighbors, given a year and month. This function
would walk through the yearList,
starting at the given month, and working backwards
and forwards to find the nearest neighbors.
The objection to this approach is that the yearList must then be global, or be passed
around through many levels.
At this point, the author felt that a third class was
called for: YearCollection, which
manages the overall sequence of years and months.
It can traverse the years in reverse chronological order to build the index table with the most recent years first.
It is the obvious place for logic that can find the neighbors of any month.