Module webstats.py is the top-level script for this application.
The main script starts with a brief prologue. This is major version 4.0.
The first version was discarded when the logs got too large
for in-memory data structures. It was called weblog long before the term
“weblog” was current.
The second version was obsoleted by changes in the
configuration of the web server, and the author found
the documentation badly out of sync with the code.
It was called webstats, but this
name may be a copyright infringement, based on a brief
Google search.
The third version was a fairly thorough rewrite. In addition to providing another literate programming example, which should make the program more maintainable, it addressed a few minor problems such as the failure to parse correctly access log entries that have escaped double-quote characters inside double-quoted strings. Also, the web pages were generated in XHTML using the techniques described in Python and the XML Document Object Model (DOM).
The fourth version was triggered by the upgrade of the
infohost server in January 2009. On
this server, the access logs were rotated weekly, not
daily. By this point, processor memories were
sufficiently large to allow all the input data to
reside in memory, obviating the need for the external
sort-merge representation of access data used in the
previous major version.
The author chose a fairly thorough rewrite in order to improve the navigation: the previous version had one huge page with links to all personal and official reports, and moving to a thumb-index by first letter makes it quicker for a user to find their report.
Python's XML libraries have also improved a lot since
the previous version, and the use of the lxml library and the etbuilder.py module, based on Fredrik Lundh's work, greatly
simplify and clarify the generation of web pages.