At the New Mexico Tech Computer Center, logs of all Web page accesses are maintained in directory /u/www/logs.
The server is constantly writing to file access_log. The format of this file is described in the Apache Web server documentation at http://www.apache.org/docs/mod/mod_log_config.html, under ``Common Log Format.''
Every night around midnight, a cron job runs that compresses this file and rotates it through seven filenames so that a full week's data is always available. The filenames are access_log.1.gz, access_log.2.gz, and so forth up through access_log.7.gz. The .1.gz is yesterday's, the .2.gz is from the day before yesterday, and so on.
However, weblog must maintain at least a month's statistics. Moreover, the log files are pretty sizeable, even compressed.
Hence, part of weblog's job is to summarize the data from the log files and write its own database. Summarizing the data reduces disk space requirements, and the database is kept elsewhere so that the timespan can be increased to the required length of a month.
Accordingly, the weblog program will have this intended function:
[ database := database - expired-records + new log file
web-pages := reports summarizing (database -
expired-records + new log file) ]
so that it can be run as a cron job, taking its input from
the (unzipped)
access_log.1.gz file,
discarding any records older than a month, adding records from
the latest daily log, and updating the web pages that show its
results.