| weblog: A web access logging tool | ||
|---|---|---|
| Prev | weblog: A web access logging tool | |
Two files contain the code for this script:
The run file contains the body of the script.
File weblog.py contains three classes and a function that are used by the run script.
Overall, the script has these steps:
Read our database file, weblog_db, containing the usage data from previous days. This database is represented as a WebAccessDB object (see weblog.py for the definition of this class). The WebAccessDB object is stored in variable db.
If this is the first run of the script, the database file doesn't exist yet. If it doesn't (we determine this by using os.path.exists()), we call the WebAccessDB constructor with a value of None for the file name, which causes it to return an empty WebAccessDB object.
The logic for this step is contained in function openDatabase().
Read the new log entries from the standard input stream, adding them to the database object db.
This step corresponds to function addLog().
Write the database back out to the file. This is subroutine writeDatabase.
Write three web pages into directory /u/www/docs/tcc/webstats/. The first page is homepage.html and shows the time range and the total on- and off-campus accesses. This page is pretty short. The other two pages are byurl.html, which lists all accesses alphabetized by URL, and byhits.html, showing the same data but sorted in descending order of number of hits.
The logic for this step is in function writeWeb().
The weblog.py module contains three classes used by the weblog application.
Class WebAccessDB is a container class for all the data we're keeping about web accesses. It segregates the data by date, so we can discard records that are older than we want. It can also write itself to a file, and its constructor can read back that file.
This object also keeps track of the total number of near (on-campus) and far (off-campus) accesses that it contains, as well as timestamps for its oldest and newest access.
Class PageUsage contains a summary of one or more page accesses. Its contents include the URL of a page and counters for the number of near and far accesses.
Class PageGet represents one line from the access log file (located in /u/www/logs/access_log). It encapsulates the logic for scanning log file lines and determining whether they are near or far accesses.