Next / Previous / Contents / TCC Help System / NM Tech homepage

51. The pageget.py module: Apache log file functions

All the machinery that deals with the Apache access log file (access_log) lives in module pageget.py.

This module contains two items:

51.1. Prologue to pageget.py

Here are the opening declarations to the pageget.py module.

pageget.py
"""pageget.py:  Functions for the Apache access_log recording page fetches.
"""

The standard sys library gives us access to the standard I/O streams; see http://docs.python.org/library/sys.html.

pageget.py
#================================================================
# IMPORTS
#----------------------------------------------------------------
import sys

We'll need the os module to remove redundant “/” and “..” elements from URL paths.

pageget.py
import os

We use the Python regular expression package re to break down the log lines into their component parts.

pageget.py
import re

The urllib module provides functions to handle URL encoding and decoding.

pageget.py
import urllib

The datetime module provides clock and calendar functions, so we can include only records from a given interval.

pageget.py
import datetime