Three fields remain at the end of the access log record:
the result code (e.g., 404 for “Page not
found”), the length of the returned file, and the
referring URL enclosed in double quotes. An earlier
version of this module extracted the referrer, and that
was a field in the PageGet
object, but this version will not retain that field.
So all we need is a pattern that matches an integer followed by one space.
# - - - s c a n T a i l G r o u p - - -
STATUS_FIELD = "c" # Result code, e.g., 200 ok, 404 not found
tailPat = re.compile (
r' ' # Space before status code
r'(?P<%s>' # Start STATUS_FIELD group
r'\d+' # Result code: all leading digits
r')'
r' ' # One space
% STATUS_FIELD )
def scanTailGroup ( tailGroup ):
"""Extract the status from the tail group.
[ if tailGroup is a valid tail group ->
return status from tailGroup
else -> raise ValueError ]
"""
First we apply the regular expression to the raw tail. Then, assuming it matches, we extract the status code, convert it to an integer, and return it as the result.
#-- 1 --
# [ if tailGroup matches tailPat ->
# m := a Match object describing that match
# else ->
# raise ValueError ]
m = tailPat.match ( tailGroup )
if m is None:
raise ValueError, ( "Invalid status/length/referrer group '%s'" %
tailGroup )
#-- 2 --
# [ return STATUS_FIELD from m as an integer ]
status = int ( m.group ( STATUS_FIELD ) )
return status