This function parses the accessor group, consisting of everything up to the first square bracket in the log record.
# - - - s c a n A c c e s s G r o u p - - -
def scanAccessGroup ( accessGroup ):
"""Determine the set of effective accessor IP addresses from accessGroup
[ accessGroup is a string ->
if accessGroup is a valid host-group ->
return (effective host list from accessGroup,
username from accessGroup or "-" if none)
else ->
sys.stderr +:= error message
return an empty list ]
"""
The accessGroup argument includes
a trailing space, so we use the string .rstrip() method to remove that. What is
left must be four or more fields separated by whitespace:
The primary accessor.
One or more secondary accessor. If there is more than one, all but the last will have a trailing comma.
The penultimate field must be "-".
The last field is "-"
normally, or the username if the page is password-protected.
#-- 1 --
# [ fieldList := fields of accessGroup separated by
# whitespace, omitting trailing whitespace ]
fieldList = accessGroup.rstrip().split(' ')
The original version of the above line had a subtle bug.
Originally, the .split() call
had no argument, so fields were split on clumps of
whitespace. Then in the 20050316 log I found this
accessor group:
'82.115.10.14 - -'
Note the two spaces before the first hyphen. That will
yield three strings instead of four. The fix is to use
.split(' '), which yields
the correct four strings.
Next we separate the fields into three groups: the primary
accessor in priHost; a list of
secondary accessors in secHostList; and the user name in username.
#-- 2 --
# [ if fieldList consists of four or more fields of which the
# next-to-last is "-" ->
# priHost := first field of fieldList
# secHostList := fields of fieldList from second on,
# omitting last two
# userName := last field of fieldList
# else -> raise ValueError ]
if ( ( len ( fieldList ) >= 2 ) and
( fieldList[-2] == "-" ) ):
priHost = fieldList[0]
secHostList = fieldList[1:-2]
username = fieldList[-1]
else:
raise ValueError, ( "Badly formed accessor group: '%s'" %
accessGroup )
Next we derive the list of effective hosts: if secHostList contains only "-", then priHost is
the only effective host. Otherwise, secHostList (with the trailing commas removed
from its elements) is the effective host list.
#-- 3 --
# [ if secHostList is empty or contains only "-" or "" ->
# hostList := [ priHost ]
# else ->
# hostList := secHostList with any trailing commas
# removed from its elements ]
hostList = findHostList ( priHost, secHostList )
#-- 4 --
if len(hostList) == 0:
error ( "No hosts found: '%s'" % accessGroup )
return (hostList, username)