Next / Previous / Contents / TCC Help System / NM Tech homepage

51.8. scanAccessGroup(): Process accessors

This function parses the accessor group, consisting of everything up to the first square bracket in the log record.

pageget.py
# - - -   s c a n A c c e s s G r o u p   - - -

def scanAccessGroup ( accessGroup ):
    """Determine the set of effective accessor IP addresses from accessGroup

      [ accessGroup is a string ->
          if accessGroup is a valid host-group ->
            return (effective host list from accessGroup,
            username from accessGroup or "-" if none)
          else ->
            sys.stderr +:= error message
            return an empty list ]
    """

The accessGroup argument includes a trailing space, so we use the string .rstrip() method to remove that. What is left must be four or more fields separated by whitespace:

  1. The primary accessor.

  2. One or more secondary accessor. If there is more than one, all but the last will have a trailing comma.

  3. The penultimate field must be "-".

  4. The last field is "-" normally, or the username if the page is password-protected.

pageget.py
    #-- 1 --
    # [ fieldList  :=  fields of accessGroup separated by
    #                  whitespace, omitting trailing whitespace ]
    fieldList  =  accessGroup.rstrip().split(' ')

Note

The original version of the above line had a subtle bug. Originally, the .split() call had no argument, so fields were split on clumps of whitespace. Then in the 20050316 log I found this accessor group:

'82.115.10.14  - -'

Note the two spaces before the first hyphen. That will yield three strings instead of four. The fix is to use .split(' '), which yields the correct four strings.

Next we separate the fields into three groups: the primary accessor in priHost; a list of secondary accessors in secHostList; and the user name in username.

pageget.py
    #-- 2 --
    # [ if fieldList consists of four or more fields of which the
    #   next-to-last is "-" ->
    #     priHost      :=  first field of fieldList
    #     secHostList  :=  fields of fieldList from second on,
    #                      omitting last two
    #     userName     :=  last field of fieldList
    #   else -> raise ValueError ]
    if  ( ( len ( fieldList ) >= 2 ) and
          ( fieldList[-2] == "-" ) ):
        priHost      =  fieldList[0]
        secHostList  =  fieldList[1:-2]
        username     =  fieldList[-1]
    else:
        raise ValueError, ( "Badly formed accessor group: '%s'" %
                            accessGroup )

Next we derive the list of effective hosts: if secHostList contains only "-", then priHost is the only effective host. Otherwise, secHostList (with the trailing commas removed from its elements) is the effective host list.

pageget.py
    #-- 3 --
    # [ if secHostList is empty or contains only "-" or "" ->
    #     hostList  :=  [ priHost ]
    #   else ->
    #     hostList  :=  secHostList with any trailing commas
    #                        removed from its elements ]
    hostList  =  findHostList ( priHost, secHostList )

    #-- 4 --
    if  len(hostList) == 0:
        error ( "No hosts found: '%s'" % accessGroup )
    return (hostList, username)