Next / Previous / Contents / TCC Help System / NM Tech homepage

51.15. scanTailGroup(): Process remaining fields

Three fields remain at the end of the access log record: the result code (e.g., 404 for “Page not found”), the length of the returned file, and the referring URL enclosed in double quotes. An earlier version of this module extracted the referrer, and that was a field in the PageGet object, but this version will not retain that field.

So all we need is a pattern that matches an integer followed by one space.

pageget.py
# - - -   s c a n T a i l G r o u p   - - -

STATUS_FIELD    =  "c"          # Result code, e.g., 200 ok, 404 not found

tailPat  =  re.compile (
    r' '                        # Space before status code
    r'(?P<%s>'                  # Start STATUS_FIELD group
      r'\d+'                    # Result code: all leading digits
    r')'
    r' '                        # One space
    % STATUS_FIELD )

def scanTailGroup ( tailGroup ):
    """Extract the status from the tail group.

      [ if tailGroup is a valid tail group ->
          return status from tailGroup
        else -> raise ValueError ]
    """

First we apply the regular expression to the raw tail. Then, assuming it matches, we extract the status code, convert it to an integer, and return it as the result.

pageget.py
    #-- 1 --
    # [ if tailGroup matches tailPat ->
    #     m  :=  a Match object describing that match
    #   else ->
    #     raise ValueError ]
    m  =  tailPat.match ( tailGroup )
    if  m is None:
        raise ValueError, ( "Invalid status/length/referrer group '%s'" %
                            tailGroup )

    #-- 2 --
    # [ return STATUS_FIELD from m as an integer ]
    status    =  int ( m.group ( STATUS_FIELD ) )
    return status