This filter tests for various minor special cases that tend to crop up in the real world, but are not important to the audience.
# - - - A c c e s s S u m m a r y . _ _ s p e c i a l F i l t e r
def __specialFilter ( self, pageGet ):
'''Filter out special cases.
[ pageGet is a PageGet instance ->
if pageGet is something we would prefer to ignore ->
return False
else -> return True ]
'''
Here are the cases to date that need to be ignored:
Accesses to the /robots.txt
file.
Accesses to URLs starting with "/~ ".
The server will allow a space after the tilde, but
such references are rare.
If the URL starts with "/~" followed by
an uppercase letter, the server will redirect that to
the lowercased account name. Ignoring them removes
bogus entries from the letter table such as "/K".
For a few access lines, the URL starts with
“http:”. We can
ignore those.
#-- 1 --
if pageGet.url.startswith("/robots.txt"):
return False
#-- 2 --
if pageGet.url.startswith("/~ "):
return False
The next test is for a personal page starting with an
uppercase letter. By using the slice expression [2:3] instead of just the index expression [2], we don't have to test to insure that the URL
is at least three characters long: a slice beyond the end
of a string always returns the empty string.
#-- 3 --
if pageGet.url[2:3].isupper():
return False
#-- 4 --
if pageGet.url.startswith("http:"):
return False
else:
return True