Next / Previous / Contents / TCC Help System / NM Tech homepage

30. AccessSummary.__spiderFilter(): Filter out search engine spider accesses

This method checks a PageGet to see if it is probably an access by a search engine spider. For the strings that signify spiders, see Section 24.6, “AccessSummary.SPIDER_STRINGS: Spider detection strings”. These strings may occur anywhere in the accessor URL.

webstats.py
# - - -   A c c e s s S u m m a r y . _ _ s p i d e r F i l t e r

    def __spiderFilter ( self, pageGet ):
        '''Filter out accesses by search engine spiders.

          [ pageGet is a PageGet instance ->
             if no string in self.SPIDER_STRINGS is found in
              pageGet.accessor ->
                return True
              else -> return False ]
        '''
        for s in self.SPIDER_STRINGS:
            if s in pageGet.accessor:
                return False
        return True