Next / Previous / Contents / TCC Help System / NM Tech homepage

23. class AccessSummary: Principal data structure

For a general discussion of what this class's instance holds, and what methods of access are required, see Section 2.1, “Data structures”. Here is the interface.

webstats.py
# - - - - -   c l a s s   A c c e s s S u m m a r y

class AccessSummary:
    '''Container for all report summary data.

      Exports:
        AccessSummary(cutoffTime, now):
          [ (cutoffTime is the beginning of the report interval
            as a datetime.datetime) and
            (now is the end of the report interval as a
            datetime.datetime) ->
              return a new, empty AccessSummary instance for that
              interval ]

The .cutoffTime and .now attributes define the beginning and end of the reporting period. The .oldestHit attribute tracks the oldest timestamp actually observed from a relevant access log entry.

webstats.py
        .cutoffTime:
          [ the time self.EXPIRE_DAYS in the past as a
            datetime.datetime ]
        .now:
          [ the time of instantiation as a datetime.datetime ]
        .oldestHit:
          [ the oldest timestamp observed in any access record,
            as a datetime.datetime instance ]

We need to accumulate the total number of hits, and the total of remote hits, for the entire report. We can use a HitCount instance to hold those values.

webstats.py
        .sumHitCount:
          [ a HitCount instance giving the overall total
            and remote hit counts for all accesses in self ]

The constructor sets up an empty structure. The next method is used to add access records, in the form of PageGet instances, to the structure. This method takes care of all the filtering that is done on access records, such as removing records that are too old, not successful, and so forth. See Section 26, “AccessSummary.addPageGet(): Process one access record”.

webstats.py
        .addPageGet ( self, pageGet ):
          [ pageGet is a PageGet instance ->
              if (pageGet is not older than self.cutoffTime) and
              (pageGet is relevant by all filtering criteria) ->
                self  :=  self with that access added ]

We'll need a method that retrieves the access counts, represented as HitCount instances, for any given URL. This satisfies the need for a way to retrieve the statistics for the NMT homepage, as well as the logic that builds access report pages for specific users and official directories. See Section 38, “AccessSummary.getUrl(): Retrieve hit counts for a given URL”.

webstats.py
        .getUrl ( url ):
          [ url is a URL as a string ->
              if self has any accesses for url ->
                 return its access counts as a HitCount instance
              else -> raise KeyError ]

To create the hit parade page, we'll need a method that generates a sequence of HitCount instances in hit-parade order. See Section 39, “AccessSummary.genByHits(): Generate the hit parade”.

webstats.py
        .genByHits():
          [ generate the HitCount instances in self sorted
            according to HitCount.__cmp__() ]

To create the table of links to pages by their first character, we'll need two methods, one of which generates the first characters of personal pages in order, another to generate all the personal pages that start with that first character. See Section 40, “AccessSummary.genPersonalLetters(): First letters of personal accounts” and Section 41, “AccessSummary.genPersonals(): Generate accounts with the same first letter”.

webstats.py
        .genPersonalLetters():
          [ generate the sequence of initial letters of personal
            account names in ascending order as a sequence of strings ]
        .genPersonals(letter):
          [ letter is a 1-character string ->
              generate all the personal account names in self
              that start with letter ]

Next, we'll need a way of retrieving all the official directory names. See Section 42, “AccessSummary.genOfficials(): Generate names of official directories”.

webstats.py
        .genOfficials():              
          [ generate all the official directory names in self
            in ascending order ]

To generate one personal or official access report page, we'll need to be able to retrieve all the URLS in that account or directory. See Section 43, “AccessSummary.genPersonUrls(): All URLs for a given person” and Section 44, “AccessSummary.genOfficialUrls(): All URLs for an official directory”.

webstats.py
        .genPersonUrls(person):
          [ person is a TCC account name ->
              generate the URLs in self for this person as a
              sequence of strings ]
        .genOfficialUrls(dir):
          [ dir is an official directory name ->
              generate the URLs in self for this directory as
              a sequence of strings ]

To support these methods, we'll need a number of internal data structures. We'll use the Python set type for collections, because most of the time we'll be be testing new letters or directories for membership in those collections.

webstats.py
      State/Invariants:
        .__urlMap:
          [ a dictionary whose keys are all the URLs in self
            and each corresponding value is a HitCount instance
            summarizing the hits on that URL in self's reporting
            period ]
        .__personalLetterMap:
          [ a dictionary whose keys are the first characters of
            personal directories in self, and each corresponding
            value is a set of the personal directories in self
            that have that first character ]
        .__personalMap:
          [ a dictionary whose keys are the names of personal
            directories in self, and each corresponding value
            is a set of the URLs for that person ]
        .__officialMap:
          [ a dictionary whose keys are the names of official
            directories in self, and each corresponding value
            is a set of the URLs for that directory ]
    '''