Abstract
Describes a system that generates reports summarizing
web page access counts on the New Mexico Tech Computer
Center web server, http://www.nmt.edu/.
This publication is available in Web form and also as a PDF document.
Please forward any comments to tcc-doc@nmt.edu.
Table of Contents
webstats.py module: Main
programwebstats.py file: PrologueinputPhase(): Read the access logsreadLogFile(): Process one access
logbuildAllPages(): Generate all output
pagesaddSummaryTable(): Generate the table
summarizing all accessesbuildHitParade(): Build the hit paradeaccessReport(): Start a new access
report tableaccessRow(): Add one row to an access
report tableinstituteHomepage(): Access report for
“/”buildCategoryTable(): Build categories table
and all personal and official reportsbuildPersonalSide(): Letter and
personal pagesbuildLetter(): Build one letter
page and related personal pagesaddPersonalReport(): Generate links and
access report for one personal accountbuildReportPage(): Build one access
report pagebuildOfficialSide(): Build access
reports for official directoriesfatal():
Write a message and stopclass AccessSummary: Principal data
structureclass
AccessSummaryAccessSummary.EXPIRE_DAYS: Duration
of the report intervalAccessSummary.SYM_DOMAIN: Local
domain name, symbolic formAccessSummary.IP_DOMAIN: Local domain
in dotted formAccessSummary.BAD_STATUS_THRESHOLD:
Upper limit for status codesAccessSummary.IGNORED_EXTENSIONS:
File extensions to be ignoredAccessSummary.SPIDER_STRINGS: Spider
detection stringsAccessSummary.__init__():
ConstructorAccessSummary.addPageGet(): Process one
access recordAccessSummary.__isRelevant(): Filter out
irrelevant access recordsAccessSummary.__statusFilter(): Filter
by status codeAccessSummary.__extFilter(): Ignore
certain files by extensionAccessSummary.__spiderFilter(): Filter
out search engine spider accessesAccessSummary.__pwdFilter(): Filter out
password-protected pagesAccessSummary.__timeFilter(): Filter
out expired recordsAccessSummary.__specialFilter():
Special case filterAccessSummary.FILTER_FUNCTIONS:
Collection of filter functionsAccessSummary.__addHit(): Register one
accessAccessSummary.__addUrl(): Register
one access in self.__urlMapAccessSummary.__addCategory(): Add URL
to appropriate categoryAccessSummary.getUrl(): Retrieve hit
counts for a given URLAccessSummary.genByHits(): Generate the
hit paradeAccessSummary.genPersonalLetters():
First letters of personal accountsAccessSummary.genPersonals():
Generate accounts with the same first letterAccessSummary.genOfficials():
Generate names of official directoriesAccessSummary.genPersonUrls():
All URLs for a given personAccessSummary.genOfficialUrls():
All URLs for an official directoryclass HitCount: Hit counts for one
URLHitCount.__init__(): ConstructorHitCount.addHit(): Tally one accessHitcount.__cmp__(): Comparator methodpageget.py module: Apache
log file functionspageget.pyclass FixedTimeZonescanAccessLog(): Scan an access
log filescanAccessLine(): Process one
access log linescanGroups(): Top-level
disassembly of the log linescanQuoted(): Process
double-quoted string with escapesscanAccessGroup(): Process
accessorsfindHostList(): Derive the
effective host listscanDateGroup(): Process datescanCmdGroup: Process
command groupcleanURL(): Process the raw URLasciifyString: Encode
non-ASCII charactersasciifyChar(): Escape a
non-ASCII characterscanTailGroup(): Process remaining
fieldsclass PageGet: Describes
one page accessPageGet.__init__(): ConstructorPageGet.isFar(): Is this
an off-campus accessor?PageGet.__str__(): Debug
displayerror(): Write a message to
stderrmessage(): Send a message
to standard error and log filePageGet
This document describes the webstats.py script for reducing and
displaying statistics on page accesses from the Tech
Computer Center's web server. See the
specification for the externals of this script.
The code is documented in the “literate programming” style: the source file contains both the documentation and the script's source code. For more information on literate programming and the tool used to extract the source code, see A source extractor for lightweight literate programming.
This publication is available in Web form and also as a PDF document.