Abstract
Describes the implementation of a script that indexes all publicly visible personal homepages hosted at New Mexico Tech.
This publication is available in Web form and also as a PDF document.
Please forward any comments to tcc-doc@nmt.edu.
Table of Contents
main(): The main programcheckArguments(): Digest the command line
argumentsfatal(): Fatal errorbuildKwic(): Survey the users and build the
KWIC indexgenWebUsers(): Find users with
homepagesgenAllUsers(): Find all the users in
LDAPisWebUser(): Does this user have a personal
Web?indexUser(): Add one user to the KWIC
indexbuildAllPages(): Build the output
pagesscanInitialLetters(): What initial letters
occur in keywords?buildIndexPage(): Build the top-level
pagerelToURL(): URL of a relative pathrelToAbs(): Absolute path from a relative
pathaddSubpageLink(): Link from the index page
to one subpageletterInfo(): Information about a
subpagebuildSubpage(): Build the subpage for one
letteremptySubpage(): Set up an empty
subpageaddRow(): Add one row to the subpage
tableclass WebUser: Encapsulate user data
Everyone who has an account at the New Mexico Tech (NMT)
Computer Center (TCC) may host a personal homepage with a
URL of the form http://www.nmt.edu/~, where acctname is the user's login name.
acctname
The purpose of this project is to generate an index to all these homepages so that other members of the NMT community can find them.
Typically people will know either the given name or surname of the person they are looking for. Some campus clubs and other organizations also have homepages.
One venerable technology for automatically generating
indexes is KeyWord In Context (KWIC) indexing. This
technique is discussed in a related TCC publication, kwic.py: A
Python module to generate a Key Word In Context (KWIC)
index. This publication discusses
how such indexes are structured.
Accordingly, this project provides a script that can be run periodically as a cron job. When run, the script generates a set of pages that index the current personal homepages. Each homepage is indexed using the words in the accountholder's Gecos field in the TCC's LDAP server that defines the attributes of their user account.
The top-level page describes the overall structure of the index, and provides a link for each letter of the alphabet.
For each letter of the alphabet, a subpage contains all the index entries for words starting with that letter.