Next / Previous / Contents / TCC Help System / NM Tech homepage

Abstract

Describes the implementation of a script that indexes all publicly visible personal homepages hosted at New Mexico Tech.

This publication is available in Web form and also as a PDF document. Please forward any comments to tcc-doc@nmt.edu.

Table of Contents

1. How do people search for local homepages?
2. Online files
3. Design notes
3.1. Where are the personal homepages?
3.2. Page format
3.3. Root page design
3.4. Subpage design
4. Operation of the script
5. Prologue
6. Imported modules
7. Manifest constants
7.1. UID_ATTR
7.2. GECOS_ATTR
7.3. LDAP_SERVER
7.4. ACCOUNTS_DN
7.5. WEB_DIR
7.6. NMT_URL
7.7. TCC_URL
7.8. TCC_ABS
7.9. HTML_SUFFIX
7.10. INDEX_NAME
7.11. SUBPAGE_NAME
7.12. LETTER_PAGE_NAME
7.13. BASE_TITLE
7.14. SPACER
8. main(): The main program
9. checkArguments(): Digest the command line arguments
10. fatal(): Fatal error
11. buildKwic(): Survey the users and build the KWIC index
12. genWebUsers(): Find users with homepages
13. genAllUsers(): Find all the users in LDAP
14. isWebUser(): Does this user have a personal Web?
15. indexUser(): Add one user to the KWIC index
16. buildAllPages(): Build the output pages
17. scanInitialLetters(): What initial letters occur in keywords?
18. buildIndexPage(): Build the top-level page
19. relToURL(): URL of a relative path
20. relToAbs(): Absolute path from a relative path
21. addSubpageLink(): Link from the index page to one subpage
22. letterInfo(): Information about a subpage
23. buildSubpage(): Build the subpage for one letter
24. emptySubpage(): Set up an empty subpage
25. addRow(): Add one row to the subpage table
26. class WebUser: Encapsulate user data
27. Epilogue

1. How do people search for local homepages?

Everyone who has an account at the New Mexico Tech (NMT) Computer Center (TCC) may host a personal homepage with a URL of the form http://www.nmt.edu/~acctname, where acctname is the user's login name.

The purpose of this project is to generate an index to all these homepages so that other members of the NMT community can find them.

Typically people will know either the given name or surname of the person they are looking for. Some campus clubs and other organizations also have homepages.

One venerable technology for automatically generating indexes is KeyWord In Context (KWIC) indexing. This technique is discussed in a related TCC publication, kwic.py: A Python module to generate a Key Word In Context (KWIC) index. This publication discusses how such indexes are structured.

Accordingly, this project provides a script that can be run periodically as a cron job. When run, the script generates a set of pages that index the current personal homepages. Each homepage is indexed using the words in the accountholder's Gecos field in the TCC's LDAP server that defines the attributes of their user account.

  • The top-level page describes the overall structure of the index, and provides a link for each letter of the alphabet.

  • For each letter of the alphabet, a subpage contains all the index entries for words starting with that letter.