The job of the homelist.py script is pretty simple and
self-contained. It uses the LDAP server to generate a list
of current user accounts, and then looks in each account to
see if there is a publicly accessible homepage. It then
writes a list of those homepages (with links to the pages
themselves) to the standard output stream. Redirecting
this stream to the appropriate location will be taken care
by the crontab entry that runs
the script periodically.
Because it has standard library modules for accessing LDAP and file systems, as well as generating XHTML content, Python is a good choice for an implementation language.
At the moment, there is no easy way to generate XHTML pages
that use the PyStyler template that enforces the standard TCC
web page style, other than to generate a PyStyler .g file that is part of the PyStyler
structure.
To simplify the division of labor, the homelist.py script will
generate only the content of the page's XHTML <body> element, and a static
.g page and its .html equivalent will use a server-side
include to import that generated content.
The XHTML content generated will have this form:
<table border='3' cellpadding='3'>
<tr>
<th>User name</th>
<th>Homepage</th>
</tr>
<tr>
<td>gecos</td>
<td>
<tt><a href='url'>url</a></tt>
</td>
</tr>
...
</table>
<p>
This page is generated daily by a script. For documentation, see
<a href='http://www.nmt.edu/tcc/projects/homelist/'
><citetitle><tt>homelist.py</tt>:
A script to generate a list of user homepages</citetitle>
</a>.
</p>
<p>
Last updated YYYY-MM-DD HH:MM.
</p>
where the <tr> element is
repeated for each user's and the gecos of their
home page.
url
The displayed list will be sorted by account name. The
author would prefer to sort by last name and then first
name, but unfortunately the gecos
field is the entire name as "First Last", and there is
no obvious way to discern where the last name begins
(consider surnames like “de la O”,
“Ortiz y Vega”, and “ter Horst”).
A user will be considered to have a homepage if they have
a world-readable directory named either public_html or the older equivalent
www, and that directory contains a
world-readable file named either index.html or the older equivalent homepage.html.
We could just ask the HTTP server to serve us the page with the appropriate URL, but this has the undesireable side effect of causing a page fetch that will show up in our Web statistics on user page fetches. Also, why bother the HTTP server when we can just look for the user files directly?