<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
 "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
  [
    <!ENTITY selfURL
      "http://www.nmt.edu/tcc/help/lang/python/examples/manweb/">
    <!ENTITY mw       "<application>manweb</application>">
    <!ENTITY man      "<code>man</code>">
    <!ENTITY m2h      "<application>man2html</application>">
    <!ENTITY zcat     "<application>zcat</application>">
  ]
>
<article>
  <articleinfo>
    <title>&mw;: A CGI script to display &man; pages</title>
    <authorgroup>
      <author>
        <firstname>John W.</firstname>
        <surname>Shipman</surname>
      </author>
    </authorgroup>
    <address><email>tcc-doc@nmt.edu</email>
    </address>
    <revhistory>
      <revision>
        <revnumber>$Revision: 1.16 $</revnumber>
        <date>$Date: 2008/01/09 05:14:14 $</date>
      </revision>
    </revhistory>
    <abstract>
      <para>
        Describes a Web script to display Unix <code >man</code >
        pages online.
      </para>
      <para>
        This publication is available in <ulink url="&selfURL;"
        >Web form</ulink > and also as a <ulink
        url="&selfURL;manweb.pdf" >PDF document</ulink >.  Please
        forward any comments to <userinput
        >tcc-doc@nmt.edu</userinput >.
      </para>
    </abstract>
  </articleinfo>
  <section id='intro'>
    <title>Introduction</title>
    <para>
      The purpose of this script is to display a selected Unix
      &man; page in HTML.  It uses the CGI (Common Gateway
      Interface) protocol to generate and display the web page.
    </para>
    <para>
      Some consideration is given to security.  Arguments passed
      to the script may contain only certain characters.
    </para>
  </section> <!--intro-->
  <section id='man-paths'>
    <title>The path names of &man; pages</title>
    <para>
      The standard location for &man; pages is in subdirectories
      of &#x201c;<filename >/usr/share/man</filename >&#x201d;.
      These subdirectories' names must start with &#x201c;<code
      >man</code >&#x201d;.  A cursory survey
      of &man; page directories in various installs found
      directories named
          
      <code >man0p</code >,
      <code >man1</code >,
      <code >man1p</code >,
      <code >man1x</code >,
      <code >man2</code >, 
      <code >man2x</code>,
      <code >man3</code >,
      <code >man3p</code >,
      <code >man3x</code >,
      <code >man4</code >, 
      <code >man4x</code>,
      <code >man5</code >, 
      <code >man5x</code>,
      <code >man6</code >, 
      <code >man6x</code>,
      <code >man7</code >, 
      <code >man7x</code>,
      <code >man8</code >, 
      <code >man8x</code>,
      <code >man9</code >, 
      <code >man9x</code>,
      <code >manl</code >,
      and
      <code >mann</code >.
    </para>
    <para>
      The only directory structure that matters is the one on the
      web server where this script is installed.  At this writing
      (December 2007), the aging <code >infohost</code > server
      has only <code >man1</code > through <code >man9</code >,
      plus <code >mann</code >, with no suffixes.  However, this
      script will be more liberal, in anticipation of future
      installs that may be more like client machines.
    </para>
    <para>
      The language of &man; pages is <application
      >groff</application >; for details, see <ulink
      url='http://www.gnu.org/software/groff/' >the project page
      at <code >gnu.org</code ></ulink >.
    </para>
    <para>
      The actual files are compressed with <application
      >gzip</application >, so their names end with <filename
      >.gz</filename >.
    </para>
    <para>
      Here are some examples of actual path names:
    </para>
    <programlisting
>/usr/share/man/man1/ls.1.gz
/usr/share/man/man1/Magick++-config.1.gz
/usr/share/man/man0p/math.h.0p.gz
/usr/share/man/man3/tan.3.gz
/usr/share/man/man3/Thread::Queue.3pm.gz
/usr/share/man/man3/trace.3x.gz
/usr/share/man/man3/ui.3ssl.gz
/usr/share/man/man3/vars.3pm.gz
/usr/share/man/man1p/nohup.1p.gz
/usr/share/man/man6x/bouncingcow.6x.gz
</programlisting>
    <para>
      In order to allow access only to legitimate &man; pages,
      this script will limit access to pathnames of this
      general form:
    </para>
    <programlisting
>/usr/share/man/man<replaceable >S</replaceable >/<replaceable
                >P</replaceable >.<replaceable >SX</replaceable
                >.gz
</programlisting>
    <variablelist>
      <varlistentry>
        <term>
          <code ><replaceable >S</replaceable ></code >
        </term>
        <listitem>
          <para>
            The section identifier.  This must be a single
            digit from 0 to 9 inclusive, optionally followed
            by either &#x201c;<code >p</code
            >&#x201d; or &#x201c;<code >x</code >&#x201d;.
          </para>
        </listitem>
      </varlistentry>
      <varlistentry>
        <term>
          <code >P</code >
        </term>
        <listitem>
          <para>
            Valid page names must start with a letter or underbar
            (<code >_</code >).  All additional characters must
            be letters, digits, underbars, or any of the
            characters {<code >-</code >, <code >+</code >, <code
            >.</code >, <code >:</code >}.
          </para>
        </listitem>
      </varlistentry>
      <varlistentry>
        <term>
          <code >X</code >
        </term>
        <listitem>
          <para>
            An optional suffix that appears in the file name
            right before the final &#x201c;<code >.gz</code
            >&#x201d;.  This suffix, if any, may contain only
            letters.
          </para>
        </listitem>
      </varlistentry>
    </variablelist>
  </section> <!--man-paths-->
  <section id='operation'>
    <title>Operation of the script</title>
    <para>
      To view a &man; page on the Web, use a URL of this form:
      <programlisting
>http://www.nmt.edu/tcc/cgi/manweb.cgi?p=<replaceable
                >P</replaceable >&amp;s=<replaceable
                >S</replaceable >&amp;x=<replaceable
                >X</replaceable >
</programlisting>
      where <code ><replaceable >P</replaceable ></code > is the
      page name, <code ><replaceable >S</replaceable ></code > is
      the optional section name, and <code ><replaceable
      >X</replaceable ></code > is the optional suffix string, as
      described in <xref linkend='man-paths' />.
    </para>
    <para>
      Here are some examples.  The first column shows the part of
      the URL after the &#x201c;<code >?</code >&#x201d;; the
      second column shows the resulting path name relative to
      <filename >/usr/share/man/</filename >.  Note that all
      special characters must be escaped using the convention
      that a character with hexadecimal code <code ><replaceable
      >xx</replaceable ></code > is represented as &#x201c;<code
      >%<replaceable >xx</replaceable ></code >&#x201d;.
    </para>
    <informaltable>
      <tgroup cols="2">
        <colspec align="left"/>
        <colspec align="left"/>
        <thead>
          <row>
            <entry>Arguments</entry>
            <entry>Path</entry>
          </row>
        </thead>
        <tbody>
          <row>
            <entry valign="top"><code>p=ls</code></entry>
            <entry valign="top">
              <filename >man1/ls.1.gz</filename >
            </entry>
          </row>
          <row>
            <entry valign="top"><code>p=Magick%2B%2B%2Dconfig</code></entry>
            <entry valign="top">
              <filename >man1/Magick++-config.1.gz</filename >
            </entry>
          </row>
          <row>
            <entry valign="top"><code>p=math%2Eh&amp;s=0p</code></entry>
            <entry valign="top">
              <filename >man0p/math.h.0p.gz</filename >
            </entry>
          </row>
          <row>
            <entry valign="top">
              <code>s=3&amp;p=Thread%3A%3AQueue&amp;x=pm</code>
            </entry>
            <entry valign="top">
              <filename >man3/Thread::Queue.3pm.gz</filename >
            </entry>
          </row>
        </tbody>
      </tgroup>
    </informaltable>
    <para>
      If any of the supplied arguments are invalid, or if the
      specified &man; page does not exist, the script will
      display an error page.  In the latter case, mail will also
      be sent to <code >webmaster@nmt.edu</code > so that broken
      links can be repaired.
    </para>
    <para>
      One potential problem is that the set of &man; pages on the
      server will be different from the set of pages available on
      clients.  Since this script is primarily for basic &man;
      pages that exist everywhere, we'll deal with that if and
      when it becomes a problem.  One possible solution is to
      duplicate the set of client &man; pages on <code
      >infohost</code >, and add an option to select the client
      or server set.
    </para>
  </section> <!--operation-->
  <section id='internals'>
    <title>Internals</title>
    <para>
      The actual work of reformatting the &man; file into HTML is
      done by <filename >/usr/bin/&m2h;</filename >.  The &mw;
      script is primarily a wrapper that does stringent error
      checking.
    </para>
    <para>
      Here is the actual Python script, in <ulink
      url='http://www.nmt.edu/~shipman/soft/litprog/' >
      lightweight literate programming</ulink > form.
    </para>
    <section id='install'>
      <title>Installation</title>
      <para>
        Executable CGI scripts live in directory &#x201c;<code
        >/u/www/docs/tcc/cgi/</code >&#x201d;.  To install this
        script, either soft-link that directory to here, or copy
        the <code >manweb.cgi</code > file there.
      </para>
      <para>
        Online files related to this project:
      </para>
      <itemizedlist>
        <listitem>
          <para>
            <ulink url='&selfURL;manweb.cgi' ><filename
            >manweb.cgi</filename ></ulink >:  Source of the
            CGI script.
          </para>
        </listitem>
        <listitem>
          <para>
            <ulink url='&selfURL;manweb.xml' ><filename
            >manweb.xml</filename ></ulink >: DocBook source for
            this document.
          </para>
        </listitem>
      </itemizedlist>
    </section> <!--install-->
    <section id='prologue'>
      <title>Prologue</title>
      <para>
        The script starts with the usual &#x201c;pound-bang
        line&#x201d; to make it self-executing, followed by a
        comment pointing readers back to this documentation.
      </para>
      <programlisting role='outFile:manweb.cgi'
>#!/usr/bin/env python
#================================================================
# &mw;.cgi:  Display a man page in HTML.  For documentation, see:
#   &selfURL;
#----------------------------------------------------------------
</programlisting>
    </section> <!--prologue-->
    <section id='imports'>
      <title>Imports</title>
      <para>
        We'll need two standard Python modules: <code >sys</code
        > for the standard streams, and <code >os</code > for
        the <code >popen</code > function to
        execute commands in a subshell, and the <code
        >environ</code > dictionary containing the currently
        defined environmental variables.
      </para>
      <programlisting role='outFile:manweb.cgi'
>#================================================================
# Imports
#----------------------------------------------------------------

import sys, os
</programlisting>
      <para>
        Python's <code >cgi</code > module takes care of
        retrieving the arguments from the URL.
      </para>
      <programlisting role='outFile:manweb.cgi'
>import cgi
</programlisting>
      <para>
        The Python regular expression module will handle checking
        the arguments for correct syntax.
      </para>
      <programlisting role='outFile:manweb.cgi'
>import re
</programlisting>
    </section> <!--imports-->
    <section id='constants'>
      <title>Manifest constants</title>
      <para>
        Next we'll define constants used by the script.
      </para>
      <section id='MAN_BASE'>
        <title><code >MAN_BASE</code ></title>
        <para>
          The absolute path to the root of the structure
          containing the &man; pages.
      </para>
      <programlisting role='outFile:manweb.cgi'
>#================================================================
# Manifest constants
#----------------------------------------------------------------

MAN_BASE  =  "/usr/share/man/man"
</programlisting>
      </section> <!--End MAN_BASE-->
      <section id='PAGENAME_RE'>
        <title><code >PAGENAME_RE</code ></title>
        <para>
          Regular expressions are used to check the format of the
          script's arguments.  The <code >PAGENAME_RE</code >
          regular expression is used to check the <code
          ><replaceable >pagename</replaceable ></code >.
        </para>
        <important>
          <para>
            The pattern ends with the end-of-line anchor (<code
            >$</code >) to ensure that there are no stray
            unmatched characters at the end of the string.  This
            is a security consideration.  Since the page name
            will become part of a shell command, we need to be
            sure that no shell metacharacters such as pipe (<code
            >|</code >) are included&#x2014;that can be a
            penetration route for evildoers.
          </para>
        </important>
        <programlisting role='outFile:manweb.cgi'
>PAGENAME_RE  =  re.compile (
    r'[_a-zA-Z]'           # Starts with letter or underbar
    r'[-+.:_a-zA-Z0-9]*'   # Zero or more: + - . : _ letter digit
    r'$' )                 # Match the entire string
</programlisting>
      </section> <!--End PAGENAME_RE-->
      <section id='SECTION_RE'>
        <title><code >SECTION_RE</code ></title>
        <para>
          The <code >SECTION_RE</code > regular expression is
          used to test the section number.  This regular
          expression covers all the currently known cases as well
          as some unlikely combinations such as <code
          >manlp</code > or <code >mannx</code >; there is no
          great downside to letting these through, since there
          won't be any &man; pages at that path.
        </para>
        <programlisting role='outFile:manweb.cgi'
>SECTION_RE  =  re.compile (
    r'[0-9ln]'           # Starts with one digit, letter l, or n
    r'[px]?'             # Optional suffix
    r'$' )               # Match the entire string
</programlisting>
      </section> <!--End SECTION_RE-->
      <section id='SUFFIX_RE'>
        <title><code >SUFFIX_RE</code ></title>
        <para>
          This is the regular expression used to validate the
          suffix argument: it may contain only letters.
        </para>
        <programlisting role='outFile:manweb.cgi'
>SUFFIX_RE  =  re.compile (
    r'[a-z]+'           # One or more letters
    r'$' )              # Match the entire string
</programlisting>
      </section> <!--End SUFFIX_RE-->
      <section id='GATEWAY'>
        <title><code >GATEWAY</code ></title>
        <para>
          This is the name of the environmental variable that
          tells us what gateway interface standard is executing
          us; it is unset if we're not being called as a CGI
          script.
        </para>
        <programlisting role='outFile:manweb.cgi'
>GATEWAY  =  "GATEWAY_INTERFACE"
</programlisting>
      </section> <!--End GATEWAY-->
      <section id='CGI_VERSION'>
        <title><code >CGI_VERSION</code ></title>
        <para>
          This constant is used to check for the correct
          environment: if we are being called as a CGI script,
          the <link linkend='GATEWAY' ><code >GATEWAY</code
          ></link > environmental variable should have this
          value.
        </para>
        <programlisting role='outFile:manweb.cgi'
>CGI_VERSION  =  "CGI/1.1"
</programlisting>
      </section> <!--End CGI_VERSION-->
      <section id='PAGE_NAME_ARG'>
        <title><code >PAGE_NAME_ARG</code ></title>
        <para>
          This is the name of the argument, passed in the URL, that
          defines what page the user wants to see.
        </para>
        <programlisting role='outFile:manweb.cgi'
>PAGE_NAME_ARG  =  "p"
</programlisting>
      </section> <!--End PAGE_NAME_ARG-->
      <section id='SECTION_ARG'>
        <title><code >SECTION_ARG</code ></title>
        <para>
          This is the name of the optional argument, passed in the URL,
          that defines which section contains the desired &man; page.
        </para>
      <programlisting role='outFile:manweb.cgi'
>SECTION_ARG  =  "s"
</programlisting>
      </section> <!--End SECTION_ARG-->
      <section id='DEFAULT_SECTION'>
        <title><code >DEFAULT_SECTION</code ></title>
      <para>
        This constant is the default section number when one is
        not supplied.
      </para>
      <programlisting role='outFile:manweb.cgi'
>DEFAULT_SECTION  =  "1"
</programlisting>
      </section> <!--End DEFAULT_SECTION-->
      <section id='SUFFIX_ARG'>
        <title><code >SUFFIX_ARG</code ></title>
        <para>
          This is the name of the optional suffix argument,
          passed in via the URL, that is appended before the
          terminal &#x201c;<code >.gz</code >&#x201d; of the
          &man; file name.  The default value is an empty string.
        </para>
        <programlisting role='outFile:manweb.cgi'
>SUFFIX_ARG  =  "x"
</programlisting>
      </section> <!--End SUFFIX_ARG-->
    </section> <!--constants-->
    <section id='main'>
      <title>Main program</title>
      <para>
        Here is the main program.  The source code comments in
        square brackets are intended functions for the <ulink
        url='http://www.nmt.edu/~shipman/soft/clean/' >Cleanroom
        software development methodology.</ulink >
      </para>
      <programlisting role='outFile:manweb.cgi'
># - - -   m a i n

def main():
    """Main program: display a man page in HTML.

      [ if (this script is being executed by the CGI protocol) and
        (its arguments are valid) and
        (the arguments described an existing man page ->
          sys.stdout  +:=  the output of &m2h; reformatting
                           that page
        else ->
          sys.stdout  +:=  an error message as HTML ]
    """
</programlisting>
      <para>
        We must check that the protocol is CGI, retrieve the
        arguments, and check them for validity.  See <xref
        linkend='processArguments' />, which does not return
        unless everything is valid.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 1 --
    # [ if (this script is being executed by the CGI protocol) and
    #   (its arguments are valid) ->
    #     pageName  :=  the page name argument
    #     section  :=  the section number argument, defaulting to "1"
    #   else ->
    #     sys.stdout  +:=  an error message as HTML
    #     stop execution ]
    pageName, section, suffix  =  processArguments()
</programlisting>
      <para>
        Next we build the absolute path name to the input file
        and check to see if it exists.  See <xref
        linkend='findManFile' />.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 3 --
    # [ if the man file specified by pagename and section exists ->
    #     manPath  :=  the absolute pathname to that file
    #   else ->
    #     sys.stdout  +:=  an error message as HTML
    #     stop execution ]
    manPath  =  findManFile ( pageName, section, suffix )
</programlisting>
      <para>
        At this point we can fire up &m2h;, feed it the file, and
        route its output back to our standard output stream.
        Things can still go wrong, however.  See <xref linkend='convert' />.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 4 --
    # [ if manPath specifies a readable, valid man file ->
    #     sys.stdout  +:=  the output of man2html operating on
    #                      that file
    #   else ->
    #     sys.stdout  +:=  an error message as HTML ]
    convert ( manPath )
</programlisting>
    </section> <!--main-->
    <section id='processArguments'>
      <title><code >processArguments()</code >: Retrieve and
      check the CGI arguments</title>
      <para>
        This function checks the protocol, retrieves the
        arguments, and checks them for validity.
      </para>
      <programlisting role='outFile:manweb.cgi'
># - - -   p r o c e s s A r g u m e n t s

def processArguments():
    """Retrieve and validate the arguments.

      [ if (this script is being executed by the CGI protocol) and
        (its arguments are valid) ->
          return (page name argument, section number argument
          defaulting to "1", suffix defaulting to "")
        else ->
          sys.stdout  +:=  an error message as HTML
          stop execution ]
    """
</programlisting>
      <para>
        First we check to see if the CGI protocol is in effect.
        If so, the environmental variable <code >GATEWAY</code >
        will have the value <code >CGI_VERSION</code >; see <xref
        linkend='GATEWAY' /> and <xref linkend='CGI_VERSION' />.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 1 --
    # [ if environmental variable GATEWAY is defined has the value
    #   CGI_VERSION ->
    #     I
    #   else ->
    #     sys.stdout  +:=  an HTML error message
    #     stop execution ]
    try:
        gateway  =  os.environ [ GATEWAY ]
        if  gateway != CGI_VERSION:
            errorPage ( "Incorrect CGI protocol version." )
    except KeyError:
        errorPage ( "This script must be executed using the CGI "
                    "protocol." )
</programlisting>
      <para>
        Next we use Python's <code >cgi</code > module to extract
        the arguments following the &#x201c;<code >?</code
        >&#x201d; in the URL, converting them into a <code
        >cgi.FieldStorage</code > instance.  For details of this
        module, see the <ulink
        url='http://docs.python.org/lib/module-cgi.html' >online
        documentation</ulink >.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 2 --
    # [ form  :=  a cgi.FieldStorage instance representing the
    #       arguments from the URL used to invoke this script ]
    form  =  cgi.FieldStorage()
</programlisting>
      <para>
        The <code >PAGE_NAME_ARG</code > argument is required;
        the <code >SECTION_ARG</code > is optional and defaults
        to <code >DEFAULT_SECTION</code >; and the <code
        >SUFFIX_ARG</code > is optional and defaults to an empty
        string.
      </para>
      <para>
        The <code >form</code > instance acts like a dictionary;
        if an argument was not supplied, attempting to extract it
        will raise a <code >KeyError</code > exception.
        See <xref linkend='PAGE_NAME_ARG' />.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 3 --
    # [ if (form[PAGE_NAME_ARG] does not exist) or
    #   (it exists but is not valid) ->
    #     sys.stdout  +:=  an error message as HTML
    #     stop execution
    #   else ->
    #     pageName  :=  the corresponding value
    try:
        pageName  =  form[PAGE_NAME_ARG].value
    except KeyError:
        errorPage ( "The 'p=PAGENAME' argument is required." )
</programlisting>
      <para>
        For security reasons, we need to make sure that the page
        name does not contain any unusual characters that might
        cause security holes.  See <xref linkend='PAGENAME_RE' />
        for the regular expression used to check its syntax.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 4 --
    # [ if pageName matches PAGENAME_RE ->
    #     I
    #   else ->
    #     sys.stdout  +:=  an error message as HTML
    #     stop execution ]
    m  =  PAGENAME_RE.match ( pageName )
    if  m is None:
        errorPage ( "The 'p=PAGENAME' argument is not valid." )
</programlisting>
      <para>
        Retrieval and checking of the section name proceeds
        similarly, except that we have to supply a default value
        if the section name wasn't specified.  See <xref
        linkend='SECTION_ARG' /> and <xref
        linkend='DEFAULT_SECTION'/>.  For the regular expression used
        to validate the section name, see <xref
        linkend='SECTION_RE' />.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 5 --
    # [ if form[SECTION_ARG] does not exist ->
    #     sectionName  :=  DEFAULT_SECTION
    #   else if it exists and is valid ->
    #     sectionName  :=  the corresponding value
    #   else ->
    #     sys.stdout  +:=  an error message as HTML
    #     stop execution
    try:
        section  =  form[SECTION_ARG].value
        m  =  SECTION_RE.match ( section )
        if  m is None:
            errorPage ( "The 's=SECTION' argument is not valid." )
    except KeyError:
        section  =  DEFAULT_SECTION
</programlisting>
      <para>
        Also check for the optional suffix argument, which
        defaults to the empty string.  See <xref
        linkend='SUFFIX_ARG' /> and <xref linkend='SUFFIX_RE' />.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 7 --
    # [ if form[SUFFIX_ARG] does not exist ->
    #     suffix  :=  ""
    #   else if form[SUFFIX_ARG] exists and is valid ->
    #     suffix  :=  the corresponding value
    #   else ->
    #     sys.stdout  +:=  an error message as HTML
    #     stop execution
    try:
        suffix  =  form[SUFFIX_ARG].value
        m  =  SUFFIX_RE.match ( suffix )
        if  m is None:
            errorPage ( "The 'x=SUFFIX' argument is not valid." )
    except KeyError:
        suffix  =  ""
</programlisting>
      <para>
        Finally, return the three variable parts of the file
        name.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 8 --
    return (pageName, section, suffix)
</programlisting>
    </section> <!--processArguments-->
    <section id='findManFile'>
      <title><code >findManFile()</code >: Locate the input file</title>
      <para>
        This function takes the page name and section, builds a
        full absolute path name, and checks that the file exists.
      </para>
      <programlisting role='outFile:manweb.cgi'
># - - -   f i n d M a n F i l e

def findManFile ( pageName, section, suffix ):
    """Locate the man page and verify its existence.

      [ (pageName is a page name as a string) and
        (section is a section number as a string) and
        (suffix is a suffix string) ->
          if that man page exists in that section ->
            return the absolute path to the page
          else ->
            sys.stdout  +:=  an error message as HTML
            stop execution ]
    """
</programlisting>
      <para>
        First build the absolute pathname to the file.
        Then see if the file exists.  See <xref
        linkend='MAN_BASE' /> for the constant that defines the
        path to the top directory of the &man; page structure.       
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 1 --
    # [ manPath  :=  path to the man file for page name
    #       (pageName), section (section), and suffix (suffix) ]
    manPath  =  ( "%s%s/%s.%s%s.gz" %
                  (MAN_BASE, section, pageName, section, suffix) )
    #-- 2 --
    # [ if manPath names an existing file ->
    #     I
    #   else ->
    #     sys.stdout  +:=  an error message in HTML
    #     stop execution ]
    if  not os.path.exists ( manPath ):
        errorPage ( "No such man page: %s" % manPath )

    #-- 3 --
    return manPath
</programlisting>
    </section> <!--findManFile-->
    <section id='convert'>
      <title><code >convert</code >: Convert to HTML and output</title>
      <para>
        This function attempts to send the file specified by
        <code >manPath</code > to &m2h;, capture its output, and
        send it to our standard output.
      </para>
      <programlisting role='outFile:manweb.cgi'
># - - -   c o n v e r t

def convert ( manPath ):
    """Convert a man page to HTML and display it.

      [ manPath is a string ->
          if manPath specifies a readable, valid man file ->
            sys.stdout  +:=  the output of man2html operating on
                             that file
          else ->
            sys.stdout  +:=  an error message as HTML ]
    """
</programlisting>
      <para>
        The first complication is that the &man; file will be
        compressed with <code >gzip</code >, and &m2h; will not
        uncompress them.  We can use &zcat; to uncompress it.
        So, the command we execute is a pipe of this form:
        <programlisting
>zcat <replaceable >file</replaceable > | man2html
</programlisting>
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 1 --
    command  =  "zcat %s|man2html" % manPath
</programlisting>
      <para>
        The standard Python function <code >os.popen()</code >
        executes a shell command in a subprocess and returns a
        readable file handle that we can use to retrieve its
        output.  If the command fails, we don't find out until we
        close the file; the <code >.close()</code > method
        returns a Boolean value, <code >True</code > if the
        operation failed, <code >False</code > if it succeeded.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 2 --
    # [ pipe  :=  an open file handle for reading the output of
    #             command ]
    pipe  =  os.popen ( command )
</programlisting>
      <para>
        We have to retrieve the output before we can close the
        pipe and find out whether the operation failed.  If it
        failed, it won't send us any data, so in that case this
        step does nothing.
      </para>
      <programlisting role='outFile:manweb.cgi'
>    #-- 3 --
    # [ sys.stdout  +:=  contents of pipe ]
    sys.stdout.write ( pipe.read() )

    #-- 4 --
    # [ if pipe.close returns a false value ->
    #     I
    #   else ->
    #     sys.stdout  +:=  an error message in HTML ]
    status  =  pipe.close()
    if  status:
        errorPage ( "The man2html conversion step for file '%s' "
                    "failed." % manPath )
</programlisting>
    </section> <!--convert-->
    <section id='errorPage'>
      <title><code >errorPage</code ></title>
      <para>
        This function creates an HTML page telling the user that
        the operation failed.  The argument is a text string
        giving details of the failure.
      </para>
      <programlisting role='outFile:manweb.cgi'
># - - -   e r r o r P a g e

def errorPage ( text ):
    """Write an HTML error message page.

      [ text is a string ->
          sys.stdout  +:=  an HTML error message page containing
                           text
          stop execution ]
    """
    #-- 1 --
    # [ sys.stdout  +:=  CGI headers for generating HTML ]
    print "Content-type: text/html"
    print
    print "&lt;html&gt;"
    print "&lt;head&gt;"
    print "  &lt;title&gt;manweb error&lt;/title&gt;"
    print "&lt;/head&gt;"
    print "&lt;body&gt;"
    print "&lt;h1&gt;&lt;tt&gt;manweb&lt;/tt&gt; script error&lt;/h1&gt;"
    print "&lt;p&gt;We were unable to process your request: %s" % text
    print "&lt;/p&gt;"
    print "&lt;/body&gt;"
    print "&lt;/html&gt;"
    sys.exit(1)
</programlisting>
    </section> <!--errorPage-->
    <section id='epilogue'>
      <title>Epilogue</title>
      <para>
        The last lines of the script invoke the <code
        >main()</code >.
      </para>
      <programlisting role='outFile:manweb.cgi'
>if  __name__ == "__main__":
    main()
</programlisting>
    </section> <!--epilogue-->
  </section> <!--internals-->
  <section id='errors'>
    <title>Error statistics</title>
    <para>
      Here is a list of all errors discovered since first compilation.
    </para>
    <section id='type-errors'>
      <title>Run-time type errors</title>
      <para>
        These are errors that would have been caught in a more
        strongly-typed language.
      </para>
      <orderedlist>
        <listitem>
          <para>
            In <code >processArguments()</code >, the values from
            the <code >FieldStorage</code > constructor are
            instances, not simple strings; it is necessary to
            extract their <code >.value</code > attribute to get
            the actual contents.
          </para>
        </listitem>
        <listitem>
          <para>
            At one point <code >convert()</code > was going to
            take three arguments, including the original page
            name and section, for better error reporting.  I
            changed my mind but didn't change the definition of
            this function.
          </para>
        </listitem>
<!--\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-->
      </orderedlist>
    </section> <!--type-errors-->
  </section> <!--errors-->
</article>

