Next / Previous / Contents

3.2. Processing

To extract the program code from the document's source file, we use a very simple program that reads the file from beginning to end, finds lines that delimit code fragments, and copies everything between pairs of these lines to an output file. Even a beginning programmer can easily write a program to do this in a few dozen lines. For an XML-based notation such as DocBook, the job is even easier if you use XSL transforms. Here is the XSL that I use for DocBook; notice that it is written as a literate program itself. Both the markup and the processing are completely language-independent.

To extract the readable document from the source file, we may have a bit of processing to do, depending on the document notation that we are using, what we choose as the syntax for the markup lines that delimit the code fragments, and the appearance we want. We will want the code fragments to appear in some distinctive way in the document, like the short "Hello, world!" example does: probably set off from the text by vertical white space, perhaps indented or centered horizontally, and perhaps shown in a distinctive font such as the customary typewriter-like font. We might even use borders around the code fragments, or colored text or backgrounds.

If we chose, we could process the code fragments according to the syntax of the programming language, and render them as Knuth's WEB system does, or perhaps (for example) using the kind of "syntax coloring" done by many program editors. In our work thus far, we have not seen enough justification for such a thing to make it worth the trouble. Instead, we follow the "lightweight" path and leave the code alone, simply rendering it all in a monospaced, typewriter-like font. Thus, this part of the processing is completely language-independent too.

If our delimiter lines are not part of the normal syntax of the document notation, we can use another simple program to find these lines and replace them with markup commands (in the document notation) that produce the appearance that we want. For the DocBook notation, we did this for the "<code>" ... "</code>" tags, but when we use "<programlisting role='executable'>" ... "</programlisting>" we don't do anything, because DocBook already has a "programlisting" element that has a reasonable appearance in most rendering systems. With TeX or LaTeX, when we use "\beginCode ... \endCode", we can simply define these words as macros that produce the markup that we want.

Of course, other software exists to convert notations such as DocBook and LaTeX into various printable and viewable formats, and most word-processing programs can produce some of these formats directly. For our literate programs we usually use HTML, to allow them to be posted and viewed as web pages. Here are some examples.