The purpose of an index is to help a reader find some word or phrase. A properly constructed book or Web site should have an index that will help the reader find material relevant to a large number of different words or phrases.
However, building a proper index for a book is a tedious process that is best performed by a trained indexer who understands the subject matter. The technique of KWIC indexing arose in the 1960s as an attempt to automate indexing. The basic idea is to identify keywords and present them, in alphabetical order, surrounded by their context.
Here's an example. Suppose you want to build an index of words that appear in a list of film titles. For the film title “Driving Miss Daisy”, there will be three index entries, once for each word; we'll call this the classical style of indexing.
Daisy, Driving Miss Driving Miss Daisy Miss Daisy, Driving
In the original sense, a KWIC index divides the page vertically in two, with the keywords running along the right side of the dividing line in alphabetical order, and the context shown around the keyword, like this:
|Driving Miss Daisy|
This is called the permuted style because the title is cyclically rotated through the position of each keyword. The kwic.py module can be used to build either the permuted style or the classical style.
A contiguous string consisting of one keyword start character followed by zero or more keyword characters.
is true, or the
Any keyword start character, digit, or hyphen (
A common word that is not considered significant, such as “a”, “and”, or “the”.
A list of stop words.
The part of the context that precedes a keyword.
The context that comes after a keyword.