Next / Previous / Contents / TCC Help System / NM Tech homepage

Abstract

KWIC (Key Word In Context) is a venerable method for indexing text. This publication describes a Python-language module to assist in the generation of KWIC indexes.

This publication is available in Web form and also as a PDF document. Please forward any comments to tcc-doc@nmt.edu.

Table of Contents

1. Introduction
2. Files online
3. Theory of KWIC indexing
4. Using the kwic.py module
4.1. Using the KwicIndex class
4.2. Using the KwicWord class
4.3. Using the KwicRef class
5. The kwic module: prologue
6. Imported modules
7. Manifest constants
7.1. STOP_FILE_NAME
8. Specification functions
8.1. ref-key
9. class KwicIndex: The entire index
9.1. KwicIndex.__init__(): Constructor
9.2. KwicIndex.__makeStopSet(): Build the internal stop list
9.3. KwicIndex.__makeUni(): Force Unicode representation
9.4. KwicIndex.__findKeywords(): Find all the keywords in a line
9.5. KwicIndex.__isStart(): Test for a keyword start character
9.6. KwicIndex.__isWord(): Test for a keyword character
9.7. KwicIndex.index(): Index a line of text
9.8. KwicIndex.__addRef(): Add one reference
9.9. KwicIndex.genWords(): Generate the index entries
10. class KwicWord: All references to one keyword
10.1. KwicWord.__init__(): Constructor
10.2. KwicWord.add(): Add one reference
10.3. KwicWord.getKey(): Fabricate the sort key
10.4. KwicWord.genRefs(): Disgorge the references
11. class KwicRef: Record of one reference to one keyword
11.1. KwicRef.__init__(): Constructor
11.2. KwicRef.__cmp__(): Comparator
11.3. KwicRef.__str__()
12. kwictest: A small test driver
13. The default stop_words file