An instance of this class represents the complete KWIC index to all of the lines submitted to it for indexing. Here is the formal interface.
# - - - - - c l a s s K w i c I n d e x class KwicIndex(object): '''Represents a keyword index. Exports: Exports: KwicIndex(stopList=None): [ if stopList is a sequence including at least one str that is not valid UTF-8 -> raise UnicodeEncodeError else if stopList is a sequence of unicode or UTF-8 stop words -> return a new, empty KwicIndex using stopList as its stop word list else if file stop_words is readable -> return a new, empty KwicIndex using the keywords found in that file as its stop word list else -> return a new, empty KwicIndex with no stop words ] .index(s): [ s is a unicode or UTF-8 string -> self := self with all keywords in s added ] .genWords(prefix=''): [ prefix is a unicode or UTF-8 string -> generate all the unique keywords in self that start with prefix as a sequence of KwicWord instances, in ascending order by upshifted keyword ] State/Invariants: .__stopSet: [ a set containing words in the stop list as upshifted unicode ] .__skip: [ an instance of pyskip.SkipList representing the keyword occurrences in self as SkipWord instances, ordered according to SkipWord.__cmp__ ] '''
The constructor has two jobs: set up the empty skip list and set up the stop word list.
# - - - K w i c I n d e x . _ _ i n i t _ _ def __init__(self, stopList=None): '''Constructor. ''' #-- 1 -- # [ self.__skip := a new, empty pyskip.SkipList instance # that uses KwicWord.getKey as a key extractor ] self.__skip = pyskip.SkipList(keyFun=KwicWord.getKey)
Internally, the stop word list is a Python
members are uppercased Unicode. If the effective
argument value is
None, we'll try to read
the default stop file, but if it's not there, make the
set empty. See Section 9.2, “
KwicIndex.__makeStopSet(): Build the
internal stop list”.
#-- 2 -- # [ if (stopList is not None) and (stopList includes at # least one str that is not UTF-8) -> # raise UnicodeEncodeError # else if stopList is not None -> # self.__stopSet := a set made from the elements of # stopList, converted to Unicode and upshifted # else if file stop_words is readable -> # self.__stopSet := a set made from the keywords # found in that file, as upshifted Unicode # else -> # self.__stopSet := an empty set ] self.__makeStopSet(stopList)