The purpose of this method is to find all the character groups in a line that have the pattern of keywords: they start with a keyword start character followed by zero or more keyword characters.
# - - - K w i c I n d e x . _ _ f i n d K e y w o r d s def __findKeywords(self, s): '''Find all the keywords in the given string. [ s is a string -> generate (start,end) tuples bracketing the keywords in s such that each keyword is found in s[start:end] ] '''
We will use a simple state machine to process the line. The
start will be initially set to
None, and will mark the starting position of each
keyword. We walk through the line, examining each
If this is the transition between a non-keyword and a
keyword (at a keyword start character), set
start to the current position.
If this is the transition between a keyword and a
non-keyword, generate the tuple bracketing the keyword,
start back to
#-- 1 -- start = None #-- 2 -- # [ if s ends with a keyword -> # start := starting position of that keyword # generate (start, end) tuples bracketing any keywords # that don't end (s) # else -> # generate (start, end) tuples bracketing any keywords # that don't end (s) ] for i in range(len(s)): #-- 2 body -- # [ if (start is None) and # (s[i] is a start character) -> # start := i # else if (start is not None) and # (s[i] is not a word character) -> # yield (start, i) # start := None # else -> I ] if start is None: if self.__isStart(s[i]): start = i else: pass elif not self.__isWord(s[i]): yield (start, i) start = None
After inspecting all the characters, if
None, the line ended with a keyword;
generate the bracketing tuple.
#-- 3 -- if start is not None: yield (start, len(s)) #-- 4 -- raise StopIteration