Next / Previous / Contents / Shipman's homepage

9.7. KwicIndex.index(): Index a line of text

The logic that converts a line to Unicode (or raises UnicodeEncodeError if the line is a non-UTF-8 str value) is in Section 9.4, “KwicIndex.__findKeywords(): Find all the keywords in a line”.

kwic.py
# - - -   K w i c I n d e x . i n d e x

    def index(self, line, userData=None):
        '''Add all the keyword references in line.
        '''
        #-- 1 --
        # [ if line is unicode ->
        #     s  :=  line
        #   else if line is a valid UTF-8 str ->
        #     s  :=  line converted to unicode using UTF-8
        #   else -> raise UnicodeEncodeError ]
        s = self.__makeUni(line)

For the logic that checks the word against the stop list and adds it, see Section 9.8, “KwicIndex.__addRef(): Add one reference”.

kwic.py
        #-- 2 --
        # [ if s is unicode or a valid UTF-8 str ->
        #     self.__skip  :=  self.__skip + (KwicWord instances
        #         representing all the keyword occurrences in s
        #         not in self.__stopSet)
        #   else -> raise UnicodeEncodeError ]
        for (start, end) in self.__findKeywords(s):
            #-- 2 body --
            # [ let
            #      word == s[start:end]
            #      prefix == s[:start].strip()
            #      suffix == s[end:].strip()
            #   in ->
            #     word is a unicode string ->
            #       if word is in self.__stopSet ->
            #         I
            #       else if self.__skip has a KwicWord instance for
            #       word ->
            #         that instance  +:=  a KwicRef instance
            #             with prefix=(prefix), word=(word),
            #             suffix=(suffix), and userData=(userData)
            #       else ->
            #         self.__skip  :=  a new KwicWord instance 
            #             containing a new KwicRef instance
            #             with prefix=(prefix), word=(word),
            #             suffix=(suffix), and userData=(userData) ]
            self.__addRef(s, start, end, userData)