Next / Previous / Contents / TCC Help System / NM Tech homepage

17. scanInitialLetters(): What initial letters occur in keywords?

homelist2
# - - -   s c a n I n i t i a l L e t t e r s

def scanInitialLetters(kwicIndex):
    '''Find the sequence of letters to be indexed.

      [ kwicIndex is a kwic.KwicIndex instance ->
          return a list of the unique initial letters of keywords in
          kwicIndex, in ascending order by code point, upshifted ]
    '''

We'll use a Python set to find the unique letters in all the keywords. The kwic.KwicIndex.genWords() method generates a sequence of kwic.KwicWord instances; in each of these instances, the .word attribute is the keyword in Unicode form.

homelist2
    #-- 1 --
    letterSet = set()

    #-- 2 --
    # [ letterSet  :=  union(letterSet, (initial letters of all
    #                  keywords in kwicIndex, uppercased)) ]
    for kwicWord in kwicIndex.genWords():
        letterSet.add(kwicWord.word[0].upper())

Python's built-in sorted() function produces a sorted list from any sequence, including sets.

homelist2
    #-- 3 --
    return sorted(letterSet)