Next / Previous / Contents / Shipman's homepage

7.8. indexer: blockReport

Here's an example of the output we are generating. The first row starts a new block, and the second row shows the format of a character within the block.

  <row>
    <entry align="left" namest="idCol" nameend="fullCol">
      <emphasis role="strong">
        Block [U00000, U0007F]: C0 Controls and Basic Latin
      </emphasis>
    </entry>
  </row>
  <row>
    <entry><code>&#x0021;<code></entry>
    <entry><code>&excl;<code></entry>
    <entry>!</entry>
    <entry>EXCLAMATION MARK</entry>
  </row>
indexer
# - - -   b l o c k R e p o r t

def blockReport(tbody, uniData, entSetNames, uniBlock):
    '''Report on one block of code points.

      [ (tbody is an et.Element) and
        (uniData is a unidata.UniData instance) and
        (uniBlock is a unidata.UniBlock instance within uniData) ->
              tbody  +:=  rows displaying entities for characters
                  in uniBlock whose set names are in entSetNames ]
    '''

We use a dictionary to hold the extracted UniEntity instances. Because a given code point may have multiple entities in the same entity set, the key in this dictionary will be a 2-tuple containing the code point (as an integer) and the entity name. Sorting on this key will then group together all the entities for a given code point.

indexer
    #-- 1
    # [ cpMap  :=  a dictionary whose keys are tuples (cp, eName)
    #       where cp is all the code points of characters in uniData
    #       that are in uniBlock's range and eName is all the
    #       entities in entSetNames, and each related value is 
    #       the corresponding uniEntity instance ]
    cpMap = {}
    for uniChar in uniData.genChars():
        if uniBlock.start <= uniChar.cp <= uniBlock.end:
            for uniEnt in uniChar.genEnts():
                if uniEnt.setName in entSetNames:
                    cpMap[(uniChar.cp, uniEnt)] = uniEnt

If this block doesn't have any entities in our group, we're done; otherwise generate the spanned row that starts the block.

indexer
    #-- 2
    # [ if cpMap is empty ->
    #     return
    #   else ->
    #     tbody  +:=  spanned row displaying uniBlock ]
    if len(cpMap) == 0:
        return
    text = ("Block [U{b.start:05X}-U{b.end:05X}]: {b.name}".format(
           b=uniBlock))
    tbody.append(
        E.row(
            E.entry(SPANNED_ROW,
                E.emphasis(text, role='strong'))))

Next, add the rows for each entity in cpMap. The logic that formats each row is in Section 7.9, “indexer: outRow().

indexer
    #-- 3
    # [ tbody  +:=  rows displaying elements of cpMap in key order ]
    for cp, uniEntity in sorted(cpMap):
        outRow(tbody, uniEntity)