Next / Previous / Contents / Shipman's homepage

7.4. indexer: main()

indexer
# - - - - -   m a i n

def main():
    '''Generate three files that index the ISO 9573 entities.

      [ (unidata's data file is readable and valid) and
        (GROUP_9573 is an entity group in that file) ->
          file by-cp.dbk  :=  a DocBook tbody element containing
              an index of that entity group by code point
          file by-ent.dbk  :=  a DocBook tbody element containing
              an index of that entity group by entity name
          file by-full.dbk  :=  a DocBook tbody element containing
              an index of that entity group by entity name
          file OUT_MODULE_PY  :=  a Python module defining constants
              for each entity in that file ]
    '''

Each of these table bodies is invoked inside a table body with four columns that looks like this:

    <informaltable>
      <tgroup cols="4">
        <colspec align="left" colname='idCol'/>
        <colspec align="left" colname='entCol'/>
        <colspec align="center" colname='picCol'/>
        <colspec align="left" colname='fullCol'/>
        ... table body here ...
      </tgroup>
    </informaltable>

The columns, in order, contain the character ID (e.g., “000A0”), the entity name, the entity itself (assuming it renders), and the full name of the code point. The colname attributes are necessary because in the table by code point, spanned rows are inserted for each code block that contains at least one entity of interest.

The first order of business is to call the UniData constructor to read the unicode.xml file, and extract the UniGroup instance for the ISO-9573 entity group.

indexer
    #-- 1
    # [ if unidata module's data file is readable and valid ->
    #     uniData  :=  a unidata.UniData instance representing
    #                  that file
    #   else ->
    #     sys.stderr  +:=  error message
    #     stop execution ]
    try:
        uniData = unidata.UniData()
    except (IOError, OSError) as x:
        Log().fatal("Can't open the Unicode data file: "
                    "{0}".format(str(x)))

One requirement common to all three of the indices is that they must include only entities in the selected entity group. Hence, we build a Python set containing all the entity set names in that group; see Section 7.6, “indexer: findGroup().

indexer
    #-- 2
    # [ if GROUP_9573 is a group name in uniData ->
    #     sys.stdout  +:=  report on that group and its component
    #         entity sets from uniData
    #     entSetNames  :=  a set of the names of those component
    #                      entity sets
    #   else ->
    #     sys.stderr  +:=  error message
    #     stop execution ]
    entSetNames = findGroup(uniData)

For the logic that produces the three tables, see Section 7.7, “indexer: cpReport(), Section 7.10, “indexer: entNameReport, and Section 7.13, “indexer: cpNameReport().

indexer
    #-- 3
    # [ file by-cp.dbk  :=  a DocBook tbody element containing
    #       an index of that entity group by code point ]
    cpReport(uniData, entSetNames)

    #-- 4
    # [ file by-ent.dbk  :=  a DocBook tbody element containing
    #       an index of that entity group by entity name
    #   file OUT_MODULE_PY  :=  a Python module defining constants
    #       for each of those entities ]
    entNameReport(uniData, entSetNames)

    #-- 5    
    # [ file by-full.dbk  :=  a DocBook tbody element containing
    #       an index of that entity group by full name ]
    cpNameReport(uniData, entSetNames)