Next / Previous / Contents / Shipman's homepage

6.4. class UniData: The entire file

This class represents the entire file. Here is the interface.

unidata.py
# - - - - -   c l a s s   U n i D a t a 

class UniData(object):
    '''Represents the entire unicode.xml file.

      Exports:
        UniData():
          [ IN_FILE is a readable file valid against charlist.rnc ->
              return a new UnicodeData instance representing that file ]
        .genGroups():
          [ generate the entity groups in self as a sequence of
            UniGroup instances ]
        .findGroup(groupName)
          [ groupName is a str ->
              if self has an entity group named (groupName) ->
                return that group as a UniGroup instance
              else -> raise KeyError ]
        .genBlocks(): [ generate the UniBlock instances in self ]
        .findBlock(blockName):
          [ if blockName matches a block name in self ->
              return the corresponding UniBlock
            else -> raise KeyError ]
        .genChars():
          [ generate the characters in self as a sequence of UniChar
            instances, in ascending order by code point ]
        .findChar(cp):
          [ cp is a code point as an int ->
              if self has a description for the character with that
              code point ->
                return a Char instance with that description
              else -> raise KeyError ]

Here are the class invariants. These are containers for the groups, blocks, and characters.

unidata.py
      State/Invariants:
        ._groupMap:
          [ a dictionary whose keys are the names of entity groups,
            and each related entry is a UniGroup instance representing
            that group ]
        ._blockMap:
          [ a dictionary whose keys are the names of Unicode blocks,
            and each related value is a UniBlock instance defining
            that block ]
        ._cpMap:
          [ a dictionary whose keys are code points as ints, and each
            related value is a Char instance representing that code ]
    '''