Next / Previous / Contents / Shipman's homepage

6.26. class UniChar: One code point

The bulk of the unicode.xml file describes the known code points. An instance of this class is also a container for instances of Section 6.30, “class UniEntity: One entity”.

Here is the class interface. The constructor operates directly on the character node from the unicode.xml file.

unidata.py
# - - - - -   c l a s s   U n i C h a r

class UniChar(object):
    '''Represents one code point.

      UniChar(uniData, node):
        [ (uniData is the containing UniData instance) and
          (node is a CHARACTER_N node) ->
            if node defines a single character ->
              return a new UniChar instance representing that node
            else -> raise ValueError ]
      .uniData:    [ as passed to constructor ]
      .id:         [ identifier, e.g., "U000C1" ]
      .cp:         [ code point as an int ]
      .fullName:   [ e.g., "LATIN CAPITAL LETTER A ACUTE" ]
      .genEnts():
        [ generate the entities in self as UniEntity instances in
          ascending order by (entity name, entity set name) ]

Because a character may have multiple entity children for different entity sets, the dictionary that holds the child UniEntity instances uses a key that is a 2-tuple. The first element of each key is the entity's name, and the second is the name of the corresponding entity set. This key structure allows the .genEnts() method to produce the child entities in ascending order by entity name, with the entity set name as a secondary key.

unidata.py
      State/Invariants:
        ._entMap:
          [ a dictionary whose keys are (entity ID, entity setName)
            and each related entry is a UniEntity ]
    '''