Next / Previous / Contents / Shipman's homepage

6.28. UniChar.__init__()

unidata.py
# - - -   U n i C h a r . _ _ i n i t _ _

    def __init__(self, uniData, node):
        '''Constructor.
        '''
        #-- 1
        self.uniData = uniData
        self.id = node.attrib[ID_A]

The schema has a lot of items we don't use. Here are simplified versions of the relevant productions:

character = element character
{ attlist.character,
  entity*,
  description
}
attlist.character &=
  attribute id { xsd:ID },
  attribute dec { text }
description = element description { text }

Many of the entries in the unicode.xml file define multi-character sequences. Here is an example of such a complete entry:

      <character id="U00021-0003D" dec="33-61" image="none">
         <unicodedata/>
         <operator-dictionary priority="260" form="infix" lspace="4"
                              rspace="4"/>
         <description>MULTIPLE CHARACTER OPERATOR: !=</description>
      </character>

We want to ignore entries such as this. In the next line, the call to int() that converts the dec attribute raises a ValueError, which we propagate back to the caller.

unidata.py
        #-- 2
        # [ if the DEC_A attribute of node is a valid integer ->
        #     self.cp  :=  that value as type int
        #     self.fullName  :=  text of the DESCRIPTION_N child
        #   else -> raise ValueError ]
        self.cp = int(node.attrib[DEC_A])
        self.fullName = node.findtext(DESCRIPTION_N)

Each of the entity children is processed by the constructor for Section 6.30, “class UniEntity: One entity”.

unidata.py
        #-- 3
        # [ self._entMap  :=  as invariant, from ENTITY_N children
        #                     of node ]
        self._entMap = {}
        for entityNode in node.findall(ENTITY_N):
            uniEnt = UniEntity(self, entityNode)
            self._entMap[(uniEnt.id, uniEnt.setName)] = uniEnt