Abstract
Describes a standard set of named Unicode entities and the generation and use of a file defining these entities.
This publication is available in Web form and also as a PDF document.
Please forward any comments to tcc-doc@nmt.edu.
Table of Contents
iso9573.ent fileentmarch: Generating the iso9573.ent fileunidata module: Interface to the unicode.xml
fileunidata.py: Prologueunidata: Importsunidata: Manifest constantsclass UniData: The entire fileUnidata.genGroups(): Generate the
groupsUniData.findGroup(): Look up an entity
group nameUniData.genBlocks(): Generate the code
blocksUniData.findBlock(): Look up a code block
by nameUniData.genChars(): Generate all
charactersUniData.findChar(): Look up a character by
its code pointUniData.__init__()UniData._readGroups(): Process entity group
dataUniData._readBlocks(): Extract code block
dataUniData._readChars(): Process character
dataUniData._readOneChar(): Extract the data
for one code pointclass UniBlock: One Unicode code blockUniBlock.__init__()UniBlock.__str__()class UniGroup: A group of entity
setsUniGroup.genEntSets(): Generate contained
entity setsUniGroup.findEntSet()UniGroup.__init__()UniGroup.__str__()class UniEntSet: One entity setUniEntSet.__init__()class UniChar: One code pointUniChar.genEnts()UniChar.__init__()UniChar.__str__()class UniEntity: One entityUniEntity.__init__()UniEntity.__str__()indexer: Generate indicesThe character set defined by the Unicode Consortium is an international standard intended to greatly expand the character set available for publications. The Unicode standard defines a set of numeric code points, each of which is associated with a particular character.
This publication is a companion to Writing documentation with DocBook-XML 4.3, which
describes a system for publishing technical documentation
that allows a single input format to be rendered into both
HTML and PDF forms. In DocBook, you may use a character entity to specify any Unicode
character. There are two kinds of character entities:
numeric and symbolic. All entities start with
“&” and end with
“;”.
You can always get any Unicode character by using the numerical form. There are two formats:
To specify a character by its hexadecimal value, use
the form “&#x” where N; is the
hexadecimal. For example, “N ” is the code for a
non-breaking space.
To specify a character by its decimal value, use the
form “&”, where N; is decimal. For example,
“N ” is a
different way to specify a non-breaking space.
Among the many publications of the International Standards
Organization (ISO), ISO/IEC TR
9573-13 describes names for selected Unicode
characters so they can be referred by symbolic names. For
example, the entity is a
non-breaking space.
However, in order to use these symbolic names, you must include their definitions in your DocBook source file.
The present document includes such a file that defines all the entities in the ISO 9573-2003 group, a version of the ISO/IEC TR 9573-13 standard.
Section 2, “Files for downloading” provides links to files
including the iso9573.ent file that you may include
in your DocBook document.
Section 3, “How to use the iso9573.ent file” explains how to use that file.
Section 5, “entmarch: Generating the iso9573.ent file” is a Python script that
generates iso9573.ent.
Section 6, “The unidata module: Interface to the unicode.xml
file” is a Python module that
interfaces to the unicode.xml file that defines all the
entities.