Next / Previous / Contents / Shipman's homepage


Describes a standard set of named Unicode entities and the generation and use of a file defining these entities.

This publication is available in Web form and also as a PDF document. Please forward any comments to

Table of Contents

1. ISO 9573-2003 gives you lots of special characters
2. Files for downloading
3. How to use the isoents.ent file
4. Indices to the entities
4.1. Index by code point
4.2. Index by entity name
4.3. Index by code point full name
5. entmarch: Generating the isoents.ent file
5.1. entmarch: Prologue
5.2. entmarch: Imports
5.3. entmarch: Manifest constants
5.4. entmarch: main()
5.5. entmarch: extractSet()
5.6. entmarch: extractData()
5.7. entmarch: fatal()
5.8. entmarch: Epilogue
6. The unidata module: Interface to the unicode.xml file
6.1. Prologue
6.2. unidata: Imports
6.3. unidata: Manifest constants
6.4. class UniData: The entire file
6.5. Unidata.genGroups(): Generate the groups
6.6. UniData.findGroup(): Look up an entity group name
6.7. UniData.genBlocks(): Generate the code blocks
6.8. UniData.findBlock(): Look up a code block by name
6.9. UniData.genChars(): Generate all characters
6.10. UniData.findChar(): Look up a character by its code point
6.11. UniData.__init__()
6.12. UniData._readGroups(): Process entity group data
6.13. UniData._readBlocks(): Extract code block data
6.14. UniData._readChars(): Process character data
6.15. UniData._readOneChar(): Extract the data for one code point
6.16. class UniBlock: One Unicode code block
6.17. UniBlock.__init__()
6.18. UniBlock.__str__()
6.19. class UniGroup: A group of entity sets
6.20. UniGroup.genEntSets(): Generate contained entity sets
6.21. UniGroup.findEntSet()
6.22. UniGroup.__init__()
6.23. UniGroup.__str__()
6.24. class UniEntSet: One entity set
6.25. UniEntSet.__init__()
6.26. class UniChar: One code point
6.27. UniChar.genEnts()
6.28. UniChar.__init__()
6.29. UniChar.__str__()
6.30. class UniEntity: One entity
6.31. UniEntity.__init__()
6.32. UniEntity.__str__()
7. indexer: Generate indices
7.1. indexer: Prologue
7.2. indexer: Imports
7.3. indexer: Manifest constants
7.4. indexer: main()
7.5. indexer: fatal()
7.6. indexer: findGroup()
7.7. indexer: cpReport()
7.8. indexer: blockReport
7.9. indexer: outRow()
7.10. indexer: entNameReport
7.11. indexer: startModule()
7.12. indexer: moduleWrite()
7.13. indexer: cpNameReport()
7.14. indexer: Epilogue

1. ISO 9573-2003 gives you lots of special characters

The character set defined by the Unicode Consortium is an international standard intended to greatly expand the character set available for publications. The Unicode standard defines a set of numeric code points, each of which is associated with a particular character.

This publication is a companion to Writing documentation with DocBook-XML 4.3, which describes a system for publishing technical documentation that allows a single input format to be rendered into both HTML and PDF forms. In DocBook, you may use a character entity to specify any Unicode character. There are two kinds of character entities: numeric and symbolic. All entities start with “&” and end with “;”.

You can always get any Unicode character by using the numerical form. There are two formats:

  • To specify a character by its hexadecimal value, use the form “&#xN;” where N is the hexadecimal. For example, “ ” is the code for a non-breaking space.

  • To specify a character by its decimal value, use the form “&N;”, where N is decimal. For example, “ ” is a different way to specify a non-breaking space.

Among the many publications of the International Standards Organization (ISO), ISO/IEC TR 9573-13 describes names for selected Unicode characters so they can be referred by symbolic names. For example, the entity   is a non-breaking space.

However, in order to use these symbolic names, you must include their definitions in your DocBook source file.

The present document includes such a file that defines all the entities in the ISO 9573-2003 group, a version of the ISO/IEC TR 9573-13 standard.