Next / Previous / Contents / TCC Help System / NM Tech homepage

7.3. Type unicode: Strings of 32-bit characters

With the advent of the Web as medium for worldwide information interchange, the Unicode character set has become vital. For general background on this character set, see the Unicode homepage.

To get a Unicode string, prefix the string with u. For example:

u'klarn'

is a five-character Unicode string.

To include one of the special Unicode characters in a string constant, use these escape sequences:

\xHH For a code with the 8-bit hexadecimal value HH.
\uHHHH For a code with the 16-bit hexadecimal value HHHH.
\UHHHHHHHH For a code with the 32-bit hexadecimal value HHHHHHHH.

Examples:

>>> u'Klarn.'
u'Klarn.'
>>> u'Non-breaking-\xa0-space.'
u'Non-breaking-\xa0-space.'
>>> u'Less-than-or-equal symbol: \u2264'
u'Less-than-or-equal symbol: \u2264'
>>> u"Phoenician letter 'wau': \U00010905"
u"Phoenician letter 'wau': \U00010905"
>>> len(u'\U00010905')
1

All the operators and methods of str type are available with unicode values.

Additionally, for a Unicode value U, use this method to encode its value as a string of type str:

U.encode ( encoding )

Return the value of U as type str. The encoding argument is a string that specifies the encoding method. In most cases, this will be 'utf_8'. For discussion and examples, see Section 7.3.1, “The UTF-8 encoding”.