With the advent of the Web as medium for worldwide information interchange, the Unicode character set has become vital. For general background on this character set, see the Unicode homepage.
To get a Unicode string, prefix the string with
u. For example:
u'klarn'
is a five-character Unicode string.
To include one of the special Unicode characters in a string constant, use these escape sequences:
\x
|
For a code with the 8-bit hexadecimal value
.
|
\u
|
For a code with the 16-bit hexadecimal value
.
|
\U
|
For a code with the 32-bit hexadecimal value
.
|
Examples:
>>> u'Klarn.' u'Klarn.' >>> u'Non-breaking-\xa0-space.' u'Non-breaking-\xa0-space.' >>> u'Less-than-or-equal symbol: \u2264' u'Less-than-or-equal symbol: \u2264' >>> u"Phoenician letter 'wau': \U00010905" u"Phoenician letter 'wau': \U00010905" >>> len(u'\U00010905') 1
All the operators and methods of str type are available with unicode values.
Additionally, for a Unicode value , use this method to encode
its value as a string of type Ustr:
U.encode (
encoding )
Return the value of as type Ustr. The argument
is a string that specifies the encoding method. In most
cases, this will be encoding'utf_8'. For
discussion and examples, see Section 7.3.1, “The UTF-8 encoding”.