Next / Previous / Contents / TCC Help System / NM Tech homepage

13.1. Using the bytes type in 3.x conversion

Versions 2.6+ support a new notation: to create a literal of type bytes, place a “b” just before the opening quote.

>>> s = b'abc'
>>> s
'abc'
>>> type(s)
<type 'str'>

Such literals are exactly like regular string literals. The difference comes when you convert your program to the 3.x versions. In Python 3.x, a string of the form b'...' will have type bytes, which will be different than the str (32-bit character) type in 3.x.

One step in converting your 2.x programs to 3.x is to add this import before all the other imports in your program:

from __future__ import unicode_literals
        

In programs that start with this declaration, all string literals will automatically be considered unicode type without using the u'...' prefix. This means you may also include escape sequences of the form '\uXXXX', each of which designates a 16-bit Unicode code point as four hexadecimal digits XXXX.

Here is a demonstration of the difference. Before the import, the \u escape is not recognized, and the value has type str. Afterwards, the return value is type unicode

>>> s = '\u2672'
>>> len(s)
6
>>> s
'\\u2672'
>>> type(s)
<type 'str'>
>>> from __future__ import unicode_literals
>>> t = '\u2672'
>>> len(t)
1
>>> type(t)
<type 'unicode'>
>>> t
u'\u2672'