Next / Previous / Contents / Shipman's homepage

13. The bytes type

To understand why Python version 2.6 and beyond have a bytes type, it is necessary to review a little history.

Most early computing used 7- and 8-bit character codes, but these character sets are very limited. In particular, life was difficult for Francophone countries when “è” and “é” are very different letters. The 32-bit character set of the Unicode standard is the current preferred practice, and provides enough characters to last a good while into the future.

Text handling in the Python 2.x releases was awkward due to the presence of two different types for representing character data: str and unicode. Consequently, in the upcoming major incompatible 3.x releases, all character data will be represented internally by 32-bit characters.

Therefore, in Python 2.6 the bytes type was added to aid transition to the 3.0 family, which has a separate bytes type for 8-bit character strings. In the 3.x versions, a bytes value is a sequence of zero or more unsigned 8-bit integers, each in the range 0–255, inclusive.

In Python 2.6 and subsequent versions, the bytes type is a synonym for str. The bytes() function works exactly like the str() function.

>>> s=bytes(987)
>>> s
>>> type(s)
<type 'str'>

Use this type where your program expects 8-bit characters, and it will ease the transition to Python 3.x, because the semi-automated translation process will know that values of bytes type are intended for sequences of 8-bit characters.