Python has extensive features for handling strings of characters. There are two types:
str value is a string of zero or more
8-bit characters. The common characters you see on
North American keyboards all use 8-bit characters. The
official name for this character set is ASCII, for American Standard Code for Information
This character set has one surprising property: all
capital letters are considered less than all lowercase
letters, so the string
"Z" sorts before
unicode value is a string of zero or
more 32-bit Unicode characters. The Unicode character
set covers just about every written language and every
special character ever invented.
We'll mainly talk about working with
values, but most
unicode operations are
similar or identical.
In Python, you can enclose string constants in either
'...') or double-quote
>>> cloneName = 'Clem' >>> cloneName 'Clem' >>> print cloneName Clem >>> fairName = "Future Fair" >>> print fairName Future Fair >>> fairName 'Future Fair'
When you display a string value in conversational mode,
Python will usually use single-quote characters.
Internally, the values are the same regardless of which
kind of quotes you use. Note also that the
To convert an integer (
int type) value
string equivalent, use the function “
>>> str(-497) '-497' >>> str(000) '0'
The inverse operation, converting a string
back into an
integer, is written as “
>>> >>> int("-497") -497 >>> int("-0") 0 >>> int ( "012this ain't no number" ) Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: invalid literal for int(): 012this ain't no number
The last example above shows what happens when you try to convert a string that isn't a valid number.
To convert a string
number in base
, use the form “
>>> int ( '0F', 16 ) 15 >>> int ( "10101", 2 ) 21 >>> int ( "0177776", 8 ) 65534
To obtain the 8-bit integer code contained in a
, use the function “
The inverse function, to convert an integer
to the character
that has code
, use “
chr(”. The numeric values of each character
are defined by the ASCIIcharacter set.
>>> chr( 97 ) 'a' >>> ord("a") 97 >>> chr(65) 'A' >>> ord('A') 65
In addition to the printable characters with codes in the
range from 32 to 127 inclusive, a Python string can
contain any of the other unprintable, special characters
as well. For example, the null
character, whose official name is
NUL, is the character whose code is zero.
One way to write such a character is to use this form:
the character's code in hexadecimal (base 16) notation.
>>> chr(0) '\x00' >>> ord('\x00') 0
Another special character you may need to deal with is
the newline character, whose
official name is
LF (for “line
feed”). Use the special escape
to produced this character.
>>> s = "Two-line\nstring." >>> s 'Two-line\nstring.' >>> print s Two-line string.
As you can see, when a newline character is displayed in
conversational mode, it appears as “
\n”, but when you print it, the character
that follows it will appear on the next line. The code
for this character is 10:
>>> ord('\n') 10 >>> chr(10) '\n'
Python has several other of these escape sequences. The
term “escape sequence” refers to a
convention where a special character, the “escape
character”, changes the meaning of the characters
after it. Python's escape character is backslash (
There is another handy way to get a string that contains newline characters: enclose the string within three pairs of quotes, either single or double quotes.
>>> multi = """This string ... contains three ... lines.""" >>> multi 'This string\n contains three\n lines.' >>> print multi This string contains three lines. >>> s2 = ''' ... xyz ... ''' >>> s2 '\nxyz\n' >>> print s2 xyz >>>
Notice that in Python's conversational mode, when you
press Enter at the end of a line, and
Python knows that your line is not finished, it displays
...” prompt instead of
the usual “