Next / Previous / Contents / TCC Help System / NM Tech homepage

7.2. Strings: the str and unicode types

Python has two string types. Type str holds strings of zero or more 8-bit characters, while unicode strings provide full support of the expanded Unicode character set (see the Unicode homepage).

7.2.1. String constants

There are many forms for string constants:

  • '...': Enclose the string in single quotes.

  • "...": Enclose it in double quotes.

  • '''...''': Enclose it between three single quotes in a row. The difference is that you can continue such a string over multiple lines, and the line breaks will be included in the string as newline characters.

  • """...""": You can use three sets of double quotes. As with three sets of single quotes, line breaks are allowed and preserved as "\n" characters.

The above forms give you regular strings. To get a unicode string, prefix the string with u. For example:

u"klarn"

is a five-character Unicode string.

In addition, you can use any of these escape sequences inside a string constant:

\newlineA backslash at the end of a line is ignored.
\\Backslash (\)
\'Closing single quote (')
\"Double-quote character (")
\nNewline (ASCII LF or linefeed)
\bBackspace (in ASCII, the BS character)
\fFormfeed (ASCII FF)
\rCarriage return (ASCII CR)
\tHorizontal tab (ASCII HT)
\vVertical tab (ASCII VT)
\oooThe character with octal code ooo, e.g., '\177'.
\xhhThe character with hexadecimal value hh, e.g., "\xFF".
\uhhhhThe Unicode character with hexadecimal value hhhh, e.g., u"\uFFFF".

Raw strings: If you need to use a lot of backslashes inside a string constant, and doubling them is too confusing, you can prefix any string with the letter r to suppress the interpretation of the escape sequences above. For example, '\\\\' contains two backslashes, but r'\\\\' contains four. Raw strings are particularly useful with the regular expression module.

7.2.2. The string format operator

In addition to the operations common to all sequences, strings support the operator

f % v

Format values from a tuple v using a format string f; the result is a single string with all the values formatted. See the table of format codes below.

All format codes start with %; the other characters of f appear unchanged in the result. A conversational example:

>>> print "We have %d pallets of %s today." % (49, "kiwis")
We have 49 pallets of kiwis today.

In general, format codes have the form

%[p][m[.n]]c

where:

pis an optional prefix; see the table of format code prefixes below.
mspecifies the total desired field width. The result will never be shorter than this value, but may be longer if the value doesn't fit; so, "%5d" % 1234 yields " 1234", but "%2d" % 1234 yields "1234".
nspecifies the number of digits after the decimal point for float types.
cindicates the type of formatting.

Here are the format codes c:

%sString; e.g., "%-3s" % "xy" yields "xy " (because the "-" prefix forces left alignment).
%dDecimal conversion, e.g., "%3d" % -4 yields the string " -4".
%eExponential format; allow four characters for the exponent. Examples: "%08.1e" % 1.9783 yields "0002.0e+00".
%ESame as %e, but an uppercase E is used for the exponent.
%fFor float type. E.g., "%4.1f" % 1.9783 yields " 2.0".
%gGeneral numeric format. Use %f if it fits, otherwise use %e.
%GSame as %G, but an uppercase E is used for the exponent if there is one.
%oOctal, e.g., "%o" % 13 yields "15".
%xHexadecimal, e.g., "%x" % 247 yields "f7".
%XSame as %x, but capital letters are used for the digits A-F, e.g., "%04X" % 247 yields "00F7".
%cConvert an integer to the corresponding ASCII code; for example, "%c" % 0x61 yields the string "a".
%%Places a percent sign (%) in the result. Does not require a corresponding value.

Format prefixes include:

+For numeric types, forces the sign to appear even for positive values.
-Left-justifies the value in the field.
0For numeric types, use zero fill. For example, "%04d" % 2 produces the value "0002".
#With the %o (octal) format, append a leading "0"; with the %x (hexadecimal) format, append a leading "0x"; with the %g (general numeric) format, append all trailing zeroes. Examples:
>>> "%4o" % 127
' 177'
>>> "%#4o" % 127
'0177'
>>> "%x" % 127
'7f'
>>> "%#x" % 127
'0x7f'
>>> "%10.5g" % 0.5
'       0.5'
>>> "%#10.5g" % 0.5
'   0.50000'

7.2.3. String formatting from a dictionary

You can use the string format operator % to format a set of values from a dictionary D:

f % D

In this form, the general form for a format code is:

%(k)[p][m[.n]]c

where k is a key in dictionary D, and the rest of the format code is as in the usual string format operator. For each format code, the value of D[k] is used.

For example, suppose D is the dictionary {'baz':39, 'foo':'X'}; then ("=%(foo)s=%(baz)03d=" % D) yields '=X=039='.

7.2.4. String functions

Functions:

str(obj)

Converts obj, an object of any type, to a string. For example, str(17) produces the string '17'.

unicode(s[,enc[,errs]])

Converts an object s, of any type, to a Unicode string. The optional enc argument specifies an encoding, and the optional errs argument specifies what to do in case of errors (see the Python Library Reference for details).

raw_input(p)

Prompt for input with string p, then return a line entered by the user, without the newline. p may be omitted for unprompted input.

7.2.5. String methods

These methods are available on any string or Unicode object S:

S.capitalize()

Return S with its first character capitalized.

S.center(w)

Return S centered in a string of width w, padded with spaces. If w<=len(S), the result is a copy of S. Example: 'x'.center(4) returns ' x  '.

S.count(t[,start[,end]])

Return the number of times string t occurs in S. To search only a slice S[start:end] of S, supply start and end arguments.

S.endswith(t[,start[,end]])

Predicate to test whether S ends with string t. If you supply the optional start and end arguments, it tests whether the slice S[start:end] ends with t.

S.expandtabs([tabsize])

Returns a copy of S with all tabs expanded to spaces using. The optional tabsize argument specifies the number of spaces between tab stops; the default is 8.

S.find(t[,start[,end]])

If string t is not found in S, return -1; otherwise return the index of the first position in S that matches t. For example, "banana".find("an") returns 1. The optional start and end arguments restrict the search to slice S[start:end].

S.index(t[,start[,end]])

Works like .find(), but if t is not found, it raises a ValueError exception.

S.isalnum()

Predicate that tests whether S is nonempty and all its characters are alphanumeric.

S.isalpha()

Predicate that tests whether S is nonempty and all its characters are letters.

S.isdigit()

Predicate that tests whether S is nonempty and all its characters are digits.

S.islower()

Predicate that tests whether S is nonempty and all its characters are lowercase letters.

S.isspace()

Predicate that tests whether S is nonempty and all its characters are whitespace characters.

In Python, the characters considered whitespace include ' ' (space, called SP in ASCII), '\n' (newline, NL), '\r' (return, CR), '\t' (tab, HT), '\f' (form feed, FF), and '\v' (vertical tab, VT).

S.isupper()

Predicate that tests whether S is nonempty and all its characters are uppercase letters.

S.join(L)

L must be a sequence. Returns a string containing the members of the sequence with copies of string S inserted between them. For example, '/'.join(['foo', 'bar', 'baz']) returns the string 'foo/bar/baz'.

S.ljust(w)

Return a copy of S left-justified in a field of width w, padded with spaces. If w<=len(S), the result is a copy of S. Example: "Ni".ljust(4) returns "Ni   ".

S.lower()

Returns a copy of S with all uppercase letters replaced by their lowercase equivalent.

S.lstrip([c])

Return S with all leading characters from string c removed. The default value for c is a string containing all the whitespace characters.

S.replace(old,new[,max])

Return a copy of S with all occurrences of string old replaced by string new. Normally, all occurrences are replaced; if you want to limit the number of replacements, pass that limit as the max argument.

S.rfind(t[,start[,end]])

Like .find(), but if t occurs in S, this method returns the highest starting index.

For example, "banana".rfind("an") returns 3.

S.rjust(w)

Return a copy of S right-justified in a field of width w, padded with spaces. If w<=len(S), the result is a copy of S.

S.rstrip([c])

Return S with all trailing characters from string c removed. The default value for c is a string containing all the whitespace characters.

S.split([d[,max]])

Returns a list of strings [s0, s1, ...] made by splitting S into pieces wherever the delimiter string d is found. The default is to split up S into pieces wherever clumps of one or more whitespace characters are found. Some examples:

>>> "I'd annex \t \r the Sudetenland" .split()
["I'd", 'annex', 'the', 'Sudetenland']
>>> '3/crunchy frog/ Bath & Wells'.split('/')
['3', 'crunchy frog', ' Bath & Wells']
>>> '//Norwegian Blue/'.split('/')
['', '', 'Norwegian Blue', '']
>>> 'never<*>pay<*>plan<*>'.split('<*>')
['never', 'pay', 'plan', '']

The optional max argument limits the number of pieces removed from the front of S. For example, 'a/b/c/d/e'.split('/',2) yields the list ['a', 'b', 'c/d/e'].

S.splitlines([keepends])

Splits S into lines and returns a list of the lines as strings. Discards the line separators unless the optional keepends arguments is true.

S.startswith(t[,start[,end]])

Predicate to test whether S starts with string t. Otherwise similar to .endswith().

S.strip([c])

Return S with all leading and trailing characters from string c removed. The default value for c is a string containing all the whitespace characters.

S.swapcase()

Return a copy of S with each lowercase character replaced by its uppercase equivalent, and vice versa.

S.translate(new[,drop])

This function is used to translate or remove each character of S. The new argument is a string of exactly 256 characters, and each character x of the result is replaced by new[ord(x)]. If you would like certain characters removed from S before the translation, provide a string of those characters as the drop argument.

S.upper()

Return a copy of S with all lowercase characters replaced by their uppercase equivalents.

S.zfill(w)

Return a copy of S left-filled with '0' characters to width w. For example, '12'.zfill(5) returns '00012'.