The re module provides functions for matching strings against regular expressions. See the O'Reilly book Mastering Regular Expressions by Friedl and Oram for the whys and hows of regular expressions. We discuss only the commonest functions here. Refer to the Python Library Reference for the full feature set.
Note: The raw string notation
r'...' is most useful
for regular expressions; see
raw strings,
above.
These characters have special meanings in regular expressions:
. | Matches any character except a newline. |
^ | Matches the start of the string. |
$ | Matches the end of the string. |
| Matches zero or more repetitions of regular
expression . |
| Matches one or more repetitions of
. |
| Matches zero or one
. |
| Non-greedy form of ;
matches as few characters as possible. The normal
* operator is greedy:
it matches as much text as possible. |
| Non-greedy form of
. |
| Non-greedy form of . |
| Matches from to
repetitions of . For example,
r'x{3,5}' matches
between three and five copies of letter
'x';
r'0{4}' matches the string
'0000'. |
| Non-greedy version of the previous form. |
[...] | Matches one character from a set of characters.
You can put all the allowable characters inside the
brackets, or use to mean all characters from
to
inclusive.
For example, regular expression
r'[abc]' will match either
'a',
'b', or
'c'. Pattern
r'[0-9a-zA-Z]'
will match any single letter or digit. |
[^...] | Matches any character not in the given set. |
| Matches expression
followed by expression
. |
| Matches either
or
. |
( | Matches and forms it into
a group that can be retrieved separately after a match;
see MatchObject, below.
Groups are numbered starting from 1. |
(?: | Matches but does not form a group for
later retrieval. |
(?P< | Matches and forms it into a named
group, with name , for later retrieval. |
These special sequences are recognized:
\ | Matches the same text as a group that matched
earlier, where is the number of
that group. For example,
r'([a-zA-Z]+):\1'
matches the string
"foo:foo". |
\A | Matches only at the start of the string. |
\b | Matches the empty string but only at the start or
end of a word (where a word is set off by whitespace
or a non-alphanumeric character). For example,
r'foo\b' would match
"foo" but not
"foot". |
\B | Matches the empty string when not at the start or end of a word. |
\d | Matches any digit. |
\D | Matches any non-digit. |
\s | Matches any whitespace character. |
\S | Matches any non-whitespace character. |
\w | Matches any alphanumeric character. |
\W | Matches any non-alphanumeric character. |
\Z | Matches only at the end of the string. |
\\ | Matches a backslash (\)
character. |
There are two ways to use a re
regular expression. Assuming you import the module with
import re, you can test whether a
regular expression matches a string r with the construct
sre.match(.r,s)
However, if you will be matching the same regular
expression many times, the performance will be better if
you compile the regular expression using
re.compile(, which returns a compiled regular
expression object. You can then check a string
s for matching by using the r).match( method
on that object.s)
Here are the functions in module re:
compile(r[,f])Compile regular expression
. Returns a compiled r.e. object;
see the table of methods on such objects below. To get
case-insensitive matching, use rre.I as the argument. There are
other flags that may be passed to the
f
argument; see the
Python Library Reference.f
match(r,s[,f])If matches the start of string
r, return a
sMatchObject (see below),
otherwise return None.
search(r,s[,f])Like the match()
method, but matches anywhere in
r,
not just at the beginning.s
split(r,s[,maxsplit=m])Splits string into pieces where
pattern s occurs. If r does not contain groups,
returns a list of the parts of r
that match s, in order. If r contains groups,
returns a list containing all the characters from
r,
with parts matching s in separate elements
from the non-matching parts. If the
r
argument is given, it specifies the maximum number of pieces
that will be split, and the leftovers will be returned as an
extra string at the end of the list.m
sub(r,R,s[,count=c])Replace the leftmost non-overlapping parts of
that match s using r; returns
R
if there is no match. The s
argument can be a string or a function that takes
one RMatchObject argument and
returns the string to be substituted. If the
argument is supplied (defaulting to 0), no more
than c
replacements are done, where a value of 0 means do them
all.c
Compiled regular expression objects returned by
re.compile() have these methods:
.match(s[,[ps][,pe]])If the start of string matches, return a
sMatchObject; if there is
no match, return None. If
is given, it
specifies the index within ps where matching is to
start; this defaults to 0. If s is given, it specifies the maximum
length of pe that can be used in matching.s
.search(s[,[ps][,pe]])Like match(), but
matches anywhere in .s
.split(s[,maxsplit=m])Like re.split().
.sub(R,s[,count=c])Like re.sub().
.patternThe string from which this object was compiled.
A MatchObject is the object
returned by .match() or other
methods. Such an object has these methods:
.group([n])Retrieves the text that matched. If there
are no arguments, returns the entire string that
matched. To retrieve just the text that matched
the th group, pass in an integer
n,
where the groups are numbered starting at 1. For
example, for a nMatchObject
, mm would return the text that
matched the second group, or
.group(2)None if there were no
second group.
If you have named the groups in your regular
expression using a construct of the form
(?P<, the name>...) argument can be the
n
as a string. For example, if you have a group
name(?P<year>[\d]{4})
(which matches four digits), you can retrieve
that field using .m.group("year")
.groups()Return a tuple ( containing all the matched
strings, where s1,s2,...)
is the string that matched the sith
group.i
.start([n])Returns the location where a match started.
If no argument is given, returns the index within
the string where the entire match started. If an
argument is given, returns the index of the
start of the match for the nth group.n
.end([n])Returns the location where a match ended. If
no argument is given, returns the index of the
first character past the match. If
is given, returns the index of the first
character past where the nth group matched.n
.span([n])Returns a 2-tuple (.m.start(n),m.end(n))
.posThe effective
value passed to ps.match() or
.search().
.endposThe effective
value passed to pe.match() or
.search().
.reThe regular expression object used to produce this
MatchObject.
.stringThe argument passed to
s.match() or
.search().