Next / Previous / Contents / TCC Help System / NM Tech homepage

28.5.1. Characters in regular expressions

Note: The raw string notation r'...' is most useful for regular expressions; see raw strings, above.

These characters have special meanings in regular expressions:

.Matches any character except a newline.
^Matches the start of the string.
$Matches the end of the string.
r* Matches zero or more repetitions of regular expression r.
r+ Matches one or more repetitions of r.
r? Matches zero or one r.
r*? Non-greedy form of r*; matches as few characters as possible. The normal * operator is greedy: it matches as much text as possible.
r+? Non-greedy form of r+.
r?? Non-greedy form of r?.
r{m,n} Matches from m to n repetitions of r. For example, r'x{3,5}' matches between three and five copies of letter 'x'; r'(bl){4}' matches the string 'blblblbl'.
r{m,n}? Non-greedy version of the previous form.
[...] Matches one character from a set of characters. You can put all the allowable characters inside the brackets, or use a-b to mean all characters from a to b inclusive. For example, regular expression r'[abc]' will match either 'a', 'b', or 'c'. Pattern r'[0-9a-zA-Z]' will match any single letter or digit.
[^...] Matches any character not in the given set.
rs Matches expression r followed by expression s.
r|s Matches either r or s.
(r) Matches r and forms it into a group that can be retrieved separately after a match; see MatchObject, below. Groups are numbered starting from 1.
(?:r) Matches r but does not form a group for later retrieval.
(?P<n>r) Matches r and forms it into a named group, with name n, for later retrieval.
(?P=n) Matches whatever string matched an earlier (?P<n>r) group.
(?#...) Comment: the “...” portion is ignored and may contain a comment.
(?=...) The “...” portion must be matched, but is not consumed by the match. This is sometimes called a lookahead match. For example, r'a(?=bcd)' matches the string 'abcd' but not the string 'abcxyz'. Compared to using r'abcd' as the regular expression, the difference is that in this case the matched portion would be 'a' and not 'abcd'.
(?!...) This is similar to the (?=...): it specifies a regular expression that must not match, but does not consume any characters. For example, r'a(?!bcd)' would match 'axyz', and return 'a' as the matched portion; but it would not match 'abcdef'. You could call it a negative lookahead match.

The special sequences in the table below are recognized. However, many of them function in ways that depend on the locale; see Section 19.4, “What is the locale?”. For example, the r'\s' sequence matches characters that are considered whitespace in the current locale.

\n Matches the same text as a group that matched earlier, where n is the number of that group. For example, r'([a-zA-Z]+):\1' matches the string "foo:foo".
\A Matches only at the start of the string.
\b Matches the empty string but only at the start or end of a word (where a word is set off by whitespace or a non-alphanumeric character). For example, r'foo\b' would match "foo" but not "foot".
\B Matches the empty string when not at the start or end of a word.
\d Matches any digit.
\D Matches any non-digit.
\s Matches any whitespace character.
\S Matches any non-whitespace character.
\w Matches any alphanumeric character plus the underbar '_'.
\W Matches any non-alphanumeric character.
\Z Matches only at the end of the string.
\\ Matches a backslash (\) character.