28.5.1. Characters in regular expressions

Note: The raw string notation `r'...'` is most useful for regular expressions; see raw strings, above.

These characters have special meanings in regular expressions:

 `.` Matches any character except a newline. `^` Matches the start of the string. `\$` Matches the end of the string. `r*` Matches zero or more repetitions of regular expression `r`. `r+` Matches one or more repetitions of `r`. `r?` Matches zero or one `r`. `r*?` Non-greedy form of `r*`; matches as few characters as possible. The normal `*` operator is greedy: it matches as much text as possible. `r+?` Non-greedy form of `r+`. `r??` Non-greedy form of `r?`. `r{m,n}` Matches from `m` to `n` repetitions of `r`. For example, `r'x{3,5}'` matches between three and five copies of letter `'x'`; `r'(bl){4}'` matches the string `'blblblbl'`. `r{m,n}?` Non-greedy version of the previous form. `[...]` Matches one character from a set of characters. You can put all the allowable characters inside the brackets, or use `a-b` to mean all characters from `a` to `b` inclusive. For example, regular expression `r'[abc]'` will match either `'a'`, `'b'`, or `'c'`. Pattern `r'[0-9a-zA-Z]'` will match any single letter or digit. `[^...]` Matches any character not in the given set. `rs` Matches expression `r` followed by expression `s`. `r|s` Matches either `r` or `s`. `(r)` Matches `r` and forms it into a group that can be retrieved separately after a match; see `MatchObject`, below. Groups are numbered starting from 1. `(?:r)` Matches `r` but does not form a group for later retrieval. `(?Pr)` Matches `r` and forms it into a named group, with name `n`, for later retrieval. `(?P=n)` Matches whatever string matched an earlier `(?Pr)` group. `(?#...)` Comment: the “`...`” portion is ignored and may contain a comment. `(?=...)` The “`...`” portion must be matched, but is not consumed by the match. This is sometimes called a lookahead match. For example, `r'a(?=bcd)'` matches the string `'abcd'` but not the string `'abcxyz'`. Compared to using `r'abcd'` as the regular expression, the difference is that in this case the matched portion would be `'a'` and not `'abcd'`. `(?!...)` This is similar to the `(?=...)`: it specifies a regular expression that must not match, but does not consume any characters. For example, `r'a(?!bcd)'` would match `'axyz'`, and return `'a'` as the matched portion; but it would not match `'abcdef'`. You could call it a negative lookahead match.

The special sequences in the table below are recognized. However, many of them function in ways that depend on the locale; see Section 19.4, “What is the locale?”. For example, the `r'\s'` sequence matches characters that are considered whitespace in the current locale.

 `\n` Matches the same text as a group that matched earlier, where `n` is the number of that group. For example, `r'([a-zA-Z]+):\1'` matches the string `"foo:foo"`. `\A` Matches only at the start of the string. `\b` Matches the empty string but only at the start or end of a word (where a word is set off by whitespace or a non-alphanumeric character). For example, `r'foo\b'` would match `"foo"` but not `"foot"`. `\B` Matches the empty string when not at the start or end of a word. `\d` Matches any digit. `\D` Matches any non-digit. `\s` Matches any whitespace character. `\S` Matches any non-whitespace character. `\w` Matches any alphanumeric character plus the underbar `'_'`. `\W` Matches any non-alphanumeric character. `\Z` Matches only at the end of the string. `\\` Matches a backslash (`\`) character.