A quick reference guide for pyparsing, a recursive descent parser framework for the Python programming language.
Table of Contents
ParserElement: The basic parser building block
CaselessKeyword: Case-insensitive keyword match
CaselessLiteral: Case-insensitive string match
CharsNotIn: Match characters not in a given set
Combine: Fuse components together
Dict: A scanner for tables
Each: Require components in any order
Empty: Match empty content
FollowedBy: Adding lookahead constraints
Forward: The parser placeholder
GoToColumn: Advance to a specified position in the line
Group: Group repeated items into a list
Keyword: Match a literal string not adjacent to specified context
LineEnd: Match end of line
LineStart: Match start of line
Literal: Match a specific string
MatchFirst: Try multiple matches in a given order
NoMatch: A parser that never matches
NotAny: General lookahead condition
OneOrMore: Repeat a pattern one or more times
Optional: Match an optional pattern
Or: Parse one of a set of alternatives
ParseFatalException: Get me out of here!
ParseResults: Result returned from a match
QuotedString: Match a delimited string
Regex: Match a regular expression
SkipTo: Search ahead for a pattern
StringEnd: Match the end of the text
StringStart: Match the start of the text
Suppress: Omit matched text from the result
Upcase: Uppercase the result
White: Match whitespace
Word: Match characters from a specified set
WordEnd: Match only at the end of a word
WordStart: Match only at the start of a word
ZeroOrMore: Match any number of repetitions including none
col(): Convert a position to a column number
countedArray: Parse N followed by N things
delimitedList(): Create a parser for a delimited list
dictOf(): Build a dictionary from key/value pairs
downcaseTokens(): Lowercasing parse action
getTokensEndLoc(): Find the end of the tokens
line(): In what line does a location occur?
lineno(): Convert a position to a line number
matchOnlyAtCol(): Parse action to limit matches to a specific column
matchPreviousExpr(): Match the text that the preceding expression matched
matchPreviousLiteral(): Match the literal text that the preceding expression matched
nestedExpr(): Parser for nested lists
oneOf(): Check for multiple literals, longest first
srange(): Specify ranges of characters
removeQuotes(): Strip leading trailing quotes
replaceWith(): Substitute a constant value for the matched text
traceParseAction(): Decorate a parse action with trace output
upcaseTokens(): Uppercasing parse action
alphanums: The alphanumeric characters
alphas: The letters
alphas8bit: Supplement Unicode letters
cStyleComment: Match a C-language comment
commaSeparatedList: Parser for a comma-separated list
cppStyleComment: Parser for C++ comments
dblQuotedString: String enclosed in
dblSlashComment: Parser for a comment that starts with “
empty: Match empty content
hexnums: All hex digits
javaStyleComment: Comments in Java syntax
lineEnd: An instance of
lineStart: An instance of
nums: The decimal digits
printables: All the printable non-whitespace characters
punc8bit: Some Unicode punctuation marks
pythonStyleComment: Comments in the style of the Python language
quotedString: Parser for a default quoted string
restOfLine: Match the rest of the current line
sglQuotedString: String enclosed in
stringEnd: Matches the end of the string
unicodeString: Match a Python-style Unicode string
The purpose of the pyparsing module is to give programmers using the Python programming language a tool for extracting information from structured textual data.
In order to find information within structured text, we must be able to describe that structure. The pyparsing module builds on the fundamental syntax description technology embodied in Backus-Naur Form, or BNF. Some familiarity with the various syntax notations based on BNF will be most helpful to you in using this package.
The way that the pyparsing module works is to match patterns in the input text using a recursive descent parser: we write BNF-like syntax productions, and pyparsing provides a machine that matches the input text against those productions.
The pyparsing module works best when you can describe the exact syntactic structure of the text you are analyzing. A common application of pyparsing is the analysis of log files. Log file entries generally have a predictable structure including such fields as dates, IP addresses, and such. Possible applications of the module to natural language work are not addressed here.
Useful online references include:
Complete online reference documentation for each class, function, and variable.
See the tutorial at
The package author's 2004 tutorial is slightly dated but still useful.
For a small example (about ten syntax productions) of an
application that uses this package, see icalparse: A
pyparsing parser for
For a modest example, abaraw: A shorthand notation for bird records describes a file format with about thirty syntax productions. The actual implementation appears in a separate document, abaraw internal maintenance specification, which is basically a pyparsing core with a bit of application logic that converts it to XML to be passed to later processing steps.
Not every feature is covered here; this document is an attempt to cover the features most people will use most of the time. See the reference documentation for all the grisly details.
In particular, the author feels strongly that pyparsing is not the right tool for parsing XML and HTML, so numerous related features are not covered here. For a much better XML/HTML tool, see Python XML processing with lxml.