Next / Previous / Contents / Shipman's homepage

5. Classes

Here are the classes defined in the pyparsing module.

5.1. ParserElement: The basic parser building block

Definition: A parser is an instance of some subclass of pp.ParserElement.

In the process of building a large syntax out of small pieces, define a parser for each piece, and then combine the pieces into larger and large aggregations until you have a parser that matches the entire input.

To assemble parsers into larger configurations, you will use pyparsing's built-in classes such as pp.And, pp.Or, and pp.OneOrMore. Each of these class constructors returns a parser, and many of them accept one or more parsers as arguments.

For example, if a certain element of the syntax described by some parser p is optional, then pp.Optional(p) returns another parser – that is, another instance of a subclass of pp.ParserElement – that will match pattern p if it occurs at that point in the input, and do nothing if the input does not match p.

Here are the methods available on a parser instance p that subclasses pp.ParserElement.

p.addParseAction(f1, f2, ...)

Returns a copy of p with one or more additional parse actions attached. See the p.setParseAction() method below for a discussion of parse actions.

p.copy()

Returns a copy of p.

p.ignore(q)

This method modifies p so that it ignores any number of occurrences of text that matches pattern q. This is a useful way to instruct your parser to ignore comments.

>>> number = pp.Word(pp.nums)
>>> name = pp.Word(pp.alphas).ignore(number)
>>> print name.parseString('23 84 98305478 McTeagle')
['McTeagle']
p.leaveWhitespace()

This method instructs p not to skip whitespace before matching the input text. The method returns p.

When used on a parser that includes multiple pieces, this method suppresses whitespace skipping for all the included pieces. Here is an example:

>>> word = pp.Word(pp.alphas)
>>> num = pp.Word(pp.nums)
>>> wn = (word + num).leaveWhitespace()
>>> nwn = num + wn
>>> print nwn.parseString('23xy47')
['23', 'xy', '47']
>>> print nwn.parseString('23 xy47')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/pyparsing.py", line 1032,
in parseString
    raise exc
pyparsing.ParseException: Expected W:(abcd...) (at char 2), (line:1,
col:3)

Note

To save space, in subsequent examples we will omit all the “Traceback” lines except the last.

>>> print nwn.parseString('23xy 47')
pyparsing.ParseException: Expected W:(0123...) (at char 4), (line:1, col:5)

You will note that even though the num parser does not skip whitespace, whitespace is still disallowed for the string ' 47' because the wn parser disabled automatic whitespace skipping.

p.parseFile(f, parseAll=False)

Try to match the contents of a file against parser p. The argument f may be either the name of a file or a file-like object.

If the entire contents of the file does not match p, it is not considered an error unless you pass the argument parseAll=True.

p.parseString(s, parseAll=False)

Try to match string s against parser p. If there is a match, it returns an instance of Section 5.26, “ParseResults: Result returned from a match”. If there is no match, it will raise a pp.ParseException.

By default, if the entirety of s does not match p, it is not considered an error. If you want to insure that all of s matched p, pass the keyword argument parseAll=True.

p.scanString(s)

Search through string s to find regions that match p. This method is an iterator that generates a sequence of tuples (r, start, end), where r is a pp.ParseResults instance that represents the matched part, and start and end are the beginning and ending offsets within s that bracket the position of the matched text.

>>> name = pp.Word(pp.alphas)
>>> text = "**** Farcical aquatic  ceremony"
>>> for result, start, end in name.scanString(text):
...     print "Found {0} at [{1}:{2}]".format(result, start, end)
... 
Found ['Farcical'] at [5:13]
Found ['aquatic'] at [14:21]
Found ['ceremony'] at [23:31]
p.setBreak()

When this parser is about to be used, call up the Python debugger pdb.

p.setFailAction(f)

This method modifies p so that it will call function f if it fails to parse. The method returns p.

Here is the calling sequence for a fail action:

f(s, loc, expr, err)
s

The input string.

loc

The location in the input where the parse failed, as an offset counting from 0.

expr

The name of the parser that failed.

err

The exception instance that the parser raised.

Here is an example.

>>> def oops(s, loc, expr, err):
...     print ("s={0!r} loc={1!r} expr={2!r}\nerr={3!r}".format(
...            s, loc, expr, err))
... 
>>> fail = pp.NoMatch().setName('fail-parser').setFailAction(oops)
>>> r = fail.parseString("None shall pass!")
s='None shall pass!' loc=0 expr=fail-parser
err=Expected fail-parser (at char 0), (line:1, col:1)
pyparsing.ParseException: Expected fail-parser (at char 0), (line:1,
col:1)
p.setName(name)

Attaches a name to this parser for debugging purposes. The argument is a string. The method returns p.

>>> print pp.Word(pp.nums)
W:(0123...)
>>> count = pp.Word(pp.nums).setName('count-parser')
>>> print count
count-parser
>>> count.parseString('FAIL')
pyparsing.ParseException: Expected count-parser (at char 0), (line:1,
col:1)

In the above example, if you convert a parser to a string, you get a generic description of it: the string “W:(0123...)” tells you it is a Word parser and shows you the first few characters in the set. Once you have attached a name to it, the string form of the parser is that name. Note that when the parse fails, the error message identifies what it expected by naming the failed parser.

p.setParseAction(f1, f2, ...)

This method returns a copy of p with one or more parse actions attached. When the parser matches the input, it then calls each function fi in the order specified.

The calling sequence for a parse action can be any of these four prototypes:

f()
f(toks)
f(loc, toks)
f(s, loc, toks)

These are the arguments your function will receive, depending on how many arguments it accepts:

s

The string being parsed. If your string contains tab characters, see the reference documentation for notes about tab expansion and its effect on column positions.

loc

The location of the matching substring as an offset (index, counting from 0).

toks

A pp.ParseResults instance containing the results of the match.

A parse action can modify the result (the toks argument) by returning the modified list. If it returns None, the result is not changed. Here is an example parser with two parse actions.

>>> name = pp.Word(pp.alphas)
>>> def a1():
...     print "In a1"
... 
>>> def a2(s, loc, toks):
...     print "In a2: s={0!r} loc={1!r} toks={2!r}".format(
...         s, loc, toks)
...     return ['CENSORED']
... 
>>> newName = name.setParseAction(a1, a2)
>>> r = newName.parseString('Gambolputty')
In a1
In a2: s='Gambolputty' loc=0 toks=(['Gambolputty'], {})
>>> print r
['CENSORED']
p.setResultsName(name)

For parsers that deposit the matched text into the ParseResults instance returned by .parseString(), you can use this method to attach a name to that matched text. Once you do this, you can retrieve the matched text from the ParseResults instance by using that instance as if it were a Python dict.

>>> count = pp.Word(pp.nums)
>>> beanCounter = count.setResultsName('beanCount')
>>> r = beanCounter.parseString('7388')
>>> r.keys()
['beanCount']
>>> r['beanCount']
'7388'

The result of this method is a copy of p. Hence, if you have defined a useful parser, you can create several instances, each with a different results name. Continuing the above example, if we then use the count parser, we find that it does not have the results name that is attached to its copy beanCounter.

>>> r2 = count.parseString('8873')
>>> r2.keys()
[]
>>> print r2
['8873']
p.setWhitespaceChars(s)

For parser p, change its definition of whitespace to the characters in string s.

p.suppress()

This method returns a copy of p modified so that it does not add the matched text to the ParseResult. This is useful for omitting punctuation. See also Section 5.32, “Suppress: Omit matched text from the result”.

>>> name = pp.Word(pp.alphas)
>>> lb = pp.Literal('[')
>>> rb = pp.Literal(']')
>>> pat1 = lb + name + rb
>>> print pat1.parseString('[hosepipe]')
['[', 'hosepipe', ']']
>>> pat2 = lb.suppress() + name + rb.suppress()
>>> print pat2.parseString('[hosepipe]')
['hosepipe']

Additionally, these ordinary Python operators are overloaded to work with ParserElement instances.

p1+p2

Equivalent to “pp.And(p1, p2)”.

p * n

For a parser p and an integer n, the result is a parser that matches n repetitions of p. You can give the operands in either order: for example, “3 * p” is the same as “p * 3”.

>>> threeWords = pp.Word(pp.alphas) * 3
>>> text = "Lady of the Lake"
>>> print threeWords.parseString(text)
['Lady', 'of', 'the']
>>> print threeWords.parseString(text, parseAll=True)
pyparsing.ParseException: Expected end of text (at char 12), (line:1,
col:13)
p1 | p2

Equivalent to “pp.MatchFirst(p1, p2)”.

p1 ^ p2

Equivalent to “pp.Or(p1, p2)”.

p1 & p2

Equivalent to “pp.Each(p1, p2)”.

~ p

Equivalent to “pp.NotAny(p)”.

Class pp.ParserElement also supports one static method:

pp.ParserElement.setDefaultWhitespaceChars(s)

This static method changes the definition of whitespace to be the characters in string s. Calling this method has this effect on all subsequent instantiations of any pp.ParserElement subclass.

>>> blanks = ' \t-=*#^'
>>> pp.ParserElement.setDefaultWhitespaceChars(blanks)
>>> text = ' \t-=*#^silly ##*=---\t  walks--'
>>> nameList = pp.OneOrMore(pp.Word(pp.alphas))
>>> print nameList.parseString(text)
['silly', 'walks']