Next / Previous / Contents / TCC Help System / NM Tech homepage

4. How to structure the returned ParseResults

When your input matches the parser you have built, the .parseString() method returns an instance of class ParseResults.

For a complex structure, this instance may have many different bits of information inside it that represent the important pieces of the input. The exact internal structure of a ParseResults instance depends on how you build up your top-level parser.

You can access the resulting ParseResults instance in two different ways:

Here are some general principles for structuring your parser's ParseResults instance.

4.1. Use pp.Group() to divide and conquer

Like any nontrivial program, structuring a parser with any complexity at all will be more tractable if you use the “divide and conquer” principle, also known as stepwise refinement.

In practice, this means that the top-level ParseResults should contain no more than, say, five or seven components. If there are too many components at this level, look at the total input and divide it into two or more subparsers. Then structure the top level so that it contains just those pieces. If necessary, divide the smaller parsers into smaller parsers, until each parser is clearly defined in terms of built-in primitive functions or other parsers that you have built.

Section 5.13, “Group: Group repeated items into a list” is the basic tool for creating these levels of abstraction.

  • Normally, when your parser matches multiple things, the result is a ParseResults instance that acts like a list of the strings that matched. For example, if your parser matches a list of words, it might return a ParseResults that prints as if it were a list. Using the type() function we can see the actual type of the result, and that the components are Python strings.

    >>> word = pp.Word(pp.alphas)
    >>> phrase = pp.OneOrMore(word)
    >>> result = phrase.parseString('farcical aquatic ceremony')
    >>> print result
    ['farcical', 'aquatic', 'ceremony']
    >>> type(result)
    <class 'pyparsing.ParseResults'>
    >>> type(result[0])
    <type 'str'>
  • However, when you apply pp.Group() to some parser, all the matching pieces are returned in a single pp.ParseResults that acts like a list.

For example, suppose your program is disassembling a sequence of words, and you want to treat the first word one way and the rest of the words another way. Here's our first attempt.

>>> ungrouped = word + phrase
>>> result = ungrouped.parseString('imaginary farcical aquatic ceremony')
>>> print result
['imaginary', 'farcical', 'aquatic', 'ceremony']

That result doesn't really match our concept that the parser is a sequence of two things: a single word, followed by a sequence of words.

By applying pp.Group() like this, we get a parser that will return a sequence of two items that match our concept.

>>> grouped = word + pp.Group(phrase)
>>> result = grouped.parseString('imaginary farcical aquatic ceremony')
>>> print result
['imaginary', ['farcical', 'aquatic', 'ceremony']]
>>> print result[1]
['farcical', 'aquatic', 'ceremony']
>>> type(result[1])
<class 'pyparsing.ParseResults'>
>>> result[1][0]
>>> type(result[1][0])
<type 'str'>
  1. The grouped parser has two components: a word and a pp.Group. Hence, the result returned acts like a two-element list.

  2. The first element is an actual string, 'imaginary'.

  3. The second part is another pp.ParseResults instance that acts like a list of strings.

So for larger grammars, the pp.ParseResults instance, which the top-level parser returns when it matches, will typically be a many-layered structure containing this kind of mixture of ordinary strings and other instances of pp.ParseResults.

The next section will give you some suggestions on manage the structure of these beasts.