Next / Previous / Contents / Shipman's homepage

3. A small, complete example

Just to give you the general idea, here is a small, running example of the use of pyparsing.

A Python identifier name consists of one or more characters, in which the first character is a letter or the underbar (“_”) character, and any additional characters are letters, underbars, or digits. In extended BNF we can write it this way:

first       ::=  letter | "_"
letter      ::=  "a" | "b" | ... "z" | "A" | "B" | ... | "Z"
digit       ::=  "0" | "1" | ... | "9"
rest        ::=  first | digit
identifier  ::=  first rest*

That last production can be read as: “an identifier consists of one first followed by zero or more rest”.

Here is a script that implements that syntax and then tests it against a number of strings.

trivex
#!/usr/bin/env python
#================================================================
# trivex: Trivial example
#----------------------------------------------------------------

# - - - - -   I m p o r t s

import sys

The next line imports the pyparsing module and renames it as pp.

trivex
import pyparsing as pp

# - - - - -   M a n i f e s t   c o n s t a n t s

In the next line, the pp.alphas variable is a string containing all lowercase and uppercase letters. The pp.Word() class produces a parser that matches a string of letters defined by its first argument; the exact=1 keyword argument tells that parser to accept exactly one character from that string. So first is a parser (that is, a ParserElement instance) that matches exactly one letter or an underbar.

trivex
first = pp.Word(pp.alphas+"_", exact=1)

The pp.alphanums variable is a string containing all the letters and all the digits. So the rest pattern matches one or more letters, digits, or underbar characters.

trivex
rest = pp.Word(pp.alphanums+"_")

The Python “+” operator is overloaded for instances of the pp.ParserElement class to mean sequence: that is, the identifier parser matches what the first parser matches, followed optionally by what the rest parser matches.

trivex
identifier = first+pp.Optional(rest)

testList = [ # List of test strings    
    # Valid identifiers
    "a", "foo", "_", "Z04", "_bride_of_mothra",
    # Not valid
    "", "1", "$*", "a_#" ]

# - - - - -   m a i n

def main():
    """
    """
    for text in testList:
        test(text)

# - - -   t e s t

def test(s):
    '''See if s matches identifier.
    '''
    print "---Test for '{0}'".format(s)

When you call the .parseString() method on an instance of the pp.ParserElement class, either it returns a list of the matched elements or raises a pp.ParseException.

trivex
    try:
        result = identifier.parseString(s)
        print "  Matches: {0}".format(result)
    except pp.ParseException as x:
        print "  No match: {0}".format(str(x))

# - - - - -   E p i l o g u e

if __name__ == "__main__":
    main()

Here is the output from this script:

---Test for 'a'
  Matches: ['a']
---Test for 'foo'
  Matches: ['f', 'oo']
---Test for '_'
  Matches: ['_']
---Test for 'Z04'
  Matches: ['Z', '04']
---Test for '_bride_of_mothra'
  Matches: ['_', 'bride_of_mothra']
---Test for ''
  No match: Expected W:(abcd...) (at char 0), (line:1, col:1)
---Test for '1'
  No match: Expected W:(abcd...) (at char 0), (line:1, col:1)
---Test for '$*'
  No match: Expected W:(abcd...) (at char 0), (line:1, col:1)
---Test for 'a_#'
  Matches: ['a', '_']

The return value is an instance of the pp.ParseResults class; when printed, it appears as a list of the matched strings. You will note that for single-letter test strings, the resulting list has only a single element, while for multi-letter strings, the list has two elements: the first character (the part that matched first) followed by the remaining characters that matched the rest parser.

If we want the resulting list to have only one element, we can change one line to get this effect:

identifier = pp.Combine(first+pp.Optional(rest))

The pp.Combine() class tells pyparsing to combine all the matching pieces in its argument list into a single result. Here is an example of two output lines from the revised script:

---Test for '_bride_of_mothra'
  Matches: ['_bride_of_mothra']