Next / Previous / Contents / Shipman's homepage

2. Structuring your application

Here is a brief summary of the general procedure for writing a program that uses the pyparsing module.

  1. Write the BNF that describes the structure of the text that you are analyzing.

  2. Install the pyparsing module if necessary. Most recent Python install packages will include it. If you don't have it, you can download it from the pyparsing homepage.

  3. In your Python script, import the pyparsing module. We recommend that you import it like this:

    import pyparsing as pp

    The examples in this document will refer to the module throughout as pp.

  4. Your script will assemble a parser that matches your BNF. A parser is an instance of the abstract base class pp.ParserElement that describes a general pattern.

    Building a parser for your input file format is a bottom-up process. You start by writing parsers for the smallest pieces, and assemble them into larger and larger pieces until you have a parser for the entire file.

  5. Build a Python string (type str or unicode) containing the input text to be processed.

  6. If the parser is p and the input text is s, this code will try to match them:


    If the syntax of s matches the syntax descriped by p, this expression will return an object that represents the parts that matched. This object will be an instance of class pp.ParseResults.

    If the input does not match your parser, it will raise an exception of class pp.ParseException. This exception will include information about where in the input the parse faiiled.

    The .parseString() method proceeds in sequence through your input text, using the pieces of your parser to match chunks of text. The lowest-level parsers are sometimes called tokens, and parsers at higher levels are called patterns.

    You may attach parse actions to any component parser. For example, the parser for an integer might have an attached parse action that converts the string representation into a Python int.


    White space (non-printing) characters such as space and tab are normally skipped between tokens, although this behavior can be changed. This greatly simplifies many applications, because you don't have to clutter the syntax with a lot of patterns that specify where white space can be skipped.

  7. Extract your application's information from the returned ParseResults instance. The exact structure of this instance depends on how you built the parser.

To see how this all fits together: