Next / Previous / Contents / Shipman's homepage

4. The Scan class: Managing progress through a stream

The Scan class was invented to manage the scanning of a file, especially when error messages may be written that are related to locations in the file. It is intended for writing small compilers and other applications that syntax-check files. All error and other message logging is handled using the Log singleton described in Section 3, “The singleton Log object”.

This method of stream processing assumes:

These constraints guarantee that minimal amounts of storage will be taken up even while reading quite large input files, so long as individual lines are not too large.

Each Scan instance keeps track of the current position in the current line. A number of different methods allow you to move the scan position forward (and backward, within the line, if you like), and to advance to the next line (if there is one).

The Scan class also supports an optional line-comment character. For example, you might ask for your Scan instance to ignore a percent sign (%) and everything after it on that line.

Often it is useful to identify the site of an error with more information than just the file name and line number. The Scan object also allows you to provide a callback procedure that can furnish a string that identifies where you are in the file. For example, suppose you are writing a compiler for a language that divides code into modules. The callback procedure might identify which module you are currently in.

This Python class is based on an earlier version implemented in Icon, which has a large and well-designed set of scanning primitives. Ralph Griswold's book, The Icon Programming Language, explains these scanning primitives, but you won't need to know Icon to use the Python Scan object.

Here is the interface to the Scan class.

Scan(inFile, commentPrefix=None, callback=None)

The required inFile argument specifies the stream to be read. The constructor attempts to read the first line; if the stream is empty, it will set the .atEndFile attribute.

  • If inFile is a string, Scan will attempt to open a file by that name. If the open fails, the constructor will raise an IOError exception.

  • inFile may also be a file or file-like object that provides .readline() and .close() methods.

If a commentPrefix string is specified, all input lines that start with that string will be treated as comments. The default behavior is not to consider anything a comment.

If you want to provide additional information in some error messages, pass a function reference as the callback argument. Whenever an error message is issued through the Scan instance, this procedure will be called, with the Scan instance as its sole argument, and the result of this call (which should be a string) will be appended to the error message. This is provided so that the application can identify the point of the error relative to the file's logical structure, not just the current line's contents and its physical position in the file.

.atEndFile

This read-only attribute is a bool, initially false, and set to True when the end of the stream has been reached.

.line

The contents of the current line, as a string, without any trailing line termination. It may be empty. If there is a commentPrefix in force, the comment part of the line is removed. Read-only.

.rawLine

Same as the .line attribute except that the comment is not removed if there is one. Read-only.

.lineNo

The current line number, counting from 1. Read-only.

.pos

Index of the current scan position within the current line, in range(len(line)). Read-only.

.close()

Call this method when stream scanning is complete.

.atEndLine()

A predicate that returns True if the current scan position is at the end of the current line, False otherwise.

.nextLine()

No matter where the scan position is on the current line, try to move to the start of the next line. If there are no lines remaining, leave the position at the end of the file. You can test to see if you are at the end of the file by using the .atEndFile() predicate described above. Returns True if the current line was not the last, False otherwise.

.error(*L)

Writes an error message.

  • If there have been no previous messages issued for the current line, the line's contents are sent to Log(), followed by the result returned by the callback procedure (if there is one).

  • A carat (^) is displayed under the current scan position.

  • The arguments are the error message, as one or more strings that are concatenated to form the message. The complete message is then sent to Log().error().

.syntax(*L)

Works the same as the .error() method, but after transmitting the error message, it raises a SyntaxError exception.

.warning(*L)

Works the same as .error(), but the message is sent to the Log().warning() method.

.msgKind(kind, *L)

The message is built up as in the .error() method, but the result is sent to Log().msgKind(kind).

.move(n)

If at least n characters remain on the current line, advances the position that far and returns the characters between the current position and the new position. If the current line has fewer than n characters remaining, raises IndexError.

.tab(p)

Try to move the position within the current line to p. If successful, it returns the string between the current position and p.

If p is nonnegative, it is treated as the normal Python string index—0 for the first character, 1 for the second, and so on.

You may also describe positions relative to the end of the string, but these do not follow the standard Python behavior for negative indices. Position -1 is at the end of the string; -2 is before the last character on the line; and so on.

If position p is within the line, the position is set as requested. If it is out of range for the current line length, this method will raise IndexError.

.isPos(p)

A predicate to test whether the position in the current line is equivalent to p, where the values of p are as described above for the .tab() method.

For example, to test to see if there is exactly one character remaining on the line in scan object s, you could use the expression s.isPos(-2), which would return True if you are positioned on the last character, False otherwise.

.find(s)

This method searches the remainder of the current line for a string s. If the string is found, this method returns the index on the current line (counting from 0) where the first match begins. If string s does not occur in the remaining part of the current line, the method returns None.

.upToRe(r)

This method is used to look for a specific pattern on the current line, at or after the current position. The argument must be a regular expression (as a string) or a compiled regular expression.

If the pattern is found, this method returns the index on the current line where the first match starts. If the pattern is not found, it returns None.

.deblankFile()

Tries to advance to the next non-whitespace character in the file, if any. This method may move beyond the current line, unlike most of the scanning methods. If there are no characters remaining in the file, or the next character is not a whitespace character, it does nothing.

.deblankLine()

If there is anything left on the current line, and the character at the current scan position is a whitespace character, the position is advanced to the next non-whitespace character, but not past the end of the current line. Otherwise it does nothing.

.match(s)

The argument s must be a nonempty string. If the current line at the current position starts with s, the method returns the position just after the match; otherwise it returns None.

.matchArb(s)

Like .match(), but ignores case—that is, it treats all letters in both s and the current line as if they were uppercased.

.tabMatch(s)

Like .match(), but if s matches, the current position is moved to a point just after the match, and the matching contents are returned.

.tabMatchArb(s)

Like .tabMatch(), but ignores case.

.reMatch(r)

The argument r is a regular expression (in string or compiled RE form). If the current line beginning at the current position matches r, the method returns the MatchObject instance; otherwise it returns None.

.tabReMatch(r)

If the string or compiled regular expression r matches the current line at the current position, returns a MatchObject instance and advances the position to a point just after the matching content; otherwise it returns None.

.integer(maxLen=None)

If the current line at the current position starts with one or more digits, optionally preceded by + or -, the position is advanced past those characters, and the value is returned as an int; otherwise it returns None.

If you wish to limit the length of the matched number, pass the maximum number of digits in as an argument.

.fixed()

If the current line at the current position starts with a float (that is, one or more digits and at most one decimal point), optionally preceded by a sign, this method advances the position to a point just after the float, and returns the matching part as a float; otherwise it returns None.

Warning

Note that this does not allow fixed-point numbers to start with a decimal point, so the string ".1" would not be considered valid. The author feels that fixed-point constants less than one should always have a leading zero, because in a number like “.1” it is too easy too miss the decimal point or ignore it as a flyspeck on the monitor.

.flatInt(n)

If the current line at the current position starts with an integer in a fixed-size field of length n, the method advances the position to a point just after that field and returns the integer as type int; otherwise it returns None.