Scan class was invented to manage the scanning of a
file, especially when error messages may be written that
are related to locations in the file. It is intended for
writing small compilers and other applications that
syntax-check files. All error and other message logging
is handled using the
Log singleton described in
Section 3, “The singleton
This method of stream processing assumes:
that the file is structured into lines separated by ASCII newline characters; and
that the caller will never want to back up to a previous line, though backing up within a line is allowed.
These constraints guarantee that minimal amounts of storage will be taken up even while reading quite large input files, so long as individual lines are not too large.
Scan instance keeps track of the current position in
the current line. A number of different methods allow you
to move the scan position forward (and backward, within the
line, if you like), and to advance to the next line (if
there is one).
Scan class also supports an optional line-comment
character. For example, you might ask for your
instance to ignore a percent sign (
everything after it on that line.
Often it is useful to identify the site of an error with
more information than just the file name and line number.
Scan object also allows you to provide a callback
procedure that can furnish a string that identifies where
you are in the file. For example, suppose you are writing
a compiler for a language that divides code into modules.
The callback procedure might identify which module you are
This Python class is based on an earlier version
implemented in Icon, which has a large and well-designed
set of scanning primitives. Ralph Griswold's book,
The Icon Programming Language,
explains these scanning primitives, but you won't need to
know Icon to use the Python
Here is the interface to the
inFile, commentPrefix=None, callback=None)
specifies the stream to be read. The constructor attempts
to read the first line; if the stream is empty, it will
Scan will attempt to open a file by that
name. If the open fails, the constructor will raise
also be a file or file-like object that provides
commentPrefix string is specified,
all input lines that start with that string will be
treated as comments. The default behavior is not to
consider anything a comment.
If you want to provide additional information in some
error messages, pass a function reference as the
callback argument. Whenever an error
message is issued through the
Scan instance, this
procedure will be called, with the
Scan instance as
its sole argument, and the result of this call (which
should be a string) will be appended to the error
message. This is provided so that the application
can identify the point of the error relative to the
file's logical structure, not just the current line's
contents and its physical position in the file.
This read-only attribute is a
initially false, and set to
the end of the stream has been reached.
The contents of the current line, as a string,
without any trailing line termination. It may be
empty. If there is a
force, the comment part of the line is removed.
Same as the
.line attribute except
that the comment is not removed if there is one.
The current line number, counting from 1. Read-only.
Index of the current scan position within the current
Call this method when stream scanning is complete.
A predicate that returns
True if the
current scan position is at the end of the current
No matter where the scan position is on the current line,
try to move to the start of the next line. If there are
no lines remaining, leave the position at the end of the
file. You can test to see if you are at the end of the
file by using the
described above. Returns
True if the current
line was not the last,
Writes an error message.
If there have been no previous messages issued for the
current line, the line's contents are sent to
Log(), followed by the result returned by
the callback procedure (if there is one).
A carat (
^) is displayed under
the current scan position.
The arguments are the error message, as one or more
strings that are concatenated to form the message.
The complete message is then sent to
Works the same as the
but after transmitting the error message, it raises a
Works the same as
.error(), but the
message is sent to the
The message is built up as in the
method, but the result is sent to
If at least
characters remain on the current line, advances the
position that far and returns the characters between the
current position and the new position. If the current
line has fewer than
Try to move the position within the current line to
successful, it returns the string between the current
nonnegative, it is treated as the normal Python
string index—0 for the first character, 1 for
the second, and so on.
You may also describe positions relative to the end of the string, but these do not follow the standard Python behavior for negative indices. Position -1 is at the end of the string; -2 is before the last character on the line; and so on.
within the line, the position is set as requested. If it
is out of range for the current line length, this method
A predicate to test whether the position in the current
line is equivalent to
, where the
described above for the
For example, to test to see if there is exactly one
character remaining on the line in scan object
s, you could use the expression
s.isPos(-2), which would return
True if you are positioned on the last
This method searches the remainder of the current line for
the string is found, this method returns the index on the
current line (counting from 0) where the first match
begins. If string
does not occur
in the remaining part of the current line, the method
This method is used to look for a specific pattern on the current line, at or after the current position. The argument must be a regular expression (as a string) or a compiled regular expression.
If the pattern is found, this method returns the
index on the current line where the first match
starts. If the pattern is not found, it returns
Tries to advance to the next non-whitespace character in the file, if any. This method may move beyond the current line, unlike most of the scanning methods. If there are no characters remaining in the file, or the next character is not a whitespace character, it does nothing.
If there is anything left on the current line, and the character at the current scan position is a whitespace character, the position is advanced to the next non-whitespace character, but not past the end of the current line. Otherwise it does nothing.
must be a nonempty string. If the current line at the
current position starts with
, the method
returns the position just after the match; otherwise it
.match(), but ignores
case—that is, it treats all letters in both
current line as if they were uppercased.
.match(), but if
current position is moved to a point just after the match,
and the matching contents are returned.
.tabMatch(), but ignores case.
a regular expression (in string or compiled RE form). If
the current line beginning at the current position matches
, the method
MatchObject instance; otherwise
If the string or compiled regular expression
current line at the current position, returns a
MatchObject instance and advances the
position to a point just after the matching content;
otherwise it returns
If the current line at the current position starts with
one or more digits, optionally preceded by
-, the position is advanced past those
characters, and the value is returned as an
int; otherwise it returns
If you wish to limit the length of the matched number, pass the maximum number of digits in as an argument.
If the current line at the current position starts
with a float (that is, one or more digits and at most
one decimal point), optionally preceded by a sign,
this method advances the position to a point just
after the float, and returns the matching part
otherwise it returns
Note that this does not allow fixed-point numbers to
start with a decimal point, so the string
".1" would not be considered valid. The
author feels that fixed-point constants less than one
should always have a leading zero, because in a number
like “.1” it is too easy too miss the decimal point or
ignore it as a flyspeck on the monitor.
If the current line at the current position starts with an
integer in a fixed-size field of length
, the method
advances the position to a point just after that field and
returns the integer as type
int; otherwise it