The Scan class was invented to manage the scanning of a
file, especially when error messages may be written that
are related to locations in the file. It is intended for
writing small compilers and other applications that
syntax-check files. All error and other message logging
is handled using the Log singleton described in
Section 3, “The singleton Log object”.
This method of stream processing assumes:
that the file is structured into lines separated by ASCII newline characters; and
that the caller will never want to back up to a previous line, though backing up within a line is allowed.
These constraints guarantee that minimal amounts of storage will be taken up even while reading quite large input files, so long as individual lines are not too large.
Each Scan instance keeps track of the current position in
the current line. A number of different methods allow you
to move the scan position forward (and backward, within the
line, if you like), and to advance to the next line (if
there is one).
The Scan class also supports an optional line-comment
character. For example, you might ask for your Scan
instance to ignore a percent sign (%) and
everything after it on that line.
Often it is useful to identify the site of an error with
more information than just the file name and line number.
The Scan object also allows you to provide a callback
procedure that can furnish a string that identifies where
you are in the file. For example, suppose you are writing
a compiler for a language that divides code into modules.
The callback procedure might identify which module you are
currently in.
This Python class is based on an earlier version
implemented in Icon, which has a large and well-designed
set of scanning primitives. Ralph Griswold's book,
The Icon Programming Language,
explains these scanning primitives, but you won't need to
know Icon to use the Python Scan object.
Here is the interface to the Scan class.
Scan(inFile,
commentPrefix=None, callback=None)
The required argument specifies the stream to be read.
The constructor attempts to read the first line; if
the stream is empty, it will set the inFile.atEndFile attribute.
If is a string, inFileScan will attempt to
open a file by that name. If the open fails, the
constructor will raise an IOError
exception.
may also be a file or file-like object that
provides inFile.readline() and .close() methods.
If a commentPrefix string is specified,
all input lines that start with that string will be
treated as comments. The default behavior is not to
consider anything a comment.
If you want to provide additional information in some
error messages, pass a function reference as the
callback argument. Whenever an error
message is issued through the Scan instance, this
procedure will be called, with the Scan instance as
its sole argument, and the result of this call (which
should be a string) will be appended to the error
message. This is provided so that the application
can identify the point of the error relative to the
file's logical structure, not just the current line's
contents and its physical position in the file.
.atEndFile
This read-only attribute is a bool,
initially false, and set to True when
the end of the stream has been reached.
.line
The contents of the current line, as a string,
without any trailing line termination. It may be
empty. If there is a commentPrefix in
force, the comment part of the line is removed.
Read-only.
.rawLine
Same as the .line attribute except
that the comment is not removed if there is one.
Read-only.
.lineNo
The current line number, counting from 1. Read-only.
.pos
Index of the current scan position within the current
line, in range(len(line)). Read-only.
.close()
Call this method when stream scanning is complete.
.atEndLine()
A predicate that returns True if the
current scan position is at the end of the current
line, False otherwise.
.nextLine()
No matter where the scan position is on the current
line, try to move to the start of the next line. If
there are no lines remaining, leave the position at
the end of the file. You can test to see if you are
at the end of the file by using the .atEndFile() predicate described above.
Returns True if the current line was
not the last, False otherwise.
.error(*L)
Writes an error message.
If there have been no previous messages issued for the
current line, the line's contents are sent to Log(), followed by the result returned by the
callback procedure (if there is one).
A carat (^) is displayed under
the current scan position.
The arguments are the error message, as one or more
strings that are concatenated to form the message. The
complete message is then sent to Log().error().
.syntax(*L)
Works the same as the .error() method,
but after transmitting the error message, it raises a
SyntaxError exception.
.warning(*L)
Works the same as .error(), but the
message is sent to the Log().warning()
method.
.msgKind(kind,
*L)
The message is built up as in the .error() method, but the result is sent to
Log().msgKind(.
kind)
.move(n)
If at least characters remain on the current line,
advances the position that far and returns the
characters between the current position and the new
position. If the current line has fewer than n characters
remaining, raises nIndexError.
.tab(p)
Try to move the position within the current line to
. If
successful, it returns the string between the current
position and p.
p
If is
nonnegative, it is treated as the normal Python
string index—0 for the first character, 1 for
the second, and so on.
p
You may also describe positions relative to the end of the string, but these do not follow the standard Python behavior for negative indices. Position -1 is at the end of the string; -2 is before the last character on the line; and so on.
If position is within the line, the position is set as
requested. If it is out of range for the current
line length, this method will raise pIndexError.
.isPos(p)
A predicate to test whether the position in the
current line is equivalent to , where the values of p are as
described above for the p.tab() method.
For example, to test to see if there is exactly one
character remaining on the line in scan object s, you could use the expression s.isPos(-2), which would return True if you are positioned on the last
character, False otherwise.
.find(s)
This method searches the remainder of the current
line for a string . If the string is found, this method
returns the index on the current line (counting from
0) where the first match begins. If string s does not
occur in the remaining part of the current line, the
method returns sNone.
.upToRe(r)
This method is used to look for a specific pattern on the current line, at or after the current position. The argument must be a regular expression (as a string) or a compiled regular expression.
If the pattern is found, this method returns the
index on the current line where the first match
starts. If the pattern is not found, it returns
None.
.deblankFile()
Tries to advance to the next non-whitespace character in the file, if any. This method may move beyond the current line, unlike most of the scanning methods. If there are no characters remaining in the file, or the next character is not a whitespace character, it does nothing.
.deblankLine()
If there is anything left on the current line, and the character at the current scan position is a whitespace character, the position is advanced to the next non-whitespace character, but not past the end of the current line. Otherwise it does nothing.
.match(s)
The argument must be a nonempty string. If the current
line at the current position starts with s, the method
returns the position just after the match; otherwise
it returns sNone.
.matchArb(s)
Like .match(), but ignores
case—that is, it treats all letters in both
and the
current line as if they were uppercased.
s
.tabMatch(s)
Like .match(), but if matches, the
current position is moved to a point just after the
match, and the matching contents are returned.
s
.tabMatchArb(s)
Like .tabMatch(), but ignores case.
.reMatch(r)
The argument is a regular expression (in string or
compiled RE form). If the current line beginning at
the current position matches r, the method returns the
rMatchObject instance; otherwise
it returns None.
.tabReMatch(r)
If the string or compiled regular expression matches the
current line at the current position, returns a rMatchObject instance and advances the
position to a point just after the matching content;
otherwise it returns None.
.integer(maxLen=None)
If the current line at the current position starts
with one or more digits, optionally preceded by + or -, the position is
advanced past those characters, and the value is
returned as an int; otherwise it returns None.
If you wish to limit the length of the matched number, pass the maximum number of digits in as an argument.
.fixed()
If the current line at the current position starts
with a float (that is, one or more digits and at most
one decimal point), optionally preceded by a sign,
this method advances the position to a point just
after the float, and returns the matching part
as a float;
otherwise it returns None.
Note that this does not allow fixed-point numbers
to start with a decimal point, so the string ".1" would not be considered valid. The
author feels that fixed-point constants less than
one should always have a leading zero, because in a
number like “.1” it is too easy too
miss the decimal point or ignore it as a flyspeck
on the monitor.
.flatInt ( n )
If the current line at the current position starts
with an integer in a fixed-size field of length , the method
advances the position to a point just after that
field and returns the integer as type nint; otherwise it returns None.