The purpose of this function is to recognize a string enclosed in double-quotes, which may contain escaped double-quote characters.
# - - - s c a n Q u o t e d - - -
def scanQuoted ( s ):
"""Remove a double-quoted string from the front of s, with escaping
[ s is a string ->
if s starts with a double-quoted string with escaping ->
return (contents of that string with escaped quotes
unescaped, remainder of s)
else -> raise ValueError ]
"""
We'll use an index pos to mark our
progress through the string, and a list L to accumulate the character content of the string.
#-- 1 --
pos = 0
L = []
s must start with a double quote,
or we're in trouble right away.
#-- 2 --
# [ if s starts with '"' ->
# pos := 1
# else -> raise ValueError ]
if s[0] == '"':
pos = 1
else:
raise ValueError, ( "Expecting '\"' at the start of the "
"command group." )
Next we move pos along until we
either find an unescaped quote or run out of string.
A little note on an edge case. In the body of the loop
below, we test a string s[pos:pos+2] to see if it is an escaped double quote. What happens if
pos is pointing at the last
character of s? In that case, the
expression s[pos:pos+2] evaluates
without raising an exception, and returns the single
character at s[pos].
#-- 3 --
# [ pos := pos advanced to character after closing quote
# or end of s, whichever comes first
# L +:= characters between s[pos:] and closing quote or
# end of s, whichever comes first ]
while ( ( pos < len(s) ) and
( s[pos] != '"' ) ):
#-- 3 body --
# [ if s[pos:] starts with '\"' ->
# L +:= s[pos+1]
# pos +:= 2
# else ->
# L +:= s[pos]
# pos +:= 1 ]
if ( s[pos:pos+2] == r'\"' ):
L.append ( s[pos+1] )
pos += 2
else:
L.append ( s[pos] )
pos += 1
#-- 4 --
# [ if pos < len(s) ->
# return (elements of L concatenated, s[pos+1:])
# else ->
# raise ValueError ]
if pos < len(s):
return ("".join(L), s[pos+1:])
else:
raise ValueError, ("No closing double-quote: '%s'" % s)