Next / Previous / Contents / TCC Help System / NM Tech homepage

51.7. scanQuoted(): Process double-quoted string with escapes

The purpose of this function is to recognize a string enclosed in double-quotes, which may contain escaped double-quote characters.

pageget.py
# - - -   s c a n Q u o t e d   - - -

def scanQuoted ( s ):
    """Remove a double-quoted string from the front of s, with escaping

      [ s is a string ->
          if s starts with a double-quoted string with escaping ->
            return (contents of that string with escaped quotes
            unescaped, remainder of s)
          else -> raise ValueError ]
    """

We'll use an index pos to mark our progress through the string, and a list L to accumulate the character content of the string.

pageget.py
    #-- 1 --
    pos  =  0
    L    =  []

s must start with a double quote, or we're in trouble right away.

pageget.py
    #-- 2 --
    # [ if s starts with '"' ->
    #     pos  :=  1
    #   else -> raise ValueError ]
    if  s[0] == '"':
        pos  =  1
    else:
        raise ValueError, ( "Expecting '\"' at the start of the "
                            "command group." )

Next we move pos along until we either find an unescaped quote or run out of string.

A little note on an edge case. In the body of the loop below, we test a string s[pos:pos+2] to see if it is an escaped double quote. What happens if pos is pointing at the last character of s? In that case, the expression s[pos:pos+2] evaluates without raising an exception, and returns the single character at s[pos].

pageget.py
    #-- 3 --
    # [ pos  :=   pos advanced to character after closing quote
    #             or end of s, whichever comes first
    #   L    +:=  characters between s[pos:] and closing quote or
    #             end of s, whichever comes first ]
    while ( ( pos < len(s) ) and
            ( s[pos] != '"' ) ):
        #-- 3 body --
        # [ if s[pos:] starts with '\"' ->
        #     L    +:=  s[pos+1]
        #     pos  +:=  2
        #   else ->
        #     L    +:=  s[pos]
        #     pos  +:=  1 ]
        if  ( s[pos:pos+2] == r'\"' ):
            L.append ( s[pos+1] )
            pos  +=  2
        else:
            L.append ( s[pos] )
            pos  +=  1

    #-- 4 --
    # [ if pos < len(s) ->
    #     return (elements of L concatenated, s[pos+1:])
    #   else ->
    #     raise ValueError ]
    if  pos < len(s):
        return ("".join(L), s[pos+1:])
    else:
        raise ValueError, ("No closing double-quote: '%s'" % s)