Statistics on sgmltag.py ------------------------ Unverified. Program size: 141/522 code lines, 27.0%. ================================================================ Syntax errors ---------------------------------------------------------------- S1 ---------------------------------------------------------------- ================================================================ Bugs that would be caught in a more strongly-typed language ---------------------------------------------------------------- T1 In SGMLTag.__init__(), this line: if self.attrList: should be: if attrList: ---------------------------------------------------------------- ================================================================ Logic bugs ---------------------------------------------------------------- B1 SGMLTag.__str__(): Failed to produce opening "<" ---------------------------------------------------------------- B2 Tag_Attr_Scan_Value(): Wrong rule for unquoted strings. It is looking for a string starting with giStartCset, but that includes only letters. Loose HTML allows numbers to be unquoted as well, e.g., "". So these lines: pastFirst = scan.any ( giStartCset ) if pastFirst: ... should be: pastFirst = scan.any ( giCset ) if pastFirst is not None: ... Also, there is no `else' case for this test: pastOpen = scan.match ( '"' ) if pastOpen: But there should be an else case to issue an error if the value is neither alphanumeric nor a double-quote. ---------------------------------------------------------------- B3 In SGML_Tag_Scan_Comment(), this line crashes because `gi' is undefined: return SGMLTag ( gi, 0, None, string.join ( L, "" ) ) It should be: return SGMLTag ( SGML_COMMENT_GI, 0, None, string.join ( L, "" ) ) ---------------------------------------------------------------- B4 In SGMLTAG.__str__(), the initial "--" is missing from the reconstituted string. This is because the .gi member is only "!", not "!--". Change: if self.gi == SGML_COMMENT_GI: return "<%s%s%s" % ( self.gi, self.text, SGML_COMMENT_TAIL ) to: if self.gi == SGML_COMMENT_GI: return "<%s%s%s" % ( SGML_COMMENT_HEAD, self.text, SGML_COMMENT_TAIL ) ---------------------------------------------------------------- B5 In SGML_Tag_Scan_Comment(), the scanning logic does not move past the closing "-->", which causes it to be treated as ordinary text. With this bug and B4, a comment of the form: will be reconstituted as: --> Change prime 2.2 from: #-- 2.2 -- # [ if endPos is None -> # scan := scan advanced to the start of the next line # L +:= (remainder of the current line) + "\n" # else -> # scan := scan advanced up to endPos # L +:= text from scan up to endPos # done := 1 ] if endPos is not None: L.append ( scan.tab ( endPos ) ) done = 1 else: ... to: #-- 2.2 -- # [ if endPos is None -> # scan := scan advanced to the start of the next line # L +:= (remainder of the current line) + "\n" # else -> # scan := scan advanced up to endPos, plus the length # of SGML_COMMENT_TAIL # L +:= text from scan up to endPos # done := 1 ] if endPos is not None: L.append ( scan.tab ( endPos ) ) scan.move ( len ( SGML_COMMENT_TAIL ) ) done = 1 else: ... ---------------------------------------------------------------- B6 In Tag_Attr_Scan_Value, if the value starts with a character that is not a double-quote and not in giCset, the error message is issued correctly ("Tag attribute values must be alphanumeric or enclosed in double-quotes"), but then it falls through to prime 2 and then prime 3, where this line return ( value, isQuoted ) fails because `value' has never been set. The fix is to add the return statement in this else clause: else: scan.error ( "Tag attribute values must be alphanumeric " "or enclosed in double-quotes ('\"')." ) return (None, 0) ----------------------------------------------------------------