node argument to this static method is
para node, or some other element that
has the same content. It returns a new
Paragraph instance representing that content.
In general, the content can be any mixture of ordinary text,
and text marked up with elements such as
To conform to the representation discussed in Section 19, “
class Paragraph: One paragraph of mixed
text”, we have to convert
lxml's representation into a sequence of phrases,
where each phrase is “unmarked” (plain text) or
lxml model, any initial unmarked text
will be found in the
.text attribute of the
given node. If any text in the paragraph is marked up, it
will be found in the node's child elements, in the
.text attribute. However, if there are any child
elements, the text in their
represents unmarked text following the element.
# - - - P a r a g r a p h . r e a d N o d e @staticmethod def readNode(node): """Convert para-content to a Paragraph instance. """
First, we'll create an empty
instance. Then, if there is any initial
we add it to that instance as an unmarked phrase.
#-- 1 -- # [ result := a new, empty Paragraph ] result = Paragraph()
For the method that adds one phrase, see Paragraph-addPhrase.
#-- 2 -- # [ if node.text is not None -> # result := result with an unmarked phrase added # containing (node.text) # else -> I ] if node.text is not None: result.addPhrase(None, node.text)
Next, process the children (if any) in order. For
each child, add its
.text as a marked phrase;
then, if there is any
.tail text, add that as
an unmarked phrase.
#-- 3 -- # [ result := result with content added from children, # if any ] for child in node: result.addPhrase (child.tag, child.text) if child.tail is not None: result.addPhrase (None, child.tail) #-- 4 -- return result