Next / Previous / Contents / Shipman's homepage

19.7. Paragraph.readNode(): Process a para element (static method)

The node argument to this static method is either a para node, or some other element that has the same content. It returns a new Paragraph instance representing that content.

In general, the content can be any mixture of ordinary text, and text marked up with elements such as genus. To conform to the representation discussed in Section 19, “class Paragraph: One paragraph of mixed text”, we have to convert lxml's representation into a sequence of phrases, where each phrase is “unmarked” (plain text) or “marked-up”.

In the lxml model, any initial unmarked text will be found in the .text attribute of the given node. If any text in the paragraph is marked up, it will be found in the node's child elements, in the .text attribute. However, if there are any child elements, the text in their .tail attributes represents unmarked text following the element.
# - - -   P a r a g r a p h . r e a d N o d e

    def readNode(node):
        """Convert para-content to a Paragraph instance.

First, we'll create an empty Paragraph instance. Then, if there is any initial .text, we add it to that instance as an unmarked phrase.
        #-- 1 --
        # [ result  :=  a new, empty Paragraph ]
        result = Paragraph()

For the method that adds one phrase, see Paragraph-addPhrase.
        #-- 2 --
        # [ if node.text is not None ->
        #     result  :=  result with an unmarked phrase added
        #                 containing (node.text)
        #   else -> I ]
        if  node.text is not None:
            result.addPhrase(None, node.text)

Next, process the children (if any) in order. For each child, add its .text as a marked phrase; then, if there is any .tail text, add that as an unmarked phrase.
        #-- 3 --
        # [ result  :=  result with content added from children,
        #               if any ]
        for  child in node:
            result.addPhrase (child.tag, child.text)
            if  child.tail is not None:
                result.addPhrase (None, child.tail)

        #-- 4 --
        return result