Next / Previous / Contents / Shipman's homepage

19.4. Paragraph.writeNode(): Write as XML

Converting an instance of this class back to XML is one situation where the quirky structure of the lxml package makes us work a little harder. The instance's ._phraseList attribute is the input to this process; the output is a new para element attached to the given parent, possibly with child elements such as genus or cite.

However, because of the way lxml places text into either .text or .tail attributes, we must translate the ._phraseList in this way:

  1. All initial phrases with no markup are concatenated to form the .text of the new para element.

  2. For each element of the phrase list of the form (markup, t), where markup is not None, create a new child element under the para element, whose tag is markup and whose .text is t.

    Any following markup-free phrases are concatenated and stored in the .tail of the new child element.

Here's an example. The input text looks like this:

    Read about <genus>Tyrannus couchii</genus> in
    <cite>NMOS Journal</cite> next month.

In the XML representation, the para node has .text='\n Read about' and .tail=None. It has two children. The genus child has .text='Tyrannus couchi' and .tail=' in\n '. The cite child has .text='NMOS Journal' and .tail=' next month.\n'.

The corresponding ._phraseList would be:

[(None,    '\n  Read about'),
 ('genus', 'Tyrannus couchi'),
 (None,    ' in\n    '),
 ('cite',  'NMOS Journal'),
 (None,    ' next month.\n  ')]

The first step is to create the new para element and attach it to the parent.
# - - -   P a r a g r a p h . w r i t e N o d e

    def writeNode(self, parent):
        """Attach a new paragraph to the parent node.
        #-- 1 -
        # [ parent  :=  parent with a new rnc.PARA_N child added
        #   para  :=  that child ]
        para = et.SubElement(parent, rnc.PARA_N)

The logic that converts the __phraseList back to XML is packaged in a separate function, Section 19.5, “Paragraph.writeContent(): Write the content of a paragraph”. This is necessary in the special case that a Narrative instance contains only one Paragraph; in that case, the output is not wrapped in a para element.
        #-- 2 --
        # [ para  :=  para with XML content made from
        #             self._phraseList ]