Next / Previous / Contents / Shipman's homepage

4.10. descendantText(): Find all contained text nodes

The purpose of this function is to find all the text content within, or under, a given node. This is necessary because the content of a literate node may contain markup that must be filtered out of the output text.

For example, a literate block may contain a DocBook xref element wrapped around a call to some function, so that the reader of the document can click on that link to go to the definition of the called function.

The Element.itertext() method of the etree module does the work of walking the entire subtree, generate the content of each bit of text in document order.

# - - -   d e s c e n d a n t T e x t

def descendantText(elt):
    '''Equivalent of XPath descendant-or-self::text()

      [ P: elt is an et.Element
        return the concatenation of all text in or below elt
        as a string ]
    return ''.join([t
                    for t in elt.itertext()])