Next / Previous / Contents / TCC Help System / NM Tech homepage

9.10. Element.getiterator(): Make an iterator to walk a subtree

Sometimes you want to walk through all or part of a document, looking at all the elements in document order. Similarly, you may want to walk through all or part of a document and look for all the occurrences of a specific kind of element.

The .getiterator() method on an Element instance produces a Python iterator that tells Python how to visit elements in these ways. Here is the general form, for an Element instance E:

E.getiterator(tag=None)

Preorder traversal of a tree means that we visit the root first, then the subtrees from left to right (that is, in document order). This is also called a depth-first traversal: we visit the root, then its first child, then its first child's first child, and so on until we run out of descendants. Then we move back up to the last element with more children, and repeat.

Here is an example showing the traversal of an entire tree. First, a diagram showing the tree structure:

A preorder traversal of this tree goes in this order: a, b, c, d, e.

>>> xml = '''<a><b><c/><d/></b><e/></a>'''
>>> tree = etree.fromstring(xml)
>>> walkAll = tree.getiterator()
>>> for  elt in walkAll:
...     print elt.tag,
... 
a b c d e
>>> 

In this example, we visit only the bird nodes.

>>> xml = '''<bio>
...   <bird type="Bushtit"/>
...   <butterfly type="Mourning Cloak"/>
...   <bird type="Mew Gull"/>
...   <group site="Water Canyon">
...     <snake type="Sidewinder"/>
...     <bird type="Verdin"/>
...   </group>
...   <bird type="Pygmy Nuthatch"/>
... </bio>'''
>>> root = etree.fromstring(xml)
>>> for  elt in root.getiterator('bird'):
...     print elt.get('type', 'Unknown')
... 
Bushtit
Mew Gull
Verdin
Pygmy Nuthatch
>>> 

Note in the above example that the iterator visits the Verdin element even though it is not a direct child of the root element.