If you have done XML work using the Document Object Model
(DOM), you will find that the lxml package has a quite
different way of representing documents as trees. In the
DOM, trees are build out of nodes represented as Node instances. Some nodes are Element instances, representing whole elements.
Each Element has an assortment of child
nodes of various types: Element nodes for
its element children; Attribute nodes for
its attributes; and Text nodes for textual
content.
Here is a small fragment of XHTML, and its representation as a DOM tree:

<p>To find out <em>more</em>, see the <a href="http://www.w3.org/XML">standard</a>.</p>
The above diagram shows the conceptual structure of the
XML. The lxml view of an XML document, by contrast, builds
a tree of only one node type: the Element.
The main difference between the ElementTree view used in lxml, and
the classical view, is the association of text with
elements: it is very different in lxml.
An instance of lxml's Element class contains
these attributes:
.tag
The name of the element, such as "p"
for a paragraph or "em" for emphasis.
.text
The text inside the element, if any, up to
the first child element. This attribute
is None if the element is empty or has
no text before the first child element.
.tail
The text following the element. This is the most unusual departure.
In the DOM model, any text following an element is associated
with the parent of E; in Elxml, that text is
considered the “tail” of .
E
.attrib
A Python dictionary containing the element's XML
attribute names and their corresponding values. For
example, for the element “<h2
class="arch" id="N15">”, that
element's .attrib would be the
dictionary “{"class": "arch", "id":
"N15"}”.
To access sub-elements, treat an element as a list.
For example, if node is an Element instance, node[0] is
the first sub-element of node.
If node doesn't have any sub-elements,
this operation will raise an IndexError exception.
You can find out the number of sub-elements using the
len() function. For example, if
node has five children, len(node) will return a value of 5.
One advantage of the lxml view is that a tree is now made of
only one type of node: each node is an Element instance. Here is our XML fragment again, and a picture
of its representation in lxml.

<p>To find out <em>more</em>, see the <a href="http://www.w3.org/XML">standard</a>.</p>
Notice that in the lxml view, the text ", see
the\n" (which includes the newline) is contained in
the .tail attribute of the em
element, not associated with the p element
as it would be in the DOM view. Also, the "." at the end of the paragraph is in the .tail attribute of the a (link) element.
Now that you know how XML is represented in lxml, there are
three general application areas.