lxml.objectify module

The lxml.objectify module implements a Python object API for XML. It is based on lxml.etree.

class lxml.objectify.BoolElement

Bases: IntElement

Boolean type base on string values: ‘true’ or ‘false’.

Note that this inherits from IntElement to mimic the behaviour of Python’s bool type.

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

_setValueParser(function)

Set the function that parses the Python value from a string.

Do not use this unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.ElementMaker(self, namespace=None, nsmap=None, annotate=True, makeelement=None)

Bases: object

An ElementMaker that can be used for constructing trees.

Example:

>>> M = ElementMaker(annotate=False)
>>> attributes = {'class': 'par'}
>>> html = M.html( M.body( M.p('hello', attributes, M.br, 'objectify', style="font-weight: bold") ) )

>>> from lxml.etree import tostring
>>> print(tostring(html, method='html').decode('ascii'))
<html><body><p style="font-weight: bold" class="par">hello<br>objectify</p></body></html>

To create tags that are not valid Python identifiers, call the factory directly and pass the tag name as first argument:

>>> root = M('tricky-tag', 'some text')
>>> print(root.tag)
tricky-tag
>>> print(root.text)
some text

Note that this module has a predefined ElementMaker instance called E.

class lxml.objectify.FloatElement

Bases: NumberElement

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

_setValueParser(function)

Set the function that parses the Python value from a string.

Do not use this unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.IntElement

Bases: NumberElement

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

_setValueParser(function)

Set the function that parses the Python value from a string.

Do not use this unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.LongElement

Bases: NumberElement

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

_setValueParser(function)

Set the function that parses the Python value from a string.

Do not use this unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.NoneElement

Bases: ObjectifiedDataElement

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.NumberElement

Bases: ObjectifiedDataElement

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

_setValueParser(function)

Set the function that parses the Python value from a string.

Do not use this unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.ObjectPath(path)

Bases: object

Immutable object that represents a compiled object path.

Example for a path: ‘root.child[1].{other}child[25]’

addattr(self, root, value)

Append a value to the target element in a subtree.

If any of the children on the path does not exist, it is created.

hasattr(self, root)

setattr(self, root, value)

Set the value of the target element in a subtree.

If any of the children on the path does not exist, it is created.

find

class lxml.objectify.ObjectifiedDataElement

Bases: ObjectifiedElement

This is the base class for all data type Elements. Subclasses should override the ‘pyval’ property and possibly the __str__ method.

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.ObjectifiedElement

Bases: ElementBase

Main XML Element class.

Element children are accessed as object attributes. Multiple children with the same name are available through a list index. Example:

>>> root = XML("<root><c1><c2>0</c2><c2>1</c2></c1></root>")
>>> second_c2 = root.c1.c2[1]
>>> print(second_c2.text)
1

Note that you cannot (and must not) instantiate this class or its subclasses.

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

class lxml.objectify.ObjectifyElementClassLookup(self, tree_class=None, empty_data_class=None)

Bases: ElementClassLookup

Element class lookup method that uses the objectify classes.

class lxml.objectify.PyType(self, name, type_check, type_class, stringify=None)

Bases: object

User defined type.

Named type that contains a type check function, a type class that inherits from ObjectifiedDataElement and an optional “stringification” function. The type check must take a string as argument and raise ValueError or TypeError if it cannot handle the string value. It may be None in which case it is not considered for type guessing. For registered named types, the ‘stringify’ function (or unicode() if None) is used to convert a Python object with type name ‘name’ to the string representation stored in the XML tree.

Example:

PyType('int', int, MyIntClass).register()

Note that the order in which types are registered matters. The first matching type will be used.

register(self, before=None, after=None)

Register the type.

The additional keyword arguments ‘before’ and ‘after’ accept a sequence of type names that must appear before/after the new type in the type list. If any of them is not currently known, it is simply ignored. Raises ValueError if the dependencies cannot be fulfilled.

unregister(self)

name

stringify

type_check

xmlSchemaTypes

The list of XML Schema datatypes this Python type maps to.

Note that this must be set before registering the type!

class lxml.objectify.StringElement

Bases: ObjectifiedDataElement

String data class.

Note that this class does not support the sequence protocol of strings: len(), iter(), str_attr[0], str_attr[0:1], etc. are not supported. Instead, use the .text attribute to get a ‘real’ string.

_init(self): Called after object initialisation. Custom subclasses may override this if they recursively call _init() in the superclasses.

_setText(s): For use in subclasses only. Don’t use unless you know what you are doing.

addattr(self, tag, value)

Add a child value to the element.

As opposed to append(), it sets a data value, not an element.

addnext(self, element)

Adds the element as a following sibling directly after this element.

This is normally used to set a processing instruction or comment after the root node of a document. Note that tail text is automatically discarded when adding at the root level.

addprevious(self, element)

Adds the element as a preceding sibling directly before this element.

This is normally used to set a processing instruction or comment before the root node of a document. Note that tail text is automatically discarded when adding at the root level.

append(self, element): Adds a subelement to the end of this element.

clear(self, keep_tail=False)

Resets an element. This function removes all subelements, clears all attributes and sets the text and tail properties to None.

Pass keep_tail=True to leave the tail text untouched.

countchildren(self): Return the number of children of this element, regardless of their name.

cssselect(expr, *, translator)

Run the CSS expression on this element and its children, returning a list of the results.

Equivalent to lxml.cssselect.CSSSelect(expr)(self) – note that pre-compiling the expression can provide a substantial speedup.

descendantpaths(self, prefix=None): Returns a list of object path expressions for all descendants.

extend(self, elements): Extends the current children by the elements in the iterable.

find(self, path, namespaces=None)

Finds the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findall(self, path, namespaces=None)

Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

findtext(self, path, default=None, namespaces=None)

Finds text for the first matching subelement, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

get(self, key, default=None): Gets an element attribute.

getchildren(self): Returns a sequence of all direct children. The elements are returned in document order.

getiterator(self, tag=None, *tags)

Returns a sequence or iterator of all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags, see iter.

Deprecated: Note that this method is deprecated as of ElementTree 1.3 and lxml 2.0. It returns an iterator in lxml, which diverges from the original ElementTree behaviour. If you want an efficient iterator, use the element.iter() method instead. You should only use this method in new code if you require backwards compatibility with older versions of lxml or ElementTree.

getnext(self): Returns the following sibling of this element or None.

getparent(self): Returns the parent of this element or None for the root element.

getprevious(self): Returns the preceding sibling of this element or None.

getroottree(self)

Return an ElementTree for the root node of the document that contains this element.

This is the same as following element.getparent() up the tree until it returns None (for the root element) and then build an ElementTree for the last parent that was returned.

index(self, child, start=None, stop=None)

Find the position of the child within the parent.

This method is not part of the original ElementTree API.

insert(self, index, element): Inserts a subelement at the given position in this element

items(self): Gets element attributes, as a sequence. The attributes are returned in an arbitrary order.

iter(self, tag=None, *tags)

Iterate over all elements in the subtree in document order (depth first pre-order), starting with this element.

Can be restricted to find only elements with specific tags: pass "{ns}localname" as tag. Either or both of ns and localname can be * for a wildcard; ns can be empty for no namespace. "localname" is equivalent to "{}localname" (i.e. no namespace) but "*" is "{*}*" (any or no namespace), not "{}*".

You can also pass the Element, Comment, ProcessingInstruction and Entity factory functions to look only for the specific element type.

Passing multiple tags (or a sequence of tags) instead of a single tag will let the iterator return all elements matching any of these tags, in document order.

iterancestors(self, tag=None, *tags)

Iterate over the ancestors of this element (from parent to parent).

Can be restricted to find only elements with specific tags, see iter.

iterchildren(self, tag=None, *tags, reversed=False)

Iterate over the children of this element.

As opposed to using normal iteration on this element, the returned elements can be reversed with the ‘reversed’ keyword and restricted to find only elements with specific tags, see iter.

iterdescendants(self, tag=None, *tags)

Iterate over the descendants of this element in document order.

As opposed to el.iter(), this iterator does not yield the element itself. The returned elements can be restricted to find only elements with specific tags, see iter.

iterfind(self, path, namespaces=None)

Iterates over all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping that allows the usage of XPath prefixes in the path expression.

itersiblings(self, tag=None, *tags, preceding=False)

Iterate over the following or preceding siblings of this element.

The direction is determined by the ‘preceding’ keyword which defaults to False, i.e. forward iteration over the following siblings. When True, the iterator yields the preceding siblings in reverse document order, i.e. starting right before the current element and going backwards.

Can be restricted to find only elements with specific tags, see iter.

itertext(self, tag=None, *tags, with_tail=True)

Iterates over the text content of a subtree.

You can pass tag names to restrict text content to specific elements, see iter.

You can set the with_tail keyword argument to False to skip over tail text.

keys(self): Gets a list of attribute names. The names are returned in an arbitrary order (just like for an ordinary Python dictionary).

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with the same document.

remove(self, element): Removes a matching subelement. Unlike the find methods, this method compares elements based on identity, not on tag value or contents.

replace(self, old_element, new_element): Replaces a subelement with the element passed as second argument.

set(self, key, value): Sets an element attribute. In HTML documents (not XML or XHTML), the value None is allowed and creates an attribute without value (just the attribute name).

strlen()

values(self): Gets element attribute values as a sequence of strings. The attributes are returned in an arbitrary order.

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables): Evaluate an xpath expression using the element as context node.

attrib: Element attribute dictionary. Where possible, use get(), set(), keys(), values() and items() to access element attributes.

base

The base URI of the Element (xml:base or HTML base URL). None if the base URI is unknown.

Note that the value depends on the URL of the document that holds the Element if there is no xml:base attribute on the Element or its ancestors.

Setting this property will set an xml:base attribute on the Element, regardless of the document type (XML or HTML).

nsmap

Namespace prefix->URI mapping known in the context of this Element. This includes all namespace declarations of the parents.

Note that changing the returned dict has no effect on the Element.

prefix: Namespace prefix or None.

pyval

sourceline: Original line number as found by the parser or None if unknown.

tag: Element tag

tail: Text after this element’s end tag, but before the next sibling element’s start tag. This is either a string or the value None, if there was no text.

text

lxml.objectify.DataElement(_value, attrib=None, nsmap=None, _pytype=None, _xsi=None, **_attributes)

Create a new element from a Python value and XML attributes taken from keyword arguments or a dictionary passed as second argument.

Automatically adds a ‘pytype’ attribute for the Python type of the value, if the type can be identified. If ‘_pytype’ or ‘_xsi’ are among the keyword arguments, they will be used instead.

If the _value argument is an ObjectifiedDataElement instance, its py:pytype, xsi:type and other attributes and nsmap are reused unless they are redefined in attrib and/or keyword arguments.

lxml.objectify.Element(_tag, attrib=None, nsmap=None, _pytype=None, **_attributes)

Objectify specific version of the lxml.etree Element() factory that always creates a structural (tree) element.

NOTE: requires parser based element class lookup activated in lxml.etree!

lxml.objectify.XML(xml, parser=None, base_url=None)

Objectify specific version of the lxml.etree XML() literal factory that uses the objectify parser.

You can pass a different parser as second argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

lxml.objectify.__unpickleElementTree(data)

lxml.objectify.annotate(element_or_tree, ignore_old=True, ignore_xsi=False, empty_pytype=None, empty_type=None, annotate_xsi=0, annotate_pytype=1)

Recursively annotates the elements of an XML tree with ‘xsi:type’ and/or ‘py:pytype’ attributes.

If the ‘ignore_old’ keyword argument is True (the default), current ‘py:pytype’ attributes will be ignored for the type annotation. Set to False if you want reuse existing ‘py:pytype’ information (iff appropriate for the element text value).

If the ‘ignore_xsi’ keyword argument is False (the default), existing ‘xsi:type’ attributes will be used for the type annotation, if they fit the element text values.

Note that the mapping from Python types to XSI types is usually ambiguous. Currently, only the first XSI type name in the corresponding PyType definition will be used for annotation. Thus, you should consider naming the widest type first if you define additional types.

The default ‘py:pytype’ annotation of empty elements can be set with the empty_pytype keyword argument. Pass ‘str’, for example, to make string values the default.

The default ‘xsi:type’ annotation of empty elements can be set with the empty_type keyword argument. The default is not to annotate empty elements. Pass ‘string’, for example, to make string values the default.

The keyword arguments ‘annotate_xsi’ (default: 0) and ‘annotate_pytype’ (default: 1) control which kind(s) of annotation to use.

lxml.objectify.deannotate(element_or_tree, pytype=True, xsi=True, xsi_nil=False, cleanup_namespaces=False)

Recursively de-annotate the elements of an XML tree by removing ‘py:pytype’ and/or ‘xsi:type’ attributes and/or ‘xsi:nil’ attributes.

If the ‘pytype’ keyword argument is True (the default), ‘py:pytype’ attributes will be removed. If the ‘xsi’ keyword argument is True (the default), ‘xsi:type’ attributes will be removed. If the ‘xsi_nil’ keyword argument is True (default: False), ‘xsi:nil’ attributes will be removed.

Note that this does not touch the namespace declarations by default. If you want to remove unused namespace declarations from the tree, pass the option cleanup_namespaces=True.

lxml.objectify.dump(_Element element not None): Return a recursively generated string representation of an element.

lxml.objectify.enable_recursive_str(on=True): Enable a recursively generated tree representation for str(element), based on objectify.dump(element).

lxml.objectify.fromstring(xml, parser=None, base_url=None)

Objectify specific version of the lxml.etree fromstring() function that uses the objectify parser.

You can pass a different parser as second argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

lxml.objectify.getRegisteredTypes()

Returns a list of the currently registered PyType objects.

To add a new type, retrieve this list and call unregister() for all entries. Then add the new type at a suitable position (possibly replacing an existing one) and call register() for all entries.

This is necessary if the new type interferes with the type check functions of existing ones (normally only int/float/bool) and must the tried before other types. To add a type that is not yet parsable by the current type check functions, you can simply register() it, which will append it to the end of the type list.

lxml.objectify.makeparser(remove_blank_text=True, **kw)

Create a new XML parser for objectify trees.

You can pass all keyword arguments that are supported by etree.XMLParser(). Note that this parser defaults to removing blank text. You can disable this by passing the remove_blank_text boolean keyword option yourself.

lxml.objectify.parse(f, parser=None, base_url=None)

Parse a file or file-like object with the objectify parser.

You can pass a different parser as second argument.

The base_url keyword allows setting a URL for the document when parsing from a file-like object. This is needed when looking up external entities (DTD, XInclude, …) with relative paths.

lxml.objectify.pyannotate(element_or_tree, ignore_old=False, ignore_xsi=False, empty_pytype=None)

Recursively annotates the elements of an XML tree with ‘pytype’ attributes.

If the ‘ignore_old’ keyword argument is True (the default), current ‘pytype’ attributes will be ignored and replaced. Otherwise, they will be checked and only replaced if they no longer fit the current text value.

Setting the keyword argument ignore_xsi to True makes the function additionally ignore existing xsi:type annotations. The default is to use them as a type hint.

The default annotation of empty elements can be set with the empty_pytype keyword argument. The default is not to annotate empty elements. Pass ‘str’, for example, to make string values the default.

lxml.objectify.pytypename(obj): Find the name of the corresponding PyType for a Python object.

lxml.objectify.set_default_parser(new_parser=None)

Replace the default parser used by objectify’s Element() and fromstring() functions.

The new parser must be an etree.XMLParser.

Call without arguments to reset to the original parser.

lxml.objectify.set_pytype_attribute_tag(attribute_tag=None)

Change name and namespace of the XML attribute that holds Python type information.

Do not use this unless you know what you are doing.

Reset by calling without argument.

Default: “{http://codespeak.net/lxml/objectify/pytype}pytype”

lxml.objectify.xsiannotate(element_or_tree, ignore_old=False, ignore_pytype=False, empty_type=None)

Recursively annotates the elements of an XML tree with ‘xsi:type’ attributes.

If the ‘ignore_old’ keyword argument is True (the default), current ‘xsi:type’ attributes will be ignored and replaced. Otherwise, they will be checked and only replaced if they no longer fit the current text value.

Note that the mapping from Python types to XSI types is usually ambiguous. Currently, only the first XSI type name in the corresponding PyType definition will be used for annotation. Thus, you should consider naming the widest type first if you define additional types.

Setting the keyword argument ignore_pytype to True makes the function additionally ignore existing pytype annotations. The default is to use them as a type hint.

The default annotation of empty elements can be set with the empty_type keyword argument. The default is not to annotate empty elements. Pass ‘string’, for example, to make string values the default.