Materialized Path trees

This is an efficient implementation of Materialized Path trees for Django, as described by Vadim Tropashko in SQL Design Patterns. Materialized Path is probably the fastest way of working with trees in SQL without the need of extra work in the database, like Oracle’s CONNECT BY or sprocs and triggers for nested intervals.

In a materialized path approach, every node in the tree will have a path attribute, where the full path from the root to the node will be stored. This has the advantage of needing very simple and fast queries, at the risk of inconsistency because of the denormalization of parent/child foreign keys. This can be prevented with transactions.

django-treebeard uses a particular approach: every step in the path has a fixed width and has no separators. This makes queries predictable and faster at the cost of using more characters to store a step. To address this problem, every step number is encoded.

Also, two extra fields are stored in every node: depth and numchild. This makes the read operations faster, at the cost of a little more maintenance on tree updates/inserts/deletes. Don’t worry, even with these extra steps, materialized path is more efficient than other approaches.

Warning

As with all tree implementations, please be aware of the Known Caveats.

Note

The materialized path approach makes heavy use of LIKE in your database, with clauses like WHERE path LIKE '002003%'. If you think that LIKE is too slow, you’re right, but in this case the path field is indexed in the database, and all LIKE clauses that don’t start with a % character will use the index. This is what makes the materialized path approach so fast.

Inheritance diagram of MP_Node
class treebeard.mp_tree.MP_Node(*args, **kwargs)

Bases: Node

Abstract model to create your own Materialized Path Trees.

Warning

Do not change the values of path, depth or numchild directly: use one of the included methods instead. Consider these values read-only.

Warning

Do not change the values of the steplen, alphabet or node_order_by after saving your first object. Doing so will corrupt the tree.

Warning

If you need to define your own Manager class, you’ll need to subclass MP_NodeManager.

Also, if in your manager you need to change the default queryset handler, you’ll need to subclass MP_NodeQuerySet.

Example:

class SortedNode(MP_Node):
   node_order_by = ['numval', 'strval']

   numval = models.IntegerField()
   strval = models.CharField(max_length=255)

Read the API reference of treebeard.models.Node for info on methods available in this class, or read the following section for methods with particular arguments or exceptions.

steplen

Attribute that defines the length of each step in the path of a node. The default value of 4 allows a maximum of 1679615 children per node. Increase this value if you plan to store large trees (a steplen of 5 allows more than 60M children per node). Note that increasing this value, while increasing the number of children per node, will decrease the max depth of the tree (by default: 63). To increase the max depth, increase the max_length attribute of the path field in your model.

alphabet

Attribute: the alphabet that will be used in base conversions when encoding the path steps into strings. The default value, 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ is the most optimal possible value that is portable between the supported databases (which means: their default collation will order the path field correctly).

Note

In case you know what you are doing, there is a test that is disabled by default that can tell you the optimal default alphabet in your enviroment. To run the test you must enable the TREEBEARD_TEST_ALPHABET enviroment variable:

$ TREEBEARD_TEST_ALPHABET=1 py.test -k test_alphabet

In OS X Mavericks, good readable values for the three supported databases in their default configuration:

Database

Optimal Alphabet

Base

MySQL 5.6.17

0-9A-Z

36

PostgreSQL 9.3.4

0-9A-Za-z

62

Sqlite3

0-9A-Za-z

62

The default value is MySQL’s since it will work in all DBs, but when working with a better database, changing the alphabet value is recommended in order to increase the density of the paths.

For an even better approach, change the collation of the path column in the database to handle raw ASCII, and use the printable ASCII characters (0x20 to 0x7E) as the alphabet.

node_order_by

Attribute: a list of model fields that will be used for node ordering. When enabled, all tree operations will assume this ordering.

Example:

node_order_by = ['field1', 'field2', 'field3']
path

CharField, stores the full materialized path for each node. The default value of it’s max_length, 255, is the max efficient and portable value for a varchar. Increase it to allow deeper trees (max depth by default: 63)

Note

django-treebeard uses Django’s abstract model inheritance, so to change the max_length value of the path in your model, you have to redeclare the path field in your model:

class MyNodeModel(MP_Node):
    path = models.CharField(max_length=1024, unique=True)

Note

For performance, and if your database allows it, you can safely define the path column as ASCII (not utf-8/unicode/iso8859-1/etc) to keep the index smaller (and faster). Also note that some databases (mysql) have a small index size limit. InnoDB for instance has a limit of 765 bytes per index, so that would be the limit if your path is ASCII encoded. If your path column in InnoDB is using unicode, the index limit will be 255 characters since in MySQL’s indexes, unicode means 3 bytes per character.

Note

django-treebeard uses numconv for path encoding.

depth

PositiveIntegerField, depth of a node in the tree. A root node has a depth of 1.

numchild

PositiveIntegerField, the number of children of the node.

classmethod add_root(**kwargs)

Adds a root node to the tree.

This method saves the node in database. The object is populated as if via:

` obj = cls(**kwargs) `

Raises:

PathOverflow – when no more root objects can be added

See: treebeard.models.Node.add_root()

add_child(**kwargs)

Adds a child to the node.

This method saves the node in database. The object is populated as if via:

` obj = self.__class__(**kwargs) `

Raises:

PathOverflow – when no more child nodes can be added

See: treebeard.models.Node.add_child()

add_sibling(pos=None, **kwargs)

Adds a new node as a sibling to the current node object.

This method saves the node in database. The object is populated as if via:

` obj = self.__class__(**kwargs) `

Raises:

PathOverflow – when the library can’t make room for the node’s new position

See: treebeard.models.Node.add_sibling()

move(target, pos=None)

Moves the current node and all it’s descendants to a new position relative to another node.

Raises:

PathOverflow – when the library can’t make room for the node’s new position

See: treebeard.models.Node.move()

classmethod get_tree(parent=None)
Returns:

A queryset of nodes ordered as DFS, including the parent. If no parent is given, the entire tree is returned.

See: treebeard.models.Node.get_tree()

Note

This metod returns a queryset.

classmethod find_problems()

Checks for problems in the tree structure, problems can occur when:

  1. your code breaks and you get incomplete transactions (always use transactions!)

  2. changing the steplen value in a model (you must dump_bulk() first, change steplen and then load_bulk()

Returns:

A tuple of five lists:

  1. a list of ids of nodes with characters not found in the alphabet

  2. a list of ids of nodes when a wrong path length according to steplen

  3. a list of ids of orphaned nodes

  4. a list of ids of nodes with the wrong depth value for their path

  5. a list of ids nodes that report a wrong number of children

Note

A node won’t appear in more than one list, even when it exhibits more than one problem. This method stops checking a node when it finds a problem and continues to the next node.

Note

Problems 1, 2 and 3 can’t be solved automatically.

Example:

MyNodeModel.find_problems()
classmethod fix_tree(destructive=False, fix_paths=False)

Solves some problems that can appear when transactions are not used and a piece of code breaks, leaving the tree in an inconsistent state.

The problems this method solves are:

  1. Nodes with an incorrect depth or numchild values due to incorrect code and lack of database transactions.

  2. “Holes” in the tree. This is normal if you move/delete nodes a lot. Holes in a tree don’t affect performance,

  3. Incorrect ordering of nodes when node_order_by is enabled. Ordering is enforced on node insertion, so if an attribute in node_order_by is modified after the node is inserted, the tree ordering will be inconsistent.

Parameters:
  • fix_paths – A boolean value. If True, a slower, more complex fix_tree method will be attempted. If False (the default), it will use a safe (and fast!) fix approach, but it will only solve the depth and numchild nodes, it won’t fix the tree holes or broken path ordering.

  • destructive – Deprecated; alias for fix_paths.

Example:

MyNodeModel.fix_tree()
class treebeard.mp_tree.MP_NodeManager(*args, **kwargs)

Bases: Manager

Custom manager for nodes in a Materialized Path tree.

class treebeard.mp_tree.MP_NodeQuerySet(model=None, query=None, using=None, hints=None)

Bases: QuerySet

Custom queryset for the tree node manager.

Needed only for the custom delete method.