Materialized Path trees¶
This is an efficient implementation of Materialized Path
trees for Django, as described by Vadim Tropashko in SQL Design
Patterns. Materialized Path is probably the fastest way of working with
trees in SQL without the need of extra work in the database, like Oracle’s
CONNECT BY
or sprocs and triggers for nested intervals.
In a materialized path approach, every node in the tree will have a
path
attribute, where the full path from the root
to the node will be stored. This has the advantage of needing very simple
and fast queries, at the risk of inconsistency because of the
denormalization of parent
/child
foreign keys. This can be prevented
with transactions.
django-treebeard
uses a particular approach: every step in the path has
a fixed width and has no separators. This makes queries predictable and
faster at the cost of using more characters to store a step. To address
this problem, every step number is encoded.
Also, two extra fields are stored in every node:
depth
and numchild
.
This makes the read operations faster, at the cost of a little more
maintenance on tree updates/inserts/deletes. Don’t worry, even with these
extra steps, materialized path is more efficient than other approaches.
Warning
As with all tree implementations, please be aware of the Known Caveats.
Note
The materialized path approach makes heavy use of LIKE
in your
database, with clauses like WHERE path LIKE '002003%'
. If you think
that LIKE
is too slow, you’re right, but in this case the
path
field is indexed in the database, and all
LIKE
clauses that don’t start with a %
character will use
the index. This is what makes the materialized path approach so fast.
- class treebeard.mp_tree.MP_Node(*args, **kwargs)¶
Bases:
Node
Abstract model to create your own Materialized Path Trees.
Warning
Do not change the values of
path
,depth
ornumchild
directly: use one of the included methods instead. Consider these values read-only.Warning
Do not change the values of the
steplen
,alphabet
ornode_order_by
after saving your first object. Doing so will corrupt the tree.Warning
If you need to define your own
Manager
class, you’ll need to subclassMP_NodeManager
.Also, if in your manager you need to change the default queryset handler, you’ll need to subclass
MP_NodeQuerySet
.Example:
class SortedNode(MP_Node): node_order_by = ['numval', 'strval'] numval = models.IntegerField() strval = models.CharField(max_length=255)
Read the API reference of
treebeard.models.Node
for info on methods available in this class, or read the following section for methods with particular arguments or exceptions.- steplen¶
Attribute that defines the length of each step in the
path
of a node. The default value of 4 allows a maximum of 1679615 children per node. Increase this value if you plan to store large trees (asteplen
of 5 allows more than 60M children per node). Note that increasing this value, while increasing the number of children per node, will decrease the maxdepth
of the tree (by default: 63). To increase the maxdepth
, increase the max_length attribute of thepath
field in your model.
- alphabet¶
Attribute: the alphabet that will be used in base conversions when encoding the path steps into strings. The default value,
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
is the most optimal possible value that is portable between the supported databases (which means: their default collation will order thepath
field correctly).Note
In case you know what you are doing, there is a test that is disabled by default that can tell you the optimal default alphabet in your enviroment. To run the test you must enable the
TREEBEARD_TEST_ALPHABET
enviroment variable:$ TREEBEARD_TEST_ALPHABET=1 py.test -k test_alphabet
In OS X Mavericks, good readable values for the three supported databases in their default configuration:
Database
Optimal Alphabet
Base
MySQL 5.6.17
0-9A-Z
36
PostgreSQL 9.3.4
0-9A-Za-z
62
Sqlite3
0-9A-Za-z
62
The default value is MySQL’s since it will work in all DBs, but when working with a better database, changing the
alphabet
value is recommended in order to increase the density of the paths.For an even better approach, change the collation of the
path
column in the database to handle raw ASCII, and use the printable ASCII characters (0x20 to 0x7E) as thealphabet
.
- node_order_by¶
Attribute: a list of model fields that will be used for node ordering. When enabled, all tree operations will assume this ordering.
Example:
node_order_by = ['field1', 'field2', 'field3']
- path¶
CharField
, stores the full materialized path for each node. The default value of it’s max_length, 255, is the max efficient and portable value for avarchar
. Increase it to allow deeper trees (max depth by default: 63)Note
django-treebeard uses Django’s abstract model inheritance, so to change the
max_length
value of the path in your model, you have to redeclare the path field in your model:class MyNodeModel(MP_Node): path = models.CharField(max_length=1024, unique=True)
Note
For performance, and if your database allows it, you can safely define the path column as ASCII (not utf-8/unicode/iso8859-1/etc) to keep the index smaller (and faster). Also note that some databases (mysql) have a small index size limit. InnoDB for instance has a limit of 765 bytes per index, so that would be the limit if your path is ASCII encoded. If your path column in InnoDB is using unicode, the index limit will be 255 characters since in MySQL’s indexes, unicode means 3 bytes per character.
Note
django-treebeard
uses numconv for path encoding.
- depth¶
PositiveIntegerField
, depth of a node in the tree. A root node has a depth of 1.
- numchild¶
PositiveIntegerField
, the number of children of the node.
- classmethod add_root(**kwargs)¶
Adds a root node to the tree.
This method saves the node in database. The object is populated as if via:
` obj = cls(**kwargs) `
- Raises:
PathOverflow – when no more root objects can be added
- add_child(**kwargs)¶
Adds a child to the node.
This method saves the node in database. The object is populated as if via:
` obj = self.__class__(**kwargs) `
- Raises:
PathOverflow – when no more child nodes can be added
- add_sibling(pos=None, **kwargs)¶
Adds a new node as a sibling to the current node object.
This method saves the node in database. The object is populated as if via:
` obj = self.__class__(**kwargs) `
- Raises:
PathOverflow – when the library can’t make room for the node’s new position
- move(target, pos=None)¶
Moves the current node and all it’s descendants to a new position relative to another node.
- Raises:
PathOverflow – when the library can’t make room for the node’s new position
- classmethod get_tree(parent=None)¶
- Returns:
A queryset of nodes ordered as DFS, including the parent. If no parent is given, the entire tree is returned.
See:
treebeard.models.Node.get_tree()
Note
This metod returns a queryset.
- classmethod find_problems()¶
Checks for problems in the tree structure, problems can occur when:
your code breaks and you get incomplete transactions (always use transactions!)
changing the
steplen
value in a model (you mustdump_bulk()
first, changesteplen
and thenload_bulk()
- Returns:
A tuple of five lists:
a list of ids of nodes with characters not found in the
alphabet
a list of ids of nodes when a wrong
path
length according tosteplen
a list of ids of orphaned nodes
a list of ids of nodes with the wrong depth value for their path
a list of ids nodes that report a wrong number of children
Note
A node won’t appear in more than one list, even when it exhibits more than one problem. This method stops checking a node when it finds a problem and continues to the next node.
Note
Problems 1, 2 and 3 can’t be solved automatically.
Example:
MyNodeModel.find_problems()
- classmethod fix_tree(destructive=False, fix_paths=False)¶
Solves some problems that can appear when transactions are not used and a piece of code breaks, leaving the tree in an inconsistent state.
The problems this method solves are:
Nodes with an incorrect
depth
ornumchild
values due to incorrect code and lack of database transactions.“Holes” in the tree. This is normal if you move/delete nodes a lot. Holes in a tree don’t affect performance,
Incorrect ordering of nodes when
node_order_by
is enabled. Ordering is enforced on node insertion, so if an attribute innode_order_by
is modified after the node is inserted, the tree ordering will be inconsistent.
- Parameters:
fix_paths – A boolean value. If True, a slower, more complex fix_tree method will be attempted. If False (the default), it will use a safe (and fast!) fix approach, but it will only solve the
depth
andnumchild
nodes, it won’t fix the tree holes or broken path ordering.destructive – Deprecated; alias for
fix_paths
.
Example:
MyNodeModel.fix_tree()
- class treebeard.mp_tree.MP_NodeManager(*args, **kwargs)¶
Bases:
Manager
Custom manager for nodes in a Materialized Path tree.
- class treebeard.mp_tree.MP_NodeQuerySet(model=None, query=None, using=None, hints=None)¶
Bases:
QuerySet
Custom queryset for the tree node manager.
Needed only for the custom delete method.