There are a number of commonly used ways to represent tree structures in a database, each suited for different access patterns. This page attempts to outline the most common ones, along with what operations are easy and hard to do from a database.
The Adjacency List model is the most common and most obvious way of representing a tree structure in a database. Unfortunately, retrieving all descendants of a node, usually the most common operation, is difficult and expensive, requiring many queries on large trees.
Columns | parent_id |
Cheap operations | Retrieve parent, Retrieve children, Insert, Change parent, Delete |
Expensive operations | Retrieve tree, Retrieve descendants, Retrieve ancestors |
This method is similar to the Adjacency List method, with the exception that it stores the ID of the root of the tree in addition to the ID of the direct parent node. When storing multiple trees in a table, such as threads in a discussion board, this makes retrieving the entire tree easy. Retrieving descendants of a small part of the tree remains difficult, however.
Columns | root_id, parent_id |
Cheap operations | Retrieve parent, Retrieve children, Insert, Change parent, Delete, Retrieve tree |
Expensive operations | Retrieve descendants, Retrieve ancestors |
This method works by storing in each node a complete list of ancestors, from the root down. The list is typically either stored as a delimited string (eg, "1,24,500") or as an array of IDs. For ease of querying, the id of the node itself is usually included as the last element in the list. This method makes most tree operations easy, but requires the ancestor list for the parent node to be retrieved before a node can be inserted. On deep trees, this method also results in very large lists, resulting in a lot of storage overhead.
Columns | ancestors (array or string) |
Cheap operations | Retrieve parent, Retrieve children, Insert, Change parent, Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors |
Expensive operations | None |
This method works by assigning a 'left' and 'right' ID to each node. Nodes are numbered in depth-first order, so every node is guaranteed to have a 'left' value greater than the 'left' of all its ancestors, and a 'right' value greater than the 'right' of all its ancestors, and sorting a (sub)tree by its 'left' values will return it in depth-first order.
This is also the only method listed here that allows nodes to have an explicit internal ordering of children of each node other than any external ordering provided.
Note that 'retrieve parent' and 'retrieve children' are listed as expensive operations because they are non-optimal. However, they are still no less efficient than 'retrieve ancestors' and 'retrieve descendants'.
'Delete' is listed as a cheap operation under the assumption that 'left' and 'right' IDs do not need to remain contiguous.
Columns | left, right |
Cheap operations | Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors |
Expensive operations | Retrieve parent, Retrieve children, Insert, Change parent |
This method functions as a combination of the Nested Set and Adjacency List methods, conferring most of the benefits of each.
Columns | parent_id, left, right |
Cheap operations | Retrieve parent, Retrieve children, Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors |
Expensive operations | Insert, Change parent |
This method operates identically to the Nested Set method, described above, with the addition of a depth field, which contains the count of nodes between the current node and the root of the tree. This makes querying direct children and parents cheap.
This is generally an improvement on Nested Set with Adjacency List, since you can retrieve children only to a certain depth.
Columns | left, right, depth |
*Cheap operations | Retrieve parent, Retrieve children, Delete, Retrieve tree, Retrieve descendants, Retrieve ancestors |
Expensive operations | Insert, Change parent |
No page links to DatabaseTreeStructures.